Looking at the code more... it seems the issue is here

In AssignmentManager.processDeadServersAndRegionsInTransition():

...
failoverCleanupDone();
if (!failover) {
  // Fresh cluster startup.
  LOG.info("Clean cluster startup. Assigning user regions");
  assignAllUserRegions(allRegions);
}
...

As soon as a single node failed, failover is set to true. So no
assignment is done. Later on the servers are recovered from the
"crash" (which a clean restart is as well), and then all unassigned
regions (all of them now, since they are unassigned earlier) are
assigned round-robin.



On Tue, Mar 14, 2017 at 1:54 PM, Lars George <lars.geo...@gmail.com> wrote:
> Hi,
>
> I had this happened at multiple clusters recently where after the
> restart the locality dropped from close to or exactly 100% down to
> single digits. The reason is that all regions were completely shuffled
> and reassigned to random servers. Upon reading the (yet again
> non-trivial) assignment code, I found that a single server missing
> will trigger a full "recovery" of all servers, which includes a drop
> of all previous assignments and random new assignment.
>
> This is just terrible! In addition, I also assumed that - at least the
> StochasticLoadBalancer - is checking which node holds most of the data
> of a region locality wise and picks that server. But that is _not_ the
> case! It just spreads everything seemingly randomly across the
> servers.
>
> To me this is a big regression (or straight bug) given that a single
> server out of, for example, hundreds could trigger that and destroy
> the locality completely. Running a major compaction is not an approach
> for many reasons.
>
> This used to work better, why that regression?
>
> Lars

Reply via email to