[ https://issues.apache.org/jira/browse/SOLR-10780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030666#comment-16030666 ]
Erick Erickson commented on SOLR-10780: --------------------------------------- As the original author of all that REGALANCELEADERS stuff, I'll be happy to see it go away, it's always been arcane ;).... The intent of the original was to prevent 100s of leaders being on the same Solr instance in cases where there were many, many shards spread across many machines and each machine would host a replica of each shard. In that case measurable performance degradation happened because, even though the extra work for the leader wasn't onerous, the cumulative extra work was. And since there is no use for BALANCESHARDUNIQUE other than preferredLeader (that I know of), this and the REBALANCELEADERS API commands are overkill. I think the intent of this functionality can be implemented much more simply. When a replica comes up and after it becomes active, if it examines the state of the collection and notes "too many" leaders on a particular node, if could simply request that it become the leader of its shard. By waiting until it's active, we should avoid conditions where a replica wants to become the leader but hasn't synced. I think this is quite legitimate as part of the general autoscaling effort, the time is now. Let's say I have 100 nodes, 100 shards and 100 replicas/shard. That is, each node hosts one replica for each shard. Now I run around and start up all the nodes. How do we keep from unnecessary leadership changes? Maybe throttle this somehow? Or two replicas for the same shard request leadership at the same time.... Or is this the Overseer's job? Something like a "balancing thread" that notices this condition and sends "you should be leader" messages to particular replicas. Or something that has a global view of what's happening cluster wide (as yet undefined)... > A new collection property autoRebalanceLeaders > ----------------------------------------------- > > Key: SOLR-10780 > URL: https://issues.apache.org/jira/browse/SOLR-10780 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Reporter: Noble Paul > > In solrcloud , the first replica to get started in a given shard becomes the > leader of that shard. This is a problem during cluster restarts. the first > node to get started have al leaders and that node ends up being very heavily > loaded. The solution we have today is to invoke a REBALANCELEADERS command > explicitly so that the system ends up with a uniform distribution of leaders > across nodes. This is a manual operation and we can make the system do it > automatically. > so each collection can have an {{autoRebalanceLeaders}} flag . If it is set > to true whenever a replica becomes {{ACTIVE}} in a shard , a > {{REBALANCELEADER}} is invoked for that shard -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org