[ 
https://issues.apache.org/jira/browse/SOLR-10780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030666#comment-16030666
 ] 

Erick Erickson commented on SOLR-10780:
---------------------------------------

As the original author of all that REGALANCELEADERS stuff, I'll be happy to see 
it go away, it's always been arcane ;)....

The intent of the original was to prevent 100s of leaders being on the same 
Solr instance in cases where there were many, many shards spread across many 
machines and each machine would host a replica of each shard. In that case 
measurable performance degradation happened because, even though the extra work 
for the leader wasn't onerous, the cumulative extra work was.

And since there is no use for BALANCESHARDUNIQUE other than preferredLeader 
(that I know of), this and the REBALANCELEADERS API commands are overkill.

I think the intent of this functionality can be implemented much more simply. 
When a replica comes up and after it becomes active, if it examines the state 
of the collection and notes "too many" leaders on a particular node, if could 
simply request that it become the leader of its shard.

By waiting until it's active, we should avoid conditions where a replica wants 
to become the leader but hasn't synced.

I think this is quite legitimate as part of the general autoscaling effort, the 
time is now.

Let's say I have 100 nodes, 100 shards and 100 replicas/shard. That is, each 
node hosts one replica for each shard. Now I run around and start up all the 
nodes. How do we keep from unnecessary leadership changes? Maybe throttle this 
somehow?

Or two replicas for the same shard request leadership at the same time....

Or is this the Overseer's job? Something like a "balancing thread" that notices 
this condition and sends "you should be leader" messages to particular 
replicas. Or something that has a global view of what's happening cluster wide 
(as yet undefined)...


> A new collection property autoRebalanceLeaders 
> -----------------------------------------------
>
>                 Key: SOLR-10780
>                 URL: https://issues.apache.org/jira/browse/SOLR-10780
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>            Reporter: Noble Paul
>
> In solrcloud , the first replica to get started in a given shard becomes the 
> leader of that shard. This is a problem during cluster restarts. the first 
> node to get started have al leaders and that node ends up being very heavily 
> loaded. The solution we have today is to invoke a REBALANCELEADERS command 
> explicitly so that the system ends up with  a uniform distribution of leaders 
> across nodes. This is a manual operation and we can make the system do it 
> automatically. 
> so each collection can have an {{autoRebalanceLeaders}} flag . If it is set 
> to true whenever a replica becomes {{ACTIVE}} in a shard , a 
> {{REBALANCELEADER}} is invoked for that shard 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to