[
https://issues.apache.org/jira/browse/SOLR-6691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Erick Erickson updated SOLR-6691:
---------------------------------
Attachment: BalanceLeaderTester.java
SOLR-6691.patch
OK, I think this is finally working as I expect. The attached java file is a
stand-alone program that stresses the heck out of shard leader election. The
idea is that you fire it up against a collection and it
1> takes the initial state
2> tries to issue the preferred leader command to a randome replica on each
shard.
3> issues the rebalanceleaders comand
4> verifies that all the shard leader election queues have one entry for all
the nodes that were there originally.
5> verifies that the actual leader is the preferred leader
6> goes to <2>.
Note that the guts of this test are in the new unit test.
I had to change the leader election code to get all this predictable, and that
makes me a little nervous given how difficult that all was to get working in
the first place so this makes me a little nervous, but the external test code
beats _all_ the leader election code up pretty fiercely which gives me hope.
So I have a couple of options here:
1> go ahead and check it in. 5.0 appears to be receding here so it has some
time to bake before release
2> check it in to trunk and let it bake there for a while, perhaps until after
5.0 is cut, then merge and bake.
Opinions?
> REBALANCELEADERS needs to change the leader election queue.
> -----------------------------------------------------------
>
> Key: SOLR-6691
> URL: https://issues.apache.org/jira/browse/SOLR-6691
> Project: Solr
> Issue Type: Bug
> Reporter: Erick Erickson
> Assignee: Erick Erickson
> Attachments: BalanceLeaderTester.java, SOLR-6691.patch
>
>
> The original code (SOLR-6517) assumed that changes in the clusterstate after
> issuing a command to the overseer to change the leader indicated that the
> leader was successfully changed. Fortunately, Noble clued me in that this
> isn't the case and that the potential leader needs to insert itself in the
> leader election queue before trigging the change leader command.
> Inserting themselves in the front of the queue should probably happen in
> BALANCESHARDUNIQUE when the preferredLeader property is assigned as well.
> [~noble.paul] Do evil things happen if a node joins at the head but it's
> _already_ in the queue? These ephemeral nodes in the queue are watching each
> other. So if node1 is the leader you have
> node1 <- node2 <- node3 <- node4
> where <- means "watches".
> Now, if node3 puts itself at the head of the list, you have
> {code}
> node1 <- node2
> <- node3 <- node4
> {code}
> I _think_ when I was looking at this it all "just worked".
> 1> node 1 goes down. Nodes 2 and 3 duke it out but there's code to insure
> that node3 becomes the leader and node2 inserts itself at then end so it's
> watching node 4.
> 2> node 2 goes down, nobody gets notified and it doesn't matter.
> 3> node 3 goes down, node 4 gets notified and starts watching node 2 by
> inserting itself at the end of the list.
> 4> node 4 goes down, nobody gets notified and it doesn't matter.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]