[ https://issues.apache.org/jira/browse/SOLR-11472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16243764#comment-16243764 ]
Shalin Shekhar Mangar commented on SOLR-11472: ---------------------------------------------- Here's the sequence of events: {code} core_node3 is leader for .system collection Test starts a new node at port 50071 Node Added Trigger fires and a plan is computed. action=MOVEREPLICA&collection=.system&targetNode=127.0.0.1:50071_solr&replica=core_node3 is processed first and core_node8 is added on port 50071 but before it recovers fully, the leader node core_node3 is unloaded core_node6 becomes the leader and asks core_node8 to recover action=MOVEREPLICA&collection=.system&targetNode=127.0.0.1:50071_solr&replica=core_node6 now core_node6 is to be moved and core_node10 is added on port 50071 but before it can recover, core_node6 is also unloaded system_shard1_replica_n2 on port 49937 becomes the leader and asks core_node8 and core_node10 to sync with it but before they can recover the test stops node 49937. The NodeLostTrigger fires and tries to create a new replica But leader election cannot happen because no nodes have any data and/or none of them were active before. {code} The crux of the issue is that move replica unloaded the leader before the newly added replica becomes active. Actually, Andrzej has fixed this problem already in SOLR-11448. The leader election issue seen in these logs is a known problem in SolrCloud. Mark Miller created SOLR-7065 to address the gridlock of leader election in such cases. I'll audit jenkins again to see if this test has failed since SOLR-11448 was committed. If not, then I'll close this issue. > Leader election bug > ------------------- > > Key: SOLR-11472 > URL: https://issues.apache.org/jira/browse/SOLR-11472 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Affects Versions: 7.1, master (8.0) > Reporter: Andrzej Bialecki > Assignee: Shalin Shekhar Mangar > Attachments: > Console_output_of_AutoscalingHistoryHandlerTest_failure.txt > > > SOLR-11407 uncovered a bug in leader election, where the same failing node is > retried indefinitely. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org