[ https://issues.apache.org/jira/browse/SOLR-14942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219733#comment-17219733 ]
Shalin Shekhar Mangar commented on SOLR-14942: ---------------------------------------------- Thanks Hoss. I have updated the PR with code comments. Mike Drob also gave some feedback on the PR which has been incorporated as well. I intend to merge to master over the weekend. > Reduce leader election time on node shutdown > -------------------------------------------- > > Key: SOLR-14942 > URL: https://issues.apache.org/jira/browse/SOLR-14942 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Affects Versions: 7.7.3, 8.6.3 > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > The credit for this issue and investigation belongs to [~caomanhdat]. I am > merely reporting the issue and creating PRs based on his work. > The shutdown process waits for all replicas/cores to be closed before > removing the election node of the leader. This can take some time due to > index flush or merge activities on the leader cores and delays new leaders > from being elected. > This process happens at CoreContainer.shutdown(): > # zkController.preClose(): remove current node from live_node and change > states of all cores in this node to DOWN state. Assuming that the current > node hosting a leader of a shard, the shard becomes leaderless after calling > this method, since the state of the leader is DOWN now. The leader election > process is not triggered for the shard since the election node is still > on-hold by the current node. > # Waiting for all cores to be loaded (if there are any). > # SolrCores.close(): close all cores. > # zkController.close(): this is where all ephemeral nodes are removed from ZK > which include election nodes created by this node. Therefore other replicas > in the shard can take part in the leader election from now. > Note that CoreContainer.shutdown() is invoked when Jetty/Solr nodes receive > SIGTERM signal. > On receiving SIGTERM, Jetty will also stop accepting new connections and new > requests. This is a very important factor, since even if the leader replica > is ACTIVE and its node in live_nodes, the shard will be considered as > leaderless if no-one can index to that shard. Therefore shards become > leaderless as soon as the node (which contains shard’s leader) receives > SIGTERM. > Therefore the longer time step 1, 2 and 3 needed to finish, the longer shards > remain leaderless. The time needed for step 3 scales with the number of cores > so the more cores a node has, the worse. This time is spent in > IndexWriter.close() where the system will > # Flush all pending updates to disk > # Waiting for all merge finish (this most likely is the meaty part) > The shutdown process is proposed to changed to: > # Wait for all in-flight indexing requests and replication requests to > complete > # Remove election nodes > # Close all replicas/cores > This ensures that index flush or merges do not block new leader elections > anymore. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org