[ https://issues.apache.org/jira/browse/SOLR-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360622#comment-14360622 ]
Anshum Gupta commented on SOLR-7109: ------------------------------------ Looks good Shalin. There's one thing that I'd like to point: You've changed the signature of Zkcontroller.ensureReplicaInLeaderInitiatedRecovery(), which is a public method. Though it's advanced and internal, it's a public method and might break back-compat for developers. > Indexing threads stuck during network partition can put leader into down state > ------------------------------------------------------------------------------ > > Key: SOLR-7109 > URL: https://issues.apache.org/jira/browse/SOLR-7109 > Project: Solr > Issue Type: Bug > Components: SolrCloud > Affects Versions: 4.10.3, 5.0 > Reporter: Shalin Shekhar Mangar > Fix For: Trunk, 5.1 > > Attachments: SOLR-7109.patch, SOLR-7109.patch > > > I found this recently while running some Jepsen tests. I found that some > threads get stuck on zk operations for a long time in > ZkController.updateLeaderInitiatedRecoveryState method and when they wake up > they go ahead with setting the LIR state to down. But in the mean time, new > leader has been elected and sometimes you'd get into a state where the leader > itself is put into recovery causing the shard to reject all writes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org