[ 
https://issues.apache.org/jira/browse/SOLR-7819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14640506#comment-14640506
 ] 

Shalin Shekhar Mangar commented on SOLR-7819:
---------------------------------------------

bq. I think we already do this, look at DistributedUpdateProcessor.java around 
line 883, if we are unable to set the LIR node, we start a thread to keep 
retrying the node set.

Umm, it looks the reverse to me. If we are unable to set the LIR node or if 
there is an exception then sendRecoveryCommand=false and we do not create the 
LeaderInitiatedRecoveryThread at all?

> ZkController.ensureReplicaInLeaderInitiatedRecovery does not respect 
> retryOnConnLoss
> ------------------------------------------------------------------------------------
>
>                 Key: SOLR-7819
>                 URL: https://issues.apache.org/jira/browse/SOLR-7819
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 5.2, 5.2.1
>            Reporter: Shalin Shekhar Mangar
>              Labels: Jepsen
>             Fix For: 5.3, Trunk
>
>
> SOLR-7245 added a retryOnConnLoss parameter to 
> ZkController.ensureReplicaInLeaderInitiatedRecovery so that indexing threads 
> do not hang during a partition on ZK operations. However, some of those 
> changes were unintentionally reverted by SOLR-7336 in 5.2.
> I found this while running Jepsen tests on 5.2.1 where a hung update managed 
> to put a leader into a 'down' state (I'm still investigating and will open a 
> separate issue about this problem).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to