[ 
https://issues.apache.org/jira/browse/SOLR-6511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14132960#comment-14132960
 ] 

Alan Woodward commented on SOLR-6511:
-------------------------------------

I think you might have some extra stuff on the end of the patch?

Digging a bit further into the logs, maxTries is set to 1 because 
ensureReplicaInLeaderInitiatedRecovery throws a SessionExpiredException 
(presumably because ZK has noticed the network blip and removed the relevant 
ephemeral node).  Maybe maxTries should *always* be set to 120?

One thing that might be nice here would be to add a utility method to 
ZkController called something like ensureLeadership(CloudDescriptor cd), which 
checks if the core described by the CloudDescriptor really is the current 
leader according to ZK, and throws an exception if it isn't.

> Fencepost error in LeaderInitiatedRecoveryThread
> ------------------------------------------------
>
>                 Key: SOLR-6511
>                 URL: https://issues.apache.org/jira/browse/SOLR-6511
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Alan Woodward
>            Assignee: Timothy Potter
>         Attachments: SOLR-6511.patch
>
>
> At line 106:
> {code}
>     while (continueTrying && ++tries < maxTries) {
> {code}
> should be
> {code}
>     while (continueTrying && ++tries <= maxTries) {
> {code}
> This is only a problem when called from DistributedUpdateProcessor, as it can 
> have maxTries set to 1, which means the loop is never actually run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to