[
https://issues.apache.org/jira/browse/SOLR-6511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14131972#comment-14131972
]
Timothy Potter commented on SOLR-6511:
--------------------------------------
bq. what ought to happen here is that replica2 sends a message back saying "no
need, I'm the leader, I'll take it from here, thanks". But because of the
fencepost error, the message to replica2 is never actually sent, and replica1
then writes replica2's state as DOWN into the LIRT zk node
The more I think about this, I don't see how the fencepost error gets hit here?
maxTries will be 120 if replica1 is setting replica2 to down
So I think the real fix is to do what Alan suggests - have the new leader
respond with: no need, I'm the leader, I'll take it from here, thanks
The patch I posted earlier has some good improvements in it, but I think we
need a unit test that proves the code works correctly for the scenario
described above.
> Fencepost error in LeaderInitiatedRecoveryThread
> ------------------------------------------------
>
> Key: SOLR-6511
> URL: https://issues.apache.org/jira/browse/SOLR-6511
> Project: Solr
> Issue Type: Bug
> Reporter: Alan Woodward
> Assignee: Timothy Potter
> Attachments: SOLR-6511.patch
>
>
> At line 106:
> {code}
> while (continueTrying && ++tries < maxTries) {
> {code}
> should be
> {code}
> while (continueTrying && ++tries <= maxTries) {
> {code}
> This is only a problem when called from DistributedUpdateProcessor, as it can
> have maxTries set to 1, which means the loop is never actually run.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]