[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112149#comment-13112149
 ] 

Benjamin Reed commented on ZOOKEEPER-1194:
------------------------------------------

can we break this out from 1192? most of the work in this patch is in the tests 
anyway. i think it keeps this review simpler.

a couple of small comments:

1) you misspell follwer
2) you should comment testLeaderInConnectingFollwers the importance of the 
joins after the starts
3) in the Mock Threads, when you detect an error, set it to a string that 
describes the error so that you can print it on the assert
4) testLeaderInElectingFollwers can be simplified by not starting the leader 
and making sure the followers fail. this also removes the need for the sleep



> Two possible race conditions during leader establishment
> --------------------------------------------------------
>
>                 Key: ZOOKEEPER-1194
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1194
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.0
>            Reporter: Alexander Shraer
>            Assignee: Alexander Shraer
>             Fix For: 3.4.0
>
>         Attachments: zookeeper-1194-ver1.patch, zookeeper-1194.patch
>
>
> Leader.getEpochToPropose() and Leader.waitForNewEpoch() act as barriers - 
> they make sure that a leader/follower can return from calling the method only 
> once connectingFollowers (or electingFollowers) contain a quorum. But these 
> methods don't make sure that the leader itself is in 
> connectingFollowers/electingFollowers. So the leader didn't necessarily reach 
> the barrier when followers pass it. This can cause the following problems:
> 1. If the leader is not in connectingFollowers when a LearnerHandler returns 
> from getEpochToPropose(), then the epoch sent by the leader to the follower 
> might be smaller than the leader's own last accepted epoch.
> 2. If the leader is not in electingFollowers when LearnerHandler returns from 
> waitForNewEpoch() then the leader will send a NEWLEADER message to followers, 
> and the followers will respond, but it is possible that the NEWLEADER message 
> is not in outstandingProposals when these NEWLEADER  acks arrive, which will 
> cause the NEWLEADER acks to be dropped.
> To fix this I propose to explicitly check that the leader is in 
> connectingFollowers/electingFollowers before anyone can pass these barriers.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to