[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808081#comment-13808081
 ] 

Germán Blanco commented on ZOOKEEPER-1732:
------------------------------------------

Yes, that is what I mean.
The round value in the votes is updated in updateElectionVote() after the 
election is finished.
In the previous code (without the patch) the vote when the election was 
finished had the epoch of the leader. That is, the epoch that the new leader 
had when the election started.
In the code after the patch, the vote is updated in updateElectionVote() to the 
epoch that the leader is using after the election is finished, which is one 
more than the epoch that it was using when the election started.
I think that if "newEpoch-1" is used to update the election vote, then things 
should be ok. If that is done, then servers with and without the patch should 
have the same value of epoch in the vote after the election is finished.
It is very good that [~rgs] has spotted this so soon, since it would have been 
seen in all upgrades from 3.4.5 to 3.4.6. On the other hand, consequences are 
not too serious. It only happens when servers with different versions are 
running in the same quorum and it only happens if there is an ensemble running 
(so there should be no interruption of the service).

> ZooKeeper server unable to join established ensemble
> ----------------------------------------------------
>
>                 Key: ZOOKEEPER-1732
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1732
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: leaderElection
>    Affects Versions: 3.4.5
>         Environment: Windows 7, Java 1.7
>            Reporter: Germán Blanco
>            Assignee: Germán Blanco
>            Priority: Blocker
>             Fix For: 3.4.6, 3.5.0
>
>         Attachments: CREATE_INCONSISTENCIES_patch.txt, zklog.tar.gz, 
> ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, 
> ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-b3.4.patch, 
> ZOOKEEPER-1732-b3.4.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch, 
> ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch
>
>
> I have a test in which I do a rolling restart of three ZooKeeper servers and 
> it was failing from time to time.
> I ran the tests in a loop until the failure came out and it seems that at 
> some point one of the servers is unable to join the enssemble formed by the 
> other two.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to