[
https://issues.apache.org/jira/browse/ZOOKEEPER-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13724085#comment-13724085
]
Germán Blanco commented on ZOOKEEPER-1732:
------------------------------------------
I think now that we also need to do something with the peerEpoch. I can't
explain why this hasn't failed so far in my tests, maybe the corner case
causing this problem is even more unlikely than what I thought. But the peer
epoch value does get sent from the leader to the follower after election,
right? So it would be possible to just update the value in the leader election
information of the follower, during the synchronization phase of the Zab
protocol, instead of loosening the restriction. In that way, there will be at
least one check verifying that all the votes come from an ensemble established
with the same epoch.
What do you think?
I will also run the tests again with a trace to see when the inconsistent
ensemble is created.
> ZooKeeper server unable to join established ensemble
> ----------------------------------------------------
>
> Key: ZOOKEEPER-1732
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1732
> Project: ZooKeeper
> Issue Type: Bug
> Components: leaderElection
> Affects Versions: 3.4.5
> Environment: Windows 7, Java 1.7
> Reporter: Germán Blanco
> Priority: Blocker
> Fix For: 3.5.0, 3.4.6
>
> Attachments: zklog.tar.gz, ZOOKEEPER-1732.patch
>
>
> I have a test in which I do a rolling restart of three ZooKeeper servers and
> it was failing from time to time.
> I ran the tests in a loop until the failure came out and it seems that at
> some point one of the servers is unable to join the enssemble formed by the
> other two.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira