[
https://issues.apache.org/jira/browse/ZOOKEEPER-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Germán Blanco updated ZOOKEEPER-1732:
-------------------------------------
Attachment: ZOOKEEPER-1732.patch
Thanks a lot for the insight!
I read then that:
- The checking of the ensemble voting for the server that joins may remain.
It is true that it is checked again in checkLeader (covering additional
cases).
Since we set all votes to round 0 and the current logicalclock starts
counting with 1 then the server joining will not pass the check in checkLeader.
However the additional checking is more readable and it avoids the processing
of adding these invalid votes to the outofelection list and the rest of the
checkings.
- The other checking is removed, since it is not possible for those votes to
arrive to this point.
- According to your explanation, the check of round and zxid when joining an
ensemble may be avoided.
It doesn't help to avoid that the server joins a broken ensemble in minority,
since they will have anyway the same epoch and zxid.
- The checking of the zxid also needs to be *loosened*.
ZOOKEEPER-1732.patch removes the cheking of zxid and adds a test case.
In order to implement the test case, one additional line in
FastLeaderElection.java was changed.
It changes logicalclock for current.getElectionEpoch(), which I believe must be
the same when not in LOOKING state.
That allows me to change the value of the round sent in FLE Notifications.
I am guessing also that it is ok to SUBMIT the patch.
> ZooKeeper server unable to join established ensemble
> ----------------------------------------------------
>
> Key: ZOOKEEPER-1732
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1732
> Project: ZooKeeper
> Issue Type: Bug
> Components: leaderElection
> Affects Versions: 3.4.5
> Environment: Windows 7, Java 1.7
> Reporter: Germán Blanco
> Priority: Blocker
> Fix For: 3.5.0, 3.4.6
>
> Attachments: zklog.tar.gz, ZOOKEEPER-1732.patch
>
>
> I have a test in which I do a rolling restart of three ZooKeeper servers and
> it was failing from time to time.
> I ran the tests in a loop until the failure came out and it seems that at
> some point one of the servers is unable to join the enssemble formed by the
> other two.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira