[
https://issues.apache.org/jira/browse/ZOOKEEPER-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13716227#comment-13716227
]
Germán Blanco commented on ZOOKEEPER-1732:
------------------------------------------
bq. For option 3, we add LE information to the LearnerInfo message ...
How would you add the information to the LearnerInfo message? If the receiving
side doesn't support the same version fo the protocol it will not be able to
parse the message, right? Do you have in mind to use some current unused field?
My suggestion was that in the message with LearnerInfo, the Learner only
reports that it supports the additional information i.e. sending 0x10001 as the
protocol version. The Leader sees this and then it will include the leader
election information together with the LeaderInfo, only if the Learner supports
this additional information. The Learner receives the message and it will read
the leader election info only if the protocol supported by the Leader is also
0x1001. At this point the Learner can just update its leader election
information with the one it got from the Leader. No new message that way :-)
bq. If we don't change the recovery handshake and use the other approach I
outlined, then I believe all changes are concentrated in the FLE class, ...
I also see it that way, and I would do it exactly as you say. It is around 20
lines of code near the end of FastLeaderElection$Messenger$WorkerReceiver,
something like this: [https://gist.github.com/germanblanco/6060741]. I don't
see any changes in QuorumPeer, but maybe I am missing something.
bq. There are some FLE test cases that implement a mock server. I think we
should do something similar here. Instead of trying to reproduce the race, we
could just test that the follower correctly updates its information upon
receiving a notification.
Sounds very good.
> ZooKeeper server unable to join established ensemble
> ----------------------------------------------------
>
> Key: ZOOKEEPER-1732
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1732
> Project: ZooKeeper
> Issue Type: Bug
> Components: leaderElection
> Affects Versions: 3.4.5
> Environment: Windows 7, Java 1.7
> Reporter: Germán Blanco
> Priority: Blocker
> Fix For: 3.5.0, 3.4.6
>
> Attachments: zklog.tar.gz
>
>
> I have a test in which I do a rolling restart of three ZooKeeper servers and
> it was failing from time to time.
> I ran the tests in a loop until the failure came out and it seems that at
> some point one of the servers is unable to join the enssemble formed by the
> other two.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira