[ https://issues.apache.org/jira/browse/ZOOKEEPER-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807102#comment-13807102 ]
Raul Gutierrez Segales commented on ZOOKEEPER-1732: --------------------------------------------------- [~fpj], [~abranzyck]: did you guys test this patch when joining a cluster of servers running without this patch (i.e.: trunk, only without this patch)? After rolling the first 2 followers - in a 5 member ensemble - the 3rd follower fails to join with this: {noformat} 2013-10-28 18:43:18,134 - INFO [WorkerReceiver[myid=4]] - Notification: 4 (n.leader), 0x8900000415 (n.zxid), 0x6 (n.round), LOOKING (n.state), 4 (n.sid), 0x89 (n.peerEPoch), LOOKING (my state)0 (n.config version) 2013-10-28 18:43:18,134 - INFO [WorkerReceiver[myid=4]] - Notification: 2 (n.leader), 0x880000002c (n.zxid), 0xffffffffffffffff (n.round), FOLLOWING (n.state), 0 (n.sid), 0x89 (n.peerEPoch), LOOKING (my state)0 (n.config version) 2013-10-28 18:43:18,135 - INFO [WorkerReceiver[myid=4]] - Notification: 2 (n.leader), 0x880000002c (n.zxid), 0x6 (n.round), LEADING (n.state), 2 (n.sid), 0x88 (n.peerEPoch), LOOKING (my state)0 (n.config version) 2013-10-28 18:43:18,135 - INFO [WorkerReceiver[myid=4]] - Notification: 2 (n.leader), 0x880000002c (n.zxid), 0x6 (n.round), FOLLOWING (n.state), 3 (n.sid), 0x88 (n.peerEPoch), LOOKING (my state)0 (n.config version) 2013-10-28 18:43:18,136 - INFO [WorkerReceiver[myid=4]] - Notification: 2 (n.leader), 0x880000002c (n.zxid), 0xffffffffffffffff (n.round), FOLLOWING (n.state), 1 (n.sid), 0x89 (n.peerEPoch), LOOKING (my state)0 (n.config version) {noformat} I am guessing IGNOREVALUE (0xffffffffffffffff) as the round value is causing issues? What was the expected behavior here (i.e.: when dealing with cluster members without this patch during an upgrade)? > ZooKeeper server unable to join established ensemble > ---------------------------------------------------- > > Key: ZOOKEEPER-1732 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1732 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection > Affects Versions: 3.4.5 > Environment: Windows 7, Java 1.7 > Reporter: Germán Blanco > Assignee: Germán Blanco > Priority: Blocker > Fix For: 3.4.6, 3.5.0 > > Attachments: CREATE_INCONSISTENCIES_patch.txt, zklog.tar.gz, > ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-3.4.patch, > ZOOKEEPER-1732-3.4.patch, ZOOKEEPER-1732-b3.4.patch, > ZOOKEEPER-1732-b3.4.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch, > ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch, ZOOKEEPER-1732.patch > > > I have a test in which I do a rolling restart of three ZooKeeper servers and > it was failing from time to time. > I ran the tests in a loop until the failure came out and it seems that at > some point one of the servers is unable to join the enssemble formed by the > other two. -- This message was sent by Atlassian JIRA (v6.1#6144)