[ https://issues.apache.org/jira/browse/ZOOKEEPER-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009878#comment-13009878 ]
Vishal K commented on ZOOKEEPER-1026: ------------------------------------- The logs are really very difficult to follow. Also, I think the sequence of events as described do not match with the logs. Heres what we can see in from node1_after_restart 2048:151124 [QuorumPeer:/0.0.0.0:2888] WARN org.apache.zookeeper.server.quorum.Learner - Exception when following the leader <==== NODE 2 WENT DOWN HERE 13497: at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:375) ===> NODE 3 WENT DOWN HERE 17926:309591 [LearnerHandler-/13.0.0.12:52753] WARN org.apache.zookeeper.server.quorum.LearnerHandler - ******* GOODBYE /13.0.0.12:52753 ******** ===> NODE2 WENT DOWN HERE AGAIN 18044:310757 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Leader - Shutdown called ===> NODE1 shutdown again (which is why node 3 became the leader)? There seem to be a few restarts after this as well grep -n "tickTime set to 3000" node1_after_restart.log [...] 23685:1831 [pool-1-thread-3] INFO org.apache.zookeeper.server.quorum.QuorumPeer - tickTime set to 3000 28491:1479 [pool-1-thread-2] INFO org.apache.zookeeper.server.ZooKeeperServer - tickTime set to 3000 So anything beyond time 310757 is not as described in the bug report. Also note that because of ZOKEEPER-975, after receiving notifications from all peers a node can go in LEADING state until it waits for initTime() * tickTime(). Then, it will timeout, start FLE, and go to FOLLOWING state. > Sequence number assignment decreases after old node rejoins cluster > ------------------------------------------------------------------- > > Key: ZOOKEEPER-1026 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1026 > Project: ZooKeeper > Issue Type: Bug > Components: server > Affects Versions: 3.3.3 > Reporter: Jeremy Stribling > Attachments: ZOOKEEPER-1026.logs.tgz > > > I ran into a weird case where a Zookeeper server rejoins the cluster after > missing several operations, and then a client creates a new sequential node > that has a number earlier than the last node it created. I don't have full > logs, or a live system in this state, or any data directories, just some > partial server logs and the evidence as seen by the client. Haven't tried > reproducing it yet, just wanted to see if anyone here had any ideas. Here's > the scenario (probably more info than necessary, but trying to be complete) > 1) Initially (5:37:20): 3 nodes up, with ids 215, 126, and 37 (called nodes > #1, #2, and #3 below): > 2) Nodes periodically (and throughout this whole timeline) create sequential, > non-ephemeral nodes under the /zkrsm parent node. > 3) 5:46:57: Node #1 gets notified of /zkrsm/0000000000000000_record0000002116 > 4) 5:47:06: Node #1 restarts and rejoins > 5) 5:49:26: Node #2 gets notified of /zkrsm/0000000000000000_record0000002708 > 6) 5:49:29: Node #2 restarts and rejoins > 7) 5:52:01: Node #3 gets notified of /zkrsm/0000000000000000_record0000003291 > 8) 5:52:02: Node #3 restarts and begins the rejoining process > 9) 5:52:08: Node #1 successfully creates > /zkrsm/0000000000000000_record0000003348 > 10) 5:52:08: Node #2 dies after getting notified of > /zkrsm/0000000000000000_record0000003348 > 11) 5:52:10ish: Node #3 is elected leader (the ZK server log doesn't have > wallclock timestamps, so not exactly sure on the ordering of this step) > 12) 5:52:15: Node #1 successfully creates > /zkrsm/0000000000000000_record0000003292 > Note that the node created in step #12 is lower than the one created in step > #9, and is exactly one greater than the last node seen by node #3 before it > restarted. > Here is the sequence of session establishments as seen from the C client of > node #1 after its restart (the IP address of node #1=13.0.0.11, #2=13.0.0.12, > #3=13.0.0.13): > 2011-03-18 05:46:59,838:17454(0x7fc57d3db710):ZOO_INFO@check_events@1632: > session establishment complete on server [13.0.0.13:2888], > sessionId=0x252ec780a3020000, negotiated timeout=6000 > 2011-03-18 05:49:32,194:17454(0x7fc57cbda710):ZOO_INFO@check_events@1632: > session establishment complete on server [13.0.0.13:2888], > sessionId=0x252ec782f5100002, negotiated timeout=6000 > 2011-03-18 05:52:02,352:17454(0x7fc57d3db710):ZOO_INFO@check_events@1632: > session establishment complete on server [13.0.0.12:2888], > sessionId=0x7e2ec782ff5f0001, negotiated timeout=6000 > 2011-03-18 05:52:08,583:17454(0x7fc57d3db710):ZOO_INFO@check_events@1632: > session establishment complete on server [13.0.0.11:2888], > sessionId=0x7e2ec782ff5f0001, negotiated timeout=6000 > 2011-03-18 05:52:13,834:17454(0x7fc57cbda710):ZOO_INFO@check_events@1632: > session establishment complete on server [13.0.0.11:2888], > sessionId=0xd72ec7856d0f0001, negotiated timeout=6000 > I will attach logs for all nodes after each of their restarts, and a partial > log for node #3 from before its restart. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira