[ https://issues.apache.org/jira/browse/ZOOKEEPER-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009906#comment-13009906 ]
Jeremy Stribling commented on ZOOKEEPER-1026: --------------------------------------------- {quote} The logs are really very difficult to follow. {quote} What would make them easier to follow? I am about to replace the relative timestamps with wallclock timestamps and I will try to reproduce. Is there anything else that might help? {quote} 2048:151124 [QuorumPeer:/0.0.0.0:2888] WARN org.apache.zookeeper.server.quorum.Learner - Exception when following the leader <==== NODE 2 WENT DOWN HERE 13497: at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:375) ===> NODE 3 WENT DOWN HERE 17926:309591 [LearnerHandler-/13.0.0.12:52753] WARN org.apache.zookeeper.server.quorum.LearnerHandler - ******* GOODBYE /13.0.0.12:52753 ******** ===> NODE2 WENT DOWN HERE AGAIN {quote} Correct, node2 dies twice during the period I've described. (Steps #6 and #10.) {quote} 18044:310757 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Leader - Shutdown called ===> NODE1 shutdown again (which is why node 3 became the leader)? {quote} This happens because my node1 client code, running in tandem with the ZK server code, hits an assert failure when it detects the sequence numbers decreasing, and kills the server code as well; then our framework restarts everything. So this is AFTER the client creates a file with a decreased seqno. Everything after this point in the node1 file did not contribute to the bug. (I left it in the log in case it showed anything interesting after the restart.) {quote} If you grep for 0000000000000000_record0000003292 and 0000000000000000_record0000003348 in node2_after_restart.log we can see that znode ending with 3292 was at least accessed before 3348. {quote} Yes, while node2 was alive, nodes up to 3348 were accessed. That 3292 is the original node with that sequence number. The second one created by node1 does not appear to show up in the logs labeled as such -- I'm not sure how to get the ZookeeperServer to print a log message containing the name of the file it created. {quote} I am confused to see that node1_after_restart.log does not have any trace of sequential znodes beyond /zkrsm/0000000000000000_record0000000921 (especially because Node1 was the leader when the znodes in questions were created). {quote} It looks to me like node #2 was the leader for a large chunk of time based on these lines in node1_after_restart: 2535 [WorkerReceiver Thread] INFO org.apache.zookeeper.server.quorum.FastLeaderElection - Notification: 126 (n.leader), 8589936862 (n.zxid), 2 (n.round), LEADING (n.state), 126 (n.sid), LOOKING (my state) ... 154676 [WorkerReceiver Thread] INFO org.apache.zookeeper.server.quorum.FastLeaderElection - Notification: 215 (n.leader), 12884902548 (n.zxid), 3 (n.round), LOOKING (n.state), 126 (n.sid), LEADING (my state) After that, there are a bunch of create messages within /zkrsm, but no GetDatas. {quote} I am also not sure whether to trust the cxid in these messages or not. If I am not mistaken, they represent the current transaction id as seen by the client. How can we isolate the transaction that did the creates? {quote} This confused me as well. If there's anything I can do to get more accurate logs, please let me know. I'm not sure if I can reproduce this yet, but I have a strategy I'm planning to try. > Sequence number assignment decreases after old node rejoins cluster > ------------------------------------------------------------------- > > Key: ZOOKEEPER-1026 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1026 > Project: ZooKeeper > Issue Type: Bug > Components: server > Affects Versions: 3.3.3 > Reporter: Jeremy Stribling > Attachments: ZOOKEEPER-1026.logs.tgz > > > I ran into a weird case where a Zookeeper server rejoins the cluster after > missing several operations, and then a client creates a new sequential node > that has a number earlier than the last node it created. I don't have full > logs, or a live system in this state, or any data directories, just some > partial server logs and the evidence as seen by the client. Haven't tried > reproducing it yet, just wanted to see if anyone here had any ideas. Here's > the scenario (probably more info than necessary, but trying to be complete) > 1) Initially (5:37:20): 3 nodes up, with ids 215, 126, and 37 (called nodes > #1, #2, and #3 below): > 2) Nodes periodically (and throughout this whole timeline) create sequential, > non-ephemeral nodes under the /zkrsm parent node. > 3) 5:46:57: Node #1 gets notified of /zkrsm/0000000000000000_record0000002116 > 4) 5:47:06: Node #1 restarts and rejoins > 5) 5:49:26: Node #2 gets notified of /zkrsm/0000000000000000_record0000002708 > 6) 5:49:29: Node #2 restarts and rejoins > 7) 5:52:01: Node #3 gets notified of /zkrsm/0000000000000000_record0000003291 > 8) 5:52:02: Node #3 restarts and begins the rejoining process > 9) 5:52:08: Node #1 successfully creates > /zkrsm/0000000000000000_record0000003348 > 10) 5:52:08: Node #2 dies after getting notified of > /zkrsm/0000000000000000_record0000003348 > 11) 5:52:10ish: Node #3 is elected leader (the ZK server log doesn't have > wallclock timestamps, so not exactly sure on the ordering of this step) > 12) 5:52:15: Node #1 successfully creates > /zkrsm/0000000000000000_record0000003292 > Note that the node created in step #12 is lower than the one created in step > #9, and is exactly one greater than the last node seen by node #3 before it > restarted. > Here is the sequence of session establishments as seen from the C client of > node #1 after its restart (the IP address of node #1=13.0.0.11, #2=13.0.0.12, > #3=13.0.0.13): > 2011-03-18 05:46:59,838:17454(0x7fc57d3db710):ZOO_INFO@check_events@1632: > session establishment complete on server [13.0.0.13:2888], > sessionId=0x252ec780a3020000, negotiated timeout=6000 > 2011-03-18 05:49:32,194:17454(0x7fc57cbda710):ZOO_INFO@check_events@1632: > session establishment complete on server [13.0.0.13:2888], > sessionId=0x252ec782f5100002, negotiated timeout=6000 > 2011-03-18 05:52:02,352:17454(0x7fc57d3db710):ZOO_INFO@check_events@1632: > session establishment complete on server [13.0.0.12:2888], > sessionId=0x7e2ec782ff5f0001, negotiated timeout=6000 > 2011-03-18 05:52:08,583:17454(0x7fc57d3db710):ZOO_INFO@check_events@1632: > session establishment complete on server [13.0.0.11:2888], > sessionId=0x7e2ec782ff5f0001, negotiated timeout=6000 > 2011-03-18 05:52:13,834:17454(0x7fc57cbda710):ZOO_INFO@check_events@1632: > session establishment complete on server [13.0.0.11:2888], > sessionId=0xd72ec7856d0f0001, negotiated timeout=6000 > I will attach logs for all nodes after each of their restarts, and a partial > log for node #3 from before its restart. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira