[jira] [Commented] (ZOOKEEPER-1026) Sequence number assignment decreases after old node rejoins cluster

Jeremy Stribling (JIRA) Tue, 22 Mar 2011 16:44:45 -0700

    [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009906#comment-13009906
 ]


Jeremy Stribling commented on ZOOKEEPER-1026:
---------------------------------------------

{quote}
The logs are really very difficult to follow.
{quote}

What would make them easier to follow?  I am about to replace the relative 
timestamps with wallclock timestamps and I will try to reproduce.  Is there 
anything else that might help?

{quote}
2048:151124 [QuorumPeer:/0.0.0.0:2888] WARN 
org.apache.zookeeper.server.quorum.Learner - Exception when following the 
leader <==== NODE 2 WENT DOWN HERE
13497: at 
org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:375) 
===> NODE 3 WENT DOWN HERE
17926:309591 [LearnerHandler-/13.0.0.12:52753] WARN 
org.apache.zookeeper.server.quorum.LearnerHandler - ******* GOODBYE 
/13.0.0.12:52753 ******** ===> NODE2 WENT DOWN HERE AGAIN
{quote}

Correct, node2 dies twice during the period I've described.  (Steps #6 and #10.)

{quote}
18044:310757 [QuorumPeer:/0.0.0.0:2888] INFO 
org.apache.zookeeper.server.quorum.Leader - Shutdown called ===> NODE1 shutdown 
again (which is why node 3 became the leader)?
{quote}

This happens because my node1 client code, running in tandem with the ZK server 
code, hits an assert failure when it detects the sequence numbers decreasing, 
and kills the server code as well; then our framework restarts everything.  So 
this is AFTER the client creates a file with a decreased seqno.  Everything 
after this point in the node1 file did not contribute to the bug.  (I left it 
in the log in case it showed anything interesting after the restart.)

{quote}
If you grep for 0000000000000000_record0000003292 and
0000000000000000_record0000003348 in node2_after_restart.log we can
see that znode ending with 3292 was at least accessed before 3348.
{quote}

Yes, while node2 was alive, nodes up to 3348 were accessed.  That 3292 is the 
original node with that sequence number.  The second one created by node1 does 
not appear to show up in the logs labeled as such -- I'm not sure how to get 
the ZookeeperServer to print a log message containing the name of the file it 
created.

{quote}
I am confused to see that node1_after_restart.log does not have any trace of 
sequential znodes beyond /zkrsm/0000000000000000_record0000000921 (especially 
because Node1 was the leader when the znodes in questions were created).
{quote}

It looks to me like node #2 was the leader for a large chunk of time based on 
these lines in node1_after_restart:

2535 [WorkerReceiver Thread] INFO 
org.apache.zookeeper.server.quorum.FastLeaderElection  - Notification: 126 
(n.leader), 8589936862 (n.zxid), 2 (n.round), LEADING (n.state), 126 (n.sid), 
LOOKING (my state)
...
154676 [WorkerReceiver Thread] INFO 
org.apache.zookeeper.server.quorum.FastLeaderElection  - Notification: 215 
(n.leader), 12884902548 (n.zxid), 3 (n.round), LOOKING (n.state), 126 (n.sid), 
LEADING (my state)

After that, there are a bunch of create messages within /zkrsm, but no GetDatas.

{quote}
I am also not sure whether to trust the cxid in these messages or not. If I
am not mistaken, they represent the current transaction id as seen by the 
client.
How can we isolate the transaction that did the creates?
{quote}

This confused me as well.  If there's anything I can do to get more accurate 
logs, please let me know.  I'm not sure if I can reproduce this yet, but I have 
a strategy I'm planning to try.

> Sequence number assignment decreases after old node rejoins cluster
> -------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1026
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1026
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Jeremy Stribling
>         Attachments: ZOOKEEPER-1026.logs.tgz
>
>
> I ran into a weird case where a Zookeeper server rejoins the cluster after 
> missing several operations, and then a client creates a new sequential node 
> that has a number earlier than the last node it created.  I don't have full 
> logs, or a live system in this state, or any data directories, just some 
> partial server logs and the evidence as seen by the client.  Haven't tried 
> reproducing it yet, just wanted to see if anyone here had any ideas.  Here's 
> the scenario (probably more info than necessary, but trying to be complete)
> 1) Initially (5:37:20): 3 nodes up, with ids 215, 126, and 37 (called nodes 
> #1, #2, and #3 below):
> 2) Nodes periodically (and throughout this whole timeline) create sequential, 
> non-ephemeral nodes under the /zkrsm parent node.
> 3) 5:46:57: Node #1 gets notified of /zkrsm/0000000000000000_record0000002116
> 4) 5:47:06: Node #1 restarts and rejoins
> 5) 5:49:26: Node #2 gets notified of /zkrsm/0000000000000000_record0000002708
> 6) 5:49:29: Node #2 restarts and rejoins
> 7) 5:52:01: Node #3 gets notified of /zkrsm/0000000000000000_record0000003291
> 8) 5:52:02: Node #3 restarts and begins the rejoining process
> 9) 5:52:08: Node #1 successfully creates 
> /zkrsm/0000000000000000_record0000003348
> 10) 5:52:08: Node #2 dies after getting notified of 
> /zkrsm/0000000000000000_record0000003348
> 11) 5:52:10ish: Node #3 is elected leader (the ZK server log doesn't have 
> wallclock timestamps, so not exactly sure on the ordering of this step)
> 12) 5:52:15: Node #1 successfully creates 
> /zkrsm/0000000000000000_record0000003292
> Note that the node created in step #12 is lower than the one created in step 
> #9, and is exactly one greater than the last node seen by node #3 before it 
> restarted.
> Here is the sequence of session establishments as seen from the C client of 
> node #1 after its restart (the IP address of node #1=13.0.0.11, #2=13.0.0.12, 
> #3=13.0.0.13):
> 2011-03-18 05:46:59,838:17454(0x7fc57d3db710):ZOO_INFO@check_events@1632: 
> session establishment complete on server [13.0.0.13:2888], 
> sessionId=0x252ec780a3020000, negotiated timeout=6000
> 2011-03-18 05:49:32,194:17454(0x7fc57cbda710):ZOO_INFO@check_events@1632: 
> session establishment complete on server [13.0.0.13:2888], 
> sessionId=0x252ec782f5100002, negotiated timeout=6000
> 2011-03-18 05:52:02,352:17454(0x7fc57d3db710):ZOO_INFO@check_events@1632: 
> session establishment complete on server [13.0.0.12:2888], 
> sessionId=0x7e2ec782ff5f0001, negotiated timeout=6000
> 2011-03-18 05:52:08,583:17454(0x7fc57d3db710):ZOO_INFO@check_events@1632: 
> session establishment complete on server [13.0.0.11:2888], 
> sessionId=0x7e2ec782ff5f0001, negotiated timeout=6000
> 2011-03-18 05:52:13,834:17454(0x7fc57cbda710):ZOO_INFO@check_events@1632: 
> session establishment complete on server [13.0.0.11:2888], 
> sessionId=0xd72ec7856d0f0001, negotiated timeout=6000
> I will attach logs for all nodes after each of their restarts, and a partial 
> log for node #3 from before its restart.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1026) Sequence number assignment decreases after old node rejoins cluster

Reply via email to