[jira] [Commented] (ZOOKEEPER-1026) Sequence number assignment decreases after old node rejoins cluster

Vishal K (JIRA) Tue, 22 Mar 2011 15:44:45 -0700

    [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009885#comment-13009885
 ]


Vishal K commented on ZOOKEEPER-1026:
-------------------------------------

If you grep for 0000000000000000_record0000003292 and
0000000000000000_record0000003348 in node2_after_restart.log we can
see that znode ending with 3292 was at least accessed before 3348.

2943:153670 [FollowerRequestProcessor:126] TRACE 
org.apache.zookeeper.server.quorum.FollowerRequestProcessor  - 
:Fsessionid:0x7e2ec782ff5f0000 type:getDate cxid:0x4d830245 
zxid:0xfffffffffffffffe \
txntype:unknown reqpath:/zkrsm/0000000000000000_record0000003292
[...]
4340:156963 [FollowerRequestProcessor:126] TRACE 
org.apache.zookeeper.server.quorum.FollowerRequestProcessor  - 
:Fsessionid:0x7e2ec782ff5f0000 type:getDate cxid:0x4d8302bb 
zxid:0xfffffffffffffffe \
txntype:unknown reqpath:/zkrsm/0000000000000000_record0000003348

The first column above indicates line number. The type (getDate) is a
typo and should be getData.

Similarly, in log3_after_restart:

32497:24471 [ProcessThread:-1] TRACE 
org.apache.zookeeper.server.PrepRequestProcessor  - 
:Psessionid:0x252ec7856ce60000 type:getDate cxid:0x4d8300c1 
zxid:0xfffffffffffffffe txntype:unknown reqpath\
:/zkrsm/0000000000000000_record0000003292
[...]
37513:27851 [ProcessThread:-1] TRACE 
org.apache.zookeeper.server.PrepRequestProcessor  - 
:Psessionid:0x252ec7856ce60001 type:getDate cxid:0x4d8300f3 
zxid:0xfffffffffffffffe txntype:unknown reqpath\
:/zkrsm/0000000000000000_record0000003348

I am confused to see that node1_after_restart.log does not have any trace of 
sequential znodes beyond /zkrsm/0000000000000000_record0000000921 (especially 
because Node1 was the leader when the znodes in questions were created).

I am also not sure whether to trust the cxid in these messages or not. If I
am not mistaken, they represent the current transaction id as seen by the 
client.
How can we isolate the transaction that did the creates?


> Sequence number assignment decreases after old node rejoins cluster
> -------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1026
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1026
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Jeremy Stribling
>         Attachments: ZOOKEEPER-1026.logs.tgz
>
>
> I ran into a weird case where a Zookeeper server rejoins the cluster after 
> missing several operations, and then a client creates a new sequential node 
> that has a number earlier than the last node it created.  I don't have full 
> logs, or a live system in this state, or any data directories, just some 
> partial server logs and the evidence as seen by the client.  Haven't tried 
> reproducing it yet, just wanted to see if anyone here had any ideas.  Here's 
> the scenario (probably more info than necessary, but trying to be complete)
> 1) Initially (5:37:20): 3 nodes up, with ids 215, 126, and 37 (called nodes 
> #1, #2, and #3 below):
> 2) Nodes periodically (and throughout this whole timeline) create sequential, 
> non-ephemeral nodes under the /zkrsm parent node.
> 3) 5:46:57: Node #1 gets notified of /zkrsm/0000000000000000_record0000002116
> 4) 5:47:06: Node #1 restarts and rejoins
> 5) 5:49:26: Node #2 gets notified of /zkrsm/0000000000000000_record0000002708
> 6) 5:49:29: Node #2 restarts and rejoins
> 7) 5:52:01: Node #3 gets notified of /zkrsm/0000000000000000_record0000003291
> 8) 5:52:02: Node #3 restarts and begins the rejoining process
> 9) 5:52:08: Node #1 successfully creates 
> /zkrsm/0000000000000000_record0000003348
> 10) 5:52:08: Node #2 dies after getting notified of 
> /zkrsm/0000000000000000_record0000003348
> 11) 5:52:10ish: Node #3 is elected leader (the ZK server log doesn't have 
> wallclock timestamps, so not exactly sure on the ordering of this step)
> 12) 5:52:15: Node #1 successfully creates 
> /zkrsm/0000000000000000_record0000003292
> Note that the node created in step #12 is lower than the one created in step 
> #9, and is exactly one greater than the last node seen by node #3 before it 
> restarted.
> Here is the sequence of session establishments as seen from the C client of 
> node #1 after its restart (the IP address of node #1=13.0.0.11, #2=13.0.0.12, 
> #3=13.0.0.13):
> 2011-03-18 05:46:59,838:17454(0x7fc57d3db710):ZOO_INFO@check_events@1632: 
> session establishment complete on server [13.0.0.13:2888], 
> sessionId=0x252ec780a3020000, negotiated timeout=6000
> 2011-03-18 05:49:32,194:17454(0x7fc57cbda710):ZOO_INFO@check_events@1632: 
> session establishment complete on server [13.0.0.13:2888], 
> sessionId=0x252ec782f5100002, negotiated timeout=6000
> 2011-03-18 05:52:02,352:17454(0x7fc57d3db710):ZOO_INFO@check_events@1632: 
> session establishment complete on server [13.0.0.12:2888], 
> sessionId=0x7e2ec782ff5f0001, negotiated timeout=6000
> 2011-03-18 05:52:08,583:17454(0x7fc57d3db710):ZOO_INFO@check_events@1632: 
> session establishment complete on server [13.0.0.11:2888], 
> sessionId=0x7e2ec782ff5f0001, negotiated timeout=6000
> 2011-03-18 05:52:13,834:17454(0x7fc57cbda710):ZOO_INFO@check_events@1632: 
> session establishment complete on server [13.0.0.11:2888], 
> sessionId=0xd72ec7856d0f0001, negotiated timeout=6000
> I will attach logs for all nodes after each of their restarts, and a partial 
> log for node #3 from before its restart.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1026) Sequence number assignment decreases after old node rejoins cluster

Reply via email to