Sequence number assignment decreases after old node rejoins cluster
-------------------------------------------------------------------

                 Key: ZOOKEEPER-1026
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1026
             Project: ZooKeeper
          Issue Type: Bug
          Components: server
    Affects Versions: 3.3.3
            Reporter: Jeremy Stribling


I ran into a weird case where a Zookeeper server rejoins the cluster after 
missing several operations, and then a client creates a new sequential node 
that has a number earlier than the last node it created.  I don't have full 
logs, or a live system in this state, or any data directories, just some 
partial server logs and the evidence as seen by the client.  Haven't tried 
reproducing it yet, just wanted to see if anyone here had any ideas.  Here's 
the scenario (probably more info than necessary, but trying to be complete)

1) Initially (5:37:20): 3 nodes up, with ids 215, 126, and 37 (called nodes #1, 
#2, and #3 below):
2) Nodes periodically (and throughout this whole timeline) create sequential, 
non-ephemeral nodes under the /zkrsm parent node.
3) 5:46:57: Node #1 gets notified of /zkrsm/0000000000000000_record0000002116
4) 5:47:06: Node #1 restarts and rejoins
5) 5:49:26: Node #2 gets notified of /zkrsm/0000000000000000_record0000002708
6) 5:49:29: Node #2 restarts and rejoins
7) 5:52:01: Node #3 gets notified of /zkrsm/0000000000000000_record0000003291
8) 5:52:02: Node #3 restarts and begins the rejoining process
9) 5:52:08: Node #1 successfully creates 
/zkrsm/0000000000000000_record0000003348
10) 5:52:08: Node #2 dies after getting notified of 
/zkrsm/0000000000000000_record0000003348
11) 5:52:10ish: Node #3 is elected leader (the ZK server log doesn't have 
wallclock timestamps, so not exactly sure on the ordering of this step)
12) 5:52:15: Node #1 successfully creates 
/zkrsm/0000000000000000_record0000003292

Note that the node created in step #12 is lower than the one created in step 
#9, and is exactly one greater than the last node seen by node #3 before it 
restarted.

Here is the sequence of session establishments as seen from the C client of 
node #1 after its restart (the IP address of node #1=13.0.0.11, #2=13.0.0.12, 
#3=13.0.0.13):

2011-03-18 05:46:59,838:17454(0x7fc57d3db710):ZOO_INFO@check_events@1632: 
session establishment complete on server [13.0.0.13:2888], 
sessionId=0x252ec780a3020000, negotiated timeout=6000
2011-03-18 05:49:32,194:17454(0x7fc57cbda710):ZOO_INFO@check_events@1632: 
session establishment complete on server [13.0.0.13:2888], 
sessionId=0x252ec782f5100002, negotiated timeout=6000
2011-03-18 05:52:02,352:17454(0x7fc57d3db710):ZOO_INFO@check_events@1632: 
session establishment complete on server [13.0.0.12:2888], 
sessionId=0x7e2ec782ff5f0001, negotiated timeout=6000
2011-03-18 05:52:08,583:17454(0x7fc57d3db710):ZOO_INFO@check_events@1632: 
session establishment complete on server [13.0.0.11:2888], 
sessionId=0x7e2ec782ff5f0001, negotiated timeout=6000
2011-03-18 05:52:13,834:17454(0x7fc57cbda710):ZOO_INFO@check_events@1632: 
session establishment complete on server [13.0.0.11:2888], 
sessionId=0xd72ec7856d0f0001, negotiated timeout=6000

I will attach logs for all nodes after each of their restarts, and a partial 
log for node #3 from before its restart.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to