[
https://issues.apache.org/jira/browse/ZOOKEEPER-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009907#comment-13009907
]
Vishal K commented on ZOOKEEPER-1026:
-------------------------------------
Could it be this:
A. Notice that suddently node 2 goes in following state:
2533 [WorkerReceiver Thread] INFO
org.apache.zookeeper.server.quorum.FastLeaderElection - Notification: 126
(n.leader), 17179869911 (n.zxid), 1 (n.round), LOOKING (n.state), 126 (n.sid),
LOOKING (my state)
2533 [WorkerReceiver Thread] DEBUG
org.apache.zookeeper.server.quorum.FastLeaderElection - Receive new
notification message. My id = 126
2533 [WorkerReceiver Thread] INFO
org.apache.zookeeper.server.quorum.FastLeaderElection - Notification: 37
(n.leader), 17179869831 (n.zxid), 3 (n.round), LOOKING (n.state), 37 (n.sid),
LOOKING (my state)
2533 [WorkerReceiver Thread] DEBUG
org.apache.zookeeper.server.quorum.FastLeaderElection - Receive new
notification message. My id = 126
2533 [WorkerReceiver Thread] INFO
org.apache.zookeeper.server.quorum.FastLeaderElection - Notification: 37
(n.leader), 17179869831 (n.zxid), 3 (n.round), LEADING (n.state), 37 (n.sid),
LOOKING (my state)
2534 [QuorumPeer:/0.0.0.0:2888] DEBUG
org.apache.zookeeper.server.quorum.FastLeaderElection - id: 126, proposed id:
126, zxid: 17179869911, proposed zxid: 17179869911
2534 [QuorumPeer:/0.0.0.0:2888] DEBUG
org.apache.zookeeper.server.quorum.FastLeaderElection - Adding vote: From =
126, Proposed leader = 126, Porposed zxid = 17179869911, Proposed epoch = 1
2534 [QuorumPeer:/0.0.0.0:2888] DEBUG
org.apache.zookeeper.server.quorum.FastLeaderElection - id: 37, proposed id:
126, zxid: 17179869831, proposed zxid: 17179869911
2534 [QuorumPeer:/0.0.0.0:2888] DEBUG
org.apache.zookeeper.server.quorum.FastLeaderElection - Adding vote: From =
37, Proposed leader = 37, Porposed zxid = 17179869831, Proposed epoch = 3
2534 [QuorumPeer:/0.0.0.0:2888] DEBUG
org.apache.zookeeper.server.quorum.FastLeaderElection - I'm a participant: 126
B. Due to ZOOKEEPER-975, 3 goes in LEADING state because it receives history of
old notifications from node 1 and node2.
node3_after_restart:
2653 [WorkerReceiver Thread] INFO
org.apache.zookeeper.server.quorum.FastLeaderElection - Notification: 37
(n.leader), 17179869831 (n.zxid), 1 (n.round), LOOKING (n.state), 37 (n.sid),
LOOKING (my state)
//If have received from all nodes, then terminate
if ((self.getVotingView().size() == recvset.size())
&&
(self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
self.setPeerState((proposedLeader ==
self.getId()) ?
ServerState.LEADING: learningState());
leaveInstance();
return new Vote(proposedLeader, proposedZxid);
}
C. After which node 2 starts following node 3 and then node 1 starts following
node 3 (because 2 is following 3)
due to:
default:
/*
* There is at most one leader for each epoch, so if a
* peer claims to be the leader for an epoch, then that
* peer must be the leader (no* arbitrary failures
* assumed). Now, if there is no quorum supporting
* this leader, then processes will naturally move
* to a new epoch.
*/
if(n.epoch == logicalclock){
recvset.put(n.sid, new Vote(n.leader, n.zxid,
n.epoch));
if((n.state == ServerState.LEADING) ||
(termPredicate(recvset, new Vote(n.leader,
n.zxid, n.epoch, n.state))
&& checkLeader(outofelection,
n.leader, n.epoch)) ){
self.setPeerState((n.leader == self.getId()) ?
ServerState.LEADING: learningState());
leaveInstance();
return new Vote(n.leader, n.zxid);
}
}
> Sequence number assignment decreases after old node rejoins cluster
> -------------------------------------------------------------------
>
> Key: ZOOKEEPER-1026
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1026
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.3.3
> Reporter: Jeremy Stribling
> Attachments: ZOOKEEPER-1026.logs.tgz
>
>
> I ran into a weird case where a Zookeeper server rejoins the cluster after
> missing several operations, and then a client creates a new sequential node
> that has a number earlier than the last node it created. I don't have full
> logs, or a live system in this state, or any data directories, just some
> partial server logs and the evidence as seen by the client. Haven't tried
> reproducing it yet, just wanted to see if anyone here had any ideas. Here's
> the scenario (probably more info than necessary, but trying to be complete)
> 1) Initially (5:37:20): 3 nodes up, with ids 215, 126, and 37 (called nodes
> #1, #2, and #3 below):
> 2) Nodes periodically (and throughout this whole timeline) create sequential,
> non-ephemeral nodes under the /zkrsm parent node.
> 3) 5:46:57: Node #1 gets notified of /zkrsm/0000000000000000_record0000002116
> 4) 5:47:06: Node #1 restarts and rejoins
> 5) 5:49:26: Node #2 gets notified of /zkrsm/0000000000000000_record0000002708
> 6) 5:49:29: Node #2 restarts and rejoins
> 7) 5:52:01: Node #3 gets notified of /zkrsm/0000000000000000_record0000003291
> 8) 5:52:02: Node #3 restarts and begins the rejoining process
> 9) 5:52:08: Node #1 successfully creates
> /zkrsm/0000000000000000_record0000003348
> 10) 5:52:08: Node #2 dies after getting notified of
> /zkrsm/0000000000000000_record0000003348
> 11) 5:52:10ish: Node #3 is elected leader (the ZK server log doesn't have
> wallclock timestamps, so not exactly sure on the ordering of this step)
> 12) 5:52:15: Node #1 successfully creates
> /zkrsm/0000000000000000_record0000003292
> Note that the node created in step #12 is lower than the one created in step
> #9, and is exactly one greater than the last node seen by node #3 before it
> restarted.
> Here is the sequence of session establishments as seen from the C client of
> node #1 after its restart (the IP address of node #1=13.0.0.11, #2=13.0.0.12,
> #3=13.0.0.13):
> 2011-03-18 05:46:59,838:17454(0x7fc57d3db710):ZOO_INFO@check_events@1632:
> session establishment complete on server [13.0.0.13:2888],
> sessionId=0x252ec780a3020000, negotiated timeout=6000
> 2011-03-18 05:49:32,194:17454(0x7fc57cbda710):ZOO_INFO@check_events@1632:
> session establishment complete on server [13.0.0.13:2888],
> sessionId=0x252ec782f5100002, negotiated timeout=6000
> 2011-03-18 05:52:02,352:17454(0x7fc57d3db710):ZOO_INFO@check_events@1632:
> session establishment complete on server [13.0.0.12:2888],
> sessionId=0x7e2ec782ff5f0001, negotiated timeout=6000
> 2011-03-18 05:52:08,583:17454(0x7fc57d3db710):ZOO_INFO@check_events@1632:
> session establishment complete on server [13.0.0.11:2888],
> sessionId=0x7e2ec782ff5f0001, negotiated timeout=6000
> 2011-03-18 05:52:13,834:17454(0x7fc57cbda710):ZOO_INFO@check_events@1632:
> session establishment complete on server [13.0.0.11:2888],
> sessionId=0xd72ec7856d0f0001, negotiated timeout=6000
> I will attach logs for all nodes after each of their restarts, and a partial
> log for node #3 from before its restart.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira