[
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469516#comment-16469516
]
Bogdan Kanivets commented on ZOOKEEPER-2959:
--------------------------------------------
I think this is ready to merge. There are 3 PRs for 3.4, 3.5 and master.
Steps to reproduce the bug:
Start with 3 servers. Config:
{code:java}
clientPort=2181
leaderServes=yes
server.1=<server.1-ip>:2888:3888
server.2=<server.2-ip>:2888:3888
server.3=<server.3-ip>:2888:3888:observer
{code}
On server.2 block follower port from server.1 to server.2:
{code:java}
sudo iptables -A INPUT -s <server.1-ip> -p tcp --destination-port 2888 -j
DROP{code}
Start server.1, server.2 and server.3
Wait for server.2 to declare itself a leader and then fail in
waitForNewLeaderAck
{code:java}
2018-04-16 20:56:25,990 [myid:2] - INFO
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:Leader@361] - LEADING - LEADER
ELECTION TOOK - 3903
2018-04-16 20:56:27,275 [myid:2] - INFO
[LearnerHandler-/<server.3-ip>:29223:LearnerHandler@329] - Follower sid: 3 :
info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@136ca5bc
2018-04-16 20:56:27,281 [myid:2] - INFO
[LearnerHandler-/<server.3-ip>:29223:LearnerHandler@384] - Synchronizing with
Follower sid: 3 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x0
2018-04-16 20:56:27,281 [myid:2] - INFO
[LearnerHandler-/<server.3-ip>:29223:LearnerHandler@393] - leader and follower
are in sync, zxid=0x0
2018-04-16 20:56:27,282 [myid:2] - INFO
[LearnerHandler-/<server.3-ip>:29223:LearnerHandler@458] - Sending DIFF
2018-04-16 20:56:27,291 [myid:2] - INFO
[LearnerHandler-/<server.3-ip>:29223:LearnerHandler@518] - Received
NEWLEADER-ACK message from 3
2018-04-16 20:56:47,283 [myid:2] - INFO
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:Leader@502] - Shutting down
2018-04-16 20:56:47,284 [myid:2] - INFO
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:Leader@508] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Waiting for a quorum of
followers, only synced with sids: [ 2 ]
at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:508)
at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:406)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:859){code}
On server.2 check that currentEpoch is incremented in currentEpoch file. This
is the bug. Epoch is incremented in getEpochToPropose because server.3 is
counted in connectingFollowers.
> ignore accepted epoch and LEADERINFO ack from observers when a newly elected
> leader computes new epoch
> ------------------------------------------------------------------------------------------------------
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
> Issue Type: Bug
> Affects Versions: 3.4.10, 3.5.3
> Reporter: xiangyq000
> Assignee: Bogdan Kanivets
> Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners
> report their accepted epoch to the leader for the computation of new cluster
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet<Long> connectingFollowers = new HashSet<Long>();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
>
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer,
> and now the leader and the observer have reported their accepted epochs while
> neither of the followers has. Thus, the connectingFollowers set consists of
> two elements, resulting in a size of 2, which is greater than half quorum,
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it
> does not check whether the elements of the parameter are participants.
> The same flaw exists in
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)