[
https://issues.apache.org/jira/browse/ZOOKEEPER-4040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18048939#comment-18048939
]
Xin Chen commented on ZOOKEEPER-4040:
-------------------------------------
[~pf] Halo,did you try that base on a new version Zookeeper,such as 3.6.2 ? I
tried, and this issue can always be reproduced. Because the specified code:
org.apache.zookeeper.server.quorum.Learner#registerWithLeader is never changed.
{code:java}
} else if (newEpoch == self.getAcceptedEpoch()) {
// since we have already acked an epoch equal to the leaders, we cannot ack
// again, but we still need to send our lastZxid to the leader so that we
can
// sync with it if it does assume leadership of the epoch.
// the -1 indicates that this reply should not count as an ack for the new
epoch
wrappedEpochBytes.putInt(-1);
} else {
throw new IOException("Leaders epoch, " + newEpoch + " is less than
accepted epoch, " + self.getAcceptedEpoch());
} {code}
[~symat] [~ztzg] [~fekelund] [~akai12] I propose a seemingly simple repair
solution: directly force the acceptEpoch of the replica to be updated to the
newEpoch sent by the leader when encountering such a situation. When the
replica attempts to join the cluster through election again, it can avoid this
infinite loop of error reporting and successfully join the cluster.
Following is the bugfix code:
{code:java}
} else if (newEpoch == self.getAcceptedEpoch()) {
...
} else {
LOG.warn("Leaders epoch, {} is less than accepted epoch, {}", newEpoch,
self.getAcceptedEpoch());
// To avoid getting stuck in an infinite loop, when the acceptEpoch of a
learner is
// greater than that of the leader, the local epoch is set compulsorily
to the leader's epoch
self.setAcceptedEpoch(newEpoch);
throw new IOException("Leaders epoch, " + newEpoch + " is less than
accepted epoch, " + self.getAcceptedEpoch());
} {code}
If you are concerned about the inconsistency between acceptEpoch and
currentEpoch, you can update them synchronously:
{code:java}
} else if (newEpoch == self.getAcceptedEpoch()) {
...
} else {
LOG.warn("Leaders epoch, {} is less than accepted epoch, {}", newEpoch,
self.getAcceptedEpoch());
// To avoid getting stuck in an infinite loop, when the acceptEpoch of a
learner is
// greater than that of the leader, the local epoch is set compulsorily
to the leader's epoch
self.setCurrentEpoch(newEpoch);
self.setAcceptedEpoch(newEpoch);
throw new IOException("Leaders epoch, " + newEpoch + " is less than
accepted epoch, " + self.getAcceptedEpoch());
} {code}
Hope someone can reply to me and let me know if this fix is feasible. I have
reproduced the issue according to the steps and verified that this bugfix can
help the replica join the cluster and return to normal.
> java.io.IOException: Leaders epoch, 1 is less than accepted epoch, 2
> --------------------------------------------------------------------
>
> Key: ZOOKEEPER-4040
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4040
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.5.5, 3.5.8, 3.6.2
> Reporter: pengfei
> Priority: Major
> Attachments: image-2020-12-28-18-20-07-842.png,
> image-2020-12-28-18-23-14-073.png, image-2020-12-28-18-25-31-960.png,
> image-2020-12-28-18-28-07-015.png
>
>
> h4. Overview (mechanically translated from ZOOKEEPER-4039):
> The acceptedEpoch is too large and the corresponding node cannot join the
> cluster
> After the leader receives the acceptedEpoch of more than half of the nodes,
> it will set its acceptedEpoch to the maximum value of these nodes plus 1, but
> at this time, the leader’s downtime will cause the leader node’s
> acceptedEpoch to be 1 larger than other nodes, and then this node will
> restart again Be elected as the leader, go down again, and then the remaining
> nodes re-elect a leader. The epoch of this leader will be smaller than the
> acceptedEpoch of the original leader, which causes the original node to
> always look and switch the follower state
> Steps to reproduce:
> 3 nodes, server1, server2, server3
> Start server1, server2, and then stop server1 and server2 at the red dot
> below. At this time, the corresponding acceptedEpoch=1 of server2
> Restart server1, server2, and then stop server1 and server2 at the red dot
> below. At this time, the corresponding acceptedEpoch=2 of server2
> Restart server1, server3, wait for server1 and server3 to elect the
> corresponding leader as server3, and then start server2, the following
> exception will be repeated
> h4. errorlog:
> java.io.IOException: Leaders epoch, 1 is less than accepted epoch,
> 2java.io.IOException: Leaders epoch, 1 is less than accepted epoch, 2 at
> org.apache.zookeeper.server.quorum.Learner.registerWithLeader(Learner.java:353)
> at
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:78) at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1271)2020-12-28
> 18:09:25,176 [myid:2] - INFO
> [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2182)(secure=disabled):Follower@201]
> - shutdown calledjava.lang.Exception: shutdown Follower at
> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:201) at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1275)
>
> h4. sample:
> cluster all servers server1,server2,server3
> * start server1 and server2 ,then shutdown them when they arrive below, now
> the accpetedEpoch of server2 is 1 , server1 is 0, server3 is 0
> !image-2020-12-28-18-23-14-073.png!
> * then repeat step 1 , now the accpetedEpoch of server1 is 0,server2 is
> 2,server3 is 0 !image-2020-12-28-18-25-31-960.png!
> * then start server1 and server3 , wait unti the leader of the cluster is
> server3 , start server2 ,now generate the error below
> !image-2020-12-28-18-28-07-015.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)