[ https://issues.apache.org/jira/browse/ZOOKEEPER-4040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258800#comment-17258800 ]
Mate Szalay-Beko commented on ZOOKEEPER-4040: --------------------------------------------- [~pf], thanks for reporting the issue! Could you please try to reproduce this issue with the latest stable ZooKeeper versions? (3.5.8 or 3.6.2) I remember I did some fixes recently around this part of the code in ZOOKEEPER-3829. Also maybe others pushed other fixes. 3.5.5 was released in May, 2019. > java.io.IOException: Leaders epoch, 1 is less than accepted epoch, 2 > -------------------------------------------------------------------- > > Key: ZOOKEEPER-4040 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4040 > Project: ZooKeeper > Issue Type: Bug > Components: server > Affects Versions: 3.5.5 > Reporter: pengfei > Priority: Major > Attachments: image-2020-12-28-18-20-07-842.png, > image-2020-12-28-18-23-14-073.png, image-2020-12-28-18-25-31-960.png, > image-2020-12-28-18-28-07-015.png > > > h4. Overview (mechanically translated from ZOOKEEPER-4039): > The acceptedEpoch is too large and the corresponding node cannot join the > cluster > After the leader receives the acceptedEpoch of more than half of the nodes, > it will set its acceptedEpoch to the maximum value of these nodes plus 1, but > at this time, the leader’s downtime will cause the leader node’s > acceptedEpoch to be 1 larger than other nodes, and then this node will > restart again Be elected as the leader, go down again, and then the remaining > nodes re-elect a leader. The epoch of this leader will be smaller than the > acceptedEpoch of the original leader, which causes the original node to > always look and switch the follower state > Steps to reproduce: > 3 nodes, server1, server2, server3 > Start server1, server2, and then stop server1 and server2 at the red dot > below. At this time, the corresponding acceptedEpoch=1 of server2 > Restart server1, server2, and then stop server1 and server2 at the red dot > below. At this time, the corresponding acceptedEpoch=2 of server2 > Restart server1, server3, wait for server1 and server3 to elect the > corresponding leader as server3, and then start server2, the following > exception will be repeated > h4. errorlog: > java.io.IOException: Leaders epoch, 1 is less than accepted epoch, > 2java.io.IOException: Leaders epoch, 1 is less than accepted epoch, 2 at > org.apache.zookeeper.server.quorum.Learner.registerWithLeader(Learner.java:353) > at > org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:78) at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1271)2020-12-28 > 18:09:25,176 [myid:2] - INFO > [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2182)(secure=disabled):Follower@201] > - shutdown calledjava.lang.Exception: shutdown Follower at > org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:201) at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1275) > > h4. sample: > cluster all servers server1,server2,server3 > * start server1 and server2 ,then shutdown them when they arrive below, now > the accpetedEpoch of server2 is 1 , server1 is 0, server3 is 0 > !image-2020-12-28-18-23-14-073.png! > * then repeat step 1 , now the accpetedEpoch of server1 is 0,server2 is > 2,server3 is 0 !image-2020-12-28-18-25-31-960.png! > * then start server1 and server3 , wait unti the leader of the cluster is > server3 , start server2 ,now generate the error below > !image-2020-12-28-18-28-07-015.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)