[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258800#comment-17258800
 ] 

Mate Szalay-Beko commented on ZOOKEEPER-4040:
---------------------------------------------

[~pf], thanks for reporting the issue!
Could you please try to reproduce this issue with the latest stable ZooKeeper 
versions? (3.5.8 or 3.6.2) 
I remember I did some fixes recently around this part of the code in 
ZOOKEEPER-3829. Also maybe others pushed other fixes. 3.5.5 was released in 
May, 2019.

> java.io.IOException: Leaders epoch, 1 is less than accepted epoch, 2
> --------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-4040
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4040
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.5.5
>            Reporter: pengfei
>            Priority: Major
>         Attachments: image-2020-12-28-18-20-07-842.png, 
> image-2020-12-28-18-23-14-073.png, image-2020-12-28-18-25-31-960.png, 
> image-2020-12-28-18-28-07-015.png
>
>
> h4. Overview (mechanically translated from ZOOKEEPER-4039):
> The acceptedEpoch is too large and the corresponding node cannot join the 
> cluster
> After the leader receives the acceptedEpoch of more than half of the nodes, 
> it will set its acceptedEpoch to the maximum value of these nodes plus 1, but 
> at this time, the leader’s downtime will cause the leader node’s 
> acceptedEpoch to be 1 larger than other nodes, and then this node will 
> restart again Be elected as the leader, go down again, and then the remaining 
> nodes re-elect a leader. The epoch of this leader will be smaller than the 
> acceptedEpoch of the original leader, which causes the original node to 
> always look and switch the follower state
> Steps to reproduce:
> 3 nodes, server1, server2, server3
> Start server1, server2, and then stop server1 and server2 at the red dot 
> below. At this time, the corresponding acceptedEpoch=1 of server2
> Restart server1, server2, and then stop server1 and server2 at the red dot 
> below. At this time, the corresponding acceptedEpoch=2 of server2
> Restart server1, server3, wait for server1 and server3 to elect the 
> corresponding leader as server3, and then start server2, the following 
> exception will be repeated
> h4. errorlog:
> java.io.IOException: Leaders epoch, 1 is less than accepted epoch, 
> 2java.io.IOException: Leaders epoch, 1 is less than accepted epoch, 2 at 
> org.apache.zookeeper.server.quorum.Learner.registerWithLeader(Learner.java:353)
>  at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:78) at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1271)2020-12-28
>  18:09:25,176 [myid:2] - INFO  
> [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2182)(secure=disabled):Follower@201]
>  - shutdown calledjava.lang.Exception: shutdown Follower at 
> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:201) at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1275)
>  
> h4. sample:
> cluster all servers server1,server2,server3
>  * start server1 and server2 ,then shutdown them when they arrive below, now 
> the accpetedEpoch of server2 is 1 , server1 is 0, server3 is 0  
> !image-2020-12-28-18-23-14-073.png!
>  * then repeat step 1 , now the accpetedEpoch of server1 is 0,server2 is 
> 2,server3 is 0  !image-2020-12-28-18-25-31-960.png!
>  * then start server1 and server3 , wait unti the leader of the cluster is 
> server3 , start server2 ,now generate the error below  
> !image-2020-12-28-18-28-07-015.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to