[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18048939#comment-18048939
 ] 

Xin Chen edited comment on ZOOKEEPER-4040 at 1/4/26 9:39 AM:
-------------------------------------------------------------

[~pf] Halo,did you try that base on a new version Zookeeper,such as 3.6.2 ? I 
tried, and this issue can always be reproduced. Because the specified code: 
org.apache.zookeeper.server.quorum.Learner#registerWithLeader  is never changed.

 
{code:java}
} else if (newEpoch == self.getAcceptedEpoch()) {
    // since we have already acked an epoch equal to the leaders, we cannot ack
    // again, but we still need to send our lastZxid to the leader so that we 
can
    // sync with it if it does assume leadership of the epoch.
    // the -1 indicates that this reply should not count as an ack for the new 
epoch
       wrappedEpochBytes.putInt(-1);
} else {
    throw new IOException("Leaders epoch, " + newEpoch + " is less than 
accepted epoch, " + self.getAcceptedEpoch());
} {code}
 

[~symat] [~ztzg] [~fekelund] [~akai12]  I  propose a seemingly simple repair 
solution: directly force the acceptEpoch of the replica to be updated to the 
newEpoch sent by the leader when encountering such a situation. When the 
replica attempts to join the cluster through election again, it can avoid this 
infinite loop of error reporting and successfully join the cluster.

Following is the bugfix code:
{code:java}
} else if (newEpoch == self.getAcceptedEpoch()) {
    ...
} else {
       LOG.warn("Leaders epoch, {} is less than accepted epoch, {}", newEpoch, 
self.getAcceptedEpoch());
       // To avoid getting stuck in an infinite loop, when the acceptEpoch of a 
learner is
       // greater than that of the leader, the local epoch is set compulsorily 
to the leader's epoch
       self.setAcceptedEpoch(newEpoch);
    throw new IOException("Leaders epoch, " + newEpoch + " is less than 
accepted epoch, " + self.getAcceptedEpoch());
} {code}
If you are concerned about the inconsistency between acceptEpoch and 
currentEpoch, you can update them synchronously:
{code:java}
} else if (newEpoch == self.getAcceptedEpoch()) {
    ...
} else {
       LOG.warn("Leaders epoch, {} is less than accepted epoch, {}", newEpoch, 
self.getAcceptedEpoch());
       // To avoid getting stuck in an infinite loop, when the acceptEpoch of a 
learner is
       // greater than that of the leader, the local epoch is set compulsorily 
to the leader's epoch
       self.setCurrentEpoch(newEpoch);
       self.setAcceptedEpoch(newEpoch);
    throw new IOException("Leaders epoch, " + newEpoch + " is less than 
accepted epoch, " + self.getAcceptedEpoch());
} {code}
Hope someone can reply to me and let me know if this fix is feasible. I have 
reproduced the issue according to the steps and verified that this bugfix can 
help the replica join the cluster and return to normal.

 


was (Author: JIRAUSER298666):
[~pf] Halo,did you try that base on a new version Zookeeper,such as 3.6.2 ? I 
tried, and this issue can always be reproduced. Because the specified code: 
org.apache.zookeeper.server.quorum.Learner#registerWithLeader  is never changed.

 
{code:java}
} else if (newEpoch == self.getAcceptedEpoch()) {
    // since we have already acked an epoch equal to the leaders, we cannot ack
    // again, but we still need to send our lastZxid to the leader so that we 
can
    // sync with it if it does assume leadership of the epoch.
    // the -1 indicates that this reply should not count as an ack for the new 
epoch
       wrappedEpochBytes.putInt(-1);
} else {
    throw new IOException("Leaders epoch, " + newEpoch + " is less than 
accepted epoch, " + self.getAcceptedEpoch());
} {code}
 

 

[~symat] [~ztzg] [~fekelund] [~akai12]  I  propose a seemingly simple repair 
solution: directly force the acceptEpoch of the replica to be updated to the 
newEpoch sent by the leader when encountering such a situation. When the 
replica attempts to join the cluster through election again, it can avoid this 
infinite loop of error reporting and successfully join the cluster.

Following is the bugfix code:
{code:java}
} else if (newEpoch == self.getAcceptedEpoch()) {
    ...
} else {
       LOG.warn("Leaders epoch, {} is less than accepted epoch, {}", newEpoch, 
self.getAcceptedEpoch());
       // To avoid getting stuck in an infinite loop, when the acceptEpoch of a 
learner is
       // greater than that of the leader, the local epoch is set compulsorily 
to the leader's epoch
       self.setAcceptedEpoch(newEpoch);
    throw new IOException("Leaders epoch, " + newEpoch + " is less than 
accepted epoch, " + self.getAcceptedEpoch());
} {code}
If you are concerned about the inconsistency between acceptEpoch and 
currentEpoch, you can update them synchronously:

 
{code:java}
} else if (newEpoch == self.getAcceptedEpoch()) {
    ...
} else {
       LOG.warn("Leaders epoch, {} is less than accepted epoch, {}", newEpoch, 
self.getAcceptedEpoch());
       // To avoid getting stuck in an infinite loop, when the acceptEpoch of a 
learner is
       // greater than that of the leader, the local epoch is set compulsorily 
to the leader's epoch
       self.setCurrentEpoch(newEpoch);
       self.setAcceptedEpoch(newEpoch);
    throw new IOException("Leaders epoch, " + newEpoch + " is less than 
accepted epoch, " + self.getAcceptedEpoch());
} {code}
Hope someone can reply to me and let me know if this fix is feasible. I have 
reproduced the issue according to the steps and verified that this bugfix can 
help the replica join the cluster and return to normal.

 

> java.io.IOException: Leaders epoch, 1 is less than accepted epoch, 2
> --------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-4040
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4040
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.5.5, 3.5.8, 3.6.2
>            Reporter: pengfei
>            Priority: Major
>         Attachments: image-2020-12-28-18-20-07-842.png, 
> image-2020-12-28-18-23-14-073.png, image-2020-12-28-18-25-31-960.png, 
> image-2020-12-28-18-28-07-015.png
>
>
> h4. Overview (mechanically translated from ZOOKEEPER-4039):
> The acceptedEpoch is too large and the corresponding node cannot join the 
> cluster
> After the leader receives the acceptedEpoch of more than half of the nodes, 
> it will set its acceptedEpoch to the maximum value of these nodes plus 1, but 
> at this time, the leader’s downtime will cause the leader node’s 
> acceptedEpoch to be 1 larger than other nodes, and then this node will 
> restart again Be elected as the leader, go down again, and then the remaining 
> nodes re-elect a leader. The epoch of this leader will be smaller than the 
> acceptedEpoch of the original leader, which causes the original node to 
> always look and switch the follower state
> Steps to reproduce:
> 3 nodes, server1, server2, server3
> Start server1, server2, and then stop server1 and server2 at the red dot 
> below. At this time, the corresponding acceptedEpoch=1 of server2
> Restart server1, server2, and then stop server1 and server2 at the red dot 
> below. At this time, the corresponding acceptedEpoch=2 of server2
> Restart server1, server3, wait for server1 and server3 to elect the 
> corresponding leader as server3, and then start server2, the following 
> exception will be repeated
> h4. errorlog:
> java.io.IOException: Leaders epoch, 1 is less than accepted epoch, 
> 2java.io.IOException: Leaders epoch, 1 is less than accepted epoch, 2 at 
> org.apache.zookeeper.server.quorum.Learner.registerWithLeader(Learner.java:353)
>  at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:78) at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1271)2020-12-28
>  18:09:25,176 [myid:2] - INFO  
> [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2182)(secure=disabled):Follower@201]
>  - shutdown calledjava.lang.Exception: shutdown Follower at 
> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:201) at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1275)
>  
> h4. sample:
> cluster all servers server1,server2,server3
>  * start server1 and server2 ,then shutdown them when they arrive below, now 
> the accpetedEpoch of server2 is 1 , server1 is 0, server3 is 0  
> !image-2020-12-28-18-23-14-073.png!
>  * then repeat step 1 , now the accpetedEpoch of server1 is 0,server2 is 
> 2,server3 is 0  !image-2020-12-28-18-25-31-960.png!
>  * then start server1 and server3 , wait unti the leader of the cluster is 
> server3 , start server2 ,now generate the error below  
> !image-2020-12-28-18-28-07-015.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to