[ https://issues.apache.org/jira/browse/CURATOR-653?focusedWorklogId=817941&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-817941 ]
ASF GitHub Bot logged work on CURATOR-653: ------------------------------------------ Author: ASF GitHub Bot Created on: 18/Oct/22 10:07 Start Date: 18/Oct/22 10:07 Worklog Time Spent: 10m Work Description: tisonkun commented on PR #436: URL: https://github.com/apache/curator/pull/436#issuecomment-1282147354 Merging... Issue Time Tracking ------------------- Worklog Id: (was: 817941) Time Spent: 2h 20m (was: 2h 10m) > Double leader for LeaderLatch > ----------------------------- > > Key: CURATOR-653 > URL: https://issues.apache.org/jira/browse/CURATOR-653 > Project: Apache Curator > Issue Type: Task > Components: Recipes > Reporter: Zili Chen > Assignee: Zili Chen > Priority: Major > Fix For: 5.4.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Reported by @woaishixiaoxiao: > When I use the LeaderLatch to select leader, there is a double-leader > phenomenon. > The timeline is as follows: > 1. The zk cluster switch leader node bescause of zxid overflow. The cluster > is unavailable to the outside world > 2. A client(not leader befor zxid overflow) and B client(is leader before > zxid overflow) enter the suspend state, B client set its leader status to > false > 3. The zk cluster complete the leader node election and the cluster back to > normal > 4. A client enter the reconnect state and call the reset function, set its > leader status to false. > 5. B client enter the reconnect state, call the reset function. set its > leader status to false. Delete its old path. > 6. A client receive preNodeDeleteEvent. Then getChildren from zkServer. > Find itself is the smallest number and set itself as a leader. > 7. B client create a new temporary node and then getChildren from zkServer. > Find itself not the node with the smallest serial number and listen to the > previous node delete event. > 8. A client delete its old path. > 9. B client receive the preNodeDeleteEvent. then getchildren from zkServer. > Find itself is the smallest sequence number and then set itself as a leader > 10. A client create a new temporary node and then getChildren from > zkServer. Find itself not the node with the smallest serial number and > listen to the previous node delete event. but it doesn't set itself as a > non-leader state. because of the sixth step operation, A still is leader > state now. > 11. now A client and B client are the leader at the same time -- This message was sent by Atlassian Jira (v8.20.10#820010)