[ 
https://issues.apache.org/jira/browse/CURATOR-653?focusedWorklogId=815212&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-815212
 ]

ASF GitHub Bot logged work on CURATOR-653:
------------------------------------------

                Author: ASF GitHub Bot
            Created on: 10/Oct/22 13:35
            Start Date: 10/Oct/22 13:35
    Worklog Time Spent: 10m 
      Work Description: XComp commented on PR #398:
URL: https://github.com/apache/curator/pull/398#issuecomment-1273322972

   Yes, but the biggest part of the diff is the last commit 
([adaee91](https://github.com/apache/curator/pull/436/commits/adaee91289014f06b60e512b8377de7f9fe4ebd6)):
 I'm ok with reverting that one if you think it's too much. The changes related 
to the comments of my review of this PR are included in the commits excluding 
[adaee91](https://github.com/apache/curator/pull/436/commits/adaee91289014f06b60e512b8377de7f9fe4ebd6)
 in #436 .




Issue Time Tracking
-------------------

    Worklog Id:     (was: 815212)
    Time Spent: 1h 50m  (was: 1h 40m)

> Double leader for LeaderLatch
> -----------------------------
>
>                 Key: CURATOR-653
>                 URL: https://issues.apache.org/jira/browse/CURATOR-653
>             Project: Apache Curator
>          Issue Type: Task
>          Components: Recipes
>            Reporter: Zili Chen
>            Assignee: Zili Chen
>            Priority: Major
>             Fix For: 5.4.0
>
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Reported by @woaishixiaoxiao:
> When I use the LeaderLatch to select leader,  there is a double-leader 
> phenomenon.
> The timeline is as follows:
> 1. The zk cluster switch leader node bescause of zxid overflow. The cluster 
> is unavailable to the outside world
> 2. A client(not leader befor zxid overflow) and B client(is leader before 
> zxid overflow) enter the suspend state, B client set  its leader status to 
> false
> 3. The zk cluster complete the leader node election and the cluster back to 
> normal
> 4. A client enter the reconnect state  and  call the reset function, set its 
> leader status to false. 
> 5. B client enter the reconnect state, call the reset function. set its 
> leader status to false.  Delete its old path.
> 6. A client receive preNodeDeleteEvent.  Then getChildren from zkServer.  
> Find itself is the smallest number and set itself as a leader.
> 7. B client create a new temporary node  and then getChildren from zkServer.  
> Find itself not the node with the smallest serial number and listen to the 
> previous node delete event.
> 8. A client delete its old path.
> 9. B client receive the preNodeDeleteEvent. then getchildren from zkServer. 
> Find itself is the smallest sequence number and then set itself as a leader
> 10. A client create  a new temporary node  and then getChildren from 
> zkServer.  Find itself not the node with the smallest serial number and 
> listen to the previous node delete event. but it doesn't  set itself as a 
> non-leader state. because of the sixth step operation, A still is leader 
> state now.
> 11. now  A client and B client are  the leader at the same time 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to