[jira] [Work logged] (CURATOR-653) Double leader for LeaderLatch

2022-09-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/CURATOR-653?focusedWorklogId=813330&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-813330
 ]

ASF GitHub Bot logged work on CURATOR-653:
--

Author: ASF GitHub Bot
Created on: 29/Sep/22 13:34
Start Date: 29/Sep/22 13:34
Worklog Time Spent: 10m 
  Work Description: eolivelli commented on PR #398:
URL: https://github.com/apache/curator/pull/398#issuecomment-1262293140

   @tisonkun thanks for fixing the test
   @woaishixiaoxiao do you agree with @tisonkun 's fix ?




Issue Time Tracking
---

Worklog Id: (was: 813330)
Time Spent: 0.5h  (was: 20m)

> Double leader for LeaderLatch
> -
>
> Key: CURATOR-653
> URL: https://issues.apache.org/jira/browse/CURATOR-653
> Project: Apache Curator
>  Issue Type: Task
>  Components: Recipes
>Reporter: Zili Chen
>Assignee: Zili Chen
>Priority: Major
> Fix For: 5.4.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Reported by @woaishixiaoxiao:
> When I use the LeaderLatch to select leader,  there is a double-leader 
> phenomenon.
> The timeline is as follows:
> 1. The zk cluster switch leader node bescause of zxid overflow. The cluster 
> is unavailable to the outside world
> 2. A client(not leader befor zxid overflow) and B client(is leader before 
> zxid overflow) enter the suspend state, B client set  its leader status to 
> false
> 3. The zk cluster complete the leader node election and the cluster back to 
> normal
> 4. A client enter the reconnect state  and  call the reset function, set its 
> leader status to false. 
> 5. B client enter the reconnect state, call the reset function. set its 
> leader status to false.  Delete its old path.
> 6. A client receive preNodeDeleteEvent.  Then getChildren from zkServer.  
> Find itself is the smallest number and set itself as a leader.
> 7. B client create a new temporary node  and then getChildren from zkServer.  
> Find itself not the node with the smallest serial number and listen to the 
> previous node delete event.
> 8. A client delete its old path.
> 9. B client receive the preNodeDeleteEvent. then getchildren from zkServer. 
> Find itself is the smallest sequence number and then set itself as a leader
> 10. A client create  a new temporary node  and then getChildren from 
> zkServer.  Find itself not the node with the smallest serial number and 
> listen to the previous node delete event. but it doesn't  set itself as a 
> non-leader state. because of the sixth step operation, A still is leader 
> state now.
> 11. now  A client and B client are  the leader at the same time 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [curator] eolivelli commented on pull request #398: CURATOR-653: fix potential double leader for LeaderLatch

2022-09-29 Thread GitBox


eolivelli commented on PR #398:
URL: https://github.com/apache/curator/pull/398#issuecomment-1262293140

   @tisonkun thanks for fixing the test
   @woaishixiaoxiao do you agree with @tisonkun 's fix ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: Next 5.4.0 release ?

2022-09-29 Thread tison
Hi Enrico,

Thanks for starting this discussion. +1 for a new releease.

I'll pick up one patch to see if we can include in the release
https://github.com/apache/curator/pull/398. See also my last comment there.

Best,
tison.


Enrico Olivelli  于2022年9月29日周四 20:50写道:

> Hello,
> Is there any show stopper for a new 5.4.0 release ?
>
> We have many goodies, like supporting ZookKeeper 3.7.1 and use
> ZookKeeperServerEmbedded.
>
> if no-one objects I will start a release in the next days
>
> Best regards
> Enrico
>


[jira] [Work logged] (CURATOR-653) Double leader for LeaderLatch

2022-09-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/CURATOR-653?focusedWorklogId=813329&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-813329
 ]

ASF GitHub Bot logged work on CURATOR-653:
--

Author: ASF GitHub Bot
Created on: 29/Sep/22 13:24
Start Date: 29/Sep/22 13:24
Worklog Time Spent: 10m 
  Work Description: tisonkun commented on PR #398:
URL: https://github.com/apache/curator/pull/398#issuecomment-1262277533

   I adjust the test to inject force `reset`s instead of depending on 
connection loss. Although this means it should be a non-real-world case now, I 
still agree on `setLeadership(false)` on `checkLeadership` find the latch isn't 
the leader. `setLeadership(false)` is idempotent.




Issue Time Tracking
---

Worklog Id: (was: 813329)
Time Spent: 20m  (was: 10m)

> Double leader for LeaderLatch
> -
>
> Key: CURATOR-653
> URL: https://issues.apache.org/jira/browse/CURATOR-653
> Project: Apache Curator
>  Issue Type: Task
>  Components: Recipes
>Reporter: Zili Chen
>Assignee: Zili Chen
>Priority: Major
> Fix For: 5.4.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Reported by @woaishixiaoxiao:
> When I use the LeaderLatch to select leader,  there is a double-leader 
> phenomenon.
> The timeline is as follows:
> 1. The zk cluster switch leader node bescause of zxid overflow. The cluster 
> is unavailable to the outside world
> 2. A client(not leader befor zxid overflow) and B client(is leader before 
> zxid overflow) enter the suspend state, B client set  its leader status to 
> false
> 3. The zk cluster complete the leader node election and the cluster back to 
> normal
> 4. A client enter the reconnect state  and  call the reset function, set its 
> leader status to false. 
> 5. B client enter the reconnect state, call the reset function. set its 
> leader status to false.  Delete its old path.
> 6. A client receive preNodeDeleteEvent.  Then getChildren from zkServer.  
> Find itself is the smallest number and set itself as a leader.
> 7. B client create a new temporary node  and then getChildren from zkServer.  
> Find itself not the node with the smallest serial number and listen to the 
> previous node delete event.
> 8. A client delete its old path.
> 9. B client receive the preNodeDeleteEvent. then getchildren from zkServer. 
> Find itself is the smallest sequence number and then set itself as a leader
> 10. A client create  a new temporary node  and then getChildren from 
> zkServer.  Find itself not the node with the smallest serial number and 
> listen to the previous node delete event. but it doesn't  set itself as a 
> non-leader state. because of the sixth step operation, A still is leader 
> state now.
> 11. now  A client and B client are  the leader at the same time 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [curator] tisonkun commented on pull request #398: CURATOR-653: fix potential double leader for LeaderLatch

2022-09-29 Thread GitBox


tisonkun commented on PR #398:
URL: https://github.com/apache/curator/pull/398#issuecomment-1262277533

   I adjust the test to inject force `reset`s instead of depending on 
connection loss. Although this means it should be a non-real-world case now, I 
still agree on `setLeadership(false)` on `checkLeadership` find the latch isn't 
the leader. `setLeadership(false)` is idempotent.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Next 5.4.0 release ?

2022-09-29 Thread Enrico Olivelli
Hello,
Is there any show stopper for a new 5.4.0 release ?

We have many goodies, like supporting ZookKeeper 3.7.1 and use
ZookKeeperServerEmbedded.

if no-one objects I will start a release in the next days

Best regards
Enrico