[
https://issues.apache.org/jira/browse/CURATOR-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845147#comment-17845147
]
Gian Merlino commented on CURATOR-696:
--------------------------------------
I think we see this same sequence of actions, leading to two active leaders, in
Apache Druid since updating to Curator 5.4. Details are here about what we saw:
https://github.com/apache/druid/issues/16411#issuecomment-2103564632
A theory is that the change in CURATOR-644 from {{reset()}} to
{{getChildren()}} on reconnection leads to a situation where the server does
not realize that its znode no longer exists. The theory is that latch recipe
sees an ephemeral node with the expected name, but it's from a previous
session, and it goes away when the previous session expires. Perhaps a fix
could be to check that the session of the old znode matches the current
session, not just the name.
> Double leader for LeaderLatch
> -----------------------------
>
> Key: CURATOR-696
> URL: https://issues.apache.org/jira/browse/CURATOR-696
> Project: Apache Curator
> Issue Type: Bug
> Affects Versions: 5.4.0, 5.5.0
> Reporter: lurna
> Assignee: Enrico Olivelli
> Priority: Critical
>
> When I use the LeaderLatch to select leader, there is a double-leader
> phenomenon.
> The timeline is as follows:
> 1.A client connected and set its leader status to true
> 2.zk offline until the session with the A client expires
> 3.zk online,A client Reconnected and set its leader status to true with old
> path
> 4.zk delete old path(A client)because of expires
> 5.A client cannot perceive that its node has been deleted,continues to
> believe that it is the leader
> 6.B client connected,due to zk's node being empty, set its leader status to
> true
> 7.now A client and B client are the leader at the same time
>
> It seems that due to CURATOR-644 and CURATOR-645
--
This message was sent by Atlassian Jira
(v8.20.10#820010)