Jordan Zimmerman created CURATOR-3:
--------------------------------------

             Summary: LeaderLatch race condition causing extra nodes to be 
added in Zookeeper Edit
                 Key: CURATOR-3
                 URL: https://issues.apache.org/jira/browse/CURATOR-3
             Project: Apache Curator
          Issue Type: Bug
          Components: Recipes
    Affects Versions: 2.0.0
            Reporter: Jordan Zimmerman
            Assignee: Jordan Zimmerman


>From https://github.com/Netflix/curator/issues/265

Looks like there's a race condition in LeaderLatch. If LeaderLatch.close() is 
called at the right time while the latch's watch handler is running, the latch 
will place another node in Zookeeper after the latch is closed.

Basically how it happens is this:

1) I have two processes contesting a LeaderLatch, ProcessA and ProcessB. 
ProcessA is leader.
2) ProcessA loses leadership somehow (it releases, its connection goes down, 
etc.)
3) This causes ProcessB's watch to get called, check the state is still 
STARTED, and if so the LeaderLatch will re-evaluate if it is leader.
4) While the watch handler is running, close() is called on the LeaderLatch on 
ProcessB. This sets the LeaderLatch state to CLOSED, removes the znode from ZK 
and closes off the LeaderLatch.
5) The watch handler has already checked that the state is STARTED, so it does 
a getChildren() on the latch path, and finds the latch's znode is missing. It 
goes ahead and calls reset(), which places a new znode in Zookeeper.

Result: The LeaderLatch is closed, but there is still a node in Zookeeper that 
isn't associated with any LeaderLatch and won't go away until the session goes 
down. Subsequent LeaderLatches at this path can never get leadership while that 
session is up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to