Doug Jones created CURATOR-62:
---------------------------------

             Summary: Leader Election Deadlock
                 Key: CURATOR-62
                 URL: https://issues.apache.org/jira/browse/CURATOR-62
             Project: Apache Curator
          Issue Type: Bug
            Reporter: Doug Jones
            Assignee: Jordan Zimmerman


I've noticed that it is possible for a leader election to deadlock if a thread 
is interrupted while it is trying to acquire the mutex for the election.

I've created a forced example of this here: 
https://github.com/dfjones/curator/commit/544220b1e6b51c2718a7d3511a74962ff1c5ff48

You can see deadlock by using my modified code and running the 
LeaderSelectorExample. Some leaders may execute, but on my system I eventually 
see deadlock. Note that I only see deadlock when running against a remote zk 
server rather than the embedded test server. I'm using Zookeeper 3.4.5 on Mac 
OS X 10.8.4.

>From what I can tell by inspecting the ZK state/watching in the debugger, the 
>thread that is interrupted is able to successfully create the lock object in 
>ZK. However, due to the interrupt an exception is generated and 
>LockInternals#internalLockLoop never runs. Later, in LeaderSelector#doWork 
>when mutex.release() is called this fails at the for lockData.

Once this occurs, the lock object in ZK is the oldest and will cause deadlock.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to