[ https://issues.apache.org/jira/browse/HADOOP-9183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tom White updated HADOOP-9183: ------------------------------ Attachment: HADOOP-9183.patch This patch fixes the problem by making two changes. First, the queue of events in WatcherWithClientRef is dispensed with, and instead the process method blocks until the ZK object is set back on the watcher. This should be acceptable since the set operation is a simple method call, so there is minimal overhead. Second, the locking order ActiveStandbyElector -> WatcherWithClientRef is enforced, to prevent cycles. Note also that the CountDownLatch can safely have its countDown() method called outside the synchronized section (which is to protect the ZK field). Indeed it must, since getNewZooKeeper is holding the ActiveStandbyElector object lock while it waits for the ZK connection event. This means that the event cannot be processed until the lock is released (this is the current behaviour today), but we need to signal that the connect event was received. > Potential deadlock in ActiveStandbyElector > ------------------------------------------ > > Key: HADOOP-9183 > URL: https://issues.apache.org/jira/browse/HADOOP-9183 > Project: Hadoop Common > Issue Type: Bug > Components: ha > Affects Versions: 2.0.2-alpha > Reporter: Tom White > Assignee: Tom White > Attachments: 2_jcarder_result_1.png, 3_jcarder_result_0.png, > HADOOP-9183.patch > > > A jcarder run found a potential deadlock in the locking of > ActiveStandbyElector and ActiveStandbyElector.WatcherWithClientRef. No > deadlock has been seen in practice, this is just a theoretical possibility at > the moment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira