[ https://issues.apache.org/jira/browse/ZOOKEEPER-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061765#comment-15061765 ]
Flavio Junqueira commented on ZOOKEEPER-2347: --------------------------------------------- To be consistent, I'm reposting the comment I made in the other jira here. bq. We have made requestsInProcess an AtomicInteger in ZOOKEEPER-1504, removing the synchronization of the decInProcess method. We should just make the same change here for the 3.4 branch. > Deadlock shutting down zookeeper > -------------------------------- > > Key: ZOOKEEPER-2347 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2347 > Project: ZooKeeper > Issue Type: Bug > Affects Versions: 3.4.7 > Reporter: Ted Yu > Assignee: Rakesh R > Priority: Blocker > Fix For: 3.4.8 > > Attachments: ZOOKEEPER-2347-br-3.4.patch, testSplitLogManager.stack > > > HBase recently upgraded to zookeeper 3.4.7 > In one of the tests, TestSplitLogManager, there is reproducible hang at the > end of the test. > Below is snippet from stack trace related to zookeeper: > {code} > "main-EventThread" daemon prio=5 tid=0x00007fd27488a800 nid=0x6f1f waiting on > condition [0x000000011834b000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000007c5b8d3a0> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501) > "main-SendThread(localhost:59510)" daemon prio=5 tid=0x00007fd274eb4000 > nid=0x9513 waiting on condition [0x0000000118042000] > java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101) > at > org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:997) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060) > "SyncThread:0" prio=5 tid=0x00007fd274d02000 nid=0x730f waiting for monitor > entry [0x00000001170ac000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.zookeeper.server.ZooKeeperServer.decInProcess(ZooKeeperServer.java:512) > - waiting to lock <0x00000007c5b62128> (a > org.apache.zookeeper.server.ZooKeeperServer) > at > org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:144) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:200) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131) > "main-EventThread" daemon prio=5 tid=0x00007fd2753a3800 nid=0x711b waiting on > condition [0x0000000117a30000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000007c9b106b8> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501) > "main" prio=5 tid=0x00007fd276000000 nid=0x1903 in Object.wait() > [0x0000000108aa1000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x00000007c5b66400> (a > org.apache.zookeeper.server.SyncRequestProcessor) > at java.lang.Thread.join(Thread.java:1281) > - locked <0x00000007c5b66400> (a > org.apache.zookeeper.server.SyncRequestProcessor) > at java.lang.Thread.join(Thread.java:1355) > at > org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:213) > at > org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:770) > at > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:478) > - locked <0x00000007c5b62128> (a > org.apache.zookeeper.server.ZooKeeperServer) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:266) > at > org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.shutdown(MiniZooKeeperCluster.java:301) > {code} > Note the address (0x00000007c5b66400) in the last hunk which seems to > indicate some form of deadlock. > According to Camille Fournier: > We made shutdown synchronized. But decrementing the requests is > also synchronized and called from a different thread. So yeah, deadlock. > This came in with ZOOKEEPER-1907 -- This message was sent by Atlassian JIRA (v6.3.4#6332)