[jira] [Commented] (ZOOKEEPER-2380) Deadlock while shutting down the zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15229774#comment-15229774 ] Rakesh R commented on ZOOKEEPER-2380: - Thanks [~arshad.mohammad] for the updates. Overall patch looks good, please take a look at the following comments. # In RaceConditionTest.java, it uses tab space. Please change the indentation by using spaces instead of tabs. # Please change {{null != quorumPeer}} checks to {{quorumPeer != null}}, we are using this fashion and would be good to follow same way. # Please remove the following overridden method {{#runFromConfig()}} from class TestQPMain, its not required. {code} @Override public void runFromConfig(QuorumPeerConfig config) throws IOException, AdminServerException { super.runFromConfig(config); } {code} Also, remove unused code. {code} /** * it is same as Container feature is disabled */ // setupContainerManager(); {code} # Make the following methods to use private access specifier {code} static class MockTestQPMain static class MockSyncRequestProcessor static class MockProposalRequestProcessor protected MainThread[] startQuorum() {code} # Change {{e.printStackTrace();}} to LOG.warn("") messaging # {{"Leader must have gone into LOOKING state"}}, here it should be {{Leader failed to transition to LOOKING state}}, right? # Good unit test. Could you please write the sequence of steps which results in deadlock in the unit test case, that would help to maintain it well and convey the idea quickly to others. > Deadlock while shutting down the zookeeper > -- > > Key: ZOOKEEPER-2380 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2380 > Project: ZooKeeper > Issue Type: Bug > Components: server >Reporter: Arshad Mohammad >Assignee: Arshad Mohammad >Priority: Blocker > Fix For: 3.5.2, 3.6.0 > > Attachments: ZOOKEEPER-2380-01.patch, ZOOKEEPER-2380-02.patch, > ZOOKEEPER-2380-03.patch > > > Zookeeper enters into deadlock while shutting down itself, thus making > zookeeper service unavailable as deadlocked server is a leader. Here is the > thread dump: > {code} > "QuorumPeer[myid=1](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)" #25 prio=5 > os_prio=0 tid=0x7fbc502a6800 nid=0x834 in Object.wait() > [0x7fbc4d9a8000] java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) at > java.lang.Thread.join(Thread.java:1245) - locked < > 0xfeb78000> (a org.apache.zookeeper.server.SyncRequestProcessor) > at java.lang.Thread.join(Thread.java:1319) at > org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:196) > at > org.apache.zookeeper.server.quorum.ProposalRequestProcessor.shutdown(ProposalRequestProcessor.java:90) > at > org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:1016) > at > org.apache.zookeeper.server.quorum.LeaderRequestProcessor.shutdown(LeaderRequestProcessor.java:78) > at > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:561) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.QuorumZooKeeperServer.shutdown(QuorumZooKeeperServer.java:169) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer.shutdown(LeaderZooKeeperServer.java:102) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:637) at > org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:590) - locked > < > 0xfeb781a0> (a org.apache.zookeeper.server.quorum.Leader) at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1108) > "SyncThread:1" #46 prio=5 os_prio=0 tid=0x7fbc5848f000 nid=0x867 waiting > for monitor entry [0x7fbc4ca9] java.lang.Thread.State: BLOCKED > (on object monitor) at > org.apache.zookeeper.server.quorum.Leader.processAck(Leader.java:784) - > waiting to lock <0xfeb781a0> (a > org.apache.zookeeper.server.quorum.Leader) at > org.apache.zookeeper.server.quorum.AckRequestProcessor.processRequest(AckRequestProcessor.java:46) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:183) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113) > {code} > Leader.lead() calls shutdown() from the synchronized block, it acquired lock > on Leader.java
[jira] [Commented] (ZOOKEEPER-2380) Deadlock while shutting down the zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15206226#comment-15206226 ] Arshad Mohammad commented on ZOOKEEPER-2380: bq. -1 core tests. The patch failed core unit tests. No test case failed, one test case is skipped which is not related to this patch > Deadlock while shutting down the zookeeper > -- > > Key: ZOOKEEPER-2380 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2380 > Project: ZooKeeper > Issue Type: Bug > Components: server >Reporter: Arshad Mohammad >Assignee: Arshad Mohammad >Priority: Blocker > Fix For: 3.5.2, 3.6.0 > > Attachments: ZOOKEEPER-2380-01.patch, ZOOKEEPER-2380-02.patch, > ZOOKEEPER-2380-03.patch > > > Zookeeper enters into deadlock while shutting down itself, thus making > zookeeper service unavailable as deadlocked server is a leader. Here is the > thread dump: > {code} > "QuorumPeer[myid=1](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)" #25 prio=5 > os_prio=0 tid=0x7fbc502a6800 nid=0x834 in Object.wait() > [0x7fbc4d9a8000] java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) at > java.lang.Thread.join(Thread.java:1245) - locked < > 0xfeb78000> (a org.apache.zookeeper.server.SyncRequestProcessor) > at java.lang.Thread.join(Thread.java:1319) at > org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:196) > at > org.apache.zookeeper.server.quorum.ProposalRequestProcessor.shutdown(ProposalRequestProcessor.java:90) > at > org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:1016) > at > org.apache.zookeeper.server.quorum.LeaderRequestProcessor.shutdown(LeaderRequestProcessor.java:78) > at > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:561) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.QuorumZooKeeperServer.shutdown(QuorumZooKeeperServer.java:169) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer.shutdown(LeaderZooKeeperServer.java:102) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:637) at > org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:590) - locked > < > 0xfeb781a0> (a org.apache.zookeeper.server.quorum.Leader) at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1108) > "SyncThread:1" #46 prio=5 os_prio=0 tid=0x7fbc5848f000 nid=0x867 waiting > for monitor entry [0x7fbc4ca9] java.lang.Thread.State: BLOCKED > (on object monitor) at > org.apache.zookeeper.server.quorum.Leader.processAck(Leader.java:784) - > waiting to lock <0xfeb781a0> (a > org.apache.zookeeper.server.quorum.Leader) at > org.apache.zookeeper.server.quorum.AckRequestProcessor.processRequest(AckRequestProcessor.java:46) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:183) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113) > {code} > Leader.lead() calls shutdown() from the synchronized block, it acquired lock > on Leader.java instance > {code} > while (true) { > synchronized (this) { > long start = Time.currentElapsedTime(); > . > {code} > In the shutdown flow SyncThread is trying to acquire lock on the same > Leader.java instance. > Leader thread acquired lock and waiting for SyncThread shutdown. SyncThread > waiting for the lock to complete its shutdown. This is how ZooKeeper entered > into deadlock -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2380) Deadlock while shutting down the zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15206199#comment-15206199 ] Hadoop QA commented on ZOOKEEPER-2380: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12794739/ZOOKEEPER-2380-03.patch against trunk revision 1736090. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3114//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3114//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3114//console This message is automatically generated. > Deadlock while shutting down the zookeeper > -- > > Key: ZOOKEEPER-2380 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2380 > Project: ZooKeeper > Issue Type: Bug > Components: server >Reporter: Arshad Mohammad >Assignee: Arshad Mohammad >Priority: Blocker > Fix For: 3.5.2, 3.6.0 > > Attachments: ZOOKEEPER-2380-01.patch, ZOOKEEPER-2380-02.patch, > ZOOKEEPER-2380-03.patch > > > Zookeeper enters into deadlock while shutting down itself, thus making > zookeeper service unavailable as deadlocked server is a leader. Here is the > thread dump: > {code} > "QuorumPeer[myid=1](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)" #25 prio=5 > os_prio=0 tid=0x7fbc502a6800 nid=0x834 in Object.wait() > [0x7fbc4d9a8000] java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) at > java.lang.Thread.join(Thread.java:1245) - locked < > 0xfeb78000> (a org.apache.zookeeper.server.SyncRequestProcessor) > at java.lang.Thread.join(Thread.java:1319) at > org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:196) > at > org.apache.zookeeper.server.quorum.ProposalRequestProcessor.shutdown(ProposalRequestProcessor.java:90) > at > org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:1016) > at > org.apache.zookeeper.server.quorum.LeaderRequestProcessor.shutdown(LeaderRequestProcessor.java:78) > at > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:561) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.QuorumZooKeeperServer.shutdown(QuorumZooKeeperServer.java:169) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer.shutdown(LeaderZooKeeperServer.java:102) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:637) at > org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:590) - locked > < > 0xfeb781a0> (a org.apache.zookeeper.server.quorum.Leader) at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1108) > "SyncThread:1" #46 prio=5 os_prio=0 tid=0x7fbc5848f000 nid=0x867 waiting > for monitor entry [0x7fbc4ca9] java.lang.Thread.State: BLOCKED > (on object monitor) at > org.apache.zookeeper.server.quorum.Leader.processAck(Leader.java:784) - > waiting to lock <0xfeb781a0> (a > org.apache.zookeeper.server.quorum.Leader) at > org.apache.zookeeper.server.quorum.AckRequestProcessor.processRequest(AckRequestProcessor.java:46) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:183) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113) > {code} > Leader.lead() calls shutdown() from the synchronized block, it acquired lock > on Leader.java instance > {code} > while (true) { > synchronized (this) { > long start = Time.currentElapsedTime(); > . > {code} > In the shutdown flow SyncThread is
[jira] [Commented] (ZOOKEEPER-2380) Deadlock while shutting down the zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15186951#comment-15186951 ] Rakesh R commented on ZOOKEEPER-2380: - Thank you [~arshad.mohammad] for the updates. Since this is a critical issue, its good to back with unit tests. I know it is not that simple, but could you check any chance to write unit tests by stubbing {{MockSyncRequestProcessor extends SyncRequestProcessor}} or welcome some other approach. > Deadlock while shutting down the zookeeper > -- > > Key: ZOOKEEPER-2380 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2380 > Project: ZooKeeper > Issue Type: Bug > Components: server >Reporter: Arshad Mohammad >Assignee: Arshad Mohammad >Priority: Blocker > Fix For: 3.5.2, 3.6.0 > > Attachments: ZOOKEEPER-2380-01.patch, ZOOKEEPER-2380-02.patch > > > Zookeeper enters into deadlock while shutting down itself, thus making > zookeeper service unavailable as deadlocked server is a leader. Here is the > thread dump: > {code} > "QuorumPeer[myid=1](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)" #25 prio=5 > os_prio=0 tid=0x7fbc502a6800 nid=0x834 in Object.wait() > [0x7fbc4d9a8000] java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) at > java.lang.Thread.join(Thread.java:1245) - locked < > 0xfeb78000> (a org.apache.zookeeper.server.SyncRequestProcessor) > at java.lang.Thread.join(Thread.java:1319) at > org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:196) > at > org.apache.zookeeper.server.quorum.ProposalRequestProcessor.shutdown(ProposalRequestProcessor.java:90) > at > org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:1016) > at > org.apache.zookeeper.server.quorum.LeaderRequestProcessor.shutdown(LeaderRequestProcessor.java:78) > at > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:561) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.QuorumZooKeeperServer.shutdown(QuorumZooKeeperServer.java:169) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer.shutdown(LeaderZooKeeperServer.java:102) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:637) at > org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:590) - locked > < > 0xfeb781a0> (a org.apache.zookeeper.server.quorum.Leader) at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1108) > "SyncThread:1" #46 prio=5 os_prio=0 tid=0x7fbc5848f000 nid=0x867 waiting > for monitor entry [0x7fbc4ca9] java.lang.Thread.State: BLOCKED > (on object monitor) at > org.apache.zookeeper.server.quorum.Leader.processAck(Leader.java:784) - > waiting to lock <0xfeb781a0> (a > org.apache.zookeeper.server.quorum.Leader) at > org.apache.zookeeper.server.quorum.AckRequestProcessor.processRequest(AckRequestProcessor.java:46) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:183) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113) > {code} > Leader.lead() calls shutdown() from the synchronized block, it acquired lock > on Leader.java instance > {code} > while (true) { > synchronized (this) { > long start = Time.currentElapsedTime(); > . > {code} > In the shutdown flow SyncThread is trying to acquire lock on the same > Leader.java instance. > Leader thread acquired lock and waiting for SyncThread shutdown. SyncThread > waiting for the lock to complete its shutdown. This is how ZooKeeper entered > into deadlock -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2380) Deadlock while shutting down the zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15186939#comment-15186939 ] Rakesh R commented on ZOOKEEPER-2380: - I'm just adding a point to avoid confusions, this issue has no relation with the ZOOKEEPER-2347. The call sequence or the execution path is completely different for both these issues. > Deadlock while shutting down the zookeeper > -- > > Key: ZOOKEEPER-2380 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2380 > Project: ZooKeeper > Issue Type: Bug > Components: server >Reporter: Arshad Mohammad >Assignee: Arshad Mohammad >Priority: Blocker > Fix For: 3.5.2, 3.6.0 > > Attachments: ZOOKEEPER-2380-01.patch, ZOOKEEPER-2380-02.patch > > > Zookeeper enters into deadlock while shutting down itself, thus making > zookeeper service unavailable as deadlocked server is a leader. Here is the > thread dump: > {code} > "QuorumPeer[myid=1](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)" #25 prio=5 > os_prio=0 tid=0x7fbc502a6800 nid=0x834 in Object.wait() > [0x7fbc4d9a8000] java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) at > java.lang.Thread.join(Thread.java:1245) - locked < > 0xfeb78000> (a org.apache.zookeeper.server.SyncRequestProcessor) > at java.lang.Thread.join(Thread.java:1319) at > org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:196) > at > org.apache.zookeeper.server.quorum.ProposalRequestProcessor.shutdown(ProposalRequestProcessor.java:90) > at > org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:1016) > at > org.apache.zookeeper.server.quorum.LeaderRequestProcessor.shutdown(LeaderRequestProcessor.java:78) > at > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:561) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.QuorumZooKeeperServer.shutdown(QuorumZooKeeperServer.java:169) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer.shutdown(LeaderZooKeeperServer.java:102) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:637) at > org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:590) - locked > < > 0xfeb781a0> (a org.apache.zookeeper.server.quorum.Leader) at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1108) > "SyncThread:1" #46 prio=5 os_prio=0 tid=0x7fbc5848f000 nid=0x867 waiting > for monitor entry [0x7fbc4ca9] java.lang.Thread.State: BLOCKED > (on object monitor) at > org.apache.zookeeper.server.quorum.Leader.processAck(Leader.java:784) - > waiting to lock <0xfeb781a0> (a > org.apache.zookeeper.server.quorum.Leader) at > org.apache.zookeeper.server.quorum.AckRequestProcessor.processRequest(AckRequestProcessor.java:46) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:183) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113) > {code} > Leader.lead() calls shutdown() from the synchronized block, it acquired lock > on Leader.java instance > {code} > while (true) { > synchronized (this) { > long start = Time.currentElapsedTime(); > . > {code} > In the shutdown flow SyncThread is trying to acquire lock on the same > Leader.java instance. > Leader thread acquired lock and waiting for SyncThread shutdown. SyncThread > waiting for the lock to complete its shutdown. This is how ZooKeeper entered > into deadlock -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2380) Deadlock while shutting down the zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182637#comment-15182637 ] Hadoop QA commented on ZOOKEEPER-2380: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12791715/ZOOKEEPER-2380-02.patch against trunk revision 1733679. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3090//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3090//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3090//console This message is automatically generated. > Deadlock while shutting down the zookeeper > -- > > Key: ZOOKEEPER-2380 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2380 > Project: ZooKeeper > Issue Type: Bug > Components: server >Reporter: Arshad Mohammad >Assignee: Arshad Mohammad >Priority: Blocker > Fix For: 3.5.2, 3.6.0 > > Attachments: ZOOKEEPER-2380-01.patch, ZOOKEEPER-2380-02.patch > > > Zookeeper enters into deadlock while shutting down itself, thus making > zookeeper service unavailable as deadlocked server is a leader. Here is the > thread dump: > {code} > "QuorumPeer[myid=1](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)" #25 prio=5 > os_prio=0 tid=0x7fbc502a6800 nid=0x834 in Object.wait() > [0x7fbc4d9a8000] java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) at > java.lang.Thread.join(Thread.java:1245) - locked < > 0xfeb78000> (a org.apache.zookeeper.server.SyncRequestProcessor) > at java.lang.Thread.join(Thread.java:1319) at > org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:196) > at > org.apache.zookeeper.server.quorum.ProposalRequestProcessor.shutdown(ProposalRequestProcessor.java:90) > at > org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:1016) > at > org.apache.zookeeper.server.quorum.LeaderRequestProcessor.shutdown(LeaderRequestProcessor.java:78) > at > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:561) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.QuorumZooKeeperServer.shutdown(QuorumZooKeeperServer.java:169) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer.shutdown(LeaderZooKeeperServer.java:102) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:637) at > org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:590) - locked > < > 0xfeb781a0> (a org.apache.zookeeper.server.quorum.Leader) at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1108) > "SyncThread:1" #46 prio=5 os_prio=0 tid=0x7fbc5848f000 nid=0x867 waiting > for monitor entry [0x7fbc4ca9] java.lang.Thread.State: BLOCKED > (on object monitor) at > org.apache.zookeeper.server.quorum.Leader.processAck(Leader.java:784) - > waiting to lock <0xfeb781a0> (a > org.apache.zookeeper.server.quorum.Leader) at > org.apache.zookeeper.server.quorum.AckRequestProcessor.processRequest(AckRequestProcessor.java:46) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:183) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113) > {code} > Leader.lead() calls shutdown() from the synchronized block, it acquired lock > on Leader.java instance > {code} > while (true) { >
[jira] [Commented] (ZOOKEEPER-2380) Deadlock while shutting down the zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182008#comment-15182008 ] Rakesh R commented on ZOOKEEPER-2380: - Thanks [~arshad.mohammad] for the patch. Instead of checking {{shutdownMessage != null}} inside loop everytime, how about sets {{shutdownMessage}} and break the while loop. After loop, if message exists then do the shutdown call. Also, could you check the possibility of unit testing this behavior. > Deadlock while shutting down the zookeeper > -- > > Key: ZOOKEEPER-2380 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2380 > Project: ZooKeeper > Issue Type: Bug > Components: server >Reporter: Arshad Mohammad >Assignee: Arshad Mohammad >Priority: Blocker > Fix For: 3.5.2, 3.6.0 > > Attachments: ZOOKEEPER-2380-01.patch > > > Zookeeper enters into deadlock while shutting down itself, thus making > zookeeper service unavailable as deadlocked server is a leader. Here is the > thread dump: > {code} > "QuorumPeer[myid=1](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)" #25 prio=5 > os_prio=0 tid=0x7fbc502a6800 nid=0x834 in Object.wait() > [0x7fbc4d9a8000] java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) at > java.lang.Thread.join(Thread.java:1245) - locked < > 0xfeb78000> (a org.apache.zookeeper.server.SyncRequestProcessor) > at java.lang.Thread.join(Thread.java:1319) at > org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:196) > at > org.apache.zookeeper.server.quorum.ProposalRequestProcessor.shutdown(ProposalRequestProcessor.java:90) > at > org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:1016) > at > org.apache.zookeeper.server.quorum.LeaderRequestProcessor.shutdown(LeaderRequestProcessor.java:78) > at > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:561) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.QuorumZooKeeperServer.shutdown(QuorumZooKeeperServer.java:169) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer.shutdown(LeaderZooKeeperServer.java:102) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:637) at > org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:590) - locked > < > 0xfeb781a0> (a org.apache.zookeeper.server.quorum.Leader) at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1108) > "SyncThread:1" #46 prio=5 os_prio=0 tid=0x7fbc5848f000 nid=0x867 waiting > for monitor entry [0x7fbc4ca9] java.lang.Thread.State: BLOCKED > (on object monitor) at > org.apache.zookeeper.server.quorum.Leader.processAck(Leader.java:784) - > waiting to lock <0xfeb781a0> (a > org.apache.zookeeper.server.quorum.Leader) at > org.apache.zookeeper.server.quorum.AckRequestProcessor.processRequest(AckRequestProcessor.java:46) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:183) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113) > {code} > Leader.lead() calls shutdown() from the synchronized block, it acquired lock > on Leader.java instance > {code} > while (true) { > synchronized (this) { > long start = Time.currentElapsedTime(); > . > {code} > In the shutdown flow SyncThread is trying to acquire lock on the same > Leader.java instance. > Leader thread acquired lock and waiting for SyncThread shutdown. SyncThread > waiting for the lock to complete its shutdown. This is how ZooKeeper entered > into deadlock -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2380) Deadlock while shutting down the zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15181789#comment-15181789 ] Arshad Mohammad commented on ZOOKEEPER-2380: bq. -1 core tests. The patch failed core unit tests. Failed test is not related to this patch. Verified locally it is passing > Deadlock while shutting down the zookeeper > -- > > Key: ZOOKEEPER-2380 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2380 > Project: ZooKeeper > Issue Type: Bug > Components: server >Reporter: Arshad Mohammad >Assignee: Arshad Mohammad >Priority: Blocker > Fix For: 3.5.2, 3.6.0 > > Attachments: ZOOKEEPER-2380-01.patch > > > Zookeeper enters into deadlock while shutting down itself, thus making > zookeeper service unavailable as deadlocked server is a leader. Here is the > thread dump: > {code} > "QuorumPeer[myid=1](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)" #25 prio=5 > os_prio=0 tid=0x7fbc502a6800 nid=0x834 in Object.wait() > [0x7fbc4d9a8000] java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) at > java.lang.Thread.join(Thread.java:1245) - locked < > 0xfeb78000> (a org.apache.zookeeper.server.SyncRequestProcessor) > at java.lang.Thread.join(Thread.java:1319) at > org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:196) > at > org.apache.zookeeper.server.quorum.ProposalRequestProcessor.shutdown(ProposalRequestProcessor.java:90) > at > org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:1016) > at > org.apache.zookeeper.server.quorum.LeaderRequestProcessor.shutdown(LeaderRequestProcessor.java:78) > at > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:561) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.QuorumZooKeeperServer.shutdown(QuorumZooKeeperServer.java:169) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer.shutdown(LeaderZooKeeperServer.java:102) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:637) at > org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:590) - locked > < > 0xfeb781a0> (a org.apache.zookeeper.server.quorum.Leader) at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1108) > "SyncThread:1" #46 prio=5 os_prio=0 tid=0x7fbc5848f000 nid=0x867 waiting > for monitor entry [0x7fbc4ca9] java.lang.Thread.State: BLOCKED > (on object monitor) at > org.apache.zookeeper.server.quorum.Leader.processAck(Leader.java:784) - > waiting to lock <0xfeb781a0> (a > org.apache.zookeeper.server.quorum.Leader) at > org.apache.zookeeper.server.quorum.AckRequestProcessor.processRequest(AckRequestProcessor.java:46) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:183) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113) > {code} > Leader.lead() calls shutdown() from the synchronized block, it acquired lock > on Leader.java instance > {code} > while (true) { > synchronized (this) { > long start = Time.currentElapsedTime(); > . > {code} > In the shutdown flow SyncThread is trying to acquire lock on the same > Leader.java instance. > Leader thread acquired lock and waiting for SyncThread shutdown. SyncThread > waiting for the lock to complete its shutdown. This is how ZooKeeper entered > into deadlock -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2380) Deadlock while shutting down the zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15181548#comment-15181548 ] Hadoop QA commented on ZOOKEEPER-2380: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12791608/ZOOKEEPER-2380-01.patch against trunk revision 1733679. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3087//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3087//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3087//console This message is automatically generated. > Deadlock while shutting down the zookeeper > -- > > Key: ZOOKEEPER-2380 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2380 > Project: ZooKeeper > Issue Type: Bug > Components: server >Reporter: Arshad Mohammad >Assignee: Arshad Mohammad >Priority: Blocker > Fix For: 3.5.2, 3.6.0 > > Attachments: ZOOKEEPER-2380-01.patch > > > Zookeeper enters into deadlock while shutting down itself, thus making > zookeeper service unavailable as deadlocked server is a leader. Here is the > thread dump: > {code} > "QuorumPeer[myid=1](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)" #25 prio=5 > os_prio=0 tid=0x7fbc502a6800 nid=0x834 in Object.wait() > [0x7fbc4d9a8000] java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) at > java.lang.Thread.join(Thread.java:1245) - locked < > 0xfeb78000> (a org.apache.zookeeper.server.SyncRequestProcessor) > at java.lang.Thread.join(Thread.java:1319) at > org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:196) > at > org.apache.zookeeper.server.quorum.ProposalRequestProcessor.shutdown(ProposalRequestProcessor.java:90) > at > org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:1016) > at > org.apache.zookeeper.server.quorum.LeaderRequestProcessor.shutdown(LeaderRequestProcessor.java:78) > at > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:561) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.QuorumZooKeeperServer.shutdown(QuorumZooKeeperServer.java:169) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer.shutdown(LeaderZooKeeperServer.java:102) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:637) at > org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:590) - locked > < > 0xfeb781a0> (a org.apache.zookeeper.server.quorum.Leader) at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1108) > "SyncThread:1" #46 prio=5 os_prio=0 tid=0x7fbc5848f000 nid=0x867 waiting > for monitor entry [0x7fbc4ca9] java.lang.Thread.State: BLOCKED > (on object monitor) at > org.apache.zookeeper.server.quorum.Leader.processAck(Leader.java:784) - > waiting to lock <0xfeb781a0> (a > org.apache.zookeeper.server.quorum.Leader) at > org.apache.zookeeper.server.quorum.AckRequestProcessor.processRequest(AckRequestProcessor.java:46) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:183) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113) > {code} > Leader.lead() calls shutdown() from the synchronized block, it acquired lock > on Leader.java instance > {code} > while (true) { > synchronized (this) { >
[jira] [Commented] (ZOOKEEPER-2380) Deadlock while shutting down the zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180222#comment-15180222 ] Rakesh R commented on ZOOKEEPER-2380: - Good catch, [~arshad.mohammad]. Probably we could try moving the {{this.shutdown}} call outside synchronization block. *Deadlock sequence is:* Quorum Leader Thread: 1=> {{leader.lead()}} 2=> Say, leader losts quorum and call {{this.shutdown()}}. This shutdown call is made under {{synchronized (this)}} [Leader.java#L554|https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/server/quorum/Leader.java#L554] 3=> Shutdown will trigger {{firstProcessor.shutdown();}} and reaches {{SyncRequestProcessor#shutdown()}} call. 4=> Now, SyncRequestProcessor will add {{requestOfDeath}} and wait to {{this.join()}}; SyncThread: 1=> SyncRequestProcessor forwards the request to its next processor {{nextProcessor.processRequest(si);}}, which is AckRequestProcessor. 2=> AckRequestProcessor, forwards the request the leader {{leader.processAck}} 3=> Since {{leader.processAck}} is synchronized at method level and requires {{leader.this}} lock [Leader.java#L783|https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/server/quorum/Leader.java#L783]. Since {{leader.this}} is already acquired by [Leader.java#L554|https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/server/quorum/Leader.java#L554], thus results in deadlock. > Deadlock while shutting down the zookeeper > -- > > Key: ZOOKEEPER-2380 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2380 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.5.0 >Reporter: Arshad Mohammad >Assignee: Arshad Mohammad >Priority: Blocker > > Zookeeper enters into deadlock while shutting down itself, thus making > zookeeper service unavailable as deadlocked server is a leader. Here is the > thread dump: > {code} > "QuorumPeer[myid=1](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)" #25 prio=5 > os_prio=0 tid=0x7fbc502a6800 nid=0x834 in Object.wait() > [0x7fbc4d9a8000] java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) at > java.lang.Thread.join(Thread.java:1245) - locked < > 0xfeb78000> (a org.apache.zookeeper.server.SyncRequestProcessor) > at java.lang.Thread.join(Thread.java:1319) at > org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:196) > at > org.apache.zookeeper.server.quorum.ProposalRequestProcessor.shutdown(ProposalRequestProcessor.java:90) > at > org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:1016) > at > org.apache.zookeeper.server.quorum.LeaderRequestProcessor.shutdown(LeaderRequestProcessor.java:78) > at > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:561) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.QuorumZooKeeperServer.shutdown(QuorumZooKeeperServer.java:169) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer.shutdown(LeaderZooKeeperServer.java:102) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:637) at > org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:590) - locked > < > 0xfeb781a0> (a org.apache.zookeeper.server.quorum.Leader) at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1108) > "SyncThread:1" #46 prio=5 os_prio=0 tid=0x7fbc5848f000 nid=0x867 waiting > for monitor entry [0x7fbc4ca9] java.lang.Thread.State: BLOCKED > (on object monitor) at > org.apache.zookeeper.server.quorum.Leader.processAck(Leader.java:784) - > waiting to lock <0xfeb781a0> (a > org.apache.zookeeper.server.quorum.Leader) at > org.apache.zookeeper.server.quorum.AckRequestProcessor.processRequest(AckRequestProcessor.java:46) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:183) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113) > {code} > Leader.lead() calls shutdown() from the synchronized block, it acquired lock > on Leader.java instance > {code} > while (true) { > synchronized (this) { > long start = Time.currentElapsedTime(); > . > {code} > In the shutdown
[jira] [Commented] (ZOOKEEPER-2380) Deadlock while shutting down the zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179387#comment-15179387 ] Arshad Mohammad commented on ZOOKEEPER-2380: It is very similar to ZOOKEEPER-2347 but it is different. > Deadlock while shutting down the zookeeper > -- > > Key: ZOOKEEPER-2380 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2380 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.5.0 >Reporter: Arshad Mohammad >Assignee: Arshad Mohammad >Priority: Blocker > > Zookeeper enters into deadlock while shutting down itself, thus making > zookeeper service unavailable as deadlocked server is a leader. Here is the > thread dump: > {code} > "QuorumPeer[myid=1](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)" #25 prio=5 > os_prio=0 tid=0x7fbc502a6800 nid=0x834 in Object.wait() > [0x7fbc4d9a8000] java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) at > java.lang.Thread.join(Thread.java:1245) - locked < > 0xfeb78000> (a org.apache.zookeeper.server.SyncRequestProcessor) > at java.lang.Thread.join(Thread.java:1319) at > org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:196) > at > org.apache.zookeeper.server.quorum.ProposalRequestProcessor.shutdown(ProposalRequestProcessor.java:90) > at > org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:1016) > at > org.apache.zookeeper.server.quorum.LeaderRequestProcessor.shutdown(LeaderRequestProcessor.java:78) > at > org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:561) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.QuorumZooKeeperServer.shutdown(QuorumZooKeeperServer.java:169) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer.shutdown(LeaderZooKeeperServer.java:102) > - locked < > 0xfeb61e20> (a > org.apache.zookeeper.server.quorum.LeaderZooKeeperServer) at > org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:637) at > org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:590) - locked > < > 0xfeb781a0> (a org.apache.zookeeper.server.quorum.Leader) at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1108) > "SyncThread:1" #46 prio=5 os_prio=0 tid=0x7fbc5848f000 nid=0x867 waiting > for monitor entry [0x7fbc4ca9] java.lang.Thread.State: BLOCKED > (on object monitor) at > org.apache.zookeeper.server.quorum.Leader.processAck(Leader.java:784) - > waiting to lock <0xfeb781a0> (a > org.apache.zookeeper.server.quorum.Leader) at > org.apache.zookeeper.server.quorum.AckRequestProcessor.processRequest(AckRequestProcessor.java:46) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:183) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113) > {code} > Leader.lead() calls shutdown() from the synchronized block, it acquired lock > on Leader.java instance > {code} > while (true) { > synchronized (this) { > long start = Time.currentElapsedTime(); > . > {code} > In the shutdown flow SyncThread is trying to acquire lock on the same > Leader.java instance. > Leader thread acquired lock and waiting for SyncThread shutdown. SyncThread > waiting for the lock to complete its shutdown. This is how ZooKeeper entered > into deadlock -- This message was sent by Atlassian JIRA (v6.3.4#6332)