[ https://issues.apache.org/jira/browse/QPID-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653924#comment-14653924 ]
Alex Rudyy edited comment on QPID-3521 at 8/5/15 3:50 PM: ---------------------------------------------------------- It seems that changes implemented in revision [r1693542|https://svn.apache.org/r1693542] might cause a deadlock on 0-9 path when Session is closed whilst failover is in progress. Here is the thread dump demonstrating the issue: {noformat} "Failover" prio=10 tid=0x00007fe0d804e000 nid=0x657c waiting on condition [0x00007fe0cf1f0000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000f41c2528> (a java.util.concurrent.CountDownLatch$Sync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:236) at org.apache.qpid.client.AMQSession.drainDispatchQueue(AMQSession.java:2306) at org.apache.qpid.client.AMQSession.drainDispatchQueueWithDispatcher(AMQSession.java:3697) at org.apache.qpid.client.AMQSession_0_8.resubscribe(AMQSession_0_8.java:186) at org.apache.qpid.client.AMQConnectionDelegate_8_0.resubscribeSessions(AMQConnectionDelegate_8_0.java:379) at org.apache.qpid.client.AMQConnection.resubscribeSessions(AMQConnection.java:1387) at org.apache.qpid.client.failover.FailoverHandler.run(FailoverHandler.java:221) - locked <0x00000000f35412c0> (a java.lang.Object) at java.lang.Thread.run(Thread.java:745) "Dispatcher-2-Conn-84" prio=10 tid=0x00007fe1341c0000 nid=0x657a waiting for monitor entry [0x00007fe0cf4f3000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.qpid.client.AMQSession$Dispatcher.dispatchMessage(AMQSession.java:3492) - waiting to lock <0x00000000f35c8e78> (a java.lang.Object) - locked <0x00000000f3cbea70> (a java.lang.Object) at org.apache.qpid.client.AMQSession$Dispatcher.access$1000(AMQSession.java:3279) at org.apache.qpid.client.AMQSession.dispatch(AMQSession.java:3272) at org.apache.qpid.client.message.UnprocessedMessage.dispatch(UnprocessedMessage.java:54) at org.apache.qpid.client.AMQSession$Dispatcher.run(AMQSession.java:3410) - locked <0x00000000f3cbea70> (a java.lang.Object) at java.lang.Thread.run(Thread.java:745) "main" prio=10 tid=0x00007fe134008800 nid=0x58f1 waiting for monitor entry [0x00007fe13d822000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.qpid.client.AMQSession.close(AMQSession.java:728) - waiting to lock <0x00000000f35412c0> (a java.lang.Object) - locked <0x00000000f35c8e78> (a java.lang.Object) at org.apache.qpid.client.AMQSession.close(AMQSession.java:447) at org.apache.qpid.client.failover.FailoverBehaviourTest.sessionCloseWhileFailoverImpl(FailoverBehaviourTest.java:1705) at org.apache.qpid.client.failover.FailoverBehaviourTest.testClientAcknowledgedSessionCloseWhileFailover(FailoverBehaviourTest.java:702) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:176) at org.apache.qpid.test.utils.QpidTestCase.runTest(QpidTestCase.java:171) at junit.framework.TestCase.runBare(TestCase.java:141) at org.apache.qpid.test.utils.QpidBrokerTestCase.runBare(QpidBrokerTestCase.java:332) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at org.apache.qpid.test.utils.QpidTestCase.run(QpidTestCase.java:156) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) {noformat} On the thread dump above Session#close() is invoked from the "main" thread. As part of Session#close() _messageDeliveryLock is acquired and main thread is waiting for the failover mutex which is acquired by "Failover" thread which is waiting for a Dispatcher thread to drain the pre-dispatch queue. However, Dispatcher thread requires _messageDeliveryLock to perform the clean up. Thus, it is in BLOCKED state causing the application hang. was (Author: alex.rufous): It seems that might changes causes the deadlock on 0-9 path when Session is closed whilst failover is in progress: {noformat} "Failover" prio=10 tid=0x00007fe0d804e000 nid=0x657c waiting on condition [0x00007fe0cf1f0000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000f41c2528> (a java.util.concurrent.CountDownLatch$Sync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:236) at org.apache.qpid.client.AMQSession.drainDispatchQueue(AMQSession.java:2306) at org.apache.qpid.client.AMQSession.drainDispatchQueueWithDispatcher(AMQSession.java:3697) at org.apache.qpid.client.AMQSession_0_8.resubscribe(AMQSession_0_8.java:186) at org.apache.qpid.client.AMQConnectionDelegate_8_0.resubscribeSessions(AMQConnectionDelegate_8_0.java:379) at org.apache.qpid.client.AMQConnection.resubscribeSessions(AMQConnection.java:1387) at org.apache.qpid.client.failover.FailoverHandler.run(FailoverHandler.java:221) - locked <0x00000000f35412c0> (a java.lang.Object) at java.lang.Thread.run(Thread.java:745) "Dispatcher-2-Conn-84" prio=10 tid=0x00007fe1341c0000 nid=0x657a waiting for monitor entry [0x00007fe0cf4f3000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.qpid.client.AMQSession$Dispatcher.dispatchMessage(AMQSession.java:3492) - waiting to lock <0x00000000f35c8e78> (a java.lang.Object) - locked <0x00000000f3cbea70> (a java.lang.Object) at org.apache.qpid.client.AMQSession$Dispatcher.access$1000(AMQSession.java:3279) at org.apache.qpid.client.AMQSession.dispatch(AMQSession.java:3272) at org.apache.qpid.client.message.UnprocessedMessage.dispatch(UnprocessedMessage.java:54) at org.apache.qpid.client.AMQSession$Dispatcher.run(AMQSession.java:3410) - locked <0x00000000f3cbea70> (a java.lang.Object) at java.lang.Thread.run(Thread.java:745) "main" prio=10 tid=0x00007fe134008800 nid=0x58f1 waiting for monitor entry [0x00007fe13d822000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.qpid.client.AMQSession.close(AMQSession.java:728) - waiting to lock <0x00000000f35412c0> (a java.lang.Object) - locked <0x00000000f35c8e78> (a java.lang.Object) at org.apache.qpid.client.AMQSession.close(AMQSession.java:447) at org.apache.qpid.client.failover.FailoverBehaviourTest.sessionCloseWhileFailoverImpl(FailoverBehaviourTest.java:1705) at org.apache.qpid.client.failover.FailoverBehaviourTest.testClientAcknowledgedSessionCloseWhileFailover(FailoverBehaviourTest.java:702) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:176) at org.apache.qpid.test.utils.QpidTestCase.runTest(QpidTestCase.java:171) at junit.framework.TestCase.runBare(TestCase.java:141) at org.apache.qpid.test.utils.QpidBrokerTestCase.runBare(QpidBrokerTestCase.java:332) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at org.apache.qpid.test.utils.QpidTestCase.run(QpidTestCase.java:156) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) {noformat} On the thread dump above Session#close() is invoked from the "main" thread. As part of Session#close() _messageDeliveryLock was acquired and main thread is waiting for the failover mutex which is acquired by "Failover" thread which is waiting for a Dispatcher thread to drain the pre-dispatch queue. However, Dispatcher thread requires _messageDeliveryLock to perform the clean up. Thus, it is in BLOCKED state causing the application hang. > failover process for the 0-8 client does not clear the pre-dispatch queue > ------------------------------------------------------------------------- > > Key: QPID-3521 > URL: https://issues.apache.org/jira/browse/QPID-3521 > Project: Qpid > Issue Type: Bug > Components: Java Client > Reporter: Robbie Gemmell > Assignee: Keith Wall > Labels: failover > Attachments: clear-dispatch-queue-on-failover.diff > > > failover process for the 0-8 client does not clear the pre-dispatch queue, > only the consumer receive queue. > This is currently masked by an issue with the rollbackMark. The changes made > in QPID-3546 to fix the 0-10 client path need to be applied to the 0-8/9/9-1 > client path when this issue is resolved. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org