[jira] [Commented] (QPID-3214) Deadlock between the failover mutex (in AMQConnection.java) and the current_exception_lock (in AMQSession.java)
[ https://issues.apache.org/jira/browse/QPID-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028247#comment-13028247 ] Rajith Attapattu commented on QPID-3214: Committed Gordon's suggested fix. It seems for some situations the exception listener is the only way to get notified about a session exception. However there was a bug which prevented exceptions being notified in some cases(see QPID-3233) and has since been fixed. Therefore I believe Gordon's suggestion is probably the correct way to go. Deadlock between the failover mutex (in AMQConnection.java) and the current_exception_lock (in AMQSession.java) --- Key: QPID-3214 URL: https://issues.apache.org/jira/browse/QPID-3214 Project: Qpid Issue Type: Bug Components: Java Client Affects Versions: 0.10 Reporter: Rajith Attapattu Assignee: Rajith Attapattu Priority: Critical Fix For: 0.11 Attachments: QPID-3214.patch As per the following thread dump you can clearly see the deadlock between the failover mutex in AMQConnection.java and the current_exception_lock in AMQSession.java This is a regression and was introduced in rev 985262 Found one Java-level deadlock: = IoReceiver - localhost/127.0.0.1:15672: waiting to lock monitor 0x002ac2ea3b70 (object 0x002ab70156b0, a java.lang.Object), which is held by main main: waiting to lock monitor 0x002ac28db1b8 (object 0x002ab7048d70, a java.lang.Object), which is held by IoReceiver - localhost/127.0.0.1:15672 Java stack information for the threads listed above: === IoReceiver - localhost/127.0.0.1:15672: at org.apache.qpid.client.AMQConnection.exceptionReceived(AMQConnection.java:1297) - waiting to lock0x002ab70156b0 (a java.lang.Object) at org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1033) - locked0x002ab7048d70 (a java.lang.Object) at org.apache.qpid.client.AMQSession_0_10.exception(AMQSession_0_10.java:913) at org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:156) at org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:32) at org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:112) at org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:50) at org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:32) at org.apache.qpid.transport.Method.delegate(Method.java:159) at org.apache.qpid.transport.Session.received(Session.java:528) at org.apache.qpid.transport.Connection.dispatch(Connection.java:404) at org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:64) at org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:40) at org.apache.qpid.transport.MethodDelegate.executionException(MethodDelegate.java:110) at org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:112) at org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:54) at org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:40) at org.apache.qpid.transport.Method.delegate(Method.java:159) at org.apache.qpid.transport.Connection.received(Connection.java:369) at org.apache.qpid.transport.Connection.received(Connection.java:59) at org.apache.qpid.transport.network.Assembler.emit(Assembler.java:95) at org.apache.qpid.transport.network.Assembler.assemble(Assembler.java:196) at org.apache.qpid.transport.network.Assembler.frame(Assembler.java:129) at org.apache.qpid.transport.network.Frame.delegate(Frame.java:133) at org.apache.qpid.transport.network.Assembler.received(Assembler.java:100) at org.apache.qpid.transport.network.Assembler.received(Assembler.java:42) at org.apache.qpid.transport.network.InputHandler.next(InputHandler.java:187) at org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:103) at org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:42) at org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:128) at java.lang.Thread.run(Thread.java:619) main: at org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1025) - waiting to lock0x002ab7048d70 (a java.lang.Object) at
[jira] [Commented] (QPID-3214) Deadlock between the failover mutex (in AMQConnection.java) and the current_exception_lock (in AMQSession.java)
[ https://issues.apache.org/jira/browse/QPID-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022110#comment-13022110 ] Gordon Sim commented on QPID-3214: -- AMQSession_0_10.java seems to be duplicating in part what the transport.Session is doing (i.e. holding exception). In the trace above the threads are each trying to set the current exception on the session to the same thing I believe. My suggested 'fix' btw is a workaround of the deadlock only. It doesn't try to untangle the code. Deadlock between the failover mutex (in AMQConnection.java) and the current_exception_lock (in AMQSession.java) --- Key: QPID-3214 URL: https://issues.apache.org/jira/browse/QPID-3214 Project: Qpid Issue Type: Bug Components: Java Client Affects Versions: 0.10 Reporter: Rajith Attapattu Assignee: Rajith Attapattu Priority: Critical Fix For: 0.11 Attachments: QPID-3214.patch As per the following thread dump you can clearly see the deadlock between the failover mutex in AMQConnection.java and the current_exception_lock in AMQSession.java This is a regression and was introduced in rev 985262 Found one Java-level deadlock: = IoReceiver - localhost/127.0.0.1:15672: waiting to lock monitor 0x002ac2ea3b70 (object 0x002ab70156b0, a java.lang.Object), which is held by main main: waiting to lock monitor 0x002ac28db1b8 (object 0x002ab7048d70, a java.lang.Object), which is held by IoReceiver - localhost/127.0.0.1:15672 Java stack information for the threads listed above: === IoReceiver - localhost/127.0.0.1:15672: at org.apache.qpid.client.AMQConnection.exceptionReceived(AMQConnection.java:1297) - waiting to lock0x002ab70156b0 (a java.lang.Object) at org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1033) - locked0x002ab7048d70 (a java.lang.Object) at org.apache.qpid.client.AMQSession_0_10.exception(AMQSession_0_10.java:913) at org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:156) at org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:32) at org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:112) at org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:50) at org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:32) at org.apache.qpid.transport.Method.delegate(Method.java:159) at org.apache.qpid.transport.Session.received(Session.java:528) at org.apache.qpid.transport.Connection.dispatch(Connection.java:404) at org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:64) at org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:40) at org.apache.qpid.transport.MethodDelegate.executionException(MethodDelegate.java:110) at org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:112) at org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:54) at org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:40) at org.apache.qpid.transport.Method.delegate(Method.java:159) at org.apache.qpid.transport.Connection.received(Connection.java:369) at org.apache.qpid.transport.Connection.received(Connection.java:59) at org.apache.qpid.transport.network.Assembler.emit(Assembler.java:95) at org.apache.qpid.transport.network.Assembler.assemble(Assembler.java:196) at org.apache.qpid.transport.network.Assembler.frame(Assembler.java:129) at org.apache.qpid.transport.network.Frame.delegate(Frame.java:133) at org.apache.qpid.transport.network.Assembler.received(Assembler.java:100) at org.apache.qpid.transport.network.Assembler.received(Assembler.java:42) at org.apache.qpid.transport.network.InputHandler.next(InputHandler.java:187) at org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:103) at org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:42) at org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:128) at java.lang.Thread.run(Thread.java:619) main: at org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1025) - waiting to lock0x002ab7048d70 (a java.lang.Object) at org.apache.qpid.client.BasicMessageConsumer_0_10.sendCancel(BasicMessageConsumer_0_10.java:193) at
[jira] [Commented] (QPID-3214) Deadlock between the failover mutex (in AMQConnection.java) and the current_exception_lock (in AMQSession.java)
[ https://issues.apache.org/jira/browse/QPID-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022150#comment-13022150 ] Rajith Attapattu commented on QPID-3214: I am actually keen to fix the exception handling part rather than addressing the deadlock directly. I agree with Gordon that the exception handling code could be improved a bit. I am hoping to work out a patch soon. Deadlock between the failover mutex (in AMQConnection.java) and the current_exception_lock (in AMQSession.java) --- Key: QPID-3214 URL: https://issues.apache.org/jira/browse/QPID-3214 Project: Qpid Issue Type: Bug Components: Java Client Affects Versions: 0.10 Reporter: Rajith Attapattu Assignee: Rajith Attapattu Priority: Critical Fix For: 0.11 Attachments: QPID-3214.patch As per the following thread dump you can clearly see the deadlock between the failover mutex in AMQConnection.java and the current_exception_lock in AMQSession.java This is a regression and was introduced in rev 985262 Found one Java-level deadlock: = IoReceiver - localhost/127.0.0.1:15672: waiting to lock monitor 0x002ac2ea3b70 (object 0x002ab70156b0, a java.lang.Object), which is held by main main: waiting to lock monitor 0x002ac28db1b8 (object 0x002ab7048d70, a java.lang.Object), which is held by IoReceiver - localhost/127.0.0.1:15672 Java stack information for the threads listed above: === IoReceiver - localhost/127.0.0.1:15672: at org.apache.qpid.client.AMQConnection.exceptionReceived(AMQConnection.java:1297) - waiting to lock0x002ab70156b0 (a java.lang.Object) at org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1033) - locked0x002ab7048d70 (a java.lang.Object) at org.apache.qpid.client.AMQSession_0_10.exception(AMQSession_0_10.java:913) at org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:156) at org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:32) at org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:112) at org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:50) at org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:32) at org.apache.qpid.transport.Method.delegate(Method.java:159) at org.apache.qpid.transport.Session.received(Session.java:528) at org.apache.qpid.transport.Connection.dispatch(Connection.java:404) at org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:64) at org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:40) at org.apache.qpid.transport.MethodDelegate.executionException(MethodDelegate.java:110) at org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:112) at org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:54) at org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:40) at org.apache.qpid.transport.Method.delegate(Method.java:159) at org.apache.qpid.transport.Connection.received(Connection.java:369) at org.apache.qpid.transport.Connection.received(Connection.java:59) at org.apache.qpid.transport.network.Assembler.emit(Assembler.java:95) at org.apache.qpid.transport.network.Assembler.assemble(Assembler.java:196) at org.apache.qpid.transport.network.Assembler.frame(Assembler.java:129) at org.apache.qpid.transport.network.Frame.delegate(Frame.java:133) at org.apache.qpid.transport.network.Assembler.received(Assembler.java:100) at org.apache.qpid.transport.network.Assembler.received(Assembler.java:42) at org.apache.qpid.transport.network.InputHandler.next(InputHandler.java:187) at org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:103) at org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:42) at org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:128) at java.lang.Thread.run(Thread.java:619) main: at org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1025) - waiting to lock0x002ab7048d70 (a java.lang.Object) at org.apache.qpid.client.BasicMessageConsumer_0_10.sendCancel(BasicMessageConsumer_0_10.java:193) at org.apache.qpid.client.BasicMessageConsumer.close(BasicMessageConsumer.java:573) -
[jira] [Commented] (QPID-3214) Deadlock between the failover mutex (in AMQConnection.java) and the current_exception_lock (in AMQSession.java)
[ https://issues.apache.org/jira/browse/QPID-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022341#comment-13022341 ] Rajith Attapattu commented on QPID-3214: After looking at the code, I believe the exception handling is not exactly correct. 1. For starters we shouldn't really notify the connection's exception listener when there is a session level error. If we avoid that we can certainly get rid of the deadlock. The JMS API doc clearly states, If a JMS provider detects a serious problem with a Connection object, it informs the Connection object's ExceptionListener, if one has been registered. It does this by calling the listener's onException method, passing it a JMSException argument describing the problem. So certainly execution exceptions which only invalidates the session should not be notified via the exception listener. 2. Going further if one looks at the AMQConnection.java exceptionReceived() method, it again looks wrong to me. It seems that an execution exception can in fact close the underlying connection (and the rest of the sessions) due to the following piece of logic. if (hardError(cause)) { closer = (!_closed.getAndSet(true)) || closer; { _logger.info(Closing AMQConnection due to : + cause); } } Digging further for any AMQException hardError method will return true. Therefore we need to rework that piece of code. 3. The 0-10 code client (if using the Java IO transport) has a separate code path to notify the Connection if a connection level error is there. Therefore we definitely do not need to notify anything from the session. I believe the MINA transport notifies the IOException etc via the ProtocolSession, perhaps the reason for this code in the first place. For now I believe we need to do two things. 1. Remove the following lines from AMQSession_0_10.setCurrentException(SessionException se) method AMQException amqe = new AMQException(AMQConstant.getConstant(code), se.getMessage(), se.getCause()); _connection.exceptionReceived(amqe); 2. Fix the AMQConnection.isHardError method and the AMQException.isHardError method. Deadlock between the failover mutex (in AMQConnection.java) and the current_exception_lock (in AMQSession.java) --- Key: QPID-3214 URL: https://issues.apache.org/jira/browse/QPID-3214 Project: Qpid Issue Type: Bug Components: Java Client Affects Versions: 0.10 Reporter: Rajith Attapattu Assignee: Rajith Attapattu Priority: Critical Fix For: 0.11 Attachments: QPID-3214.patch As per the following thread dump you can clearly see the deadlock between the failover mutex in AMQConnection.java and the current_exception_lock in AMQSession.java This is a regression and was introduced in rev 985262 Found one Java-level deadlock: = IoReceiver - localhost/127.0.0.1:15672: waiting to lock monitor 0x002ac2ea3b70 (object 0x002ab70156b0, a java.lang.Object), which is held by main main: waiting to lock monitor 0x002ac28db1b8 (object 0x002ab7048d70, a java.lang.Object), which is held by IoReceiver - localhost/127.0.0.1:15672 Java stack information for the threads listed above: === IoReceiver - localhost/127.0.0.1:15672: at org.apache.qpid.client.AMQConnection.exceptionReceived(AMQConnection.java:1297) - waiting to lock0x002ab70156b0 (a java.lang.Object) at org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1033) - locked0x002ab7048d70 (a java.lang.Object) at org.apache.qpid.client.AMQSession_0_10.exception(AMQSession_0_10.java:913) at org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:156) at org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:32) at org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:112) at org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:50) at org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:32) at org.apache.qpid.transport.Method.delegate(Method.java:159) at org.apache.qpid.transport.Session.received(Session.java:528) at org.apache.qpid.transport.Connection.dispatch(Connection.java:404) at org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:64) at org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:40) at
[jira] [Commented] (QPID-3214) Deadlock between the failover mutex (in AMQConnection.java) and the current_exception_lock (in AMQSession.java)
[ https://issues.apache.org/jira/browse/QPID-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022399#comment-13022399 ] Rajith Attapattu commented on QPID-3214: I used review board to post the patch just to get the hang of it. https://reviews.apache.org/r/642/ Deadlock between the failover mutex (in AMQConnection.java) and the current_exception_lock (in AMQSession.java) --- Key: QPID-3214 URL: https://issues.apache.org/jira/browse/QPID-3214 Project: Qpid Issue Type: Bug Components: Java Client Affects Versions: 0.10 Reporter: Rajith Attapattu Assignee: Rajith Attapattu Priority: Critical Fix For: 0.11 Attachments: QPID-3214.patch As per the following thread dump you can clearly see the deadlock between the failover mutex in AMQConnection.java and the current_exception_lock in AMQSession.java This is a regression and was introduced in rev 985262 Found one Java-level deadlock: = IoReceiver - localhost/127.0.0.1:15672: waiting to lock monitor 0x002ac2ea3b70 (object 0x002ab70156b0, a java.lang.Object), which is held by main main: waiting to lock monitor 0x002ac28db1b8 (object 0x002ab7048d70, a java.lang.Object), which is held by IoReceiver - localhost/127.0.0.1:15672 Java stack information for the threads listed above: === IoReceiver - localhost/127.0.0.1:15672: at org.apache.qpid.client.AMQConnection.exceptionReceived(AMQConnection.java:1297) - waiting to lock0x002ab70156b0 (a java.lang.Object) at org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1033) - locked0x002ab7048d70 (a java.lang.Object) at org.apache.qpid.client.AMQSession_0_10.exception(AMQSession_0_10.java:913) at org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:156) at org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:32) at org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:112) at org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:50) at org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:32) at org.apache.qpid.transport.Method.delegate(Method.java:159) at org.apache.qpid.transport.Session.received(Session.java:528) at org.apache.qpid.transport.Connection.dispatch(Connection.java:404) at org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:64) at org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:40) at org.apache.qpid.transport.MethodDelegate.executionException(MethodDelegate.java:110) at org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:112) at org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:54) at org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:40) at org.apache.qpid.transport.Method.delegate(Method.java:159) at org.apache.qpid.transport.Connection.received(Connection.java:369) at org.apache.qpid.transport.Connection.received(Connection.java:59) at org.apache.qpid.transport.network.Assembler.emit(Assembler.java:95) at org.apache.qpid.transport.network.Assembler.assemble(Assembler.java:196) at org.apache.qpid.transport.network.Assembler.frame(Assembler.java:129) at org.apache.qpid.transport.network.Frame.delegate(Frame.java:133) at org.apache.qpid.transport.network.Assembler.received(Assembler.java:100) at org.apache.qpid.transport.network.Assembler.received(Assembler.java:42) at org.apache.qpid.transport.network.InputHandler.next(InputHandler.java:187) at org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:103) at org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:42) at org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:128) at java.lang.Thread.run(Thread.java:619) main: at org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1025) - waiting to lock0x002ab7048d70 (a java.lang.Object) at org.apache.qpid.client.BasicMessageConsumer_0_10.sendCancel(BasicMessageConsumer_0_10.java:193) at org.apache.qpid.client.BasicMessageConsumer.close(BasicMessageConsumer.java:573) - locked0x002ab70156b0 (a java.lang.Object) at org.apache.qpid.client.BasicMessageConsumer.close(BasicMessageConsumer.java:535)