[ https://issues.apache.org/jira/browse/ARTEMIS-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Erich Duda updated ARTEMIS-1507: -------------------------------- Description: This issue was hit in test {{MultiThreadRandomReattachTest}}. There are several client's threads doing some work, while connection fail is simulated. The test expects that all threads finish without exceptions. This issue causes that some client's threads sometime fail with an exception {{AMQ119014: Timed out after waiting 30,000 ms for response when sending packet XXX}}. As you can see in the following test output, the {{ActiveMQConnectionTimedOutException}} is thrown even though the test ran only few milliseconds, so the 30 seconds timeout cannot expire. {code} [main] 09:16:35,054 INFO [org.apache.activemq.artemis.core.server] #*#*# Starting test: testJ()... #test testJ [main] 09:16:35,056 INFO [org.apache.activemq.artemis.tests.integration] Beginning iteration 0 [main] 09:16:35,056 INFO [org.apache.activemq.artemis.core.server] AMQ221000: live Message Broker is starting with configuration Broker Configuration (clustered=false,journalDirectory=/home/hudson/hudson_workspace/workspace/artemis-project-testsuite-rhel-eduda/b87e4e55/tests/integration-tests/./target/tmp/junit6436722967996899345/journal0-L,bindingsDirectory=/home/hudson/hudson_workspace/workspace/artemis-project-testsuite-rhel-eduda/b87e4e55/tests/integration-tests/./target/tmp/junit6436722967996899345/bindings0-L,largeMessagesDirectory=/home/hudson/hudson_workspace/workspace/artemis-project-testsuite-rhel-eduda/b87e4e55/tests/integration-tests/./target/tmp/junit6436722967996899345/large-msg0-L,pagingDirectory=/home/hudson/hudson_workspace/workspace/artemis-project-testsuite-rhel-eduda/b87e4e55/tests/integration-tests/./target/tmp/junit6436722967996899345/page0-L) [main] 09:16:35,056 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-server]. Adding protocol support for: CORE [main] 09:16:35,057 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-amqp-protocol]. Adding protocol support for: AMQP [main] 09:16:35,057 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-stomp-protocol]. Adding protocol support for: STOMP [main] 09:16:35,057 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-openwire-protocol]. Adding protocol support for: OPENWIRE [main] 09:16:35,057 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-hornetq-protocol]. Adding protocol support for: HORNETQ [main] 09:16:35,057 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-mqtt-protocol]. Adding protocol support for: MQTT [main] 09:16:35,064 INFO [org.apache.activemq.artemis.core.server] AMQ221007: Server is now live [main] 09:16:35,064 INFO [org.apache.activemq.artemis.core.server] AMQ221001: Apache ActiveMQ Artemis Message Broker version 2.4.0-SNAPSHOT [localhost, nodeID=376dc7b0-c099-11e7-a996-fa163ef1f361] [Timer-10] 09:16:35,623 INFO [org.apache.activemq.artemis.tests.integration] ** Failing connection [Timer-10] 09:16:35,623 WARN [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure has been detected: blah [code=NOT_CONNECTED] [Timer-10] 09:16:35,628 INFO [org.apache.activemq.artemis.tests.integration] ** Fail complete [main] 09:16:35,640 INFO [org.apache.activemq.artemis.core.server] AMQ221002: Apache ActiveMQ Artemis Message Broker version 2.4.0-SNAPSHOT [376dc7b0-c099-11e7-a996-fa163ef1f361] stopped, uptime 0.584 seconds [main] 09:16:35,704 INFO [org.apache.activemq.artemis.tests.integration] Beginning iteration 1 [main] 09:16:35,706 INFO [org.apache.activemq.artemis.core.server] AMQ221000: live Message Broker is starting with configuration Broker Configuration (clustered=false,journalDirectory=/home/hudson/hudson_workspace/workspace/artemis-project-testsuite-rhel-eduda/b87e4e55/tests/integration-tests/./target/tmp/junit6436722967996899345/journal0-L,bindingsDirectory=/home/hudson/hudson_workspace/workspace/artemis-project-testsuite-rhel-eduda/b87e4e55/tests/integration-tests/./target/tmp/junit6436722967996899345/bindings0-L,largeMessagesDirectory=/home/hudson/hudson_workspace/workspace/artemis-project-testsuite-rhel-eduda/b87e4e55/tests/integration-tests/./target/tmp/junit6436722967996899345/large-msg0-L,pagingDirectory=/home/hudson/hudson_workspace/workspace/artemis-project-testsuite-rhel-eduda/b87e4e55/tests/integration-tests/./target/tmp/junit6436722967996899345/page0-L) [main] 09:16:35,706 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-server]. Adding protocol support for: CORE [main] 09:16:35,707 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-amqp-protocol]. Adding protocol support for: AMQP [main] 09:16:35,707 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-stomp-protocol]. Adding protocol support for: STOMP [main] 09:16:35,707 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-openwire-protocol]. Adding protocol support for: OPENWIRE [main] 09:16:35,707 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-hornetq-protocol]. Adding protocol support for: HORNETQ [main] 09:16:35,707 INFO [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-mqtt-protocol]. Adding protocol support for: MQTT [main] 09:16:35,712 INFO [org.apache.activemq.artemis.core.server] AMQ221007: Server is now live [main] 09:16:35,712 INFO [org.apache.activemq.artemis.core.server] AMQ221001: Apache ActiveMQ Artemis Message Broker version 2.4.0-SNAPSHOT [localhost, nodeID=37d0f5e8-c099-11e7-a996-fa163ef1f361] [Timer-10] 09:16:36,433 INFO [org.apache.activemq.artemis.tests.integration] ** Failing connection [Timer-10] 09:16:36,433 WARN [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure has been detected: blah [code=NOT_CONNECTED] [Timer-10] 09:16:36,436 INFO [org.apache.activemq.artemis.tests.integration] ** Fail complete [Thread-2101] 09:16:36,438 ERROR [org.apache.activemq.artemis.tests.integration] Failed to run test: ActiveMQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT message=AMQ119014: Timed out after waiting 30,000 ms for response when sending packet 105] at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:401) [artemis-core-client-2.4.0-SNAPSHOT.jar:2.4.0-SNAPSHOT] at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:319) [artemis-core-client-2.4.0-SNAPSHOT.jar:2.4.0-SNAPSHOT] at org.apache.activemq.artemis.core.protocol.core.impl.ActiveMQSessionContext.addSessionMetadata(ActiveMQSessionContext.java:377) [artemis-core-client-2.4.0-SNAPSHOT.jar:2.4.0-SNAPSHOT] at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.addMetaData(ClientSessionImpl.java:1305) [artemis-core-client-2.4.0-SNAPSHOT.jar:2.4.0-SNAPSHOT] at org.apache.activemq.artemis.tests.integration.cluster.reattach.MultiThreadRandomReattachTestBase.doTestJ(MultiThreadRandomReattachTestBase.java:974) [:] at org.apache.activemq.artemis.tests.integration.cluster.reattach.MultiThreadRandomReattachTestBase$10.run(MultiThreadRandomReattachTestBase.java:164) [:] at org.apache.activemq.artemis.tests.integration.cluster.reattach.MultiThreadReattachSupportTestBase$1Runner.run(MultiThreadReattachSupportTestBase.java:105) [:] {code} There is an issue in {{ChannelImpl::sendBlocking}} method. {code:java} while (!closed && (response == null || (response.getType() != PacketImpl.EXCEPTION && response.getType() != expectedPacket)) && toWait > 0) { try { sendCondition.await(toWait, TimeUnit.MILLISECONDS); } catch (InterruptedException e) { throw new ActiveMQInterruptedException(e); } if (response != null && response.getType() != PacketImpl.EXCEPTION && response.getType() != expectedPacket) { ActiveMQClientLogger.LOGGER.packetOutOfOrder(response, new Exception("trace")); } if (closed) { break; } final long now = System.currentTimeMillis(); toWait -= now - start; start = now; } if (response == null) { throw ActiveMQClientMessageBundle.BUNDLE.timedOutSendingPacket(connection.getBlockingCallTimeout(), packet.getType()); } {code} bq. When waiting upon a Condition, a "spurious wakeup" is permitted to occur, in general, as a concession to the underlying platform semantics. This has little practical impact on most application programs as a Condition should always be waited upon in a loop, testing the state predicate that is being waited for. An implementation is free to remove the possibility of spurious wakeups but it is recommended that applications programmers always assume that they can occur and so always wait in a loop. \[1\] \[1\] https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/Condition.html As it is stated in the JDK API documentation, the {{sendCondition.await}} can "spuriously wakes up" so in the body of while cycle it should be checked the state of conditions to which the thread waits. If the thread wakes up and the channel is closed, the thread jumps out from the while cycle, but after the cycle there is no check if the timeout expired or not. The first check is whether response is null or not. In this case the response is null so the timed out exception is thrown. was: This issue was hit in test {{MultiThreadRandomReattachTest}}. There are several client's threads doing some work, while connection fail is simulated. The test expects that all threads finish without exceptions. This issue causes that some client's threads sometime fail with an exception {{AMQ119014: Timed out after waiting 30,000 ms for response when sending packet XXX}}. As you can see in the following test output, the {{ActiveMQConnectionTimedOutException}} is thrown even though the test ran only few milliseconds, so the 30 seconds timeout cannot expire. There is an issue in {{ChannelImpl::sendBlocking}} method. {code:java} while (!closed && (response == null || (response.getType() != PacketImpl.EXCEPTION && response.getType() != expectedPacket)) && toWait > 0) { try { sendCondition.await(toWait, TimeUnit.MILLISECONDS); } catch (InterruptedException e) { throw new ActiveMQInterruptedException(e); } if (response != null && response.getType() != PacketImpl.EXCEPTION && response.getType() != expectedPacket) { ActiveMQClientLogger.LOGGER.packetOutOfOrder(response, new Exception("trace")); } if (closed) { break; } final long now = System.currentTimeMillis(); toWait -= now - start; start = now; } if (response == null) { throw ActiveMQClientMessageBundle.BUNDLE.timedOutSendingPacket(connection.getBlockingCallTimeout(), packet.getType()); } {code} bq. When waiting upon a Condition, a "spurious wakeup" is permitted to occur, in general, as a concession to the underlying platform semantics. This has little practical impact on most application programs as a Condition should always be waited upon in a loop, testing the state predicate that is being waited for. An implementation is free to remove the possibility of spurious wakeups but it is recommended that applications programmers always assume that they can occur and so always wait in a loop. \[1\] \[1\] https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/Condition.html As it is stated in the JDK API documentation, the {{sendCondition.await}} can "spuriously wakes up" so in the body of while cycle it should be checked the state of conditions to which the thread waits. If the thread wakes up and the channel is closed, the thread jumps out from the while cycle, but after the cycle there is no check if the timeout expired or not. The first check is whether response is null or not. In this case the response is null so the timed out exception is thrown. > ActiveMQConnectionTimedOutException is thrown even though no timeout expired > ---------------------------------------------------------------------------- > > Key: ARTEMIS-1507 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1507 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker > Affects Versions: 2.3.0 > Reporter: Erich Duda > > This issue was hit in test {{MultiThreadRandomReattachTest}}. There are > several client's threads doing some work, while connection fail is simulated. > The test expects that all threads finish without exceptions. > This issue causes that some client's threads sometime fail with an exception > {{AMQ119014: Timed out after waiting 30,000 ms for response when sending > packet XXX}}. > As you can see in the following test output, the > {{ActiveMQConnectionTimedOutException}} is thrown even though the test ran > only few milliseconds, so the 30 seconds timeout cannot expire. > {code} > [main] 09:16:35,054 INFO [org.apache.activemq.artemis.core.server] #*#*# > Starting test: testJ()... > #test testJ > [main] 09:16:35,056 INFO [org.apache.activemq.artemis.tests.integration] > Beginning iteration 0 > [main] 09:16:35,056 INFO [org.apache.activemq.artemis.core.server] > AMQ221000: live Message Broker is starting with configuration Broker > Configuration > (clustered=false,journalDirectory=/home/hudson/hudson_workspace/workspace/artemis-project-testsuite-rhel-eduda/b87e4e55/tests/integration-tests/./target/tmp/junit6436722967996899345/journal0-L,bindingsDirectory=/home/hudson/hudson_workspace/workspace/artemis-project-testsuite-rhel-eduda/b87e4e55/tests/integration-tests/./target/tmp/junit6436722967996899345/bindings0-L,largeMessagesDirectory=/home/hudson/hudson_workspace/workspace/artemis-project-testsuite-rhel-eduda/b87e4e55/tests/integration-tests/./target/tmp/junit6436722967996899345/large-msg0-L,pagingDirectory=/home/hudson/hudson_workspace/workspace/artemis-project-testsuite-rhel-eduda/b87e4e55/tests/integration-tests/./target/tmp/junit6436722967996899345/page0-L) > [main] 09:16:35,056 INFO [org.apache.activemq.artemis.core.server] > AMQ221043: Protocol module found: [artemis-server]. Adding protocol support > for: CORE > [main] 09:16:35,057 INFO [org.apache.activemq.artemis.core.server] > AMQ221043: Protocol module found: [artemis-amqp-protocol]. Adding protocol > support for: AMQP > [main] 09:16:35,057 INFO [org.apache.activemq.artemis.core.server] > AMQ221043: Protocol module found: [artemis-stomp-protocol]. Adding protocol > support for: STOMP > [main] 09:16:35,057 INFO [org.apache.activemq.artemis.core.server] > AMQ221043: Protocol module found: [artemis-openwire-protocol]. Adding > protocol support for: OPENWIRE > [main] 09:16:35,057 INFO [org.apache.activemq.artemis.core.server] > AMQ221043: Protocol module found: [artemis-hornetq-protocol]. Adding protocol > support for: HORNETQ > [main] 09:16:35,057 INFO [org.apache.activemq.artemis.core.server] > AMQ221043: Protocol module found: [artemis-mqtt-protocol]. Adding protocol > support for: MQTT > [main] 09:16:35,064 INFO [org.apache.activemq.artemis.core.server] > AMQ221007: Server is now live > [main] 09:16:35,064 INFO [org.apache.activemq.artemis.core.server] > AMQ221001: Apache ActiveMQ Artemis Message Broker version 2.4.0-SNAPSHOT > [localhost, nodeID=376dc7b0-c099-11e7-a996-fa163ef1f361] > [Timer-10] 09:16:35,623 INFO [org.apache.activemq.artemis.tests.integration] > ** Failing connection > [Timer-10] 09:16:35,623 WARN [org.apache.activemq.artemis.core.client] > AMQ212037: Connection failure has been detected: blah [code=NOT_CONNECTED] > [Timer-10] 09:16:35,628 INFO [org.apache.activemq.artemis.tests.integration] > ** Fail complete > [main] 09:16:35,640 INFO [org.apache.activemq.artemis.core.server] > AMQ221002: Apache ActiveMQ Artemis Message Broker version 2.4.0-SNAPSHOT > [376dc7b0-c099-11e7-a996-fa163ef1f361] stopped, uptime 0.584 seconds > [main] 09:16:35,704 INFO [org.apache.activemq.artemis.tests.integration] > Beginning iteration 1 > [main] 09:16:35,706 INFO [org.apache.activemq.artemis.core.server] > AMQ221000: live Message Broker is starting with configuration Broker > Configuration > (clustered=false,journalDirectory=/home/hudson/hudson_workspace/workspace/artemis-project-testsuite-rhel-eduda/b87e4e55/tests/integration-tests/./target/tmp/junit6436722967996899345/journal0-L,bindingsDirectory=/home/hudson/hudson_workspace/workspace/artemis-project-testsuite-rhel-eduda/b87e4e55/tests/integration-tests/./target/tmp/junit6436722967996899345/bindings0-L,largeMessagesDirectory=/home/hudson/hudson_workspace/workspace/artemis-project-testsuite-rhel-eduda/b87e4e55/tests/integration-tests/./target/tmp/junit6436722967996899345/large-msg0-L,pagingDirectory=/home/hudson/hudson_workspace/workspace/artemis-project-testsuite-rhel-eduda/b87e4e55/tests/integration-tests/./target/tmp/junit6436722967996899345/page0-L) > [main] 09:16:35,706 INFO [org.apache.activemq.artemis.core.server] > AMQ221043: Protocol module found: [artemis-server]. Adding protocol support > for: CORE > [main] 09:16:35,707 INFO [org.apache.activemq.artemis.core.server] > AMQ221043: Protocol module found: [artemis-amqp-protocol]. Adding protocol > support for: AMQP > [main] 09:16:35,707 INFO [org.apache.activemq.artemis.core.server] > AMQ221043: Protocol module found: [artemis-stomp-protocol]. Adding protocol > support for: STOMP > [main] 09:16:35,707 INFO [org.apache.activemq.artemis.core.server] > AMQ221043: Protocol module found: [artemis-openwire-protocol]. Adding > protocol support for: OPENWIRE > [main] 09:16:35,707 INFO [org.apache.activemq.artemis.core.server] > AMQ221043: Protocol module found: [artemis-hornetq-protocol]. Adding protocol > support for: HORNETQ > [main] 09:16:35,707 INFO [org.apache.activemq.artemis.core.server] > AMQ221043: Protocol module found: [artemis-mqtt-protocol]. Adding protocol > support for: MQTT > [main] 09:16:35,712 INFO [org.apache.activemq.artemis.core.server] > AMQ221007: Server is now live > [main] 09:16:35,712 INFO [org.apache.activemq.artemis.core.server] > AMQ221001: Apache ActiveMQ Artemis Message Broker version 2.4.0-SNAPSHOT > [localhost, nodeID=37d0f5e8-c099-11e7-a996-fa163ef1f361] > [Timer-10] 09:16:36,433 INFO [org.apache.activemq.artemis.tests.integration] > ** Failing connection > [Timer-10] 09:16:36,433 WARN [org.apache.activemq.artemis.core.client] > AMQ212037: Connection failure has been detected: blah [code=NOT_CONNECTED] > [Timer-10] 09:16:36,436 INFO [org.apache.activemq.artemis.tests.integration] > ** Fail complete > [Thread-2101] 09:16:36,438 ERROR > [org.apache.activemq.artemis.tests.integration] Failed to run test: > ActiveMQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT > message=AMQ119014: Timed out after waiting 30,000 ms for response when > sending packet 105] > at > org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:401) > [artemis-core-client-2.4.0-SNAPSHOT.jar:2.4.0-SNAPSHOT] > at > org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:319) > [artemis-core-client-2.4.0-SNAPSHOT.jar:2.4.0-SNAPSHOT] > at > org.apache.activemq.artemis.core.protocol.core.impl.ActiveMQSessionContext.addSessionMetadata(ActiveMQSessionContext.java:377) > [artemis-core-client-2.4.0-SNAPSHOT.jar:2.4.0-SNAPSHOT] > at > org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.addMetaData(ClientSessionImpl.java:1305) > [artemis-core-client-2.4.0-SNAPSHOT.jar:2.4.0-SNAPSHOT] > at > org.apache.activemq.artemis.tests.integration.cluster.reattach.MultiThreadRandomReattachTestBase.doTestJ(MultiThreadRandomReattachTestBase.java:974) > [:] > at > org.apache.activemq.artemis.tests.integration.cluster.reattach.MultiThreadRandomReattachTestBase$10.run(MultiThreadRandomReattachTestBase.java:164) > [:] > at > org.apache.activemq.artemis.tests.integration.cluster.reattach.MultiThreadReattachSupportTestBase$1Runner.run(MultiThreadReattachSupportTestBase.java:105) > [:] > {code} > There is an issue in {{ChannelImpl::sendBlocking}} method. > {code:java} > while (!closed && (response == null || (response.getType() != > PacketImpl.EXCEPTION && response.getType() != expectedPacket)) && toWait > 0) > { > try { > sendCondition.await(toWait, TimeUnit.MILLISECONDS); > } catch (InterruptedException e) { > throw new ActiveMQInterruptedException(e); > } > if (response != null && response.getType() != PacketImpl.EXCEPTION && > response.getType() != expectedPacket) { > ActiveMQClientLogger.LOGGER.packetOutOfOrder(response, new > Exception("trace")); > } > if (closed) { > break; > } > final long now = System.currentTimeMillis(); > toWait -= now - start; > start = now; > } > if (response == null) { > throw > ActiveMQClientMessageBundle.BUNDLE.timedOutSendingPacket(connection.getBlockingCallTimeout(), > packet.getType()); > } > {code} > bq. When waiting upon a Condition, a "spurious wakeup" is permitted to occur, > in general, as a concession to the underlying platform semantics. This has > little practical impact on most application programs as a Condition should > always be waited upon in a loop, testing the state predicate that is being > waited for. An implementation is free to remove the possibility of spurious > wakeups but it is recommended that applications programmers always assume > that they can occur and so always wait in a loop. \[1\] > \[1\] > https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/Condition.html > As it is stated in the JDK API documentation, the {{sendCondition.await}} can > "spuriously wakes up" so in the body of while cycle it should be checked the > state of conditions to which the thread waits. > If the thread wakes up and the channel is closed, the thread jumps out from > the while cycle, but after the cycle there is no check if the timeout expired > or not. The first check is whether response is null or not. In this case the > response is null so the timed out exception is thrown. -- This message was sent by Atlassian JIRA (v6.4.14#64029)