[ https://issues.apache.org/jira/browse/ARTEMIS-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138219#comment-16138219 ]
ASF GitHub Bot commented on ARTEMIS-1368: ----------------------------------------- Github user dudaerich commented on the issue: https://github.com/apache/activemq-artemis/pull/1486 Thanks guys. I've updated the commit based on your feedback. I am running the wildfly test now. > Artemis gets to state when it doesn't respond to producer > --------------------------------------------------------- > > Key: ARTEMIS-1368 > URL: https://issues.apache.org/jira/browse/ARTEMIS-1368 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker > Affects Versions: 1.5.5, 2.2.0 > Reporter: Erich Duda > > *Scenario* > * There are two servers configured in colocated replicated HA > * There are two producers. Each one sends messages on different server to > InQueue. > * There is MDB on server 2 which resends messages from InQueue to OutQueue > * During the resending of messages, server 2 is restarted. > * After all messages are resent, receiver is connected to server 1 and it > receives all messages. > *Expectation:* All messages sent by producers to InQueue are received by > receiver from OutQueue. > *Reality:* After the restart of server 2, the server 1 gets into the state > when it stops to respond to the producer. Producer sends a bulk of messages > which are marked as duplicates by server, but the exception packet is not > sent to producer. See below for more detailed description what is happening > on the server. > *Customer impact:* Artemis may get into the state when it is not able to work > properly. This can lead to unavailability of service. > This is *regression* against *7.0.z*. > The issue wasn't reported earlier, because the test failed due to JBEAP-7968 > before 7.1.0.ER3. > *Detail description of what happened on server* > I looked at the server traces to see what happened when server received > session commit packet from producer. Based on the traces, the server behaved > correctly and it even scheduled to send response packet with the duplication > exception. > {code} > 06:17:28,206 TRACE > [org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler] > (Thread-6 > (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$3@37386953)) > ServerSessio > nPacketHandler::scheduling > response::PACKET(ActiveMQExceptionMessage)[type=20, channelID=0, > packetObject=ActiveMQExceptionMessage, exception= > ActiveMQDuplicateIdException[errorType=DUPLICATE_ID_REJECTED message= > Duplicate message detected - message will not be routed. Message > information:LargeServerMessage[messageID=7034,durable=true,userID=f3b4e359-7834-11e7-bd40-001b217d6dc3,priority=4, > timestamp=Thu Aug 03 06:17:28 E > DT 2017,expiration=0, durable=true, > address=jms.queue.InQueue,properties=TypedProperties[__AMQ_CID=829bf1a6-7834-11e7-bd40-001b217d6dc3,count=1699,counter=1700,_AMQ_DUPL_ID=1bfb61bf-2eca-4f33-9723-5998dcd84ed515 > 01755380929,_AMQ_LARGE_SIZE=409615,color=RED]]@971671687]] > {code} > The problem is that after this event, I cannot find message which would say > that the packet was sent. As you can see in code snippet below, the > "scheduling response" says that it was registered IOCallback, which will send > the packet once it is triggered. > {code:java} > private void sendResponse(final Packet confirmPacket, > final Packet response, > final boolean flush, > final boolean closeChannel) { > if (logger.isTraceEnabled()) { > logger.trace("ServerSessionPacketHandler::scheduling response::" + > response); > } > storageManager.afterCompleteOperations(new IOCallback() { > @Override > public void onError(final int errorCode, final String errorMessage) { > ActiveMQServerLogger.LOGGER.errorProcessingIOCallback(errorCode, > errorMessage); > ActiveMQExceptionMessage exceptionMessage = new > ActiveMQExceptionMessage(ActiveMQExceptionType.createException(errorCode, > errorMessage)); > doConfirmAndResponse(confirmPacket, exceptionMessage, flush, > closeChannel); > if (logger.isTraceEnabled()) { > logger.trace("ServerSessionPacketHandler::exception response > sent::" + exceptionMessage); > } > } > @Override > public void done() { > if (logger.isTraceEnabled()) { > logger.trace("ServerSessionPacketHandler::regular response > sent::" + response); > } > doConfirmAndResponse(confirmPacket, response, flush, > closeChannel); > } > }); > } > {code} > It is odd that the callback wasn't never triggered. This assumption confirms > the warning printed at stopping of the server. In the OperationContext, there > were still some callbacks which weren't triggered. > {code} > 06:22:41,377 WARN [org.apache.activemq.artemis.core.server] (ServerService > Thread Pool -- 124) AMQ222105: Could not finish context execution in 10 > seconds: java.lang.Exception: warning > at > org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.waitContextCompletion(ServerSessionImpl.java:1141) > [artemis-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1] > at > org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.closeAllServerSessions(ActiveMQServerImpl.java:1103) > [artemis-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1] > at > org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:888) > [artemis-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1] > at > org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:793) > [artemis-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1] > at > org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.internalStop(ActiveMQServerImpl.java:688) > [artemis-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1] > at > org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:681) > [artemis-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1] > at > org.apache.activemq.artemis.jms.server.impl.JMSServerManagerImpl.stop(JMSServerManagerImpl.java:433) > [artemis-jms-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1] > at > org.wildfly.extension.messaging.activemq.jms.JMSService.doStop(JMSService.java:217) > [wildfly-messaging-activemq-7.1.0.GA-redhat-4.jar:7.1.0.GA-redhat-4] > at > org.wildfly.extension.messaging.activemq.jms.JMSService.access$100(JMSService.java:64) > [wildfly-messaging-activemq-7.1.0.GA-redhat-4.jar:7.1.0.GA-redhat-4] > at > org.wildfly.extension.messaging.activemq.jms.JMSService$2.run(JMSService.java:121) > [wildfly-messaging-activemq-7.1.0.GA-redhat-4.jar:7.1.0.GA-redhat-4] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [rt.jar:1.8.0_131] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [rt.jar:1.8.0_131] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [rt.jar:1.8.0_131] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [rt.jar:1.8.0_131] > at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_131] > at org.jboss.threads.JBossThread.run(JBossThread.java:320) > [jboss-threads-2.2.1.Final-redhat-1.jar:2.2.1.Final-redhat-1] > {code} > In the attachment you can find also thread dumps from the server. I didn't > find there any deadlocks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)