[ 
https://issues.apache.org/jira/browse/ARTEMIS-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138219#comment-16138219
 ] 

ASF GitHub Bot commented on ARTEMIS-1368:
-----------------------------------------

Github user dudaerich commented on the issue:

    https://github.com/apache/activemq-artemis/pull/1486
  
    Thanks guys. I've updated the commit based on your feedback. I am running 
the wildfly test now.


> Artemis gets to state when it doesn't respond to producer
> ---------------------------------------------------------
>
>                 Key: ARTEMIS-1368
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-1368
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 1.5.5, 2.2.0
>            Reporter: Erich Duda
>
> *Scenario*
> * There are two servers configured in colocated replicated HA
> * There are two producers. Each one sends messages on different server to 
> InQueue.
> * There is MDB on server 2 which resends messages from InQueue to OutQueue
> * During the resending of messages, server 2 is restarted.
> * After all messages are resent, receiver is connected to server 1 and it 
> receives all messages.
> *Expectation:* All messages sent by producers to InQueue are received by 
> receiver from OutQueue.
> *Reality:* After the restart of server 2, the server 1 gets into the state 
> when it stops to respond to the producer. Producer sends a bulk of messages 
> which are marked as duplicates by server, but the exception packet is not 
> sent to producer. See below for more detailed description what is happening 
> on the server.
> *Customer impact:* Artemis may get into the state when it is not able to work 
> properly. This can lead to unavailability of service.
> This is *regression* against *7.0.z*.
> The issue wasn't reported earlier, because the test failed due to JBEAP-7968 
> before 7.1.0.ER3.
> *Detail description of what happened on server*
> I looked at the server traces to see what happened when server received 
> session commit packet from producer. Based on the traces, the server behaved 
> correctly and it even scheduled to send response packet with the duplication 
> exception.
> {code}
> 06:17:28,206 TRACE 
> [org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler] 
> (Thread-6 
> (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$3@37386953))
>  ServerSessio
> nPacketHandler::scheduling 
> response::PACKET(ActiveMQExceptionMessage)[type=20, channelID=0, 
> packetObject=ActiveMQExceptionMessage, exception= 
> ActiveMQDuplicateIdException[errorType=DUPLICATE_ID_REJECTED message=
> Duplicate message detected - message will not be routed. Message 
> information:LargeServerMessage[messageID=7034,durable=true,userID=f3b4e359-7834-11e7-bd40-001b217d6dc3,priority=4,
>  timestamp=Thu Aug 03 06:17:28 E
> DT 2017,expiration=0, durable=true, 
> address=jms.queue.InQueue,properties=TypedProperties[__AMQ_CID=829bf1a6-7834-11e7-bd40-001b217d6dc3,count=1699,counter=1700,_AMQ_DUPL_ID=1bfb61bf-2eca-4f33-9723-5998dcd84ed515
> 01755380929,_AMQ_LARGE_SIZE=409615,color=RED]]@971671687]]
> {code}
> The problem is that after this event, I cannot find message which would say 
> that the packet was sent. As you can see in code snippet below, the 
> "scheduling response" says that it was registered IOCallback, which will send 
> the packet once it is triggered.
> {code:java}
> private void sendResponse(final Packet confirmPacket,
>                              final Packet response,
>                              final boolean flush,
>                              final boolean closeChannel) {
>       if (logger.isTraceEnabled()) {
>          logger.trace("ServerSessionPacketHandler::scheduling response::" + 
> response);
>       }
>       storageManager.afterCompleteOperations(new IOCallback() {
>          @Override
>          public void onError(final int errorCode, final String errorMessage) {
>             ActiveMQServerLogger.LOGGER.errorProcessingIOCallback(errorCode, 
> errorMessage);
>             ActiveMQExceptionMessage exceptionMessage = new 
> ActiveMQExceptionMessage(ActiveMQExceptionType.createException(errorCode, 
> errorMessage));
>             doConfirmAndResponse(confirmPacket, exceptionMessage, flush, 
> closeChannel);
>             if (logger.isTraceEnabled()) {
>                logger.trace("ServerSessionPacketHandler::exception response 
> sent::" + exceptionMessage);
>             }
>          }
>          @Override
>          public void done() {
>             if (logger.isTraceEnabled()) {
>                logger.trace("ServerSessionPacketHandler::regular response 
> sent::" + response);
>             }
>             doConfirmAndResponse(confirmPacket, response, flush, 
> closeChannel);
>          }
>       });
>    }
> {code}
> It is odd that the callback wasn't never triggered. This assumption confirms 
> the warning printed at stopping of the server. In the OperationContext, there 
> were still some callbacks which weren't triggered.
> {code}
> 06:22:41,377 WARN  [org.apache.activemq.artemis.core.server] (ServerService 
> Thread Pool -- 124) AMQ222105: Could not finish context execution in 10 
> seconds: java.lang.Exception: warning
>         at 
> org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.waitContextCompletion(ServerSessionImpl.java:1141)
>  [artemis-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1]
>         at 
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.closeAllServerSessions(ActiveMQServerImpl.java:1103)
>  [artemis-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1]
>         at 
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:888)
>  [artemis-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1]
>         at 
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:793)
>  [artemis-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1]
>         at 
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.internalStop(ActiveMQServerImpl.java:688)
>  [artemis-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1]
>         at 
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:681)
>  [artemis-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1]
>         at 
> org.apache.activemq.artemis.jms.server.impl.JMSServerManagerImpl.stop(JMSServerManagerImpl.java:433)
>  [artemis-jms-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1]
>         at 
> org.wildfly.extension.messaging.activemq.jms.JMSService.doStop(JMSService.java:217)
>  [wildfly-messaging-activemq-7.1.0.GA-redhat-4.jar:7.1.0.GA-redhat-4]
>         at 
> org.wildfly.extension.messaging.activemq.jms.JMSService.access$100(JMSService.java:64)
>  [wildfly-messaging-activemq-7.1.0.GA-redhat-4.jar:7.1.0.GA-redhat-4]
>         at 
> org.wildfly.extension.messaging.activemq.jms.JMSService$2.run(JMSService.java:121)
>  [wildfly-messaging-activemq-7.1.0.GA-redhat-4.jar:7.1.0.GA-redhat-4]
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [rt.jar:1.8.0_131]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> [rt.jar:1.8.0_131]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [rt.jar:1.8.0_131]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [rt.jar:1.8.0_131]
>         at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_131]
>         at org.jboss.threads.JBossThread.run(JBossThread.java:320) 
> [jboss-threads-2.2.1.Final-redhat-1.jar:2.2.1.Final-redhat-1]
> {code}
> In the attachment you can find also thread dumps from the server. I didn't 
> find there any deadlocks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to