[jira] [Commented] (ARTEMIS-1368) Artemis gets to state when it doesn't respond to producer

ASF GitHub Bot (JIRA) Wed, 23 Aug 2017 02:49:22 -0700

    [ 
https://issues.apache.org/jira/browse/ARTEMIS-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138160#comment-16138160
 ]


ASF GitHub Bot commented on ARTEMIS-1368:
-----------------------------------------

Github user michaelandrepearce commented on the issue:

    https://github.com/apache/activemq-artemis/pull/1486
  
    @dudaerich agreed, essentially you're catching the case where enabled is 
changed between time the execute is called and its actually run, i think there 
maybe some possible tidy up could be done.
    
    currently:
    ```
          if (enabled) {
             replicationStream.execute(() -> {
                if (enabled) {
                   pendingTokens.add(repliToken);
                   flowControl(packet.expectedEncodeSize());
                   replicatingChannel.send(packet);
                } else {
                   packet.release();
                   repliToken.replicationDone();
                }
             });
          } else {
             // Already replicating channel failed, so just play the action now
             runItNow = true;
             packet.release();
          }
    
          // Execute outside lock
    
          if (runItNow) {
             repliToken.replicationDone();
          }
    ```
    
    as it seems the old lock that once existed is remove run it now can 
probably be removed and look like the same logic block inside you added to 
execute:
    
    e.g.
    ```
          if (enabled) {
             replicationStream.execute(() -> {
                if (enabled) {
                   pendingTokens.add(repliToken);
                   flowControl(packet.expectedEncodeSize());
                   replicatingChannel.send(packet);
                } else {
                   packet.release();
                   repliToken.replicationDone();
                }
             });
          } else {
             // Already replicating channel failed, so just play the action now
             packet.release();
             repliToken.replicationDone();
          }
    ```
    
    next is the question of do we need to have this duplicated check, could 
this just be all inside the runnable, do we need to do double check? e.g just 
have
    
    ```
    replicationStream.execute(() -> {
                if (enabled) {
                   pendingTokens.add(repliToken);
                   flowControl(packet.expectedEncodeSize());
                   replicatingChannel.send(packet);
                } else {
                   packet.release();
                   repliToken.replicationDone();
                }
             });
    ```



> Artemis gets to state when it doesn't respond to producer
> ---------------------------------------------------------
>
>                 Key: ARTEMIS-1368
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-1368
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 1.5.5, 2.2.0
>            Reporter: Erich Duda
>
> *Scenario*
> * There are two servers configured in colocated replicated HA
> * There are two producers. Each one sends messages on different server to 
> InQueue.
> * There is MDB on server 2 which resends messages from InQueue to OutQueue
> * During the resending of messages, server 2 is restarted.
> * After all messages are resent, receiver is connected to server 1 and it 
> receives all messages.
> *Expectation:* All messages sent by producers to InQueue are received by 
> receiver from OutQueue.
> *Reality:* After the restart of server 2, the server 1 gets into the state 
> when it stops to respond to the producer. Producer sends a bulk of messages 
> which are marked as duplicates by server, but the exception packet is not 
> sent to producer. See below for more detailed description what is happening 
> on the server.
> *Customer impact:* Artemis may get into the state when it is not able to work 
> properly. This can lead to unavailability of service.
> This is *regression* against *7.0.z*.
> The issue wasn't reported earlier, because the test failed due to JBEAP-7968 
> before 7.1.0.ER3.
> *Detail description of what happened on server*
> I looked at the server traces to see what happened when server received 
> session commit packet from producer. Based on the traces, the server behaved 
> correctly and it even scheduled to send response packet with the duplication 
> exception.
> {code}
> 06:17:28,206 TRACE 
> [org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler] 
> (Thread-6 
> (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$3@37386953))
>  ServerSessio
> nPacketHandler::scheduling 
> response::PACKET(ActiveMQExceptionMessage)[type=20, channelID=0, 
> packetObject=ActiveMQExceptionMessage, exception= 
> ActiveMQDuplicateIdException[errorType=DUPLICATE_ID_REJECTED message=
> Duplicate message detected - message will not be routed. Message 
> information:LargeServerMessage[messageID=7034,durable=true,userID=f3b4e359-7834-11e7-bd40-001b217d6dc3,priority=4,
>  timestamp=Thu Aug 03 06:17:28 E
> DT 2017,expiration=0, durable=true, 
> address=jms.queue.InQueue,properties=TypedProperties[__AMQ_CID=829bf1a6-7834-11e7-bd40-001b217d6dc3,count=1699,counter=1700,_AMQ_DUPL_ID=1bfb61bf-2eca-4f33-9723-5998dcd84ed515
> 01755380929,_AMQ_LARGE_SIZE=409615,color=RED]]@971671687]]
> {code}
> The problem is that after this event, I cannot find message which would say 
> that the packet was sent. As you can see in code snippet below, the 
> "scheduling response" says that it was registered IOCallback, which will send 
> the packet once it is triggered.
> {code:java}
> private void sendResponse(final Packet confirmPacket,
>                              final Packet response,
>                              final boolean flush,
>                              final boolean closeChannel) {
>       if (logger.isTraceEnabled()) {
>          logger.trace("ServerSessionPacketHandler::scheduling response::" + 
> response);
>       }
>       storageManager.afterCompleteOperations(new IOCallback() {
>          @Override
>          public void onError(final int errorCode, final String errorMessage) {
>             ActiveMQServerLogger.LOGGER.errorProcessingIOCallback(errorCode, 
> errorMessage);
>             ActiveMQExceptionMessage exceptionMessage = new 
> ActiveMQExceptionMessage(ActiveMQExceptionType.createException(errorCode, 
> errorMessage));
>             doConfirmAndResponse(confirmPacket, exceptionMessage, flush, 
> closeChannel);
>             if (logger.isTraceEnabled()) {
>                logger.trace("ServerSessionPacketHandler::exception response 
> sent::" + exceptionMessage);
>             }
>          }
>          @Override
>          public void done() {
>             if (logger.isTraceEnabled()) {
>                logger.trace("ServerSessionPacketHandler::regular response 
> sent::" + response);
>             }
>             doConfirmAndResponse(confirmPacket, response, flush, 
> closeChannel);
>          }
>       });
>    }
> {code}
> It is odd that the callback wasn't never triggered. This assumption confirms 
> the warning printed at stopping of the server. In the OperationContext, there 
> were still some callbacks which weren't triggered.
> {code}
> 06:22:41,377 WARN  [org.apache.activemq.artemis.core.server] (ServerService 
> Thread Pool -- 124) AMQ222105: Could not finish context execution in 10 
> seconds: java.lang.Exception: warning
>         at 
> org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.waitContextCompletion(ServerSessionImpl.java:1141)
>  [artemis-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1]
>         at 
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.closeAllServerSessions(ActiveMQServerImpl.java:1103)
>  [artemis-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1]
>         at 
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:888)
>  [artemis-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1]
>         at 
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:793)
>  [artemis-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1]
>         at 
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.internalStop(ActiveMQServerImpl.java:688)
>  [artemis-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1]
>         at 
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:681)
>  [artemis-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1]
>         at 
> org.apache.activemq.artemis.jms.server.impl.JMSServerManagerImpl.stop(JMSServerManagerImpl.java:433)
>  [artemis-jms-server-1.5.5.006-redhat-1.jar:1.5.5.006-redhat-1]
>         at 
> org.wildfly.extension.messaging.activemq.jms.JMSService.doStop(JMSService.java:217)
>  [wildfly-messaging-activemq-7.1.0.GA-redhat-4.jar:7.1.0.GA-redhat-4]
>         at 
> org.wildfly.extension.messaging.activemq.jms.JMSService.access$100(JMSService.java:64)
>  [wildfly-messaging-activemq-7.1.0.GA-redhat-4.jar:7.1.0.GA-redhat-4]
>         at 
> org.wildfly.extension.messaging.activemq.jms.JMSService$2.run(JMSService.java:121)
>  [wildfly-messaging-activemq-7.1.0.GA-redhat-4.jar:7.1.0.GA-redhat-4]
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [rt.jar:1.8.0_131]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> [rt.jar:1.8.0_131]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [rt.jar:1.8.0_131]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [rt.jar:1.8.0_131]
>         at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_131]
>         at org.jboss.threads.JBossThread.run(JBossThread.java:320) 
> [jboss-threads-2.2.1.Final-redhat-1.jar:2.2.1.Final-redhat-1]
> {code}
> In the attachment you can find also thread dumps from the server. I didn't 
> find there any deadlocks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (ARTEMIS-1368) Artemis gets to state when it doesn't respond to producer

Reply via email to