[jira] [Commented] (ARTEMIS-3809) LargeMessageControllerImpl hangs the message consume

David Bennion (Jira) Wed, 04 May 2022 07:56:05 -0700


    [ 
https://issues.apache.org/jira/browse/ARTEMIS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531770#comment-17531770
 ]


David Bennion commented on ARTEMIS-3809:
----------------------------------------

I have still been thinking about this scenario, and your explanation for a 
resolution to make this system more robust makes total sense.   The delivery 
system being robust and timing out is really important and the possibility of 
sending a single packet and then vanishing seems like a plausible edge case.  
So that is all good. 

 

The  piece of my situation that still doesn't make complete sense to me though 
is that all of these messages occur within a single JVM and using the InVM 
transporter.  I don't (that I know of) have any kind of error in the log that 
indicates stuff went wrong.  So how did I arrive at this point where a single 
packet of a large message made it through and it abandoned the send of the rest 
of it without a trace?

With the fix that you are proposing (which I believe is a correct and valuable 
fix), would it not be true that my situation would simply get a failed message 
delivery for that message and continue on around it? 

> LargeMessageControllerImpl hangs the message consume
> ----------------------------------------------------
>
>                 Key: ARTEMIS-3809
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-3809
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 2.21.0
>         Environment: OS: Windows Server 2019
> JVM: OpenJDK 64-Bit Server VM Temurin-17.0.1+12
> Max Memory (-Xmx): 6GB
> Allocated to JVM: 4.168GB
> Currently in use: 3.398GB  (heap 3.391GB, non-heap 0.123GB)
>            Reporter: David Bennion
>            Priority: Major
>              Labels: test-stability
>         Attachments: image-2022-05-03-10-51-46-872.png
>
>
> I wondered if this might be a recurrence of issue ARTEMIS-2293 but this 
> happens on 2.21.0 and I can see the code change in 
> LargeMessageControllerImpl.  
> Using the default min-large-message-size of 100K. (defaults)
> Many messages are passing through the broker when this happens.  I would 
> anticipate that most of the messages are smaller than 100K, but clearly some 
> of them must exceed.  After some number of messages, a particular consumer 
> ceases to consume messages.
> After the system became "hung" I was able to get a stack trace and I was able 
> to identify that the system is stuck in an Object.wait() for a notify that 
> appears to never come.
> Here is the trace I was able to capture:
> {code:java}
> Thread-2 (ActiveMQ-client-global-threads) id=78 state=TIMED_WAITING
>     - waiting on <0x43523a75> (a 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
>     - locked <0x43523a75> (a 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl)
>     at  java.base@17.0.1/java.lang.Object.wait(Native Method)
>     at 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.waitCompletion(LargeMessageControllerImpl.java:294)
>     at 
> org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.saveBuffer(LargeMessageControllerImpl.java:268)
>     at 
> org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.checkBuffer(ClientLargeMessageImpl.java:157)
>     at 
> org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.getBodyBuffer(ClientLargeMessageImpl.java:89)
>     at mypackage.MessageListener.handleMessage(MessageListener.java:46)
> {code}
>  
> The app can run either as a single node using the InVM transporter or as a 
> cluster using the TCP.  To my knowledge, I have only seen this issue occur on 
> the InVM. 
> I am not expert in this code, but I can tell from the call stack that 0 must 
> be the value of timeWait passed into waitCompletion().  But from what I can 
> discern of the code changes in 2.21.0,  it should be adjusting the 
> readTimeout to the timeout of the message (I think?) such that it causes the 
> read to eventually give up rather than remaining blocked forever.
> We have persistenceEnabled = false, which leads me to believe that the only 
> disk activity  for messages should be related to large messages(?).  
> On a machine and context where this was consistently happening, I adjusted 
> the min-large-message-size upwards and the problem went away.   This makes 
> sense for my application, but ultimately if a message goes across the 
> threshold to become large it appears to hang the consumer indefinitely. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARTEMIS-3809) LargeMessageControllerImpl hangs the message consume

Reply via email to