[ 
https://issues.apache.org/jira/browse/AMQ-5712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492733#comment-14492733
 ] 

Timothy Bish commented on AMQ-5712:
-----------------------------------

Thanks for the patch, I had uncovered much the same, in my testing and have a 
smaller but similar test case using the AMQP test client.  The issue occurs due 
to a race when the temp store is initialize by a change in memory usage that 
trips the limit causing the in memory message to need to be sent to disk.  The 
problem is that in the Queue send we don't see that temp storage is full yet 
and try to do the add.  While your fix does work around this is would result in 
the lose of the message(s) that arrive while this is happening which is not 
something we would want to do in the case of a Queue which has specific QOS 
guarantees.  

I am taking a look at how things work today and rethinking some of the layering 
of add vs tryAdd as it seems a bit wrong to me and can lead to this sort of 
error in more than one case as you have pointed out.  

> Broker can deadlock when using queues while producers wait on disk space
> ------------------------------------------------------------------------
>
>                 Key: AMQ-5712
>                 URL: https://issues.apache.org/jira/browse/AMQ-5712
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 5.11.1
>            Reporter: Christopher L. Shannon
>
> I am experiencing a deadlock when using a Queue with non-persistent messages. 
>  The queue has a cursor high memory water mark set (right now at 70%).  When 
> a producer is producing messages quickly to the queue and that limit gets 
> hit, the broker can deadlock.   I have tried setting producerWindowSize and 
> alwaysSyncSend which did not seem to help. When the broker hits that limit, I 
> am unable to do things like purge the queue.  Consumers can also deadlock as 
> well. 
> Note that this appears to be the same issue as described in this ticket here: 
> AMQ-2475 .  The difference is that I am using a Queue and not a Topic and the 
> fix for this appears to only have been for Topics.
> The problem appears to be in the Queue class on line 1852 inside the 
> {{cursorAdd}} method.  The method being called is {{return 
> messages.addMessageLast(msg);}} which will block indefinitely if there is no 
> space available, which in turn ties up the {{messagesLock}} from being used 
> by any other threads.  We have seen a deadlock where consumers can't consume 
> because they are waiting on this lock.   It looks like in AMQ-2475 part of 
> the fix was to replace {{messages.addMessageLast(msg)}} with 
> {{messages.tryAddMessageLast(msg, 10)}}.  I also noticed that not all of the 
> message cursors support {{tryAddMessageLast}}, which could be a problem.  
> {{FilePendingMessageCursor}} implements it but the rest of the cursors 
> (notably {{StoreQueueCursor}}) simply delegate back to {{addMessageLast}} in 
> the parent class.  So part of this fix may require implementing 
> {{tryAddMessageLast}} across more cursors.
> Here is part of the thread dump showing the stuck producer:
> {code}
> "ActiveMQ Transport: ssl:///192.168.3.142:38589" daemon prio=10 
> tid=0x00007fb46c006000 nid=0x3b1a runnable [0x00007fb4b8a0d000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000000cfb13cd0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>         at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2176)
>         at org.apache.activemq.usage.Usage.waitForSpace(Usage.java:103)
>         at org.apache.activemq.usage.Usage.waitForSpace(Usage.java:90)
>         at org.apache.activemq.usage.Usage.waitForSpace(Usage.java:80)
>         at 
> org.apache.activemq.broker.region.cursors.FilePendingMessageCursor.tryAddMessageLast(FilePendingMessageCursor.java:235)
>         - locked <0x00000000d2015ee0> (a 
> org.apache.activemq.broker.region.cursors.FilePendingMessageCursor)
>         at 
> org.apache.activemq.broker.region.cursors.FilePendingMessageCursor.addMessageLast(FilePendingMessageCursor.java:207)
>         - locked <0x00000000d2015ee0> (a 
> org.apache.activemq.broker.region.cursors.FilePendingMessageCursor)
>         at 
> org.apache.activemq.broker.region.cursors.StoreQueueCursor.addMessageLast(StoreQueueCursor.java:97)
>         - locked <0x00000000d1f20908> (a 
> org.apache.activemq.broker.region.cursors.StoreQueueCursor)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to