> On May 13, 2013, 3:22 p.m., Robbie Gemmell wrote: > > http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java, > > lines 744-745 > > <https://reviews.apache.org/r/10738/diff/2/?file=288993#file288993line744> > > > > What impact does dropping the message on the floor here have on the > > client? > > > > Earlier points in the call hierarchy seem to make effort to do other > > things with the message when detecting session/consumer close, so is there > > any impact from not doing so? E.g a message getting stuck acquired for a > > now-closed consumer? > > > > Does any particular attention need paid to the overridden 0-10 specific > > version of this method? > > rajith attapattu wrote: > The message will be dropped if a consumer (or session is closed or > closing). > When a consumer is closed, any messages acquired but not acknowledged > should be be made available to another consumer by the broker. > These messages will be marked redelivered. > Is this the same situation for 0-8/0-9 ? > > The situation is the same as a consumer closing with a bunch of unacked > messages when in CLIENT_ACK mode. > Alternatively we could reject the message (release in 0-10 terms). But I > don't think this is required, given that we will be closing the consumer > anyways. > > > >Earlier points in the call hierarchy seem to make effort to do other > things with the message when detecting session/consumer close > By "other things" are you referring to a reject? In 0-10 AFIK you don't > need to do it. > The situation is the same as a consumer closing with a bunch of unacked > messages when in CLIENT_ACK mode. > > >Does any particular attention need paid to the overridden 0-10 specific > version of this method? > IMO adding "if (!(isClosed() || isClosing()))" is required to prevent the > 0-10 specific method from issuing credit (and receiving more messages, that > will eventually get released). > There is a chance where the consumer could be marked closed, but not yet > reached the point of sending a cancel, by the time messageFlow() is called. > The above check should prevent it. > > Robbie Gemmell wrote: > "Did you mean race condition (instead of deadlock)?" > > I really meant deadlock. One of the uses of _messageDeliveryLock being > removed was added as part of the fix for a deadlock > (https://issues.apache.org/jira/browse/QPID-3911) and so I wonder what effect > removing it again will have. It may be the other changes mean it isn't a > problem, but I think it needs to be closely considered. > > "The message will be dropped if a consumer (or session is closed or > closing). > When a consumer is closed, any messages acquired but not acknowledged > should be be made available to another consumer by the broker. > These messages will be marked redelivered. > Is this the same situation for 0-8/0-9 ?" > > That isn't actually the case, transferred messages are the responsibility > of the session and remain acquired after consumer close until such point as > the session explicitly does something with them. I believe this is true of > 0-8/9/91 as well, and probably explains the origin of the reject/release code > I referred to earlier in the call tree than the changes would now allow the > message to be silently dropped. > > From the 0-10 spec: "Canceling a subscription MUST NOT affect pending > transfers. A transfer made prior to canceling transfers to the destination > MUST be able to be accepted, released, acquired, or rejected after the > subscription is canceled." > > rajith attapattu wrote: > Indeed, Gordon mentioned that to me and the 2nd patch (the one before I > did today) takes care of rejecting messages from the consumer. We don't need > to do the same when the session is closing, as when the session ends, the any > unacked messages are put back on the queue. > > rajith attapattu wrote: > As for QPID-3911, > There is a deadlock, albeit a bit different (involving the same locks) > from QPID-3911, that do happen in similar circumstances. > However this deadlock appears to manifest with or without this patch, > which leads me to believe that _messageDeliveryLock is not the right solution > for QPID-3911. > Sadly the solution for QPID-3911 made it worse as there are at least two > distinct cases of deadlocks involving _messageDeliveryLock. > 1. Btw _lock and _messageDeliveryLock > 2. Btw _messageDeliveryLock and _failoverMutex. > > We definitely need to find a solution for the deadlocks (at least 3 > cases) btw failoverMutex and _lock (in AMQSession), which seems to be the > root of all evil :) > We might have to drop one lock (most likely _lock) and see if we can > provide an alternative strategy to guarantee correct behaviour. >
Sorry I meant dopping _failoverMutex (not _lock in AMQSession). It might also be an opportunity to fix our less than stellar failover. - rajith ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/10738/#review20483 ----------------------------------------------------------- On May 21, 2013, 2:48 p.m., rajith attapattu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/10738/ > ----------------------------------------------------------- > > (Updated May 21, 2013, 2:48 p.m.) > > > Review request for qpid, Robbie Gemmell, Weston Price, and Rob Godfrey. > > > Description > ------- > > There are at least 3 cases where the deadlock btw _messageDeliveryLock and > _failoverMutex surfaces. Among them sending a message inside onMessage() and > the session being closed due to an error (causing the deadlock) seems to come > up a lot in production environments. There is also a deadlock btw > _messageDeliveryLock and _lock (AMQSession.java) which happens less > frequently. > The messageDeliveryLock is used to ensure that we don't close the session in > the middle of a message delivery. In order to do this we hold the lock across > onMessage(). > This causes several issues in addition to the potential to deadlock. If an > onMessage call takes longer/wedged then you cannot close the session or > failover will not happen until it returns as the same thread is holding the > failoverMutex. > > Based on an idea from Rafi, I have come up with a solution to get rid of > _messageDeliveryLock and instead use an alternative strategy to achieve > similar functionality. > In order to ensure that close() doesn't proceed until the message deliveries > currently in progress completes, an atomic counter is used to keep track of > message deliveries in progress. > The close() will wait until the count falls to zero before proceeding. No new > deliveries will be initiated bcos the close method will mark the session as > closed. > The wait has a timeout to ensure that a longer running or wedged onMessage() > will not hold up session close. > There is a slim chance that before a session being marked as closed a message > delivery could be initiated, but not yet gotten to the point of updating the > counter, hence waitForMsgDeliveryToFinish() will see it as zero and proceed > with close. But in comparison to the issues with _messageDeliveryLock, I > believe it's acceptable. > > There is an issue if MessageConsumer close is called outside of Session > close. This can be solved in a similar manner. I will wait until the current > review is complete and then post the solution for the MessageConsumer close. > I will commit them both together. > > > This addresses bug QPID-4574. > https://issues.apache.org/jira/browse/QPID-4574 > > > Diffs > ----- > > > http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQConnection.java > 1484812 > > http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQSession.java > 1484812 > > http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java > 1484812 > > http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer_0_10.java > 1484812 > > http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/util/ConditionManager.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/10738/diff/ > > > Testing > ------- > > Java test suite, tests from customers and QE around the deadlock situation. > > > Thanks, > > rajith attapattu > >
