Client does not ensure connection is closed before attempting failover
----------------------------------------------------------------------
Key: QPID-1949
URL: https://issues.apache.org/jira/browse/QPID-1949
Project: Qpid
Issue Type: Bug
Affects Versions: M4, 0.5
Reporter: Martin Ritchie
* Summary:
* A user has reported message loss from their application. On bouncing of
* the broker the 'lost' messages are delivered to the broker.
*
* Note:
* The client was using Spring so that may influence the situation.
*
* Issue:
* The log files show 7 instances of the following which result in 7
* missing messages.
*
* The client log files show:
*
* The broker log file show:
*
*
* 7 missing messages have delivery tags 5-11. Which says that they are
* sequentially the next message from the broker.
*
* The only way for the 'without a handler' log to occur is if the consumer
* has been removed from the look up table of the dispatcher.
* And the only way for the 'null message' log to occur on the broker is is
* if the message does not exist in the unacked-map
*
* The consumer is only removed from the list during session
* closure and failover.
*
* If the session was closed then the broker would requeue the unacked
* messages so the potential exists to have an empty map but the broker
* will not send a message out after the unacked map has been cleared.
*
* When failover occurs the _consumer map is cleared and the consumers are
* resubscribed. This is down without first stopping any existing
* dispatcher so there exists the potential to receive a message after
* the _consumer map has been cleared which is how the 'without a handler'
* log statement occurs.
*
* Scenario:
*
* Looking over logs the sequence that best fits the events is as follows:
* - Something causes Mina to be delayed causing the WriteTimoutException.
* - This exception is recevied by AMQProtocolHandler#exceptionCaught
* - As the WriteTimeoutException is an IOException this will cause
* sessionClosed to be called to start failover.
* + This is potentially the issues here. All IOExceptions are treated
* as connection failure events.
* - Failover Runs
* + Failover assumes that the previous connection has been closed.
* + Failover binds the existing objects (AMQConnection/Session) to the
* new connection objects.
* - Everything is reported as being successfully failed over.
* However, what is neglected is that the original connection has not
* been closed.
* + So what occurs is that the broker sends a message to the consumer on
* the original connection, as it was not notified of the client
* failing over.
* As the client failover reuses the original AMQSession and Dispatcher
* the new messages the broker sends to the old consumer arrives at the
* client and is processed by the same AMQSession and Dispatcher.
* However, as the failover process cleared the _consumer map and
* resubscribe the consumers the Dispatcher does not recognise the
* delivery tag and so logs the 'without a handler' message.
* - The Dispatcher then attempts to reject the message, however,
* + The AMQSession/Dispatcher pair have been swapped to using a new Mina
* ProtocolSession as part of the failover process so the reject is
* sent down the second connection. The broker receives the Reject
* request but as the Message was sent on a different connection the
* unacknowledgemap is empty and a 'message is null' log message
* produced.
*
* Test Strategy:
*
* It should be easy to demonstrate if we can send an IOException to
* AMQProtocolHandler#exceptionCaught and then try sending a message.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project: http://qpid.apache.org
Use/Interact: mailto:[email protected]