Client does not ensure connection is closed before attempting failover ----------------------------------------------------------------------
Key: QPID-1949 URL: https://issues.apache.org/jira/browse/QPID-1949 Project: Qpid Issue Type: Bug Affects Versions: M4, 0.5 Reporter: Martin Ritchie * Summary: * A user has reported message loss from their application. On bouncing of * the broker the 'lost' messages are delivered to the broker. * * Note: * The client was using Spring so that may influence the situation. * * Issue: * The log files show 7 instances of the following which result in 7 * missing messages. * * The client log files show: * * The broker log file show: * * * 7 missing messages have delivery tags 5-11. Which says that they are * sequentially the next message from the broker. * * The only way for the 'without a handler' log to occur is if the consumer * has been removed from the look up table of the dispatcher. * And the only way for the 'null message' log to occur on the broker is is * if the message does not exist in the unacked-map * * The consumer is only removed from the list during session * closure and failover. * * If the session was closed then the broker would requeue the unacked * messages so the potential exists to have an empty map but the broker * will not send a message out after the unacked map has been cleared. * * When failover occurs the _consumer map is cleared and the consumers are * resubscribed. This is down without first stopping any existing * dispatcher so there exists the potential to receive a message after * the _consumer map has been cleared which is how the 'without a handler' * log statement occurs. * * Scenario: * * Looking over logs the sequence that best fits the events is as follows: * - Something causes Mina to be delayed causing the WriteTimoutException. * - This exception is recevied by AMQProtocolHandler#exceptionCaught * - As the WriteTimeoutException is an IOException this will cause * sessionClosed to be called to start failover. * + This is potentially the issues here. All IOExceptions are treated * as connection failure events. * - Failover Runs * + Failover assumes that the previous connection has been closed. * + Failover binds the existing objects (AMQConnection/Session) to the * new connection objects. * - Everything is reported as being successfully failed over. * However, what is neglected is that the original connection has not * been closed. * + So what occurs is that the broker sends a message to the consumer on * the original connection, as it was not notified of the client * failing over. * As the client failover reuses the original AMQSession and Dispatcher * the new messages the broker sends to the old consumer arrives at the * client and is processed by the same AMQSession and Dispatcher. * However, as the failover process cleared the _consumer map and * resubscribe the consumers the Dispatcher does not recognise the * delivery tag and so logs the 'without a handler' message. * - The Dispatcher then attempts to reject the message, however, * + The AMQSession/Dispatcher pair have been swapped to using a new Mina * ProtocolSession as part of the failover process so the reject is * sent down the second connection. The broker receives the Reject * request but as the Message was sent on a different connection the * unacknowledgemap is empty and a 'message is null' log message * produced. * * Test Strategy: * * It should be easy to demonstrate if we can send an IOException to * AMQProtocolHandler#exceptionCaught and then try sending a message. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org