Alex Rudyy created QPID-6051:
--------------------------------

             Summary: [Java Broker] When client socket is get closed after 
transaction commit but before sending TxCommitOk back, a message might be 
deleted from a store but left on queue causing later on unhandled 
"AMQStoreException: Metadata not found" and broker crush
                 Key: QPID-6051
                 URL: https://issues.apache.org/jira/browse/QPID-6051
             Project: Qpid
          Issue Type: Bug
          Components: Java Broker
    Affects Versions: 0.28, 0.26, 0.24, 0.22, 0.20, 0.18, 0.30
            Reporter: Alex Rudyy
            Assignee: Alex Rudyy


If socket is get closed on sending back to a client  command "TxCommitOk", the 
broker tries to run rollback actions for already committed transaction. This 
might cause a deletion of the message from a message store leaving the message 
enqueued on a queue.

If later on, a consumer tries to consume such message, the broker cannot find 
message metadata (as it was previously deleted) and crushes.

Here are the exception stack traces demonstrating the issue:
{noformat}
2014-08-08 12:16:00,843 ERROR [IoSender - /127.0.0.1:36625] util.Logger 
(Logger.java:78) - error in write thread
java.net.SocketException: Connection reset
        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
        at org.apache.qpid.transport.network.io.IoSender.run(IoSender.java:313)
        at java.lang.Thread.run(Thread.java:722)
2014-08-08 12:16:00,848 ERROR [IoReceiver - /127.0.0.1:36625] 
txn.LocalTransaction$1 (LocalTransaction.java:338) - Failed to commit 
transaction
org.apache.qpid.transport.SenderClosedException: sender is closed
        at org.apache.qpid.transport.network.io.IoSender.send(IoSender.java:114)
        at org.apache.qpid.transport.network.io.IoSender.send(IoSender.java:40)
        at 
org.apache.qpid.server.protocol.AMQProtocolEngine.writeFrame(AMQProtocolEngine.java:570)
        at 
org.apache.qpid.server.handler.TxCommitHandler$1.run(TxCommitHandler.java:73)
        at org.apache.qpid.server.AMQChannel$3.run(AMQChannel.java:975)
        at 
org.apache.qpid.server.txn.LocalTransaction$1.completeDeferredWork(LocalTransaction.java:333)
        at 
org.apache.qpid.server.txn.LocalTransaction$1.checkUnderlyingCompletion(LocalTransaction.java:321)
        at 
org.apache.qpid.server.txn.LocalTransaction$1.waitForCompletion(LocalTransaction.java:313)
        at 
org.apache.qpid.server.txn.LocalTransaction.sync(LocalTransaction.java:435)
        at org.apache.qpid.server.AMQChannel.sync(AMQChannel.java:1654)
        at 
org.apache.qpid.server.AMQChannel.receivedComplete(AMQChannel.java:221)
        at 
org.apache.qpid.server.protocol.AMQProtocolEngine.receiveComplete(AMQProtocolEngine.java:273)
        at 
org.apache.qpid.server.protocol.AMQProtocolEngine.received(AMQProtocolEngine.java:256)
        at 
org.apache.qpid.server.protocol.AMQProtocolEngine.received(AMQProtocolEngine.java:85)
        at 
org.apache.qpid.server.protocol.MultiVersionProtocolEngine.received(MultiVersionProtocolEngine.java:118)
        at 
org.apache.qpid.server.protocol.MultiVersionProtocolEngine.received(MultiVersionProtocolEngine.java:37)
        at 
org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:152)
        at java.lang.Thread.run(Thread.java:722)
Caused by: java.net.SocketException: Connection reset
        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
        at org.apache.qpid.transport.network.io.IoSender.run(IoSender.java:313)
        ... 1 more
2014-08-08 12:16:00,849 ERROR [IoReceiver - /127.0.0.1:36625] 
protocol.AMQProtocolEngine (AMQProtocolEngine.java:260) - Unexpected exception 
when processing datablock
java.lang.RuntimeException: Failed to commit transaction
        at 
org.apache.qpid.server.txn.LocalTransaction$1.completeDeferredWork(LocalTransaction.java:341)
        at 
org.apache.qpid.server.txn.LocalTransaction$1.checkUnderlyingCompletion(LocalTransaction.java:321)
        at 
org.apache.qpid.server.txn.LocalTransaction$1.waitForCompletion(LocalTransaction.java:313)
        at 
org.apache.qpid.server.txn.LocalTransaction.sync(LocalTransaction.java:435)
        at org.apache.qpid.server.AMQChannel.sync(AMQChannel.java:1654)
        at 
org.apache.qpid.server.AMQChannel.receivedComplete(AMQChannel.java:221)
        at 
org.apache.qpid.server.protocol.AMQProtocolEngine.receiveComplete(AMQProtocolEngine.java:273)
        at 
org.apache.qpid.server.protocol.AMQProtocolEngine.received(AMQProtocolEngine.java:256)
        at 
org.apache.qpid.server.protocol.AMQProtocolEngine.received(AMQProtocolEngine.java:85)
        at 
org.apache.qpid.server.protocol.MultiVersionProtocolEngine.received(MultiVersionProtocolEngine.java:118)
        at 
org.apache.qpid.server.protocol.MultiVersionProtocolEngine.received(MultiVersionProtocolEngine.java:37)
        at 
org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:152)
        at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.qpid.transport.SenderClosedException: sender is closed
        at org.apache.qpid.transport.network.io.IoSender.send(IoSender.java:114)
        at org.apache.qpid.transport.network.io.IoSender.send(IoSender.java:40)
        at 
org.apache.qpid.server.protocol.AMQProtocolEngine.writeFrame(AMQProtocolEngine.java:570)
        at 
org.apache.qpid.server.handler.TxCommitHandler$1.run(TxCommitHandler.java:73)
        at org.apache.qpid.server.AMQChannel$3.run(AMQChannel.java:975)
        at 
org.apache.qpid.server.txn.LocalTransaction$1.completeDeferredWork(LocalTransaction.java:333)
        ... 12 more
Caused by: java.net.SocketException: Connection reset
        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
        at org.apache.qpid.transport.network.io.IoSender.run(IoSender.java:313)
        ... 1 more
2014-08-08 12:45:14,173 ERROR [pool-4-thread-3] server.Main$1 (Main.java:367) - 
Uncaught exception, shutting down.
java.lang.RuntimeException: org.apache.qpid.AMQStoreException: Metadata not 
found for message with id 4204052254 [error code 541: internal error]
        at 
org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$StoredBDBMessage.getMetaData(AbstractBDBMessageStore.java:1617)
        at 
org.apache.qpid.server.message.AMQMessage.getMessageMetaData(AMQMessage.java:93)
        at 
org.apache.qpid.server.message.AMQMessage.getMessagePublishInfo(AMQMessage.java:155)
        at 
org.apache.qpid.server.output.ProtocolOutputConverterImpl.createEncodedDeliverBody(ProtocolOutputConverterImpl.java:207)
        at 
org.apache.qpid.server.output.ProtocolOutputConverterImpl.writeDeliver(ProtocolOutputConverterImpl.java:70)
        at 
org.apache.qpid.server.protocol.AMQProtocolEngine$WriteDeliverMethod.deliverToClient(AMQProtocolEngine.java:1512)
        at 
org.apache.qpid.server.subscription.SubscriptionImpl.sendToClient(SubscriptionImpl.java:683)
        at 
org.apache.qpid.server.subscription.SubscriptionImpl$AckSubscription.send(SubscriptionImpl.java:293)
        at 
org.apache.qpid.server.queue.SimpleAMQQueue.deliverMessage(SimpleAMQQueue.java:877)
        at 
org.apache.qpid.server.queue.SimpleAMQQueue.attemptDelivery(SimpleAMQQueue.java:1710)
        at 
org.apache.qpid.server.queue.SimpleAMQQueue.processQueue(SimpleAMQQueue.java:1877)
        at org.apache.qpid.server.queue.QueueRunner.run(QueueRunner.java:71)
       at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.qpid.AMQStoreException: Metadata not found for message 
with id 4204052254 [error code 541: internal error]
        at 
org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.getMessageMetaData(AbstractBDBMessageStore.java:1419)
        at 
org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$StoredBDBMessage.getMetaData(AbstractBDBMessageStore.java:1613)
        ... 14 more
{noformat}

It seems we should simply stop trying to rollback transactions if they fail 
after they have actually committed and allow the underlying exception to be 
re-thrown



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to