Hi,

Just raised a bug as a result of a CI failure for the SyncWaitTimeoutDelayTest.

It appears to me to be a protocol bug anyone fluent in 0-10 able to
say if the bug is also in 0-10?

Is there going to be a 0-9 update that might address this?

https://issues.apache.org/jira/browse/QPID-1262

The problem in a nutshell:

TxCommitOk is not correlated with the TxCommit that initiated the work
on the broker.
So if our broker takes a long time (using SlowMessageStore) to perform
commit and client times out the wait for the TxCommitOK (as in the
SyncWaitTimeoutDelayTest) then it is possible that if a subsequent
TxCommit is sent that the TxCommitOk that is returned signals the wait
by mistake.

AMQP Method Sequence:
[C]lient
[B]roker
[S]end
[R]eceive

CS: TxCommit  (a)
BR: TxCommit  (a)
// Broker takes a lot of time
// Client times out waiting for TxCommit (a)
CS: TxCommit  (b)
BS: TxCommitOk (a)
CR: TxCommitOk  (a)
// At this point the the client thinks that its commit (a) has
succeeded, it hasn't.

My only thoughts were
a) add correlation ids to the TxCommit TxCommitOk pairs, as was done
above for clarity in the explanation.
b) close the session in the event of a timeout and re-establish session.

thoughts?
-- 
Martin Ritchie

Reply via email to