Hi, Just raised a bug as a result of a CI failure for the SyncWaitTimeoutDelayTest.
It appears to me to be a protocol bug anyone fluent in 0-10 able to say if the bug is also in 0-10? Is there going to be a 0-9 update that might address this? https://issues.apache.org/jira/browse/QPID-1262 The problem in a nutshell: TxCommitOk is not correlated with the TxCommit that initiated the work on the broker. So if our broker takes a long time (using SlowMessageStore) to perform commit and client times out the wait for the TxCommitOK (as in the SyncWaitTimeoutDelayTest) then it is possible that if a subsequent TxCommit is sent that the TxCommitOk that is returned signals the wait by mistake. AMQP Method Sequence: [C]lient [B]roker [S]end [R]eceive CS: TxCommit (a) BR: TxCommit (a) // Broker takes a lot of time // Client times out waiting for TxCommit (a) CS: TxCommit (b) BS: TxCommitOk (a) CR: TxCommitOk (a) // At this point the the client thinks that its commit (a) has succeeded, it hasn't. My only thoughts were a) add correlation ids to the TxCommit TxCommitOk pairs, as was done above for clarity in the explanation. b) close the session in the event of a timeout and re-establish session. thoughts? -- Martin Ritchie
