Hi all,
FailoverDurableSubTransactionTest sometimes fails, especially in Github
Actions or my own environment. In Apache Jenkins, it looks like it does not
happen which suggests probably a race condition problem.
FailoverDurableSubTransactionTest.testFailoverCommitListener:352 DLQ empty
expected:<0> but was:<10>
The test fails because 10 messages unexpectedly end up in the Dead Letter
Queue (DLQ).
I looked at the test and I have an hypothesis ...
1/ the test intentionally breaks the connection during commit (see plugin)
2/ before the commit breaks, the consumer has already received 10 messages
as part of the transaction, but transaction is not yet committed at this
moment
3/ the transport reconnects, the broker correctly redelivers the 10
uncommitted messages
4/ client thinks they are duplicates and sends an ack to the broker to
indicate the messages are problematic and should go to DLQ
To me, it's a client side issue. The messages previously received as part
of the transaction that is never committed should probably not be treated
as duplicated when the broker redelivers them after reconnecting.
I tried looking at the following message in the logs when the test fails
2025-12-17 01:22:46,438 [ Session Task-1] - WARN ActiveMQMessageConsumer -
> ID:runnervm6qbrg-43909-1765934560760-30:2:1:1 suppressing duplicate
> delivery on connection, poison acking: MessageDispatch {commandId = 0,
> responseRequired = false, consumerId =
> ID:runnervm6qbrg-43909-1765934560760-30:2:1:1, destination =
> topic://Failover.WithTx, message = ActiveMQTextMessage {commandId = 15,
> responseRequired = true, messageId =
> ID:runnervm6qbrg-43909-1765934560760-32:1:1:1:12, originalDestination =
> null, originalTransactionId = null, producerId =
> ID:runnervm6qbrg-43909-1765934560760-32:1:1:1, destination =
> topic://Failover.WithTx, transactionId = null, deliveryTime = 0, expiration
> = 0, timestamp = 1765934566409, arrival = 0, brokerInTime = 1765934566409,
> brokerOutTime = 1765934566433, correlationId = null, replyTo = null,
> persistent = true, type = null, priority = 4, groupID = null, groupSequence
> = 0, targetConsumerId = null, compressed = false, userID = null, content =
> org.apache.activemq.util.ByteSequence@39112b96, marshalledProperties =
> org.apache.activemq.util.ByteSequence@2c85f629, dataStructure = null,
> redeliveryCounter = 1, size = 0, properties = {ID=11}, readOnlyProperties =
> true, readOnlyBody = true, droppable = false, jmsXGroupFirstForConsumer =
> false, text = Test message}, redeliveryCounter = 1}
>
>
I'm wondering if previouslyDeliveredMessages (in ActiveMQMessageConsumer)
is not correctly populated in all failover scenarios, especially
asynchronous dispatch.
Fixing the test or reworking it would probably hide an underlying bug.
--
Jean-Louis Monteiro
http://twitter.com/jlouismonteiro
http://www.tomitribe.com