Hi all,

After spending a lot of time understanding this part (quite tricky), I've
spent a lot of time with various attempts. But I was always breaking
something else.
I finally came to a simpler fix that works better without breaking other
things.

I created a PR for it: https://github.com/apache/activemq/pull/1616

I ran it many times in a loop locally. I'm not sure it will pass the build
because of all sorts of flaky tests.

Feel free to review the PR and provide feedback.



--
Jean-Louis Monteiro
http://twitter.com/jlouismonteiro
http://www.tomitribe.com


On Thu, Dec 18, 2025 at 4:58 PM Arthur Naseef <[email protected]> wrote:

> Not sure if I understood the entire flow here correctly, so take this with
> a grain of salt.
>
> In a transacted session on a client using failover, there are going to
> cases where the broker and client get out-of-sync, and so the broker
> sending the same messages to the client (aka duplicates) sounds reasonable.
>
> Does the broker remember all of the messages delivered in the transaction,
> and then accept them all as consumed on the commit packet from the client?
> Or, is there an explicit ACK packet sent by the client for each message?
> I'm wondering how we prevent a race condition on the broker delivering a
> message to the client, and the client sending back a commit, both at the
> same time.  It's been a while since I looked at this code...
>
> Also in the case identified, the problem is that messages end up in the
> DLQ, right?  Does everything else work correctly (i.e. the consumer
> received a copy of each message and processed each)?  If so, then perhaps
> we need a way to recognize that race condition and deal with it correctly
> within the broker?
>
> Art
>
>
>
> On Wed, Dec 17, 2025 at 8:55 AM Jean-Louis Monteiro <
> [email protected]> wrote:
>
> > Thanks for confirming. That was my understanding as well, that's why I
> did
> > not fix the test itself.
> >
> > I need to finish the connection leak in the PooledConnectionFactory and
> > finish the Github Actions work with Jean Baptiste.
> > I'll create an issue in JIRA for now and start thinking about fixing the
> > ActiveMQMessageConsumer.
> >
> > When I have a PR up and running, I'll let you know because it will
> require
> > more eyes on it.
> >
> > --
> > Jean-Louis Monteiro
> > http://twitter.com/jlouismonteiro
> > http://www.tomitribe.com
> >
> >
> > On Wed, Dec 17, 2025 at 4:24 PM Christopher Shannon <
> > [email protected]> wrote:
> >
> > > HI Jean-Louis,
> > >
> > > I agree that this sounds like a bug if the scenario is playing out as
> you
> > > describe. If the consumer is receiving duplicate messages that were not
> > > committed before then they should of course not be considered a
> > duplicate.
> > > The broker re-sending is what it is supposed to do as the connection
> was
> > > broken and the previous transaction was not committed by the consumer.
> > >
> > > As you pointed out, the previouslyDeliveredMessages is where this
> should
> > be
> > > tracked, that gets populated a bit later though if I recall (it's been
> a
> > > while since i've been in that code). The messages first go into the
> > > deliveredMessages linked list before being moved, so without actually
> > > debugging it my assumption is maybe the messages make it to
> > > deliveredMessages but don't make it all the way to
> > > previouslyDeliveredMessages before the connection is broken, so there
> may
> > > need to be client work to fix that if that is what is happening.
> > >
> > > Chris
> > >
> > > On Wed, Dec 17, 2025 at 5:48 AM Jean-Louis Monteiro <
> > > [email protected]> wrote:
> > >
> > > > Hi all,
> > > >
> > > > FailoverDurableSubTransactionTest sometimes fails, especially in
> Github
> > > > Actions or my own environment. In Apache Jenkins, it looks like it
> does
> > > not
> > > > happen which suggests probably a race condition problem.
> > > >
> > > > FailoverDurableSubTransactionTest.testFailoverCommitListener:352 DLQ
> > > empty
> > > >  expected:<0> but was:<10>
> > > >
> > > > The test fails because 10 messages unexpectedly end up in the Dead
> > Letter
> > > > Queue (DLQ).
> > > > I looked at the test and I have an hypothesis ...
> > > > 1/ the test intentionally breaks the connection during commit (see
> > > plugin)
> > > > 2/ before the commit breaks, the consumer has already received 10
> > > messages
> > > > as part of the transaction, but transaction is not yet committed at
> > this
> > > > moment
> > > > 3/ the transport reconnects, the broker correctly redelivers the 10
> > > > uncommitted messages
> > > > 4/ client thinks they are duplicates and sends an ack to the broker
> to
> > > > indicate the messages are problematic and should go to DLQ
> > > >
> > > > To me, it's a client side issue. The messages previously received as
> > part
> > > > of the transaction that is never committed should probably not be
> > treated
> > > > as duplicated when the broker redelivers them after reconnecting.
> > > >
> > > > I tried looking at the following message in the logs when the test
> > fails
> > > >
> > > > 2025-12-17 01:22:46,438 [ Session Task-1] - WARN
> > ActiveMQMessageConsumer
> > > -
> > > > > ID:runnervm6qbrg-43909-1765934560760-30:2:1:1 suppressing duplicate
> > > > > delivery on connection, poison acking: MessageDispatch {commandId =
> > 0,
> > > > > responseRequired = false, consumerId =
> > > > > ID:runnervm6qbrg-43909-1765934560760-30:2:1:1, destination =
> > > > > topic://Failover.WithTx, message = ActiveMQTextMessage {commandId =
> > 15,
> > > > > responseRequired = true, messageId =
> > > > > ID:runnervm6qbrg-43909-1765934560760-32:1:1:1:12,
> > originalDestination =
> > > > > null, originalTransactionId = null, producerId =
> > > > > ID:runnervm6qbrg-43909-1765934560760-32:1:1:1, destination =
> > > > > topic://Failover.WithTx, transactionId = null, deliveryTime = 0,
> > > > expiration
> > > > > = 0, timestamp = 1765934566409, arrival = 0, brokerInTime =
> > > > 1765934566409,
> > > > > brokerOutTime = 1765934566433, correlationId = null, replyTo =
> null,
> > > > > persistent = true, type = null, priority = 4, groupID = null,
> > > > groupSequence
> > > > > = 0, targetConsumerId = null, compressed = false, userID = null,
> > > content
> > > > =
> > > > > org.apache.activemq.util.ByteSequence@39112b96,
> > marshalledProperties =
> > > > > org.apache.activemq.util.ByteSequence@2c85f629, dataStructure =
> > null,
> > > > > redeliveryCounter = 1, size = 0, properties = {ID=11},
> > > > readOnlyProperties =
> > > > > true, readOnlyBody = true, droppable = false,
> > > jmsXGroupFirstForConsumer =
> > > > > false, text = Test message}, redeliveryCounter = 1}
> > > > >
> > > > >
> > > > I'm wondering if previouslyDeliveredMessages (in
> > ActiveMQMessageConsumer)
> > > > is not correctly populated in all failover scenarios, especially
> > > > asynchronous dispatch.
> > > >
> > > > Fixing the test or reworking it would probably hide an underlying
> bug.
> > > > --
> > > > Jean-Louis Monteiro
> > > > http://twitter.com/jlouismonteiro
> > > > http://www.tomitribe.com
> > > >
> > >
> >
>

Reply via email to