Re: [PR] [ISSUE #9791] Prefer original producer when checking transaction state [rocketmq]

via GitHub Wed, 01 Jul 2026 01:36:51 -0700


qianye1001 commented on PR #10564:
URL: https://github.com/apache/rocketmq/pull/10564#issuecomment-4852191511

Thanks for the patch — but I'd argue this scenario is a **misuse of the
producer group abstraction, not a bug**, and doesn't need a broker-side fix.

Per the official docs, a producer group is explicitly defined as "*a
collection of the same type of Producer, which sends the same type of messages
with **consistent logic***"
([Concept.md](https://github.com/apache/rocketmq/blob/develop/docs/en/Concept.md)).
The transaction check-back mechanism is designed around exactly this
invariant: "*If a transaction message is sent and the original producer crashes
after sending, the broker will contact **other producers in the same producer
group** to commit or rollback the transactional message*"
([Design_Transaction.md](https://github.com/apache/rocketmq/blob/develop/docs/en/Design_Transaction.md)).
Any producer in the group must be able to answer a check for any transaction
sent by that group — that fungibility is what makes the check-back
fault-tolerant.

If different topics have independent transaction state /
`TransactionListener` logic, the correct fix is on the user side: **use
separate producer groups per business/topic**. As @contrueCT already pointed
out in #9791, another option is a shared state store (Redis / DB) so
`checkLocalTransaction` returns consistent results across all producers in the
group.

Concerns with the current patch:

1. It papers over the misuse rather than surfacing it; users hitting this
will silently keep depending on "sticky" routing that isn't part of the
contract.
2. It weakens the group-level fault tolerance. If the originating client is
still alive but its `TransactionListener` is stateless / has lost state (e.g.
after a restart), we now prefer it over healthier peers that could have
answered via a shared state backend. Fallback to round-robin only kicks in when
the preferred channel is missing, not when it's semantically the wrong one to
ask.
3. It adds a permanent message property (`__TXN_PRODUCER_CID__`) on every
half message for a case that has a clean user-side solution — small per-message
cost, but it's forever.
4. Rolling-upgrade + rebalancing edge cases (old producer sends half, new
broker checks back, or vice versa) add reasoning surface that isn't paying for
itself if the root cause is a config mistake.

Suggestion: close this and instead update the docs (Design_Transaction /
best-practices) to explicitly call out "one producer group per transactional
business — don't multiplex unrelated topics into one group". Would be happy to
see a docs PR for that.

cc @wang-jiahua — no offense intended, the diagnosis in the issue is
accurate, I just think the fix belongs on the user side.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [ISSUE #9791] Prefer original producer when checking transaction state [rocketmq]

Reply via email to