qianye1001 commented on PR #10564:
URL: https://github.com/apache/rocketmq/pull/10564#issuecomment-4852191511

   Thanks for the patch — but I'd argue this scenario is a **misuse of the 
producer group abstraction, not a bug**, and doesn't need a broker-side fix.
   
   Per the official docs, a producer group is explicitly defined as "*a 
collection of the same type of Producer, which sends the same type of messages 
with **consistent logic***" 
([Concept.md](https://github.com/apache/rocketmq/blob/develop/docs/en/Concept.md)).
 The transaction check-back mechanism is designed around exactly this 
invariant: "*If a transaction message is sent and the original producer crashes 
after sending, the broker will contact **other producers in the same producer 
group** to commit or rollback the transactional message*" 
([Design_Transaction.md](https://github.com/apache/rocketmq/blob/develop/docs/en/Design_Transaction.md)).
 Any producer in the group must be able to answer a check for any transaction 
sent by that group — that fungibility is what makes the check-back 
fault-tolerant.
   
   If different topics have independent transaction state / 
`TransactionListener` logic, the correct fix is on the user side: **use 
separate producer groups per business/topic**. As @contrueCT already pointed 
out in #9791, another option is a shared state store (Redis / DB) so 
`checkLocalTransaction` returns consistent results across all producers in the 
group.
   
   Concerns with the current patch:
   
   1. It papers over the misuse rather than surfacing it; users hitting this 
will silently keep depending on "sticky" routing that isn't part of the 
contract.
   2. It weakens the group-level fault tolerance. If the originating client is 
still alive but its `TransactionListener` is stateless / has lost state (e.g. 
after a restart), we now prefer it over healthier peers that could have 
answered via a shared state backend. Fallback to round-robin only kicks in when 
the preferred channel is missing, not when it's semantically the wrong one to 
ask.
   3. It adds a permanent message property (`__TXN_PRODUCER_CID__`) on every 
half message for a case that has a clean user-side solution — small per-message 
cost, but it's forever.
   4. Rolling-upgrade + rebalancing edge cases (old producer sends half, new 
broker checks back, or vice versa) add reasoning surface that isn't paying for 
itself if the root cause is a config mistake.
   
   Suggestion: close this and instead update the docs (Design_Transaction / 
best-practices) to explicitly call out "one producer group per transactional 
business — don't multiplex unrelated topics into one group". Would be happy to 
see a docs PR for that.
   
   cc @wang-jiahua — no offense intended, the diagnosis in the issue is 
accurate, I just think the fix belongs on the user side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to