[
https://issues.apache.org/jira/browse/KAFKA-20000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18048806#comment-18048806
]
sanghyeok An edited comment on KAFKA-20000 at 1/3/26 3:17 AM:
--------------------------------------------------------------
[~jolshan] [~chia7712] [~francisgodinho]
Jumping in here to share my thoughts.
I think it would be better to keep the client implementation as is and focus on
the server side.
Instead of modifying the client, we should implement the retry logic for
{{CONCURRENT_TRANSACTIONS}} on the server side (specifically, within the Group
Coordinator), similar to how the broker currently handles the produce path in
TV2. It seems that while the Group Coordinator currently sends
{{AddPartitionsToTxn}} requests to the Transaction Coordinator, it doesn't
handle retries for {{CONCURRENT_TRANSACTIONS}} internally.
If we handle this on the server:
# We can strictly reuse the {{add.partitions.to.txn.retry.backoff.ms}} config
for {{TxnOffsetCommit}} as intended.
# It aligns with the design goal of TV2 by keeping clients thin. This
significantly simplifies the implementation for other Kafka clients (Go, Rust,
Python, etc.), as they won't need to implement complex backoff logic for this
specific error.
What do you think?
was (Author: JIRAUSER303328):
[~jolshan] [~chia7712] [~francisgodinho]
I think it would be better to keep the client implementation as is and focus on
the server side.
Instead of modifying the client, we should implement the retry logic for
{{CONCURRENT_TRANSACTIONS}} on the server side (specifically, within the Group
Coordinator), similar to how the broker currently handles the produce path in
TV2. It seems that while the Group Coordinator currently sends
{{AddPartitionsToTxn}} requests to the Transaction Coordinator, it doesn't
handle retries for {{CONCURRENT_TRANSACTIONS}} internally.
If we handle this on the server:
# We can strictly reuse the {{add.partitions.to.txn.retry.backoff.ms}} config
for {{TxnOffsetCommit}} as intended.
# It aligns with the design goal of TV2 by keeping clients thin. This
significantly simplifies the implementation for other Kafka clients (Go, Rust,
Python, etc.), as they won't need to implement complex backoff logic for this
specific error.
What do you think?
> Optimize retry backoff for CONCURRENT_TRANSACTIONS to improve TV2 throughput
> ----------------------------------------------------------------------------
>
> Key: KAFKA-20000
> URL: https://issues.apache.org/jira/browse/KAFKA-20000
> Project: Kafka
> Issue Type: Improvement
> Reporter: Chia-Ping Tsai
> Assignee: Francis Godinho
> Priority: Major
> Fix For: 4.3.0
>
>
> Transaction V2 introduces frequent state transitions (epoch bumps) that
> briefly reject concurrent requests with CONCURRENT_TRANSACTIONS. The default
> client retry backoff (100ms) is excessive for these transient locks, leading
> to unnecessary latency and degraded throughput. Reducing the backoff allows
> faster retries and smoother performance during state transitions.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)