sanghyeok An created KAFKA-20114:
------------------------------------
Summary: Fix race between requestInFlight and backoffDeadlineMs in
RPCProducerIdManager causing premature retries
Key: KAFKA-20114
URL: https://issues.apache.org/jira/browse/KAFKA-20114
Project: Kafka
Issue Type: Bug
Reporter: sanghyeok An
Assignee: sanghyeok An
Attachments: image-2026-02-03-08-53-05-655.png
RPCProducerIdManager uses two independent atomics, requestInFlight and
backoffDeadlineMs. There is a remaining race that can cause premature retries
when maybeRequestNextBlock reads an outdated backoffDeadlineMs and then a
concurrent in-flight failure applies a new backoff and clears requestInFlight.
If the interleaving happens such that:
* maybeRequestNextBlock reads backoffDeadlineMs before the failure handler
updates it, and
* the failure handler clears requestInFlight before maybeRequestNextBlock
attempts compareAndSet,
then maybeRequestNextBlock can successfully set requestInFlight and call
sendRequest immediately, effectively ignoring the newly applied retry backoff.
!image-2026-02-03-08-53-05-655.png|width=847,height=438!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)