[
https://issues.apache.org/jira/browse/KAFKA-20114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sanghyeok An updated KAFKA-20114:
---------------------------------
Description:
RPCProducerIdManager uses two independent atomics, requestInFlight and
backoffDeadlineMs. There is a remaining race that can cause premature retries
when maybeRequestNextBlock reads an outdated backoffDeadlineMs and then a
concurrent in-flight failure applies a new backoff and clears requestInFlight.
If the interleaving happens such that:
* maybeRequestNextBlock reads backoffDeadlineMs before the failure handler
updates it, and
* the failure handler clears requestInFlight before maybeRequestNextBlock
attempts compareAndSet,
then maybeRequestNextBlock can successfully set requestInFlight and call
sendRequest immediately, effectively ignoring the newly applied retry backoff.
!image-2026-02-03-08-53-05-655.png|width=1040,height=538!
was:
RPCProducerIdManager uses two independent atomics, requestInFlight and
backoffDeadlineMs. There is a remaining race that can cause premature retries
when maybeRequestNextBlock reads an outdated backoffDeadlineMs and then a
concurrent in-flight failure applies a new backoff and clears requestInFlight.
If the interleaving happens such that:
* maybeRequestNextBlock reads backoffDeadlineMs before the failure handler
updates it, and
* the failure handler clears requestInFlight before maybeRequestNextBlock
attempts compareAndSet,
then maybeRequestNextBlock can successfully set requestInFlight and call
sendRequest immediately, effectively ignoring the newly applied retry backoff.
!image-2026-02-03-08-53-05-655.png|width=847,height=438!
> Fix race between requestInFlight and backoffDeadlineMs in
> RPCProducerIdManager causing premature retries
> --------------------------------------------------------------------------------------------------------
>
> Key: KAFKA-20114
> URL: https://issues.apache.org/jira/browse/KAFKA-20114
> Project: Kafka
> Issue Type: Bug
> Reporter: sanghyeok An
> Assignee: sanghyeok An
> Priority: Minor
> Attachments: image-2026-02-03-08-53-05-655.png
>
>
> RPCProducerIdManager uses two independent atomics, requestInFlight and
> backoffDeadlineMs. There is a remaining race that can cause premature retries
> when maybeRequestNextBlock reads an outdated backoffDeadlineMs and then a
> concurrent in-flight failure applies a new backoff and clears requestInFlight.
> If the interleaving happens such that:
> * maybeRequestNextBlock reads backoffDeadlineMs before the failure handler
> updates it, and
> * the failure handler clears requestInFlight before maybeRequestNextBlock
> attempts compareAndSet,
> then maybeRequestNextBlock can successfully set requestInFlight and call
> sendRequest immediately, effectively ignoring the newly applied retry backoff.
>
> !image-2026-02-03-08-53-05-655.png|width=1040,height=538!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)