[ 
https://issues.apache.org/jira/browse/KAFKA-20114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sanghyeok An updated KAFKA-20114:
---------------------------------
    Description: 
RPCProducerIdManager uses two independent atomics, requestInFlight and 
backoffDeadlineMs. There is a remaining race that can cause premature retries 
when maybeRequestNextBlock reads an outdated backoffDeadlineMs and then a 
concurrent in-flight failure applies a new backoff and clears requestInFlight.

If the interleaving happens such that:
 * maybeRequestNextBlock reads backoffDeadlineMs before the failure handler 
updates it, and
 * the failure handler clears requestInFlight before maybeRequestNextBlock 
attempts compareAndSet,

then maybeRequestNextBlock can successfully set requestInFlight and call 
sendRequest immediately, effectively ignoring the newly applied retry backoff.

 

!image-2026-02-03-08-53-05-655.png|width=1040,height=538!

  was:
RPCProducerIdManager uses two independent atomics, requestInFlight and 
backoffDeadlineMs. There is a remaining race that can cause premature retries 
when maybeRequestNextBlock reads an outdated backoffDeadlineMs and then a 
concurrent in-flight failure applies a new backoff and clears requestInFlight.

If the interleaving happens such that:
 * maybeRequestNextBlock reads backoffDeadlineMs before the failure handler 
updates it, and

 * the failure handler clears requestInFlight before maybeRequestNextBlock 
attempts compareAndSet,

then maybeRequestNextBlock can successfully set requestInFlight and call 
sendRequest immediately, effectively ignoring the newly applied retry backoff.

 

 

!image-2026-02-03-08-53-05-655.png|width=847,height=438!


> Fix race between requestInFlight and backoffDeadlineMs in 
> RPCProducerIdManager causing premature retries
> --------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-20114
>                 URL: https://issues.apache.org/jira/browse/KAFKA-20114
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: sanghyeok An
>            Assignee: sanghyeok An
>            Priority: Minor
>         Attachments: image-2026-02-03-08-53-05-655.png
>
>
> RPCProducerIdManager uses two independent atomics, requestInFlight and 
> backoffDeadlineMs. There is a remaining race that can cause premature retries 
> when maybeRequestNextBlock reads an outdated backoffDeadlineMs and then a 
> concurrent in-flight failure applies a new backoff and clears requestInFlight.
> If the interleaving happens such that:
>  * maybeRequestNextBlock reads backoffDeadlineMs before the failure handler 
> updates it, and
>  * the failure handler clears requestInFlight before maybeRequestNextBlock 
> attempts compareAndSet,
> then maybeRequestNextBlock can successfully set requestInFlight and call 
> sendRequest immediately, effectively ignoring the newly applied retry backoff.
>  
> !image-2026-02-03-08-53-05-655.png|width=1040,height=538!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to