[ 
https://issues.apache.org/jira/browse/KAFKA-20114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sanghyeok An updated KAFKA-20114:
---------------------------------
    Description: 
RPCProducerIdManager uses two independent atomics, requestInFlight and 
backoffDeadlineMs. There is a remaining race that can cause premature retries 
when maybeRequestNextBlock reads an outdated backoffDeadlineMs and then a 
concurrent in-flight failure applies a new backoff and clears requestInFlight.

If the interleaving happens such that:
 * maybeRequestNextBlock reads backoffDeadlineMs before the failure handler 
updates it, and
 * the failure handler clears requestInFlight before maybeRequestNextBlock 
attempts compareAndSet,

then maybeRequestNextBlock can successfully set requestInFlight and call 
sendRequest immediately, effectively ignoring the newly applied retry backoff.

 

!image-2026-02-03-08-53-05-655.png|width=1040,height=538!

 

 

 

*Previous discussion in other PR*

[https://github.com/apache/kafka/pull/21279#issuecomment-3836196135]

  was:
RPCProducerIdManager uses two independent atomics, requestInFlight and 
backoffDeadlineMs. There is a remaining race that can cause premature retries 
when maybeRequestNextBlock reads an outdated backoffDeadlineMs and then a 
concurrent in-flight failure applies a new backoff and clears requestInFlight.

If the interleaving happens such that:
 * maybeRequestNextBlock reads backoffDeadlineMs before the failure handler 
updates it, and
 * the failure handler clears requestInFlight before maybeRequestNextBlock 
attempts compareAndSet,

then maybeRequestNextBlock can successfully set requestInFlight and call 
sendRequest immediately, effectively ignoring the newly applied retry backoff.

 

!image-2026-02-03-08-53-05-655.png|width=1040,height=538!

 

 

 

### Previous discussion in other PR

https://github.com/apache/kafka/pull/21279#issuecomment-3836196135


> Fix race between requestInFlight and backoffDeadlineMs in 
> RPCProducerIdManager causing premature retries
> --------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-20114
>                 URL: https://issues.apache.org/jira/browse/KAFKA-20114
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: sanghyeok An
>            Assignee: sanghyeok An
>            Priority: Minor
>         Attachments: image-2026-02-03-08-53-05-655.png
>
>
> RPCProducerIdManager uses two independent atomics, requestInFlight and 
> backoffDeadlineMs. There is a remaining race that can cause premature retries 
> when maybeRequestNextBlock reads an outdated backoffDeadlineMs and then a 
> concurrent in-flight failure applies a new backoff and clears requestInFlight.
> If the interleaving happens such that:
>  * maybeRequestNextBlock reads backoffDeadlineMs before the failure handler 
> updates it, and
>  * the failure handler clears requestInFlight before maybeRequestNextBlock 
> attempts compareAndSet,
> then maybeRequestNextBlock can successfully set requestInFlight and call 
> sendRequest immediately, effectively ignoring the newly applied retry backoff.
>  
> !image-2026-02-03-08-53-05-655.png|width=1040,height=538!
>  
>  
>  
> *Previous discussion in other PR*
> [https://github.com/apache/kafka/pull/21279#issuecomment-3836196135]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to