chickenchickenlove opened a new pull request, #21279:
URL: https://github.com/apache/kafka/pull/21279

   ### Description
   This PR fixes a race condition in 
`RPCProducerIdManager.maybeRequestNextBlock()` that can clobber a newly-set 
retry backoff and cause premature retries.
   
   The Problem `maybeRequestNextBlock()` sends the controller request 
asynchronously and then unconditionally resets `backoffDeadlineMs` to 
`NO_RETRY`. On the response path, `handleUnsuccessfulResponse()` sets 
`backoffDeadlineMs = now + RETRY_BACKOFF_MS`.
   
   Because the send is asynchronous, the unconditional reset in the request 
path can execute after the failure handler has already set the backoff. This 
overwrites the valid backoff with `NO_RETRY`. Consequently, a subsequent 
`generateProducerId()` call can re-send immediately, leading to unnecessary 
controller traffic and flaky test behavior.
   
   The Fix The fix replaces the unconditional reset with a conditional 
compareAndSet. We now only reset the backoff if it has not been updated 
concurrently by the response handler.
   
   ### Sequence Diagram
   <img width="7840" height="8830" alt="ddd-1" 
src="https://github.com/user-attachments/assets/b7e079af-7dd3-4854-8719-503ed7cb0925";
 />
   
   ### Flaky Tests fixed by this changes.
   - 
https://develocity.apache.org/scans/tests?search.rootProjectNames=kafka&search.timeZoneId=Asia%2FTaipei&tests.container=org.apache.kafka.coordinator.transaction.ProducerIdManagerTest&tests.sortField=FLAKY
   - `ProducerIdManagerTest#testRetryBackoffOnNoResponse`
   - `ProducerIdManagerTest#testRetryBackoffOnAuthException`
   - `ProducerIdManagerTest#testRetryBackoffOnVersionMismatch`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to