chickenchickenlove opened a new pull request, #21279: URL: https://github.com/apache/kafka/pull/21279
### Description This PR fixes a race condition in `RPCProducerIdManager.maybeRequestNextBlock()` that can clobber a newly-set retry backoff and cause premature retries. The Problem `maybeRequestNextBlock()` sends the controller request asynchronously and then unconditionally resets `backoffDeadlineMs` to `NO_RETRY`. On the response path, `handleUnsuccessfulResponse()` sets `backoffDeadlineMs = now + RETRY_BACKOFF_MS`. Because the send is asynchronous, the unconditional reset in the request path can execute after the failure handler has already set the backoff. This overwrites the valid backoff with `NO_RETRY`. Consequently, a subsequent `generateProducerId()` call can re-send immediately, leading to unnecessary controller traffic and flaky test behavior. The Fix The fix replaces the unconditional reset with a conditional compareAndSet. We now only reset the backoff if it has not been updated concurrently by the response handler. ### Sequence Diagram <img width="7840" height="8830" alt="ddd-1" src="https://github.com/user-attachments/assets/b7e079af-7dd3-4854-8719-503ed7cb0925" /> ### Flaky Tests fixed by this changes. - https://develocity.apache.org/scans/tests?search.rootProjectNames=kafka&search.timeZoneId=Asia%2FTaipei&tests.container=org.apache.kafka.coordinator.transaction.ProducerIdManagerTest&tests.sortField=FLAKY - `ProducerIdManagerTest#testRetryBackoffOnNoResponse` - `ProducerIdManagerTest#testRetryBackoffOnAuthException` - `ProducerIdManagerTest#testRetryBackoffOnVersionMismatch` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
