jack2012aa opened a new pull request, #20791: URL: https://github.com/apache/kafka/pull/20791
# Description The test `testCloseWithZeroTimeoutFromCallerThread` is flaky. The consumer may gets all of the messages after the producer is force closed, while futures in the producer are completed exceptionally. The bug comes from a race condition introduced by `RecordAccumulator#close` and `RecordAccumulator#batchReady`. `RecordAccumulator#close` sets the closed flag to true, and `RecordAccumulator#batchReady` thinks the batch is sendable. As a result those batches are sent in the same `Sender#runOnce` call because `runOnce` doesn't check the `forceClose` flag. # Test An unit test is added to `SenderTest`. It asserts that after a sender is force closed no message should be sent or polled. # Change It is hard to fully eliminate the bug: `Sender#forceClose` can happen at any point of `Sender#runOnce` since they run in different threads. The only way to ensure that "no action is permitted after force close" is to lock `runOnce`, which is expensive. Adding a check on the flag before the poll in `runOnce` can reduce the chance of the bug. Now the race condition only happens if sender is force closed during the poll. Notice that this eliminates the flaky test. In the test scenario, if poll happens during the poll, the client has nothing to operate in this round, and there is no next run. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
