jack2012aa opened a new pull request, #20791:
URL: https://github.com/apache/kafka/pull/20791

   # Description
   The test `testCloseWithZeroTimeoutFromCallerThread` is flaky. The consumer 
may gets all of the messages after the producer is force closed, while futures 
in the producer are completed exceptionally.
   
   The bug comes from a race condition introduced by `RecordAccumulator#close` 
and `RecordAccumulator#batchReady`. `RecordAccumulator#close` sets the closed 
flag to true, and `RecordAccumulator#batchReady` thinks the batch is sendable. 
As a result those batches are sent in the same `Sender#runOnce` call because 
`runOnce` doesn't check the `forceClose` flag.
   
   # Test
   An unit test is added to `SenderTest`. It asserts that after a sender is 
force closed no message should be sent or polled.
   
   # Change
   It is hard to fully eliminate the bug: `Sender#forceClose` can happen at any 
point of `Sender#runOnce` since they run in different threads. The only way to 
ensure that "no action is permitted after force close" is to lock `runOnce`, 
which is expensive.
   
   Adding a check on the flag before the poll in `runOnce` can reduce the 
chance of the bug. Now the race condition only happens if sender is force 
closed during the poll. Notice that this eliminates the flaky test. In the test 
scenario, if poll happens during the poll, the client has nothing to operate in 
this round, and there is no next run.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to