Re: [PR] KAFKA-17439: Make polling for new records an explicit action/event in the new consumer [kafka]

via GitHub Fri, 25 Oct 2024 12:45:35 -0700


kirktrue commented on code in PR #17035:
URL: https://github.com/apache/kafka/pull/17035#discussion_r1817259110



##########
clients/src/main/java/org/apache/kafka/clients/consumer/internals/AsyncKafkaConsumer.java:
##########
@@ -1520,6 +1523,9 @@ private Fetch<K, V> pollForFetches(Timer timer) {
             return fetch;
         }
 
+        // send any new fetches (won't resend pending fetches)
+        sendFetches(timer);

Review Comment:
   The two `ConsumerDelegate` implementations work differently:
   
   * `AsyncKafkaConsumer`: `FetchRequestManager.poll()` will complete the 
event's `Future` on the background thread before it exits, i.e. before the 
thread starts the network I/O. Completing the `Future` starts the application 
thread racing toward logging that message and the background thread racing 
toward starting network I/O. I'll admit—I haven't dug through the code to 
surmise the relative costs of each thread's work before either cross their 
respective finish lines to win.
   * `ClassicKafkaConsumer`: `Fetcher.sendFetchesInternal()` calls 
`ConsumerNetworkClient.send()` to enqueue the request, but then it calls 
`NetworkClient.wakeup()`. Since the same `ConsumerNetworkClient` instance used 
by the consumer is also used by `AbstractCoordinator.HeartbeatThread`, it's 
technically possible that the heartbeat thread's `run()` method could start 
network I/O when it calls `NetworkClient.pollNoWakeup()`. Granted, that's a 
race that the application thread is much more likely to win given that the 
heartbeat thread runs much less frequently.
   
   Here are some points to consider:
   
   * The definition of the term "poll" as used in the log is open to 
interpretation. The term "poll" is everywhere, making its meaning ambiguous at 
any given point of use 😢
   * I agree there is a race condition (for both consumers, but more likely for 
the new consumer) that could result the the log message being emitted after the 
network I/O has commenced
   * For this to pose a problem to users, there needs to be other log entries 
that we're racing with, right?. We're trying to avoid the condition where the 
user is confused/mislead because the entries in the log are emitted in 
non-deterministic ordering.
   * The log line in question is only output at level `TRACE`, which I _assume_ 
is very rare for users to enable.
   
   Given the above, I'm of the opinion that it's an exercise in hair splitting 
to alter the logging. However, I could also just change it which would have 
been way less effort than researching, thinking, and composing this response 🤣
   
   If we leave the log line as it is, what would the effect be for the user?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] KAFKA-17439: Make polling for new records an explicit action/event in the new consumer [kafka]

Reply via email to