ableegoldman commented on a change in pull request #11085:
URL: https://github.com/apache/kafka/pull/11085#discussion_r672760705



##########
File path: 
streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamThread.java
##########
@@ -719,10 +719,10 @@ void runOnce() {
 
         final long pollLatency = pollPhase();
 
-        // Shutdown hook could potentially be triggered and transit the thread 
state to PENDING_SHUTDOWN during #pollRequests().
-        // The task manager internal states could be uninitialized if the 
state transition happens during #onPartitionsAssigned().

Review comment:
       This comment and the short-circuit `return` was a fix for an NPE from a 
year or two ago, but it turns out we actually broke this fix when we 
encapsulated everything into the `pollPhase` -- [the 
fix](https://issues.apache.org/jira/browse/KAFKA-8620) was to return in between 
returning from `poll()` and calling `addRecordsToTasks()`, since we could end 
up with uninitialized tasks/TaskManager state if the shutdown hook was 
triggered during the rebalance callback. 
   Luckily, at some point we happened to shore up the task management logic so 
that the rebalance callbacks will always proceed even if the thread has already 
been told to shut down, so we're not in any trouble here. This also means that 
technically, we don't even need to `return` here anymore -- but there's no real 
reason to continue through the loop, so I just updated the comment and left it 
as is.

##########
File path: 
streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamThread.java
##########
@@ -885,8 +885,8 @@ private long pollPhase() {
             records = pollRequests(pollTime);
         } else if (state == State.PENDING_SHUTDOWN) {
             // we are only here because there's rebalance in progress,
-            // just poll with zero to complete it
-            records = pollRequests(Duration.ZERO);
+            // just long poll to give it enough time to complete it
+            records = pollRequests(pollTime);

Review comment:
       This is the main fix, see the PR description for full context. I was 
actually wondering if we shouldn't go even further and call `poll(MAX_VALUE)` 
instead, since there's really no reason to return from poll when the thread is 
shutting down but a rebalance is still in progress. Thoughts?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to