ableegoldman commented on a change in pull request #11085: URL: https://github.com/apache/kafka/pull/11085#discussion_r672760705
########## File path: streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamThread.java ########## @@ -719,10 +719,10 @@ void runOnce() { final long pollLatency = pollPhase(); - // Shutdown hook could potentially be triggered and transit the thread state to PENDING_SHUTDOWN during #pollRequests(). - // The task manager internal states could be uninitialized if the state transition happens during #onPartitionsAssigned(). Review comment: This comment and the short-circuit `return` was a fix for an NPE from a year or two ago, but it turns out we actually broke this fix when we encapsulated everything into the `pollPhase` -- [the fix](https://issues.apache.org/jira/browse/KAFKA-8620) was to return in between returning from `poll()` and calling `addRecordsToTasks()`, since we could end up with uninitialized tasks/TaskManager state if the shutdown hook was triggered during the rebalance callback. Luckily, at some point we happened to shore up the task management logic so that the rebalance callbacks will always proceed even if the thread has already been told to shut down, so we're not in any trouble here. This also means that technically, we don't even need to `return` here anymore -- but there's no real reason to continue through the loop, so I just updated the comment and left it as is. ########## File path: streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamThread.java ########## @@ -885,8 +885,8 @@ private long pollPhase() { records = pollRequests(pollTime); } else if (state == State.PENDING_SHUTDOWN) { // we are only here because there's rebalance in progress, - // just poll with zero to complete it - records = pollRequests(Duration.ZERO); + // just long poll to give it enough time to complete it + records = pollRequests(pollTime); Review comment: This is the main fix, see the PR description for full context. I was actually wondering if we shouldn't go even further and call `poll(MAX_VALUE)` instead, since there's really no reason to return from poll when the thread is shutting down but a rebalance is still in progress. Thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org