Re: [PR] MINOR: Only commit running active and standby tasks when tasks corrupted [kafka]
cadonna commented on code in PR #14508: URL: https://github.com/apache/kafka/pull/14508#discussion_r1360616252 ## streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java: ## @@ -223,10 +223,7 @@ boolean handleCorruption(final Set corruptedTasks) { final Collection tasksToCommit = allTasks() .values() .stream() -// TODO: once we remove state restoration from the stream thread, we can also remove -// the RESTORING state here, since there will not be any restoring tasks managed -// by the stream thread anymore. -.filter(t -> t.state() == Task.State.RUNNING || t.state() == Task.State.RESTORING) +.filter(t -> t.state() == Task.State.RUNNING) Review Comment: @guozhangwang as far as I can see in the code, a restoring active task does not return `true` from `commitNeeded()`. Thus, `postCommit()` is never called here. Do you agree? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR: Only commit running active and standby tasks when tasks corrupted [kafka]
guozhangwang commented on code in PR #14508: URL: https://github.com/apache/kafka/pull/14508#discussion_r1359038910 ## streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java: ## @@ -223,10 +223,7 @@ boolean handleCorruption(final Set corruptedTasks) { final Collection tasksToCommit = allTasks() .values() .stream() -// TODO: once we remove state restoration from the stream thread, we can also remove -// the RESTORING state here, since there will not be any restoring tasks managed -// by the stream thread anymore. -.filter(t -> t.state() == Task.State.RUNNING || t.state() == Task.State.RESTORING) +.filter(t -> t.state() == Task.State.RUNNING) Review Comment: Hey @cadonna sorry I came late on this PR. One thing I'd like to raise is that in the past, we've seen active task restoring never complete under rolling restart / rebalance storm scenarios since we kept losing the progress we made thus far when reviving. I'm not 100% sure if this part of the code is related to that scenario but just try to double check. If you have thought about it and concluded this would not be related, I'm relieved :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR: Only commit running active and standby tasks when tasks corrupted [kafka]
cadonna merged PR #14508: URL: https://github.com/apache/kafka/pull/14508 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR: Only commit running active and standby tasks when tasks corrupted [kafka]
cadonna commented on code in PR #14508: URL: https://github.com/apache/kafka/pull/14508#discussion_r1356667200 ## streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java: ## @@ -223,10 +223,7 @@ boolean handleCorruption(final Set corruptedTasks) { final Collection tasksToCommit = allTasks() Review Comment: I am not sure if we strictly need to do it, because as you say standby tasks have nothing to do with the ongoing transaction. I was merely referring to what the old code path does. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR: Only commit running active and standby tasks when tasks corrupted [kafka]
lucasbru commented on code in PR #14508: URL: https://github.com/apache/kafka/pull/14508#discussion_r1356569841 ## streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java: ## @@ -223,10 +223,7 @@ boolean handleCorruption(final Set corruptedTasks) { final Collection tasksToCommit = allTasks() Review Comment: Actually, not so sure I understand this. Why do we want to checkpoint non-corrupted standby tasks here? The comment says `since this will force the ongoing txn to abort`, but what do standby tasks have to do with the ongoing txn? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR: Only commit running active and standby tasks when tasks corrupted [kafka]
lucasbru commented on code in PR #14508: URL: https://github.com/apache/kafka/pull/14508#discussion_r1355337776 ## streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java: ## @@ -223,10 +223,7 @@ boolean handleCorruption(final Set corruptedTasks) { final Collection tasksToCommit = allTasks() Review Comment: Oh, right. Forgot the checkpointing is piggy-backed here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR: Only commit running active and standby tasks when tasks corrupted [kafka]
cadonna commented on code in PR #14508: URL: https://github.com/apache/kafka/pull/14508#discussion_r1354810862 ## streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java: ## @@ -223,10 +223,7 @@ boolean handleCorruption(final Set corruptedTasks) { final Collection tasksToCommit = allTasks() Review Comment: Actually, we want to commit (actually checkpoint) non-corrupted standby tasks which are owned by the state updater. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR: Only commit running active and standby tasks when tasks corrupted [kafka]
lucasbru commented on code in PR #14508: URL: https://github.com/apache/kafka/pull/14508#discussion_r1354615069 ## streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java: ## @@ -223,10 +223,7 @@ boolean handleCorruption(final Set corruptedTasks) { final Collection tasksToCommit = allTasks() Review Comment: We could still use `tasks.allTasks()` here, since we certainly do not want to process tasks owned by the state updater right? Would seem cleaner to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR: Only commit running active and standby tasks when tasks corrupted [kafka]
cadonna commented on code in PR #14508: URL: https://github.com/apache/kafka/pull/14508#discussion_r1351723672 ## streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java: ## @@ -223,10 +223,7 @@ boolean handleCorruption(final Set corruptedTasks) { final Collection tasksToCommit = allTasks() .values() .stream() -// TODO: once we remove state restoration from the stream thread, we can also remove -// the RESTORING state here, since there will not be any restoring tasks managed -// by the stream thread anymore. -.filter(t -> t.state() == Task.State.RUNNING || t.state() == Task.State.RESTORING) +.filter(t -> t.state() == Task.State.RUNNING) Review Comment: Since restoring active tasks return false from `commitNeeded()` because they have never processed records and have never executed a punctuation, `preCommit()` and `postCommit()` are [never called on restoring active task in this specific code](https://github.com/apache/kafka/blob/c32d2338a7e0079e539b74eb16f0095380a1ce85/streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskExecutor.java#L141). This is true for enabled and disabled state updater. Additionally, as far as I know there is nothing to flush in a restoring active task, because restoration uses the state restore callback. In any case, the flush is never called for the reason I pointed out above. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR: Only commit running active and standby tasks when tasks corrupted [kafka]
cadonna commented on code in PR #14508: URL: https://github.com/apache/kafka/pull/14508#discussion_r1351723672 ## streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java: ## @@ -223,10 +223,7 @@ boolean handleCorruption(final Set corruptedTasks) { final Collection tasksToCommit = allTasks() .values() .stream() -// TODO: once we remove state restoration from the stream thread, we can also remove -// the RESTORING state here, since there will not be any restoring tasks managed -// by the stream thread anymore. -.filter(t -> t.state() == Task.State.RUNNING || t.state() == Task.State.RESTORING) +.filter(t -> t.state() == Task.State.RUNNING) Review Comment: Since restoring active tasks return false from `commitNeeded()` because they have never processed records and have never executed a punctuation, `preCommit()` and `postCommit()` are [never called on restoring active task in this specific code](https://github.com/apache/kafka/blob/c32d2338a7e0079e539b74eb16f0095380a1ce85/streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskExecutor.java#L141). This is true for enabled and disabled state updater. Additionally, as far as I know there is nothing to flush in a restoring active task, because restoration uses the state restore callback. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR: Only commit running active and standby tasks when tasks corrupted [kafka]
cadonna commented on code in PR #14508: URL: https://github.com/apache/kafka/pull/14508#discussion_r1351723672 ## streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java: ## @@ -223,10 +223,7 @@ boolean handleCorruption(final Set corruptedTasks) { final Collection tasksToCommit = allTasks() .values() .stream() -// TODO: once we remove state restoration from the stream thread, we can also remove -// the RESTORING state here, since there will not be any restoring tasks managed -// by the stream thread anymore. -.filter(t -> t.state() == Task.State.RUNNING || t.state() == Task.State.RESTORING) +.filter(t -> t.state() == Task.State.RUNNING) Review Comment: Since restoring active tasks return false from `commitNeeded()` because they have never processed records and have never execute a punctuation, `preCommit()` and `postCommit()` are [never called on restoring active task in this specific code](https://github.com/apache/kafka/blob/c32d2338a7e0079e539b74eb16f0095380a1ce85/streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskExecutor.java#L141). This is true for enabled and disabled state updater. Additionally, as far as I know there is nothing to flush in a restoring active task, because restoration uses the state restore callback. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR: Only commit running active and standby tasks when tasks corrupted [kafka]
mjsax commented on code in PR #14508: URL: https://github.com/apache/kafka/pull/14508#discussion_r1350625946 ## streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java: ## @@ -223,10 +223,7 @@ boolean handleCorruption(final Set corruptedTasks) { final Collection tasksToCommit = allTasks() .values() .stream() -// TODO: once we remove state restoration from the stream thread, we can also remove -// the RESTORING state here, since there will not be any restoring tasks managed -// by the stream thread anymore. -.filter(t -> t.state() == Task.State.RUNNING || t.state() == Task.State.RESTORING) +.filter(t -> t.state() == Task.State.RUNNING) Review Comment: I was just reading the TODO without much thinking about it... -- I guess we might still want to flush restoring tasks and write the checkpoint file (what is part to a commit) -- so should we execute `preCommit()` and `postCommit()` for those -- I agree that we won't have input topic offsets to be committed (and the should not be any TX). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR: Only commit running active and standby tasks when tasks corrupted [kafka]
cadonna commented on code in PR #14508: URL: https://github.com/apache/kafka/pull/14508#discussion_r1350271668 ## streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java: ## @@ -223,10 +223,7 @@ boolean handleCorruption(final Set corruptedTasks) { final Collection tasksToCommit = allTasks() .values() .stream() -// TODO: once we remove state restoration from the stream thread, we can also remove -// the RESTORING state here, since there will not be any restoring tasks managed -// by the stream thread anymore. -.filter(t -> t.state() == Task.State.RUNNING || t.state() == Task.State.RESTORING) +.filter(t -> t.state() == Task.State.RUNNING) Review Comment: I do not think that this is needed. Don't you agree that restoring active tasks do not need to be committed -- with or without state updater. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR: Only commit running active and standby tasks when tasks corrupted [kafka]
mjsax commented on code in PR #14508: URL: https://github.com/apache/kafka/pull/14508#discussion_r1349252619 ## streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java: ## @@ -223,10 +223,7 @@ boolean handleCorruption(final Set corruptedTasks) { final Collection tasksToCommit = allTasks() .values() .stream() -// TODO: once we remove state restoration from the stream thread, we can also remove -// the RESTORING state here, since there will not be any restoring tasks managed -// by the stream thread anymore. -.filter(t -> t.state() == Task.State.RUNNING || t.state() == Task.State.RESTORING) +.filter(t -> t.state() == Task.State.RUNNING) Review Comment: Given that we still have a feature flag, should we make this condition more complex and consider if the state updater thread is enabled or not, and check different conditions for both cases? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org