[ https://issues.apache.org/jira/browse/KAFKA-6106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17501162#comment-17501162 ]
Guozhang Wang commented on KAFKA-6106: -------------------------------------- As discussed with [~cadonna] offline, when we have completed https://issues.apache.org/jira/browse/KAFKA-10199, we should come back to revisit this issue. An idea would be to enable processing those ready-to-go tasks while others are still being restored by the restore threads, while some heuristics can be used on which ready-to-go tasks should be processed, e.g. the upstream sub-topologies tasks would get higher priority to be executed while downstream sub-topology can still be paused since their inputs rely on upstream tasks' processed outputs. > Postpone normal processing of tasks within a thread until restoration of all > tasks have completed > ------------------------------------------------------------------------------------------------- > > Key: KAFKA-6106 > URL: https://issues.apache.org/jira/browse/KAFKA-6106 > Project: Kafka > Issue Type: Improvement > Components: streams > Affects Versions: 0.11.0.1, 1.0.0 > Reporter: Guozhang Wang > Assignee: Kamal Chandraprakash > Priority: Major > Labels: new-streams-runtime-should-fix, newbie++ > Fix For: 1.1.1, 2.0.0 > > > Let's say a stream thread hosts multiple tasks, A and B. At the very > beginning when A and B are assigned to the thread, the thread state is > {{TASKS_ASSIGNED}}, and the thread start restoring these two tasks during > this state using the restore consumer while using normal consumer for > heartbeating. > If task A's restoration has completed earlier than task B, then the thread > will start processing A immediately even when it is still in the > {{TASKS_ASSIGNED}} phase. But processing task A will slow down restoration of > task B since it is single-thread. So the thread's transition to {{RUNNING}} > when all of its assigned tasks have completed restoring and now can be > processed will be delayed. > Note that the streams instance's state will only transit to {{RUNNING}} when > all of its threads have transit to {{RUNNING}}, so the instance's transition > will also be delayed by this scenario. > We'd better to not start processing ready tasks immediately, but instead > focus on restoration during the {{TASKS_ASSIGNED}} state to shorten the > overall time of the instance's state transition. -- This message was sent by Atlassian Jira (v8.20.1#820001)