[ 
https://issues.apache.org/jira/browse/KAFKA-6106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16239733#comment-16239733
 ] 

Guozhang Wang commented on KAFKA-6106:
--------------------------------------

Today IQ will not be allowed if the stream thread is not {{State.RUNNING}} and 
the stream thread can only turn to {{RUNNING}} if all its tasks are in RUNNING 
states. So technically we can fix it two ways:

1) the proposal mentioned in this JIRA.
2) change the StateStoreProvider class, to make stores be queryable on the task 
level than on the thread level. So that some tasks' stores can be queryable 
even though other stores are not.

I'm inclined to delay going on the second option for now (in other words, add 
this config to let users choose), because of the following:

a) most streaming applications that leverage on IQ (think: analytics, 
monitoring) can only function when all such state stores are queryable. Making 
only part of these stores to be queryable while others are still bootstrapping 
would not help these applications.
b) a known issue today for IQ is that states from different tasks are not from 
the same snapshots so that if they are logically correlated IQ has "phantom 
reads" scenarios; we have thought about how to remedy such issues and one of 
them is to leverage on exactly once semantics. In that case, moving forward 
some tasks while still restoring other tasks will simply make the tasks to be 
run at further different times.

If in the future we do observe that this is a common request where users do not 
care about partial reads or inconsistent reads across states, but are really 
keen to some of the states to be queryable ASAP (btw on top of my head I feel 
even in this case the right way to solve it would be task assignment 
improvements?), then we can consider the second option.

> Postpone normal processing of tasks within a thread until restoration of all 
> tasks have completed
> -------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-6106
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6106
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>    Affects Versions: 0.11.0.1, 1.0.0
>            Reporter: Guozhang Wang
>            Assignee: Guozhang Wang
>
> Let's say a stream thread hosts multiple tasks, A and B. At the very 
> beginning when A and B are assigned to the thread, the thread state is 
> {{TASKS_ASSIGNED}}, and the thread start restoring these two tasks during 
> this state using the restore consumer while using normal consumer for 
> heartbeating.
> If task A's restoration has completed earlier than task B, then the thread 
> will start processing A immediately even when it is still in the 
> {{TASKS_ASSIGNED}} phase. But processing task A will slow down restoration of 
> task B since it is single-thread. So the thread's transition to {{RUNNING}} 
> when all of its assigned tasks have completed restoring and now can be 
> processed will be delayed.
> Note that the streams instance's state will only transit to {{RUNNING}} when 
> all of its threads have transit to {{RUNNING}}, so the instance's transition 
> will also be delayed by this scenario.
> We'd better to not start processing ready tasks immediately, but instead 
> focus on restoration during the {{TASKS_ASSIGNED}} state to shorten the 
> overall time of the instance's state transition.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to