[ 
https://issues.apache.org/jira/browse/FLINK-17012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17095459#comment-17095459
 ] 

Piotr Nowojski commented on FLINK-17012:
----------------------------------------

I'm not sure, I think I don't have a specific problem in mind, just a feeling 
that something might be an issue here. 

During unspilling the resource usage/load might be very different. If someone 
now would like to use the `idleTime` metric to scale up/down in this phase, 
that could lead to a skewed results. If the `idleTime` would be far lower 
compared to normal processing, one could decide do scale up and keep scaling up 
and up in an infinite loop. Even if that would stop, after recovery, `idleTime` 
could go back to normal, someone would decide to scale down, during recovery 
`idleTime` would be again very low, and cycle could repeat.

Maybe we can ignore this problem for now, us unaligned checkpoints wouldn't 
support scaling up/down initially, but it seems to me that to properly 
implement scaling up/down logic based on the `idleTime` and task stage status, 
we should take this into account as well. It feels like users should avoid 
rescaling up/down during the unspilling phase.

> Expose stage of task initialization
> -----------------------------------
>
>                 Key: FLINK-17012
>                 URL: https://issues.apache.org/jira/browse/FLINK-17012
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Metrics, Runtime / Task
>            Reporter: Wenlong Lyu
>            Priority: Major
>
> Currently a task switches to running before fully initialized, does not take 
> state initialization and operator initialization(#open ) in to account, which 
> may take long time to finish. As a result, there would be a weird phenomenon 
> that all tasks are running but throughput is 0. 
> I think it could be good if we can expose the initialization stage of tasks. 
> What to you think?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to