[jira] [Commented] (FLINK-21136) Reactive Mode: Adjust timeout behavior in adaptive scheduler

Till Rohrmann (Jira) Fri, 05 Mar 2021 07:38:07 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-21136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17296105#comment-17296105
 ]


Till Rohrmann commented on FLINK-21136:
---------------------------------------

I like the ideas. The reactive mode needs a slightly different behaviour than 
the the normal execution. Maybe we could introduce two timeouts:

1) Not enough resource timeout
2) Resource stabilization period

The not enough resource timeout defines after what time we fail if we haven't 
obtained enough resources. For the reactive mode this could be set to 
"indefinitely".

The resource stabilization period is the time we wait after we have received 
enough resources to run the job but fewer than we have requested. For the 
reactive mode this value could be set to 0.

Alternatively, introducing a {{RequiredResourceController}} for which we have a 
{{NormalRequiredResourceController}} and a 
{{ReactiveRequiredResourceController}} sounds also like a good solution.

> Reactive Mode: Adjust timeout behavior in adaptive scheduler
> ------------------------------------------------------------
>
>                 Key: FLINK-21136
>                 URL: https://issues.apache.org/jira/browse/FLINK-21136
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>            Reporter: Robert Metzger
>            Assignee: Robert Metzger
>            Priority: Major
>             Fix For: 1.13.0
>
>
> The FLIP states the following timeout and resource registration behavior: 
> On initial startup, the declarative scheduler will wait indefinitely for 
> TaskManagers to show up. Once there are enough TaskManagers available to 
> start the job, and the set of resources is stable (see FLIP-160 for a 
> definition), the job will start running.
> Once the job has started running, and a TaskManager is lost, it will wait for 
> 10 seconds for the TaskManager to re-appear. Otherwise, the job will be 
> scheduled again with the available resources. If no TaskManagers are 
> available anymore, the declarative scheduler will wait indefinitely again for 
> new resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-21136) Reactive Mode: Adjust timeout behavior in adaptive scheduler

Reply via email to