[jira] [Commented] (FLINK-20332) Add workers recovered from previous attempt to pending resources

Till Rohrmann (Jira) Sat, 20 Feb 2021 03:06:06 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-20332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287644#comment-17287644
 ]


Till Rohrmann commented on FLINK-20332:
---------------------------------------

Does this mean that in the Yarn case we might request some additional 
containers? Once the recovered workers register at the RM they announce their 
resource spec and if they can be used for the job, the {{YarnResourceManager}} 
might cancel some of the newly requested resources?

> Add workers recovered from previous attempt to pending resources
> ----------------------------------------------------------------
>
>                 Key: FLINK-20332
>                 URL: https://issues.apache.org/jira/browse/FLINK-20332
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>            Reporter: Xintong Song
>            Assignee: Xintong Song
>            Priority: Major
>
> For active deployments (Native K8s/Yarn/Mesos), after a JM failover, workers 
> from previous attempt should register to the new JM. Depending on the order 
> that slot requests and TM registrations arrive at the RM, it could happen 
> that RM allocates unnecessary new resources while there are recovered 
> resources that can be reused.
> A potential improvement is to add recovered workers to pending resources, so 
> that RM knows what resources are expected to be available soon and decide 
> whether to allocate new resources accordingly.
> See also the discussion in FLINK-20249.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-20332) Add workers recovered from previous attempt to pending resources

Reply via email to