[ 
https://issues.apache.org/jira/browse/FLINK-33977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950408#comment-17950408
 ] 

RocMarshal commented on FLINK-33977:
------------------------------------

Hi, [~fcsaky] [~ferenc-csaky] 

Could you please help push forward the follow-up work for this feature?
There are still a few outstanding tasks on this ticket:
 * {*}Fix Version/s{*}: It should be changed to {_}1.20.2{_}, but I currently 
don't have edit permissions, so I need a developer with the right access to 
help update it.

 * {*}Progressing the PR review and merge{*}: BTW, the PR has already been 
updated based on the outcome of the email discussion and has received two 
approvals. We’re now hoping to get feedback from Matthias.

Thank you very much.

> Adaptive scheduler may not minimize the number of TMs during downscaling
> ------------------------------------------------------------------------
>
>                 Key: FLINK-33977
>                 URL: https://issues.apache.org/jira/browse/FLINK-33977
>             Project: Flink
>          Issue Type: Improvement
>          Components: Autoscaler, Runtime / Coordination
>    Affects Versions: 1.18.0, 1.19.0, 1.20.0
>            Reporter: Zhanghao Chen
>            Assignee: RocMarshal
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: screenshot-1.png
>
>
> Adaptive Scheduler uses SlotAssigner to assign free slots to slot sharing 
> groups. Currently, there're two implementations of SlotAssigner available: 
> the 
> DefaultSlotAssigner that treats all slots and slot sharing groups equally and 
> the {color:#172b4d}StateLocalitySlotAssigner{color} that assigns slots based 
> on the number of local key groups to utilize local state recovery. The 
> scheduler will use the DefaultSlotAssigner when no key group assignment info 
> is available and use the StateLocalitySlotAssigner otherwise.
>  
> However, none of the SlotAssigner targets at minimizing the number of TMs, 
> which may produce suboptimal slot assignment under the Application Mode. For 
> example, when a job with 8 slot sharing groups and 2 TMs (each 4 slots) is 
> downscaled through the FLIP-291 API to have 4 slot sharing groups instead, 
> the cluster may still have 2 TMs, one with 1 free slot, and the other with 3 
> free slots. For end-users, this implies an ineffective downscaling as the 
> total cluster resources are not reduced.
>  
> We should take minimizing number of TMs into consideration as well. A 
> possible approach is to enhance the {color:#172b4d}StateLocalitySlotAssigner: 
> when the number of free slots exceeds need, sort all the TMs by a score 
> summing from the allocation scores of all slots on it, remove slots from the 
> excessive TMs with the lowest score and proceed the remaining slot 
> assignment.{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to