Re: [DISCUSS] FLIP-370 : Support Balanced Tasks Scheduling

Yangze Guo Mon, 16 Oct 2023 22:50:35 -0700

Thanks for the update, Rui. +1 for the latest version of the FLIP.


Best,
Yangze Guo

On Tue, Oct 17, 2023 at 11:45 AM Rui Fan <1996fan...@gmail.com> wrote:
>
> Hi all,
>
> Offline discussed with Zhu Zhu, Yangze Guo, Yuepeng Pan.
> We reached consensus on slot.request.max-interval and
> taskmanager.load-balance.mode. And I have updated the FLIP.
>
> For a detailed introduction to taskmanager.load-balance.mode,
> please refer to FLIP’s 3.1 Public Interfaces[1].
>
> And the strategy for slot.request.max-intervel has been improved.
> The latest strategy can be referred from FLIP’s 2.2.2 Waiting mechanism[2].
> For comparison of old and new strategies, please refer to 
> RejectedAlternatives[3].
>
> Thanks again to everyone who participated in the discussion.
> Looking forward to your continued feedback.
>
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-370%3A+Support+Balanced+Tasks+Scheduling#FLIP370:SupportBalancedTasksScheduling-3.1PublicInterfaces
> [2] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-370%3A+Support+Balanced+Tasks+Scheduling#FLIP370:SupportBalancedTasksScheduling-2.2.2Waitingmechanism
> [3] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-370%3A+Support+Balanced+Tasks+Scheduling#FLIP370:SupportBalancedTasksScheduling-RejectedAlternatives
>
> Best,
> Rui
>
> On Thu, Oct 12, 2023 at 9:49 AM Yuepeng Pan <panyuep...@apache.org> wrote:
>>
>> Hi, Shammon.
>> Thanks for your feedback.
>>
>> >1. This mechanism will be only supported in `SlotPool` or both `SlotPool` 
>> >and `DeclarativeSlotPool`?
>>
>> As described on the FLIP page, the current design plans to introduce the 
>> waiting mechanism only in the `SlotPool`, because the existing 
>> `WaitingForResources` can already achieve this effect.
>>
>> >Currently the two slot pools are used in different schedulers.
>>
>> Yes, that's indeed the case.
>>
>> >I think this will also bring value to `DeclarativeSlotPool`, but currently 
>> >FLIP content seems to be based on `SlotPool`, right?
>>
>> Yes. your expectations are indeed reasonable. In theory, the 
>> `DeclarativeSlotPool` could also benefit from a waiting mechanism, as 
>> discussed. The purpose of introducing the waiting mechanism is to enable the 
>> `SlotPool` to have a global view to calculate the globally optimal solution. 
>> I've rechecked the relevant logic in the `AdaptiveScheduler`, and as I 
>> understand, the existing mechanisms already fulfill the current feature 
>> requirements. You could find more conclusions on this in FLIP `3.2.5`. Of 
>> course, I'd be appreciated with your confirmation. If there's any 
>> misunderstanding on my part, please correct me.
>>
>> >2. ... What should be done when the slot selected by the round-robin 
>> >strategy cannot meet the resource requirements?
>>
>> Is this referring to the phase of task-to-slot allocation? I'm not quite 
>> sure, would you mind explaining it? Thanks~.
>>
>> >3. Is the assignment of tasks to slots balanced based on region or job 
>> >level?
>>
>> Currently, there is no specific handling based on regions, and there is no 
>> job-level balancing. The target effect of the current feature is to achieve 
>> load balancing based on the number of tasks at the Task Manager (TM) level.
>> Looking forward to any suggestions regarding the item you mentioned.
>>
>> >When multiple TMs fail over, will it cause the balancing strategy to fail 
>> >or even worse?
>>
>> IIUC, when multiple Task Managers undergo failover, the results after 
>> successful recovery will still be maintained in a relatively balanced state.
>>
>> >What is the current processing strategy?
>>
>> The Slot-to-TM strategy does not change after a Task Manager undergoes 
>> failover.
>>
>> Best, Regards.
>> Yuepeng Pan
>>
>> On 2023/09/28 05:10:13 Shammon FY wrote:
>> > Thanks Yuepeng for initiating this discussion.
>> >
>> > +1 in general too, in fact we have implemented a similar mechanism
>> > internally to ensure a balanced allocation of tasks to slots, it works 
>> > well.
>> >
>> > Some comments about the mechanism
>> >
>> > 1. This mechanism will be only supported in `SlotPool` or both `SlotPool`
>> > and `DeclarativeSlotPool`? Currently the two slot pools are used in
>> > different schedulers. I think this will also bring value to
>> > `DeclarativeSlotPool`, but currently FLIP content seems to be based on
>> > `SlotPool`, right?
>> >
>> > 2. In fine-grained resource management, we can set different resource
>> > requirements for different nodes, which means that the resources of each
>> > slot are different. What should be done when the slot selected by the
>> > round-robin strategy cannot meet the resource requirements? Will this lead
>> > to the failure of the balance strategy?
>> >
>> > 3. Is the assignment of tasks to slots balanced based on region or job
>> > level? When multiple TMs fail over, will it cause the balancing strategy to
>> > fail or even worse? What is the current processing strategy?
>> >
>> > For Zhuzhu and Rui:
>> >
>> > IIUC, the overall balance is divided into two parts: slot to TM and task to
>> > slot.
>> > 1. Slot to TM is guaranteed by SlotManager in ResourceManager
>> > 2. Task to slot is guaranteed by the slot pool in JM
>> >
>> > These two are completely independent, what are the benefits of unifying
>> > these two into one option? Also, do we want to share the same
>> > option between SlotPool in JM and SlotManager in RM? This sounds a bit
>> > strange.
>> >
>> > Best,
>> > Shammon FY
>> >
>> >
>> >
>> > On Thu, Sep 28, 2023 at 12:08 PM Rui Fan <1996fan...@gmail.com> wrote:
>> >
>> > > Hi Zhu Zhu,
>> > >
>> > > Thanks for your feedback here!
>> > >
>> > > You are right, user needs to set 2 options:
>> > > - cluster.evenly-spread-out-slots=true
>> > > - slot.sharing-strategy=TASK_BALANCED_PREFERRED
>> > >
>> > > Update it to one option is useful at user side, so
>> > > `taskmanager.load-balance.mode` sounds good to me.
>> > > I want to check some points and behaviors about this option:
>> > >
>> > > 1. The default value is None, right?
>> > > 2. When it's set to Tasks, how to assign slots to TM?
>> > > - Option1: It's just check task number
>> > > - Option2: It''s check the slot number first, then check the
>> > > task number when the slot number is the same.
>> > >
>> > > Giving an example to explain what's the difference between them:
>> > >
>> > > - A session cluster has 2 flink jobs, they are jobA and jobB
>> > > - Each TM has 4 slots.
>> > > - The task number of one slot of jobA is 3
>> > > - The task number of one slot of jobB is 1
>> > > - We have 2 TaskManagers:
>> > >   - tm1 runs 3 slots of jobB, so tm1 runs 3 tasks
>> > >   - tm2 runs 1 slot of jobA, and 1 slot of jobB, so tm2 runs 4 tasks.
>> > >
>> > > Now, we need to run a new slot, which tm should offer it?
>> > > - Option1: If we just check the task number, the tm1 is better.
>> > > - Option2: If we check the slot number first, and then check task, the 
>> > > tm2
>> > > is better
>> > >
>> > > The original FLIP selected option2, that's why we didn't add the
>> > > third option. The option2 didn't break the semantics when
>> > > `cluster.evenly-spread-out-slots` is true, and it just improve the
>> > > behavior without the semantics is changed.
>> > >
>> > > In the other hands, if we choose option2, when user set
>> > > `taskmanager.load-balance.mode` is Tasks. It also can achieve
>> > > the goal when it's Slots.
>> > >
>> > > So I think the `Slots` enum isn't needed if we choose option2.
>> > > Of course, If we choose the option1, the enum is needed.
>> > >
>> > > Looking forward to your feedback, thanks~
>> > >
>> > > Best,
>> > > Rui
>> > >
>> > > On Wed, Sep 27, 2023 at 9:11 PM Zhu Zhu <reed...@gmail.com> wrote:
>> > >
>> > > > Thanks Yuepeng and Rui for creating this FLIP.
>> > > >
>> > > > +1 in general
>> > > > The idea is straight forward: best-effort gather all the slot requests
>> > > > and offered slots to form an overview before assigning slots, trying to
>> > > > balance the loads of task managers when assigning slots.
>> > > >
>> > > > I have one comment regarding the configuration for ease of use:
>> > > >
>> > > > IIUC, this FLIP uses an existing config 
>> > > > 'cluster.evenly-spread-out-slots'
>> > > > as the main switch of the new feature. That is, from user perspective,
>> > > > with this improvement, the 'cluster.evenly-spread-out-slots' feature 
>> > > > not
>> > > > only balances the number of slots on task managers, but also balances 
>> > > > the
>> > > > number of tasks. This is a behavior change anyway. Besides that, it 
>> > > > also
>> > > > requires users to set 'slot.sharing-strategy' to
>> > > 'TASK_BALANCED_PREFERRED'
>> > > > to balance the tasks in each slot.
>> > > >
>> > > > I think we can introduce a new config option
>> > > > `taskmanager.load-balance.mode`,
>> > > > which accepts "None"/"Slots"/"Tasks". `cluster.evenly-spread-out-slots`
>> > > > can be superseded by the "Slots" mode and get deprecated. In the future
>> > > > it can support more mode, e.g. "CpuCores", to work better for jobs with
>> > > > fine-grained resources. The proposed config option
>> > > > `slot.request.max-interval`
>> > > > then can be renamed to
>> > > > `taskmanager.load-balance.request-stablizing-timeout`
>> > > > to show its relation with the feature. The proposed
>> > > `slot.sharing-strategy`
>> > > > is not needed, because the configured "Tasks" mode will do the work.
>> > > >
>> > > > WDYT?
>> > > >
>> > > > Thanks,
>> > > > Zhu Zhu
>> > > >
>> > > > Yuepeng Pan <panyuep...@apache.org> 于2023年9月25日周一 16:26写道：
>> > > >
>> > > >> Hi all,
>> > > >>
>> > > >>
>> > > >> I and Fan Rui(CC’ed) created the FLIP-370[1] to support balanced tasks
>> > > >> scheduling.
>> > > >>
>> > > >>
>> > > >> The current strategy of Flink to deploy tasks sometimes leads some
>> > > >> TMs(TaskManagers) to have more tasks while others have fewer tasks,
>> > > >> resulting in excessive resource utilization at some TMs that contain
>> > > more
>> > > >> tasks and becoming a bottleneck for the entire job processing.
>> > > Developing
>> > > >> strategies to achieve task load balancing for TMs and reducing job
>> > > >> bottlenecks becomes very meaningful.
>> > > >>
>> > > >>
>> > > >> The raw design and discussions could be found in the Flink JIRA[2] and
>> > > >> Google doc[3]. We really appreciate Zhu Zhu(CC’ed) for providing some
>> > > >> valuable help and suggestions in advance.
>> > > >>
>> > > >>
>> > > >> Please refer to the FLIP[1] document for more details about the 
>> > > >> proposed
>> > > >> design and implementation. We welcome any feedback and opinions on 
>> > > >> this
>> > > >> proposal.
>> > > >>
>> > > >>
>> > > >> [1]
>> > > >>
>> > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-370%3A+Support+Balanced+Tasks+Scheduling
>> > > >>
>> > > >> [2] https://issues.apache.org/jira/browse/FLINK-31757
>> > > >>
>> > > >> [3]
>> > > >>
>> > > https://docs.google.com/document/d/14WhrSNGBdcsRl3IK7CZO-RaZ5KXU2X1dWqxPEFr3iS8
>> > > >>
>> > > >>
>> > > >> Best,
>> > > >>
>> > > >> Yuepeng Pan
>> > > >>
>> > > >
>> > >
>> >

Re: [DISCUSS] FLIP-370 : Support Balanced Tasks Scheduling

Reply via email to