Re: [DISCUSS] FLIP-370 : Support Balanced Tasks Scheduling

Yuepeng Pan Wed, 11 Oct 2023 18:49:54 -0700

Hi, Shammon.
Thanks for your feedback.

>1. This mechanism will be only supported in `SlotPool` or both `SlotPool` and 
>`DeclarativeSlotPool`?


As described on the FLIP page, the current design plans to introduce the 
waiting mechanism only in the `SlotPool`, because the existing 
`WaitingForResources` can already achieve this effect.

>Currently the two slot pools are used in different schedulers. 

Yes, that's indeed the case.

>I think this will also bring value to `DeclarativeSlotPool`, but currently 
>FLIP content seems to be based on `SlotPool`, right?

Yes. your expectations are indeed reasonable. In theory, the 
`DeclarativeSlotPool` could also benefit from a waiting mechanism, as 
discussed. The purpose of introducing the waiting mechanism is to enable the 
`SlotPool` to have a global view to calculate the globally optimal solution. 
I've rechecked the relevant logic in the `AdaptiveScheduler`, and as I 
understand, the existing mechanisms already fulfill the current feature 
requirements. You could find more conclusions on this in FLIP `3.2.5`. Of 
course, I'd be appreciated with your confirmation. If there's any 
misunderstanding on my part, please correct me.

>2. ... What should be done when the slot selected by the round-robin strategy 
>cannot meet the resource requirements?

Is this referring to the phase of task-to-slot allocation? I'm not quite sure, 
would you mind explaining it? Thanks~.

>3. Is the assignment of tasks to slots balanced based on region or job level? 

Currently, there is no specific handling based on regions, and there is no 
job-level balancing. The target effect of the current feature is to achieve 
load balancing based on the number of tasks at the Task Manager (TM) level.
Looking forward to any suggestions regarding the item you mentioned.

>When multiple TMs fail over, will it cause the balancing strategy to fail or 
>even worse? 

IIUC, when multiple Task Managers undergo failover, the results after 
successful recovery will still be maintained in a relatively balanced state.

>What is the current processing strategy?

The Slot-to-TM strategy does not change after a Task Manager undergoes failover.

Best, Regards.
Yuepeng Pan

On 2023/09/28 05:10:13 Shammon FY wrote:
> Thanks Yuepeng for initiating this discussion.
> 
> +1 in general too, in fact we have implemented a similar mechanism
> internally to ensure a balanced allocation of tasks to slots, it works well.
> 
> Some comments about the mechanism
> 
> 1. This mechanism will be only supported in `SlotPool` or both `SlotPool`
> and `DeclarativeSlotPool`? Currently the two slot pools are used in
> different schedulers. I think this will also bring value to
> `DeclarativeSlotPool`, but currently FLIP content seems to be based on
> `SlotPool`, right?
> 
> 2. In fine-grained resource management, we can set different resource
> requirements for different nodes, which means that the resources of each
> slot are different. What should be done when the slot selected by the
> round-robin strategy cannot meet the resource requirements? Will this lead
> to the failure of the balance strategy?
> 
> 3. Is the assignment of tasks to slots balanced based on region or job
> level? When multiple TMs fail over, will it cause the balancing strategy to
> fail or even worse? What is the current processing strategy?
> 
> For Zhuzhu and Rui:
> 
> IIUC, the overall balance is divided into two parts: slot to TM and task to
> slot.
> 1. Slot to TM is guaranteed by SlotManager in ResourceManager
> 2. Task to slot is guaranteed by the slot pool in JM
> 
> These two are completely independent, what are the benefits of unifying
> these two into one option? Also, do we want to share the same
> option between SlotPool in JM and SlotManager in RM? This sounds a bit
> strange.
> 
> Best,
> Shammon FY
> 
> 
> 
> On Thu, Sep 28, 2023 at 12:08 PM Rui Fan <1996fan...@gmail.com> wrote:
> 
> > Hi Zhu Zhu,
> >
> > Thanks for your feedback here!
> >
> > You are right, user needs to set 2 options:
> > - cluster.evenly-spread-out-slots=true
> > - slot.sharing-strategy=TASK_BALANCED_PREFERRED
> >
> > Update it to one option is useful at user side, so
> > `taskmanager.load-balance.mode` sounds good to me.
> > I want to check some points and behaviors about this option:
> >
> > 1. The default value is None, right?
> > 2. When it's set to Tasks, how to assign slots to TM?
> > - Option1: It's just check task number
> > - Option2: It''s check the slot number first, then check the
> > task number when the slot number is the same.
> >
> > Giving an example to explain what's the difference between them:
> >
> > - A session cluster has 2 flink jobs, they are jobA and jobB
> > - Each TM has 4 slots.
> > - The task number of one slot of jobA is 3
> > - The task number of one slot of jobB is 1
> > - We have 2 TaskManagers:
> >   - tm1 runs 3 slots of jobB, so tm1 runs 3 tasks
> >   - tm2 runs 1 slot of jobA, and 1 slot of jobB, so tm2 runs 4 tasks.
> >
> > Now, we need to run a new slot, which tm should offer it?
> > - Option1: If we just check the task number, the tm1 is better.
> > - Option2: If we check the slot number first, and then check task, the tm2
> > is better
> >
> > The original FLIP selected option2, that's why we didn't add the
> > third option. The option2 didn't break the semantics when
> > `cluster.evenly-spread-out-slots` is true, and it just improve the
> > behavior without the semantics is changed.
> >
> > In the other hands, if we choose option2, when user set
> > `taskmanager.load-balance.mode` is Tasks. It also can achieve
> > the goal when it's Slots.
> >
> > So I think the `Slots` enum isn't needed if we choose option2.
> > Of course, If we choose the option1, the enum is needed.
> >
> > Looking forward to your feedback, thanks~
> >
> > Best,
> > Rui
> >
> > On Wed, Sep 27, 2023 at 9:11 PM Zhu Zhu <reed...@gmail.com> wrote:
> >
> > > Thanks Yuepeng and Rui for creating this FLIP.
> > >
> > > +1 in general
> > > The idea is straight forward: best-effort gather all the slot requests
> > > and offered slots to form an overview before assigning slots, trying to
> > > balance the loads of task managers when assigning slots.
> > >
> > > I have one comment regarding the configuration for ease of use:
> > >
> > > IIUC, this FLIP uses an existing config 'cluster.evenly-spread-out-slots'
> > > as the main switch of the new feature. That is, from user perspective,
> > > with this improvement, the 'cluster.evenly-spread-out-slots' feature not
> > > only balances the number of slots on task managers, but also balances the
> > > number of tasks. This is a behavior change anyway. Besides that, it also
> > > requires users to set 'slot.sharing-strategy' to
> > 'TASK_BALANCED_PREFERRED'
> > > to balance the tasks in each slot.
> > >
> > > I think we can introduce a new config option
> > > `taskmanager.load-balance.mode`,
> > > which accepts "None"/"Slots"/"Tasks". `cluster.evenly-spread-out-slots`
> > > can be superseded by the "Slots" mode and get deprecated. In the future
> > > it can support more mode, e.g. "CpuCores", to work better for jobs with
> > > fine-grained resources. The proposed config option
> > > `slot.request.max-interval`
> > > then can be renamed to
> > > `taskmanager.load-balance.request-stablizing-timeout`
> > > to show its relation with the feature. The proposed
> > `slot.sharing-strategy`
> > > is not needed, because the configured "Tasks" mode will do the work.
> > >
> > > WDYT?
> > >
> > > Thanks,
> > > Zhu Zhu
> > >
> > > Yuepeng Pan <panyuep...@apache.org> 于2023年9月25日周一 16:26写道：
> > >
> > >> Hi all,
> > >>
> > >>
> > >> I and Fan Rui(CC’ed) created the FLIP-370[1] to support balanced tasks
> > >> scheduling.
> > >>
> > >>
> > >> The current strategy of Flink to deploy tasks sometimes leads some
> > >> TMs(TaskManagers) to have more tasks while others have fewer tasks,
> > >> resulting in excessive resource utilization at some TMs that contain
> > more
> > >> tasks and becoming a bottleneck for the entire job processing.
> > Developing
> > >> strategies to achieve task load balancing for TMs and reducing job
> > >> bottlenecks becomes very meaningful.
> > >>
> > >>
> > >> The raw design and discussions could be found in the Flink JIRA[2] and
> > >> Google doc[3]. We really appreciate Zhu Zhu(CC’ed) for providing some
> > >> valuable help and suggestions in advance.
> > >>
> > >>
> > >> Please refer to the FLIP[1] document for more details about the proposed
> > >> design and implementation. We welcome any feedback and opinions on this
> > >> proposal.
> > >>
> > >>
> > >> [1]
> > >>
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-370%3A+Support+Balanced+Tasks+Scheduling
> > >>
> > >> [2] https://issues.apache.org/jira/browse/FLINK-31757
> > >>
> > >> [3]
> > >>
> > https://docs.google.com/document/d/14WhrSNGBdcsRl3IK7CZO-RaZ5KXU2X1dWqxPEFr3iS8
> > >>
> > >>
> > >> Best,
> > >>
> > >> Yuepeng Pan
> > >>
> > >
> >
>

Re: [DISCUSS] FLIP-370 : Support Balanced Tasks Scheduling

Reply via email to