Thanks for the update, Rui. +1 for the latest version of the FLIP.
Best, Yangze Guo On Tue, Oct 17, 2023 at 11:45 AM Rui Fan <1996fan...@gmail.com> wrote: > > Hi all, > > Offline discussed with Zhu Zhu, Yangze Guo, Yuepeng Pan. > We reached consensus on slot.request.max-interval and > taskmanager.load-balance.mode. And I have updated the FLIP. > > For a detailed introduction to taskmanager.load-balance.mode, > please refer to FLIP’s 3.1 Public Interfaces[1]. > > And the strategy for slot.request.max-intervel has been improved. > The latest strategy can be referred from FLIP’s 2.2.2 Waiting mechanism[2]. > For comparison of old and new strategies, please refer to > RejectedAlternatives[3]. > > Thanks again to everyone who participated in the discussion. > Looking forward to your continued feedback. > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-370%3A+Support+Balanced+Tasks+Scheduling#FLIP370:SupportBalancedTasksScheduling-3.1PublicInterfaces > [2] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-370%3A+Support+Balanced+Tasks+Scheduling#FLIP370:SupportBalancedTasksScheduling-2.2.2Waitingmechanism > [3] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-370%3A+Support+Balanced+Tasks+Scheduling#FLIP370:SupportBalancedTasksScheduling-RejectedAlternatives > > Best, > Rui > > On Thu, Oct 12, 2023 at 9:49 AM Yuepeng Pan <panyuep...@apache.org> wrote: >> >> Hi, Shammon. >> Thanks for your feedback. >> >> >1. This mechanism will be only supported in `SlotPool` or both `SlotPool` >> >and `DeclarativeSlotPool`? >> >> As described on the FLIP page, the current design plans to introduce the >> waiting mechanism only in the `SlotPool`, because the existing >> `WaitingForResources` can already achieve this effect. >> >> >Currently the two slot pools are used in different schedulers. >> >> Yes, that's indeed the case. >> >> >I think this will also bring value to `DeclarativeSlotPool`, but currently >> >FLIP content seems to be based on `SlotPool`, right? >> >> Yes. your expectations are indeed reasonable. In theory, the >> `DeclarativeSlotPool` could also benefit from a waiting mechanism, as >> discussed. The purpose of introducing the waiting mechanism is to enable the >> `SlotPool` to have a global view to calculate the globally optimal solution. >> I've rechecked the relevant logic in the `AdaptiveScheduler`, and as I >> understand, the existing mechanisms already fulfill the current feature >> requirements. You could find more conclusions on this in FLIP `3.2.5`. Of >> course, I'd be appreciated with your confirmation. If there's any >> misunderstanding on my part, please correct me. >> >> >2. ... What should be done when the slot selected by the round-robin >> >strategy cannot meet the resource requirements? >> >> Is this referring to the phase of task-to-slot allocation? I'm not quite >> sure, would you mind explaining it? Thanks~. >> >> >3. Is the assignment of tasks to slots balanced based on region or job >> >level? >> >> Currently, there is no specific handling based on regions, and there is no >> job-level balancing. The target effect of the current feature is to achieve >> load balancing based on the number of tasks at the Task Manager (TM) level. >> Looking forward to any suggestions regarding the item you mentioned. >> >> >When multiple TMs fail over, will it cause the balancing strategy to fail >> >or even worse? >> >> IIUC, when multiple Task Managers undergo failover, the results after >> successful recovery will still be maintained in a relatively balanced state. >> >> >What is the current processing strategy? >> >> The Slot-to-TM strategy does not change after a Task Manager undergoes >> failover. >> >> Best, Regards. >> Yuepeng Pan >> >> On 2023/09/28 05:10:13 Shammon FY wrote: >> > Thanks Yuepeng for initiating this discussion. >> > >> > +1 in general too, in fact we have implemented a similar mechanism >> > internally to ensure a balanced allocation of tasks to slots, it works >> > well. >> > >> > Some comments about the mechanism >> > >> > 1. This mechanism will be only supported in `SlotPool` or both `SlotPool` >> > and `DeclarativeSlotPool`? Currently the two slot pools are used in >> > different schedulers. I think this will also bring value to >> > `DeclarativeSlotPool`, but currently FLIP content seems to be based on >> > `SlotPool`, right? >> > >> > 2. In fine-grained resource management, we can set different resource >> > requirements for different nodes, which means that the resources of each >> > slot are different. What should be done when the slot selected by the >> > round-robin strategy cannot meet the resource requirements? Will this lead >> > to the failure of the balance strategy? >> > >> > 3. Is the assignment of tasks to slots balanced based on region or job >> > level? When multiple TMs fail over, will it cause the balancing strategy to >> > fail or even worse? What is the current processing strategy? >> > >> > For Zhuzhu and Rui: >> > >> > IIUC, the overall balance is divided into two parts: slot to TM and task to >> > slot. >> > 1. Slot to TM is guaranteed by SlotManager in ResourceManager >> > 2. Task to slot is guaranteed by the slot pool in JM >> > >> > These two are completely independent, what are the benefits of unifying >> > these two into one option? Also, do we want to share the same >> > option between SlotPool in JM and SlotManager in RM? This sounds a bit >> > strange. >> > >> > Best, >> > Shammon FY >> > >> > >> > >> > On Thu, Sep 28, 2023 at 12:08 PM Rui Fan <1996fan...@gmail.com> wrote: >> > >> > > Hi Zhu Zhu, >> > > >> > > Thanks for your feedback here! >> > > >> > > You are right, user needs to set 2 options: >> > > - cluster.evenly-spread-out-slots=true >> > > - slot.sharing-strategy=TASK_BALANCED_PREFERRED >> > > >> > > Update it to one option is useful at user side, so >> > > `taskmanager.load-balance.mode` sounds good to me. >> > > I want to check some points and behaviors about this option: >> > > >> > > 1. The default value is None, right? >> > > 2. When it's set to Tasks, how to assign slots to TM? >> > > - Option1: It's just check task number >> > > - Option2: It''s check the slot number first, then check the >> > > task number when the slot number is the same. >> > > >> > > Giving an example to explain what's the difference between them: >> > > >> > > - A session cluster has 2 flink jobs, they are jobA and jobB >> > > - Each TM has 4 slots. >> > > - The task number of one slot of jobA is 3 >> > > - The task number of one slot of jobB is 1 >> > > - We have 2 TaskManagers: >> > > - tm1 runs 3 slots of jobB, so tm1 runs 3 tasks >> > > - tm2 runs 1 slot of jobA, and 1 slot of jobB, so tm2 runs 4 tasks. >> > > >> > > Now, we need to run a new slot, which tm should offer it? >> > > - Option1: If we just check the task number, the tm1 is better. >> > > - Option2: If we check the slot number first, and then check task, the >> > > tm2 >> > > is better >> > > >> > > The original FLIP selected option2, that's why we didn't add the >> > > third option. The option2 didn't break the semantics when >> > > `cluster.evenly-spread-out-slots` is true, and it just improve the >> > > behavior without the semantics is changed. >> > > >> > > In the other hands, if we choose option2, when user set >> > > `taskmanager.load-balance.mode` is Tasks. It also can achieve >> > > the goal when it's Slots. >> > > >> > > So I think the `Slots` enum isn't needed if we choose option2. >> > > Of course, If we choose the option1, the enum is needed. >> > > >> > > Looking forward to your feedback, thanks~ >> > > >> > > Best, >> > > Rui >> > > >> > > On Wed, Sep 27, 2023 at 9:11 PM Zhu Zhu <reed...@gmail.com> wrote: >> > > >> > > > Thanks Yuepeng and Rui for creating this FLIP. >> > > > >> > > > +1 in general >> > > > The idea is straight forward: best-effort gather all the slot requests >> > > > and offered slots to form an overview before assigning slots, trying to >> > > > balance the loads of task managers when assigning slots. >> > > > >> > > > I have one comment regarding the configuration for ease of use: >> > > > >> > > > IIUC, this FLIP uses an existing config >> > > > 'cluster.evenly-spread-out-slots' >> > > > as the main switch of the new feature. That is, from user perspective, >> > > > with this improvement, the 'cluster.evenly-spread-out-slots' feature >> > > > not >> > > > only balances the number of slots on task managers, but also balances >> > > > the >> > > > number of tasks. This is a behavior change anyway. Besides that, it >> > > > also >> > > > requires users to set 'slot.sharing-strategy' to >> > > 'TASK_BALANCED_PREFERRED' >> > > > to balance the tasks in each slot. >> > > > >> > > > I think we can introduce a new config option >> > > > `taskmanager.load-balance.mode`, >> > > > which accepts "None"/"Slots"/"Tasks". `cluster.evenly-spread-out-slots` >> > > > can be superseded by the "Slots" mode and get deprecated. In the future >> > > > it can support more mode, e.g. "CpuCores", to work better for jobs with >> > > > fine-grained resources. The proposed config option >> > > > `slot.request.max-interval` >> > > > then can be renamed to >> > > > `taskmanager.load-balance.request-stablizing-timeout` >> > > > to show its relation with the feature. The proposed >> > > `slot.sharing-strategy` >> > > > is not needed, because the configured "Tasks" mode will do the work. >> > > > >> > > > WDYT? >> > > > >> > > > Thanks, >> > > > Zhu Zhu >> > > > >> > > > Yuepeng Pan <panyuep...@apache.org> 于2023年9月25日周一 16:26写道: >> > > > >> > > >> Hi all, >> > > >> >> > > >> >> > > >> I and Fan Rui(CC’ed) created the FLIP-370[1] to support balanced tasks >> > > >> scheduling. >> > > >> >> > > >> >> > > >> The current strategy of Flink to deploy tasks sometimes leads some >> > > >> TMs(TaskManagers) to have more tasks while others have fewer tasks, >> > > >> resulting in excessive resource utilization at some TMs that contain >> > > more >> > > >> tasks and becoming a bottleneck for the entire job processing. >> > > Developing >> > > >> strategies to achieve task load balancing for TMs and reducing job >> > > >> bottlenecks becomes very meaningful. >> > > >> >> > > >> >> > > >> The raw design and discussions could be found in the Flink JIRA[2] and >> > > >> Google doc[3]. We really appreciate Zhu Zhu(CC’ed) for providing some >> > > >> valuable help and suggestions in advance. >> > > >> >> > > >> >> > > >> Please refer to the FLIP[1] document for more details about the >> > > >> proposed >> > > >> design and implementation. We welcome any feedback and opinions on >> > > >> this >> > > >> proposal. >> > > >> >> > > >> >> > > >> [1] >> > > >> >> > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-370%3A+Support+Balanced+Tasks+Scheduling >> > > >> >> > > >> [2] https://issues.apache.org/jira/browse/FLINK-31757 >> > > >> >> > > >> [3] >> > > >> >> > > https://docs.google.com/document/d/14WhrSNGBdcsRl3IK7CZO-RaZ5KXU2X1dWqxPEFr3iS8 >> > > >> >> > > >> >> > > >> Best, >> > > >> >> > > >> Yuepeng Pan >> > > >> >> > > > >> > > >> >