Thanks Yuepeng and Rui for driving this Discussion.

Internally when we try to use Flink 1.17.1 in production, we are also
suffering from the unbalanced task distribution problem for jobs with high
qps and complex dag. So +1 for the overall proposal.

Some questions about the details:

1, About the waiting mechanism: Will the waiting mechanism happen only in
the second level 'assigning slots to TM'?  IIUC, the first level 'assigning
Tasks to Slots' needs only the asynchronous slot result from slotpool.

2, About the slot LoadingWeight: it is reasonable to use the number of
tasks by default in the beginning, but it would be better if this could be
easily extended in future to distinguish between CPU-intensive and
IO-intensive workloads. In some cases, TMs may have IO bottlenecks but
others have CPU bottlenecks.

Regards,
Xiangyu


Yuepeng Pan <panyuep...@apache.org> 于2023年10月5日周四 18:34写道:

> Hi, Zhu Zhu,
>
> Thanks for your feedback!
>
> > I think we can introduce a new config option
> > `taskmanager.load-balance.mode`,
> > which accepts "None"/"Slots"/"Tasks". `cluster.evenly-spread-out-slots`
> > can be superseded by the "Slots" mode and get deprecated. In the future
> > it can support more mode, e.g. "CpuCores", to work better for jobs with
> > fine-grained resources. The proposed config option
> > `slot.request.max-interval`
> > then can be renamed to
> `taskmanager.load-balance.request-stablizing-timeout`
> > to show its relation with the feature. The proposed
> `slot.sharing-strategy`
> > is not needed, because the configured "Tasks" mode will do the work.
>
> The new proposed configuration option sounds good to me.
>
> I have a small question, If we set our configuration value to 'Tasks,' it
> will initiate two processes: balancing the allocation of task quantities at
> the slot level and balancing the number of tasks across TaskManagers (TMs).
> Alternatively, if we configure it as 'Slots,' the system will employ the
> LocalPreferred allocation policy (which is the default) when assigning
> tasks to slots, and it will ensure that the number of slots used across TMs
> is balanced.
> Does  this configuration essentially combine a balanced selection strategy
> across two dimensions into fixed configuration items, right?
>
> I would appreciate it if you could correct me if I've made any errors.
>
> Best,
> Yuepeng.
>

Reply via email to