The design looks good to me.

But the picture you provided doesn't seem to display properly.

Jia Fan <[email protected]> 于2024年12月4日周三 09:45写道:

> Thanks shenghang!
> The design looks good to me.
>
> zhangshenghang <[email protected]> 于2024年12月3日周二 20:52写道:
>
> > Hi Seatunnel member,
> >
> > I would like to discuss the optimization plan for the Seatunnel engine
> > task scheduling strategy:
> >
> > Currently, our task slot allocation strategy is: Random.
> >
> > We plan to add two new scheduling strategies:
> >
> >    1.
> >
> >    SLOT_RATIO
> >    2.
> >
> >    SYSTEM_LOAD
> >
> > Detailed PlanSLOT_RATIO
> >
> > This strategy schedules based on the usage rate of the worker's slots.
> > Slots with lower usage rates will have higher priority.
> >
> > *Calculation Logic*:
> >
> >    1.
> >
> >    Obtain the total number of worker slots.
> >    2.
> >
> >    Get the number of unallocated slots.
> >    3.
> >
> >    Usage rate = (Total slots - Unallocated slots) / Total slots.
> >
> > SYSTEM_LOAD
> >
> > *Weight Distribution and Calculation Explanation*
> >
> >    -
> >
> >    *Time Weight Design*: The time weight distribution is 4, 2, 2, 1, 1,
> >    and it can be normalized to maintain consistency in the total. The
> weight
> >    for each time period is calculated as:
> >    [image: image.png]
> >
> >
> >    -
> >
> >       The weight for the most recent time is 0.4, 0.2 for three minutes
> >       ago, and so on.
> >       -
> >
> >    *CPU and Memory Resource Contribution*: The CPU and memory utilization
> >    rates are combined with their respective weights to calculate the
> >    credibility of the system resource utilization. The formula is:
> >    [image: image.png]
> >
> >    -
> >
> >    *Time Decay Factor*: The comprehensive resource utilization rate is
> >    multiplied by the corresponding time weight after each calculation to
> >    obtain a time-weighted average.
> >
> > *Overall Scheduling Formula* The calculation formula for the overall
> > scheduling priority is integrated as follows:
> >
> > [image: image.png]
> > [image: image.png]
> > *Implementation Logic*
> >
> >    -
> >
> >    *Data Collection*:
> >    -
> >
> >       Collect CPU and memory utilization every 3 minutes, storing the
> >       last 5 statistics.
> >       -
> >
> >       Each time collection binds the data to the corresponding time
> >       weight.
> >       -
> >
> >    *Priority Calculation*:
> >    -
> >
> >       Based on the collected CPU and memory utilization, calculate the
> >       scheduling priority for each instance using the formula.
> >       -
> >
> >       Use the calculated result as the core basis for load distribution.
> >       -
> >
> >    *Dynamic Adjustment*:
> >    -
> >
> >       Use a sliding window to update the most recent 5 statistics.
> >       -
> >
> >       Reduce the weight of older data to better adapt to the latest load
> >       changes.
> >
> > *Example Data Calculation*
> >
> >    -
> >
> >    Assume the CPU and memory utilization rates for 5 instances are as
> >    follows:
> >    [image: image.png]
> >    -
> >
> >    The CPU and memory weight configurations are both 0.5, and the time
> >    weights are [0.4, 0.2, 0.2, 0.1, 0.1].
> >    -
> >
> >    The corresponding scheduling priority is calculated as:
> >
> >    [image: image.png]
> >
> >    -
> >
> >    The final result is the scheduling priority value, which can be used
> >    for load distribution.
> >
> > Looking forward to your suggestions.
> >
> > You can also discuss it in the issue:
> > https://github.com/apache/seatunnel/issues/8205
> >
> >
> >
> > Regards,
> > Jast (Shenghang)
> >
>


-- 
Warm Regards,

Leonard(LiFeng Nie)

Reply via email to