Thanks shenghang!
The design looks good to me.

zhangshenghang <[email protected]> 于2024年12月3日周二 20:52写道:

> Hi Seatunnel member,
>
> I would like to discuss the optimization plan for the Seatunnel engine
> task scheduling strategy:
>
> Currently, our task slot allocation strategy is: Random.
>
> We plan to add two new scheduling strategies:
>
>    1.
>
>    SLOT_RATIO
>    2.
>
>    SYSTEM_LOAD
>
> Detailed PlanSLOT_RATIO
>
> This strategy schedules based on the usage rate of the worker's slots.
> Slots with lower usage rates will have higher priority.
>
> *Calculation Logic*:
>
>    1.
>
>    Obtain the total number of worker slots.
>    2.
>
>    Get the number of unallocated slots.
>    3.
>
>    Usage rate = (Total slots - Unallocated slots) / Total slots.
>
> SYSTEM_LOAD
>
> *Weight Distribution and Calculation Explanation*
>
>    -
>
>    *Time Weight Design*: The time weight distribution is 4, 2, 2, 1, 1,
>    and it can be normalized to maintain consistency in the total. The weight
>    for each time period is calculated as:
>    [image: image.png]
>
>
>    -
>
>       The weight for the most recent time is 0.4, 0.2 for three minutes
>       ago, and so on.
>       -
>
>    *CPU and Memory Resource Contribution*: The CPU and memory utilization
>    rates are combined with their respective weights to calculate the
>    credibility of the system resource utilization. The formula is:
>    [image: image.png]
>
>    -
>
>    *Time Decay Factor*: The comprehensive resource utilization rate is
>    multiplied by the corresponding time weight after each calculation to
>    obtain a time-weighted average.
>
> *Overall Scheduling Formula* The calculation formula for the overall
> scheduling priority is integrated as follows:
>
> [image: image.png]
> [image: image.png]
> *Implementation Logic*
>
>    -
>
>    *Data Collection*:
>    -
>
>       Collect CPU and memory utilization every 3 minutes, storing the
>       last 5 statistics.
>       -
>
>       Each time collection binds the data to the corresponding time
>       weight.
>       -
>
>    *Priority Calculation*:
>    -
>
>       Based on the collected CPU and memory utilization, calculate the
>       scheduling priority for each instance using the formula.
>       -
>
>       Use the calculated result as the core basis for load distribution.
>       -
>
>    *Dynamic Adjustment*:
>    -
>
>       Use a sliding window to update the most recent 5 statistics.
>       -
>
>       Reduce the weight of older data to better adapt to the latest load
>       changes.
>
> *Example Data Calculation*
>
>    -
>
>    Assume the CPU and memory utilization rates for 5 instances are as
>    follows:
>    [image: image.png]
>    -
>
>    The CPU and memory weight configurations are both 0.5, and the time
>    weights are [0.4, 0.2, 0.2, 0.1, 0.1].
>    -
>
>    The corresponding scheduling priority is calculated as:
>
>    [image: image.png]
>
>    -
>
>    The final result is the scheduling priority value, which can be used
>    for load distribution.
>
> Looking forward to your suggestions.
>
> You can also discuss it in the issue:
> https://github.com/apache/seatunnel/issues/8205
>
>
>
> Regards,
> Jast (Shenghang)
>

Reply via email to