Hi Seatunnel member,
I would like to discuss the optimization plan for the Seatunnel engine task
scheduling strategy:
Currently, our task slot allocation strategy is: Random.
We plan to add two new scheduling strategies:
1.
SLOT_RATIO
2.
SYSTEM_LOAD
Detailed PlanSLOT_RATIO
This strategy schedules based on the usage rate of the worker's slots.
Slots with lower usage rates will have higher priority.
*Calculation Logic*:
1.
Obtain the total number of worker slots.
2.
Get the number of unallocated slots.
3.
Usage rate = (Total slots - Unallocated slots) / Total slots.
SYSTEM_LOAD
*Weight Distribution and Calculation Explanation*
-
*Time Weight Design*: The time weight distribution is 4, 2, 2, 1, 1, and
it can be normalized to maintain consistency in the total. The weight for
each time period is calculated as:
[image: image.png]
-
The weight for the most recent time is 0.4, 0.2 for three minutes
ago, and so on.
-
*CPU and Memory Resource Contribution*: The CPU and memory utilization
rates are combined with their respective weights to calculate the
credibility of the system resource utilization. The formula is:
[image: image.png]
-
*Time Decay Factor*: The comprehensive resource utilization rate is
multiplied by the corresponding time weight after each calculation to
obtain a time-weighted average.
*Overall Scheduling Formula* The calculation formula for the overall
scheduling priority is integrated as follows:
[image: image.png]
[image: image.png]
*Implementation Logic*
-
*Data Collection*:
-
Collect CPU and memory utilization every 3 minutes, storing the last
5 statistics.
-
Each time collection binds the data to the corresponding time weight.
-
*Priority Calculation*:
-
Based on the collected CPU and memory utilization, calculate the
scheduling priority for each instance using the formula.
-
Use the calculated result as the core basis for load distribution.
-
*Dynamic Adjustment*:
-
Use a sliding window to update the most recent 5 statistics.
-
Reduce the weight of older data to better adapt to the latest load
changes.
*Example Data Calculation*
-
Assume the CPU and memory utilization rates for 5 instances are as
follows:
[image: image.png]
-
The CPU and memory weight configurations are both 0.5, and the time
weights are [0.4, 0.2, 0.2, 0.1, 0.1].
-
The corresponding scheduling priority is calculated as:
[image: image.png]
-
The final result is the scheduling priority value, which can be used for
load distribution.
Looking forward to your suggestions.
You can also discuss it in the issue:
https://github.com/apache/seatunnel/issues/8205
Regards,
Jast (Shenghang)