Thanks shenghang! The design looks good to me. zhangshenghang <[email protected]> 于2024年12月3日周二 20:52写道:
> Hi Seatunnel member, > > I would like to discuss the optimization plan for the Seatunnel engine > task scheduling strategy: > > Currently, our task slot allocation strategy is: Random. > > We plan to add two new scheduling strategies: > > 1. > > SLOT_RATIO > 2. > > SYSTEM_LOAD > > Detailed PlanSLOT_RATIO > > This strategy schedules based on the usage rate of the worker's slots. > Slots with lower usage rates will have higher priority. > > *Calculation Logic*: > > 1. > > Obtain the total number of worker slots. > 2. > > Get the number of unallocated slots. > 3. > > Usage rate = (Total slots - Unallocated slots) / Total slots. > > SYSTEM_LOAD > > *Weight Distribution and Calculation Explanation* > > - > > *Time Weight Design*: The time weight distribution is 4, 2, 2, 1, 1, > and it can be normalized to maintain consistency in the total. The weight > for each time period is calculated as: > [image: image.png] > > > - > > The weight for the most recent time is 0.4, 0.2 for three minutes > ago, and so on. > - > > *CPU and Memory Resource Contribution*: The CPU and memory utilization > rates are combined with their respective weights to calculate the > credibility of the system resource utilization. The formula is: > [image: image.png] > > - > > *Time Decay Factor*: The comprehensive resource utilization rate is > multiplied by the corresponding time weight after each calculation to > obtain a time-weighted average. > > *Overall Scheduling Formula* The calculation formula for the overall > scheduling priority is integrated as follows: > > [image: image.png] > [image: image.png] > *Implementation Logic* > > - > > *Data Collection*: > - > > Collect CPU and memory utilization every 3 minutes, storing the > last 5 statistics. > - > > Each time collection binds the data to the corresponding time > weight. > - > > *Priority Calculation*: > - > > Based on the collected CPU and memory utilization, calculate the > scheduling priority for each instance using the formula. > - > > Use the calculated result as the core basis for load distribution. > - > > *Dynamic Adjustment*: > - > > Use a sliding window to update the most recent 5 statistics. > - > > Reduce the weight of older data to better adapt to the latest load > changes. > > *Example Data Calculation* > > - > > Assume the CPU and memory utilization rates for 5 instances are as > follows: > [image: image.png] > - > > The CPU and memory weight configurations are both 0.5, and the time > weights are [0.4, 0.2, 0.2, 0.1, 0.1]. > - > > The corresponding scheduling priority is calculated as: > > [image: image.png] > > - > > The final result is the scheduling priority value, which can be used > for load distribution. > > Looking forward to your suggestions. > > You can also discuss it in the issue: > https://github.com/apache/seatunnel/issues/8205 > > > > Regards, > Jast (Shenghang) >
