The design looks good to me. But the picture you provided doesn't seem to display properly.
Jia Fan <[email protected]> 于2024年12月4日周三 09:45写道: > Thanks shenghang! > The design looks good to me. > > zhangshenghang <[email protected]> 于2024年12月3日周二 20:52写道: > > > Hi Seatunnel member, > > > > I would like to discuss the optimization plan for the Seatunnel engine > > task scheduling strategy: > > > > Currently, our task slot allocation strategy is: Random. > > > > We plan to add two new scheduling strategies: > > > > 1. > > > > SLOT_RATIO > > 2. > > > > SYSTEM_LOAD > > > > Detailed PlanSLOT_RATIO > > > > This strategy schedules based on the usage rate of the worker's slots. > > Slots with lower usage rates will have higher priority. > > > > *Calculation Logic*: > > > > 1. > > > > Obtain the total number of worker slots. > > 2. > > > > Get the number of unallocated slots. > > 3. > > > > Usage rate = (Total slots - Unallocated slots) / Total slots. > > > > SYSTEM_LOAD > > > > *Weight Distribution and Calculation Explanation* > > > > - > > > > *Time Weight Design*: The time weight distribution is 4, 2, 2, 1, 1, > > and it can be normalized to maintain consistency in the total. The > weight > > for each time period is calculated as: > > [image: image.png] > > > > > > - > > > > The weight for the most recent time is 0.4, 0.2 for three minutes > > ago, and so on. > > - > > > > *CPU and Memory Resource Contribution*: The CPU and memory utilization > > rates are combined with their respective weights to calculate the > > credibility of the system resource utilization. The formula is: > > [image: image.png] > > > > - > > > > *Time Decay Factor*: The comprehensive resource utilization rate is > > multiplied by the corresponding time weight after each calculation to > > obtain a time-weighted average. > > > > *Overall Scheduling Formula* The calculation formula for the overall > > scheduling priority is integrated as follows: > > > > [image: image.png] > > [image: image.png] > > *Implementation Logic* > > > > - > > > > *Data Collection*: > > - > > > > Collect CPU and memory utilization every 3 minutes, storing the > > last 5 statistics. > > - > > > > Each time collection binds the data to the corresponding time > > weight. > > - > > > > *Priority Calculation*: > > - > > > > Based on the collected CPU and memory utilization, calculate the > > scheduling priority for each instance using the formula. > > - > > > > Use the calculated result as the core basis for load distribution. > > - > > > > *Dynamic Adjustment*: > > - > > > > Use a sliding window to update the most recent 5 statistics. > > - > > > > Reduce the weight of older data to better adapt to the latest load > > changes. > > > > *Example Data Calculation* > > > > - > > > > Assume the CPU and memory utilization rates for 5 instances are as > > follows: > > [image: image.png] > > - > > > > The CPU and memory weight configurations are both 0.5, and the time > > weights are [0.4, 0.2, 0.2, 0.1, 0.1]. > > - > > > > The corresponding scheduling priority is calculated as: > > > > [image: image.png] > > > > - > > > > The final result is the scheduling priority value, which can be used > > for load distribution. > > > > Looking forward to your suggestions. > > > > You can also discuss it in the issue: > > https://github.com/apache/seatunnel/issues/8205 > > > > > > > > Regards, > > Jast (Shenghang) > > > -- Warm Regards, Leonard(LiFeng Nie)
