On 14/09/20 11:03, Vincent Guittot wrote: > sched domains tend to trigger simultaneously the load balance loop but > the larger domains often need more time to collect statistics. This > slowness makes the larger domain trying to detach tasks from a rq whereas > tasks already migrated somewhere else at a sub-domain level. This is not > a real problem for idle LB because the period of smaller domains will > increase with its CPUs being busy and this will let time for higher ones > to pulled tasks. But this becomes a problem when all CPUs are already busy > because all domains stay synced when they trigger their LB. > > A simple way to minimize simultaneous LB of all domains is to decrement the > the busy interval by 1 jiffies. Because of the busy_factor, the interval of > larger domain will not be a multiple of smaller ones anymore. > > Signed-off-by: Vincent Guittot <[email protected]> > --- > kernel/sched/fair.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 765be8273292..7d7eefd8e2d4 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -9780,6 +9780,9 @@ get_sd_balance_interval(struct sched_domain *sd, int > cpu_busy) > > /* scale ms to jiffies */ > interval = msecs_to_jiffies(interval);
A comment here would be nice, I think. What about: /* * Reduce likelihood of (busy) balancing at higher domains racing with * balancing at lower domains by preventing their balancing periods from being * multiples of each other. */ > + if (cpu_busy) > + interval -= 1; > + > interval = clamp(interval, 1UL, max_load_balance_interval); > > return interval;

