On Wed, 6 May 2020 at 12:29, Valentin Schneider <valentin.schnei...@arm.com> wrote: > > > On 05/05/20 15:27, Vincent Guittot wrote: > > So I would be in favor of something as simple as : > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 04098d678f3b..e028bc1c4744 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -10457,6 +10457,14 @@ static bool _nohz_idle_balance(struct rq *this_rq, > > unsigned int flags, > > } > > } > > > > + /* > > + * next_balance will be updated only when there is a need. > > + * When the CPU is attached to null domain for ex, it will not be > > + * updated. > > + */ > > + if (likely(update_next_balance)) > > + nohz.next_balance = next_balance; > > + > > /* Newly idle CPU doesn't need an update */ > > if (idle != CPU_NEWLY_IDLE) { > > update_blocked_averages(this_cpu); > > @@ -10477,14 +10485,6 @@ static bool _nohz_idle_balance(struct rq *this_rq, > > unsigned int flags, > > if (has_blocked_load) > > WRITE_ONCE(nohz.has_blocked, 1); > > > > - /* > > - * next_balance will be updated only when there is a need. > > - * When the CPU is attached to null domain for ex, it will not be > > - * updated. > > - */ > > - if (likely(update_next_balance)) > > - nohz.next_balance = next_balance; > > - > > return ret; > > } > > > > But then we may skip an update if we goto abort, no? Imagine we have just > NOHZ_STATS_KICK, so we don't call any rebalance_domains(), and then as we > go through the last NOHZ CPU in the loop we hit need_resched(). We would > end in the abort part without any update to nohz.next_balance, despite > having accumulated relevant data in the local next_balance variable.
Yes but on the other end, the last CPU has not been able to run the rebalance_domain so we must not move nohz.next_balance otherwise it will have to wait for at least another full period In fact, I think that we have a problem with current implementation because if we abort because local cpu because busy we might end up skipping idle load balance for a lot of idle CPUs As an example, imagine that we have 10 idle CPUs with the same rq->next_balance which equal nohz.next_balance. _nohz_idle_balance starts on CPU0, it processes idle lb for CPU1 but then has to abort because of need_resched. If we update nohz.next_balance like currently, the next idle load balance will happen after a full balance interval whereas we still have 8 CPUs waiting for running an idle load balance. My proposal also fixes this problem > > Also note that in this case, nohz_idle_balance() will still return true. > > If we rip out just the one update we need from rebalance_domains(), then > perhaps we could go with what Peng was initially suggesting? i.e. something > like the below. > > --- > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 46b7bd41573f..0a292e0a0731 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -9934,22 +9934,8 @@ static void rebalance_domains(struct rq *rq, enum > cpu_idle_type idle) > * When the cpu is attached to null domain for ex, it will not be > * updated. > */ > - if (likely(update_next_balance)) { > + if (likely(update_next_balance)) > rq->next_balance = next_balance; > - > -#ifdef CONFIG_NO_HZ_COMMON > - /* > - * If this CPU has been elected to perform the nohz idle > - * balance. Other idle CPUs have already rebalanced with > - * nohz_idle_balance() and nohz.next_balance has been > - * updated accordingly. This CPU is now running the idle load > - * balance for itself and we need to update the > - * nohz.next_balance accordingly. > - */ > - if ((idle == CPU_IDLE) && time_after(nohz.next_balance, > rq->next_balance)) > - nohz.next_balance = rq->next_balance; > -#endif > - } > } > > static inline int on_null_domain(struct rq *rq) > @@ -10315,6 +10301,11 @@ static bool _nohz_idle_balance(struct rq *this_rq, > unsigned int flags, > if (flags & NOHZ_BALANCE_KICK) > rebalance_domains(this_rq, CPU_IDLE); > > + if (time_after(next_balance, this_rq->next_balance)) { > + next_balance = this_rq->next_balance; > + update_next_balance = 1; > + } > + > WRITE_ONCE(nohz.next_blocked, > now + msecs_to_jiffies(LOAD_AVG_PERIOD)); > > @@ -10551,6 +10542,17 @@ static __latent_entropy void > run_rebalance_domains(struct softirq_action *h) > /* normal load balance */ > update_blocked_averages(this_rq->cpu); > rebalance_domains(this_rq, idle); > + > +#ifdef CONFIG_NO_HZ_COMMON > + /* > + * NOHZ idle CPUs will be rebalanced with nohz_idle_balance() and thus > + * nohz.next_balance will be updated accordingly. If there was no NOHZ > + * kick, then we just need to update nohz.next_balance wrt *this* CPU. > + */ > + if ((idle == CPU_IDLE) && > + time_after(nohz.next_balance, this_rq->next_balance)) > + nohz.next_balance = this_rq->next_balance; > +#endif > } > > /* > ---