On 02/16/2018 01:44 PM, Vincent Guittot wrote: > On 16 February 2018 at 13:13, Valentin Schneider > <valentin.schnei...@arm.com> wrote: >> On 02/14/2018 03:26 PM, Vincent Guittot wrote: >>> Stopped the periodic update of blocked load when all idle CPUs have fully >>> decayed. We introduce a new nohz.has_blocked that reflect if some idle >>> CPUs has blocked load that have to be periodiccally updated. >>> nohz.has_blocked >>> is set everytime that a Idle CPU can have blocked load and it is then clear >>> when no more blocked load has been detected during an update. We don't need >>> atomic operation but only to make cure of the right ordering when updating >>> nohz.idle_cpus_mask and nohz.has_blocked. >>> >>> Suggested-by: Peter Zijlstra (Intel) <pet...@infradead.org> >>> Signed-off-by: Vincent Guittot <vincent.guit...@linaro.org> >>> --- >>> kernel/sched/fair.c | 122 >>> ++++++++++++++++++++++++++++++++++++++++++--------- >>> kernel/sched/sched.h | 1 + >>> 2 files changed, 102 insertions(+), 21 deletions(-) >>> >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >>> index 7af1fa9..5a6835e 100644 >>> --- a/kernel/sched/fair.c >>> +++ b/kernel/sched/fair.c >>> >>> [...] >>> @@ -9383,11 +9452,16 @@ static bool nohz_idle_balance(struct rq *this_rq, >>> enum cpu_idle_type idle) >>> * work being done for other cpus. Next load >>> * balancing owner will pick it up. >>> */ >>> - if (need_resched()) >>> - break; >>> + if (need_resched()) { >>> + has_blocked_load = true; >>> + goto abort; >>> + } >>> >>> rq = cpu_rq(balance_cpu); >>> >> >> I'd say it's safe to do the following here. The flag is raised in >> nohz_balance_enter_idle() before the smp_mb(), so we won't skip a CPU >> that just got added to nohz.idle_cpus_mask. > > rq->has_blocked_load will be set before the barrier only if > nohz_tick_stopped is not already set, > Otherwise, we skip cpumask update and the barrier in nohz_balance_enter_idle >
Right, forgot about that bit. I think it's still fine because: - nohz_balance_enter_idle() can't be called before the last running task is dequeued, which requires the rq lock. - update_blocked_averages takes the rq lock and clears rq->has_blocked_load with the lock held. So even though we could have some very unlikely scenario where a CPU quickly goes out/in of idle after nohz.idle_cpus_mask has been read, the blocked load itself is safe so rq->has_blocked_load will end up being set correctly. (Took me a while to see it that way) BTW, with the current set on Peter's sched/testing, update_nohz_stats() is called here, which doesn't do the update if !rq->has_blocked_load (Although that check is done without lock/barrier, so maybe we could not see a CPU that just went idle ?) I have one more question on that bit: has_blocked_load |= update_nohz_stats(rq, true); /* * If time for next balance is due, * do the balance. */ if (time_after_eq(jiffies, rq->next_balance)) { struct rq_flags rf; rq_lock_irqsave(rq, &rf); update_rq_clock(rq); cpu_load_update_idle(rq); rq_unlock_irqrestore(rq, &rf); if (flags & NOHZ_BALANCE_KICK) rebalance_domains(rq, CPU_IDLE); } if (time_after(next_balance, rq->next_balance)) { next_balance = rq->next_balance; update_next_balance = 1; } Now that I think about it, shouldn't we always have a 'continue' after the blocked load update if (flags & NOHZ_KICK_MASK) == NOHZ_STATS_KICK ? AFAICT we don't want to push the next_balance forward, only the next_blocked. That would also take care of not doing the load balance. >> >> /* >> * This cpu doesn't have any remaining blocked load, skip it. >> * It's sane to do this because this flag is raised in >> * nohz_balance_enter_idle() >> */ >> if ((flags & NOHZ_KICK_MASK) == NOHZ_STATS_KICK && >> !rq->has_blocked_load) >> continue; >> >>> + update_blocked_averages(rq->cpu); >>> + has_blocked_load |= rq->has_blocked_load; >>> +