On 02/16/2018 01:44 PM, Vincent Guittot wrote:
> On 16 February 2018 at 13:13, Valentin Schneider
> <valentin.schnei...@arm.com> wrote:
>> On 02/14/2018 03:26 PM, Vincent Guittot wrote:
>>> Stopped the periodic update of blocked load when all idle CPUs have fully
>>> decayed. We introduce a new nohz.has_blocked that reflect if some idle
>>> CPUs has blocked load that have to be periodiccally updated. 
>>> nohz.has_blocked
>>> is set everytime that a Idle CPU can have blocked load and it is then clear
>>> when no more blocked load has been detected during an update. We don't need
>>> atomic operation but only to make cure of the right ordering when updating
>>> nohz.idle_cpus_mask and nohz.has_blocked.
>>>
>>> Suggested-by: Peter Zijlstra (Intel) <pet...@infradead.org>
>>> Signed-off-by: Vincent Guittot <vincent.guit...@linaro.org>
>>> ---
>>>  kernel/sched/fair.c  | 122 
>>> ++++++++++++++++++++++++++++++++++++++++++---------
>>>  kernel/sched/sched.h |   1 +
>>>  2 files changed, 102 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 7af1fa9..5a6835e 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>>
>>> [...]
>>> @@ -9383,11 +9452,16 @@ static bool nohz_idle_balance(struct rq *this_rq, 
>>> enum cpu_idle_type idle)
>>>                * work being done for other cpus. Next load
>>>                * balancing owner will pick it up.
>>>                */
>>> -             if (need_resched())
>>> -                     break;
>>> +             if (need_resched()) {
>>> +                     has_blocked_load = true;
>>> +                     goto abort;
>>> +             }
>>>
>>>               rq = cpu_rq(balance_cpu);
>>>
>>
>> I'd say it's safe to do the following here. The flag is raised in
>> nohz_balance_enter_idle() before the smp_mb(), so we won't skip a CPU
>> that just got added to nohz.idle_cpus_mask.
> 
> rq->has_blocked_load will be set before the barrier only if
> nohz_tick_stopped is not already set,
> Otherwise, we skip cpumask update and the barrier in  nohz_balance_enter_idle
> 

Right, forgot about that bit. I think it's still fine because:
- nohz_balance_enter_idle() can't be called before the last running task is
dequeued, which requires the rq lock.
- update_blocked_averages takes the rq lock and clears rq->has_blocked_load
with the lock held.

So even though we could have some very unlikely scenario where a CPU quickly
goes out/in of idle after nohz.idle_cpus_mask has been read, the blocked load
itself is safe so rq->has_blocked_load will end up being set correctly.
(Took me a while to see it that way)


BTW, with the current set on Peter's sched/testing, update_nohz_stats()
is called here, which doesn't do the update if !rq->has_blocked_load
(Although that check is done without lock/barrier, so maybe we could not see
a CPU that just went idle ?)

I have one more question on that bit:


                has_blocked_load |= update_nohz_stats(rq, true);

                /*
                 * If time for next balance is due,
                 * do the balance.
                 */
                if (time_after_eq(jiffies, rq->next_balance)) {
                        struct rq_flags rf;

                        rq_lock_irqsave(rq, &rf);
                        update_rq_clock(rq);
                        cpu_load_update_idle(rq);
                        rq_unlock_irqrestore(rq, &rf);

                        if (flags & NOHZ_BALANCE_KICK)
                                rebalance_domains(rq, CPU_IDLE);
                }

                if (time_after(next_balance, rq->next_balance)) {
                        next_balance = rq->next_balance;
                        update_next_balance = 1;
                }


Now that I think about it, shouldn't we always have a 'continue' after
the blocked load update if (flags & NOHZ_KICK_MASK) == NOHZ_STATS_KICK ?
AFAICT we don't want to push the next_balance forward, only the next_blocked.
That would also take care of not doing the load balance.
>>
>>                 /*
>>                  * This cpu doesn't have any remaining blocked load, skip it.
>>                  * It's sane to do this because this flag is raised in
>>                  * nohz_balance_enter_idle()
>>                  */
>>                 if ((flags & NOHZ_KICK_MASK) == NOHZ_STATS_KICK &&
>>                     !rq->has_blocked_load)
>>                         continue;
>>
>>> +             update_blocked_averages(rq->cpu);
>>> +             has_blocked_load |= rq->has_blocked_load;
>>> +

Reply via email to