On Thu, 7 May 2020 at 14:41, Peng Liu <iwtba...@gmail.com> wrote:
>
> On Wed, May 06, 2020 at 05:02:56PM +0100, Valentin Schneider wrote:
> >
> > On 06/05/20 14:45, Vincent Guittot wrote:
> > >> But then we may skip an update if we goto abort, no? Imagine we have just
> > >> NOHZ_STATS_KICK, so we don't call any rebalance_domains(), and then as we
> > >> go through the last NOHZ CPU in the loop we hit need_resched(). We would
> > >> end in the abort part without any update to nohz.next_balance, despite
> > >> having accumulated relevant data in the local next_balance variable.
> > >
> > > Yes but on the other end, the last CPU has not been able to run the
> > > rebalance_domain so we must not move  nohz.next_balance otherwise it
> > > will have to wait for at least another full period
> > > In fact, I think that we have a problem with current implementation
> > > because if we abort because  local cpu because busy we might end up
> > > skipping idle load balance for a lot of idle CPUs
> > >
> > > As an example, imagine that we have 10 idle CPUs with the same
> > > rq->next_balance which equal nohz.next_balance.  _nohz_idle_balance
> > > starts on CPU0, it processes idle lb for CPU1 but then has to abort
> > > because of need_resched. If we update nohz.next_balance like
> > > currently, the next idle load balance  will happen after a full
> > > balance interval whereas we still have 8 CPUs waiting for running an
> > > idle load balance.
> > >
> > > My proposal also fixes this problem
> > >
> >
> > That's a very good point; so with NOHZ_BALANCE_KICK we can reduce
> > nohz.next_balance via rebalance_domains(), and otherwise we would only
> > increase it if we go through a complete for_each_cpu() loop in
> > _nohz_idle_balance().
> >
> > That said, if for some reason we keep bailing out of the loop, we won't
> > push nohz.next_balance forward and thus may repeatedly nohz-balance only
> > the first few CPUs in the NOHZ mask. I think that can happen if we have
> > say 2 tasks pinned to a single rq, in that case nohz_balancer_kick() will
> > kick a NOHZ balance whenever now >= nohz.next_balance.
>
> If we face the risk of "repeatly nohz-balance only the first few CPUs",
> Maybe we could remember the interrupted CPU and start nohz-balance from
> it next time. Just replace the loop in _nohz_idle_balance() like:
>
> for_each_cpu_wrap(cpu, nohz.idle_cpus_mask, nohz.anchor) {
>         ...
>         if (need_resched()) {
>                 ...
>                 nohz.anchor = cpu;
>                 ...
>         }
>         ...
> }
>
> This can mitigate the problem, but this can't help the extreme situation

If we rerun _nohz_idle_balance before the balance interval, the 1st
idle CPUs that has already been balanced will be skipped because there
rq->next_balance will be after jiffies and we will start calling
rebalance_domains for idle CPUs which have not yet been balance.
So I'm not sure that this will help a lot because we have to go
through all idle CPU to set the nohz.next_balance at the end

> as @Vincent put, it always failed in the same CPU.

In the case that i described above, the problem comes from the cpu
that is selected to run the ilb but if we kick ilb again, it will not
be selected again.

Reply via email to