On Thu, 29 Oct 2020 at 15:30, Vincent Guittot
<vincent.guit...@linaro.org> wrote:
>
> On Thu, 29 Oct 2020 at 15:19, Vincent Guittot
> <vincent.guit...@linaro.org> wrote:
> >
> > On Thu, 29 Oct 2020 at 12:16, Valentin Schneider
> > <valentin.schnei...@arm.com> wrote:
> > >
> > >
> > > Hi Vincent,
> > >
> > > On 28/10/20 17:44, Vincent Guittot wrote:
> > > > During fast wakeup path, scheduler always check whether local or prev 
> > > > cpus
> > > > are good candidates for the task before looking for other cpus in the
> > > > domain. With
> > > >   commit b7a331615d25 ("sched/fair: Add asymmetric CPU capacity wakeup 
> > > > scan")
> > > > the heterogenous system gains a dedicated path but doesn't try to reuse
> > > > prev cpu whenever possible. If the previous cpu is idle and belong to 
> > > > the
> > > > LLC domain, we should check it 1st before looking for another cpu 
> > > > because
> > > > it stays one of the best candidate and this also stabilizes task 
> > > > placement
> > > > on the system.
> > > >
> > > > This change aligns asymmetric path behavior with symmetric one and 
> > > > reduces
> > > > cases where the task migrates across all cpus of the sd_asym_cpucapacity
> > > > domains at wakeup.
> > > >
> > > > This change does not impact normal EAS mode but only the overloaded 
> > > > case or
> > > > when EAS is not used.
> > > >
> > > > - On hikey960 with performance governor (EAS disable)
> > > >
> > > > ./perf bench sched pipe -T -l 50000
> > > >              mainline           w/ patch
> > > > # migrations   999364                  0
> > > > ops/sec        149313(+/-0.28%)   182587(+/- 0.40) +22%
> > > >
> > > > - On hikey with performance governor
> > > >
> > > > ./perf bench sched pipe -T -l 50000
> > > >              mainline           w/ patch
> > > > # migrations        0                  0
> > > > ops/sec         47721(+/-0.76%)    47899(+/- 0.56) +0.4%
> > > >
> > > > According to test on hikey, the patch doesn't impact symmetric system
> > > > compared to current implementation (only tested on arm64)
> > > >
> > > > Also read the uclamped value of task's utilization at most twice instead
> > > > instead each time we compare task's utilization with cpu's capacity.
> > > >
> > > > Fixes: b7a331615d25 ("sched/fair: Add asymmetric CPU capacity wakeup 
> > > > scan")
> > > > Signed-off-by: Vincent Guittot <vincent.guit...@linaro.org>
> > >
> > > Other than the below, I quite like this!
> > >
> > > > ---
> > > > Changes in v2:
> > > > - merge asymmetric and symmetric path instead of duplicating tests on 
> > > > target,
> > > >   prev and other special cases.
> > > >
> > > > - factorize call to uclamp_task_util(p) and use fits_capacity(). This 
> > > > could
> > > >   explain part of the additionnal improvement compared to v1 (+22% 
> > > > instead of
> > > >   +17% on v1).
> > > >
> > > > - Keep using LLC instead of asym domain for early check of target, prev 
> > > > and
> > > >   recent_used_cpu to ensure cache sharing between the task. This doesn't
> > > >   change anything for dynamiQ but will ensure same cache for legacy 
> > > > big.LITTLE
> > > >   and also simply the changes.
> > > >
> > >
> > > On legacy big.LITTLE systems, sd_asym_cpucapacity spans all CPUs, so we
> > > would iterate over those in select_idle_capacity() anyway - the policy
> > > we've been going for is that capacity fitness trumps cache use.
> > >
> > > This does require the system to have a decent interconnect, cache snooping
> > > & co, but that is IMO a requirement of any sane asymmetric system.
> > >
> > > To put words into code, this is the kind of check I would see:
> > >
> > >   if (static_branch_unlikely(&sched_asym_cpucapacity))
> > >         return fits_capacity(task_util, capacity_of(cpu));
> > >   else
> >
> > You can't make the shortcut that prev will always belong to the domain
> > so you have to check that prev belongs to the sd_asym_cpucapacity.
> > Even if it's true with current mobile Soc, This code is generic core
> > code and must handle any kind of funny topology than HW guys could
> > imagine
>
> We would have something like that
>
> static inline bool cpus_share_domain(int this_cpu, int that_cpu)
> {
>     if (static_branch_unlikely(&sched_asym_cpucapacity))
>         return per_cpu(sd_asym_cpucapacity, this_cpu) ==
> per_cpu(sd_asym_cpucapacity, that_cpu);

hmm this doesn't work

>
>     return cpus_share_cache(this_cpu, that_cpu);
> }
>
> >
> > >         return cpus_share_cache(cpu, other);
> > >
> > > > - don't check capacity for the per-cpu kthread UC because the 
> > > > assumption is
> > > >   that the wakee queued work for the per-cpu kthread that is now 
> > > > complete and
> > > >   the task was already on this cpu.
> > > >
> > > > - On an asymmetric system where an exclusive cpuset defines a symmetric 
> > > > island,
> > > >   task's load is synced and tested although it's not needed. But taking 
> > > > care of
> > > >   this special case by testing if sd_asym_cpucapacity is not null 
> > > > impacts by
> > > >   more than 4% the performance of default sched_asym_cpucapacity path.
> > > >
> > > > - The huge increase of the number of migration for hikey960 mainline 
> > > > comes from
> > > >   teh fact that the ftrace buffer was overloaded by events in the tests 
> > > > done
> > > >   with v1.

Reply via email to