On 07/26/2018 07:14 PM, Valentin Schneider wrote:
Hi,

On 09/07/18 16:08, Morten Rasmussen wrote:
On Fri, Jul 06, 2018 at 12:18:27PM +0200, Vincent Guittot wrote:
Hi Morten,

On Wed, 4 Jul 2018 at 12:18, Morten Rasmussen <[email protected]> wrote:

[...]

With that out of the way, I did some lmbench runs:
lat_mem_rd 10 1024

With ASYM_PACKING, I still see lmbench tasks remaining on LITTLE CPUs while
bigs are free, because ASYM_PACKING only does explicit active balancing on
CPU_NEWLY_IDLE balancing - otherwise it'll rely on the nr_balance_failed 
counter.

However, that counter can be reset before it reaches the threshold at which
active balance is done, which can lead to huge upmigration delays (almost a
full second). I also see the same kind of issues on Juno r0.

This could be resolved by extending ASYM_PACKING active balancing to
non NEWLY_IDLE cases, but then we'd be thrashing everything. That's another
argument for basing upmigration on task load-tracking signals, as we can
determine which tasks need active balancing much faster than the
nr_balance_failed counter way while not active balancing the world.

The task layout of the test looks like n=85 always running tasks (each for ~ 1.25ms on big or little) and they all get created and run one after the other. So on a big cpu, their util values go from 512 to 1024 and from 223 to 446 on little cpu (Juno board). Latter thanks to Quentin's 'sched/fair: Fix util_avg of new tasks for asymmetric systems'.

root@juno:~# cat /sys/devices/system/cpu/cpu[01]/cpu_capacity
446
1024

(lat_mem_rd 10 1024) with ASYM_PACKING:

...
4.0 148.66   <-----
4.5 10.191
...
7.5 10.203
8.0 154.354   <-----

I ran the test affine to big, little and all cpus on tip/sched/core w/o ASYM_PACKING or Misfit:

cputype:     big  little     all
cpumask:    0x06    0x39    0xff

mem size   <---- latency  ---->

 0.00098   3.668   3.595   3.669
 0.00195   3.668   3.594   3.594
 0.00293   3.668   3.593   3.595
 0.00391   3.669   3.596   3.595
 ...
 3.75000  58.687 121.934 122.293
 4.00000  57.054 121.771 120.489
 4.50000  56.914 121.851  56.729
 5.00000  57.347 121.777  56.975
 5.50000  57.705 121.738  68.981
 6.00000  57.935 121.728  57.542
 6.50000  58.119 121.694 121.799
 7.00000  58.194 121.502  57.844
 7.50000  58.258 121.684  58.050
 8.00000  58.293 121.725  58.030
 9.00000  58.309 121.793  58.188
10.00000  58.561 122.252 122.078

There is no diff between big and little cpus with small memory sizes, just with the MB range. If I look into the trace for 'all' it turns out that their are cases in which, even if the task only run for ~15% of the time on big, the latency value is printed as when it was running affine to big. So using the latency value as an indicator where the task was scheduled is IMHO not really possible.

Reply via email to