On Wed, Sep 20, 2017 at 2:33 AM, Brendan Jackman <brendan.jack...@arm.com> wrote: > > On Wed, Sep 20 2017 at 05:06, Joel Fernandes wrote: >>> On Tue, Sep 19, 2017 at 3:05 AM, Brendan Jackman >>> <brendan.jack...@arm.com> wrote: >>>> On Mon, Sep 18 2017 at 22:15, Joel Fernandes wrote: >> [..] >>>>>> IIUC, if wake_affine() behaves correctly this trick wouldn't be >>>>>> necessary on SMP systems, so it might be best guarded by the presence >>>>> >>>>> Actually wake_affine doesn't check for balance if previous/next cpu >>>>> are within the same shared cache domain. The difference is some time >>>>> ago it would return true for shared cache but now it returns false as >>>>> of 4.14-rc1: >>>>> http://elixir.free-electrons.com/linux/v4.14-rc1/source/kernel/sched/fair.c#L5466 >>>>> >>>>> Since it would return false in the above wake up cases for task 1 and >>>>> 2, it would then run select_idle_sibling on the previous CPU which is >>>>> also within the big cluster, so I don't think it will make a >>>>> difference in this case... Infact what it returns probably doesn't >>>>> matter. >>>> >>>> So my paragraph here was making a leap in reasoning, let me try to fill >>>> the gap: On SMP these tasks never need to move around. If by some chance >>>> they did get coscheduled, the first load balance would spread them out and >>>> then every time they wake up from then on, prev_cpu is the sensible >>>> choice. So it will look something like: >>>> >>>> v CPU v ->time-> >>>> >>>> ------------- >>>> { 0 (SAME) 11111111111 >>>> cache { ------------- >>>> { 1 (SAME) 222222222222| >>>> ------------- >>>> { 2 (SAME) 33333333333 >>>> cache { ------------- >>>> { 3 (SAME) 44444444444 >>>> ------------- >>>> >>>> So here, task 2 wakes up the other guys and when it's doing tasks 3 and >>>> 4, prev_cpu and smp_processor_id() don't share a cache, so IIUC its' >>>> basically wake_affine's job to decide between prev_cpu and >>>> smp_processor_id(). So "if wake_affine is behaving correctly" the >>>> problem that this patch aims to solve (i.e. the fact that we overload >>>> the waker's LLC domain because of bias towards prev_cpu) does not arise >>>> on SMP. >>>> >>> >>> Yes SMP, but your patch is for solving a problem for non-SMP. So your >>> original statement about wake_affine solving any problem for SMP is >>> not relevant I feel :-P. I guess you can just kill this para from the >>> commit message to prevent confusion. >> >> Ok I take that back, you were talking about guarding this feature by >> the SD_ASYM_CPUCAPACITY flag. >> >> I don't think that protection would be helpful because you can have >> the same issue if the tasks do different amount of work on SMP. So in >> that case some threads might still complete before the others and you >> run into the same thing. > > Well assuming we're still talking about one task per CPU, if you have > tasks doing different amount of work there's still no reason to move the > longer-running threads around. The only reason that happens in my > example is because of the asym capacity.
Yes but you can very well have RT pressure and things that temporarily change the capacity equality. Also this is a simple benchmark and for any reason you have more than 1 task running on those other CPUs and then the idle CPUs run some of the tasks and you run into a similar situation that might need your patch.. Also one more note, the SD_ASYM_CPUCAPACITY protection is still not needed because SD_BALANCE_WAKE isn't turned on for !SD_ASYM_CPUCAPACITY from what I learnt from discussions with Mike, I believe its this piece of code in sd_init that actually enables it: if (sd->flags & SD_ASYM_CPUCAPACITY) { struct sched_domain *t = sd; for_each_lower_domain(t) t->flags |= SD_BALANCE_WAKE; } thanks, - Joel