On Fri, Mar 22, 2019 at 9:34 AM Peter Zijlstra <[email protected]> wrote: > On Thu, Mar 21, 2019 at 05:20:17PM -0400, Julien Desfossez wrote: > > On further investigation, we could see that the contention is mostly in > the > > way rq locks are taken. With this patchset, we lock the whole core if > > cpu.tag is set for at least one cgroup. Due to this, __schedule() is > more or > > less serialized for the core and that attributes to the performance loss > > that we are seeing. We also saw that newidle_balance() takes considerably > > long time in load_balance() due to the rq spinlock contention. Do you > think > > it would help if the core-wide locking was only performed when absolutely > > needed ? > > Something like that could be done, but then you end up with 2 locks, > something which I was hoping to avoid. > > Basically you keep rq->lock as it exists today, but add something like > rq->core->core_lock, you then have to take that second lock (nested > under rq->lock) for every scheduling action involving a tagged task. > > It makes things complicatd though; because now my head hurts thikning > about pick_next_task(). > > (this can obviously do away with the whole rq->lock wrappery) > > Also, completely untested..
We tried it and it dies within 30ms of enabling the tag on 2 VMs :-) Now after trying to debug this my head hurts as well ! We'll continue trying to figure this out, but if you want to take a look, the full dmesg is here: https://paste.debian.net/plainh/0b8f87f3 Thanks, Julien

