On Aug 25, 2016 9:06 PM, "Rik van Riel" <r...@redhat.com> wrote: > > Subject: x86,mm,sched: make lazy TLB mode even lazier > > Lazy TLB mode can result in an idle CPU being woken up for a TLB > flush, when all it really needed to do was flush %cr3 before the > next context switch. > > This is mostly fine on bare metal, though sub-optimal from a power > saving point of view, and deeper C states could make TLB flushes > take a little longer than desired. > > On virtual machines, the pain can be much worse, especially if a > currently non-running VCPU is woken up for a TLB invalidation > IPI, on a CPU that is busy running another task. It could take > a while before that IPI is handled, leading to performance issues. > > This patch is still ugly, and the sched.h include needs to be cleaned > up a lot (how would the scheduler people like to see the context switch > blocking abstracted?) > > This patch deals with the issue by introducing a third tlb state, > TLBSTATE_FLUSH, which causes %cr3 to be flushed at the next > context switch. A CPU is transitioned from TLBSTATE_LAZY to > TLBSTATE_FLUSH with the rq lock held, to prevent context switches. > > Nothing is done for a CPU that is already in TLBSTATE_FLUH mode. > > This patch is totally untested, because I am at a conference right > now, and Benjamin has the test case :) >
I haven't had a chance to seriously read the code yet, but what happens when the mm is deleted outright? Or is the idea that a reference is held until all the lazy users are gone, too? On PCID systems (still need to get that code upstream...), I wonder if we could go the other way and stop being lazy, as cr3 writes can be much faster. --Andy