On August 29, 2016 4:55:02 PM PDT, Andy Lutomirski <l...@amacapital.net> wrote: >On Aug 29, 2016 7:54 AM, "Rik van Riel" <r...@redhat.com> wrote: >> >> On Sun, 2016-08-28 at 01:11 -0700, Andy Lutomirski wrote: >> > On Aug 25, 2016 9:06 PM, "Rik van Riel" <r...@redhat.com> wrote: >> > > >> > > Subject: x86,mm,sched: make lazy TLB mode even lazier >> > > >> > > Lazy TLB mode can result in an idle CPU being woken up for a TLB >> > > flush, when all it really needed to do was flush %cr3 before the >> > > next context switch. >> > > >> > > This is mostly fine on bare metal, though sub-optimal from a >power >> > > saving point of view, and deeper C states could make TLB flushes >> > > take a little longer than desired. >> > > >> > > On virtual machines, the pain can be much worse, especially if a >> > > currently non-running VCPU is woken up for a TLB invalidation >> > > IPI, on a CPU that is busy running another task. It could take >> > > a while before that IPI is handled, leading to performance >issues. >> > > >> > > This patch is still ugly, and the sched.h include needs to be >> > > cleaned >> > > up a lot (how would the scheduler people like to see the context >> > > switch >> > > blocking abstracted?) >> > > >> > > This patch deals with the issue by introducing a third tlb state, >> > > TLBSTATE_FLUSH, which causes %cr3 to be flushed at the next >> > > context switch. A CPU is transitioned from TLBSTATE_LAZY to >> > > TLBSTATE_FLUSH with the rq lock held, to prevent context >switches. >> > > >> > > Nothing is done for a CPU that is already in TLBSTATE_FLUH mode. >> > > >> > > This patch is totally untested, because I am at a conference >right >> > > now, and Benjamin has the test case :) >> > > >> > >> > I haven't had a chance to seriously read the code yet, but what >> > happens when the mm is deleted outright? Or is the idea that a >> > reference is held until all the lazy users are gone, too? >> >> Worst case we send a TLB flush to a CPU that does >> not need it. >> >> As not sending an IPI will be faster than sending >> one, I do not think the tradeoff will be much >> different for a system with PCID. > >If we were fully non-lazy, we wouldn't need to send these IPIs at all, >right? We would just keep cr3 pointing at swapper_pg_dir when not >actively using the mm. The problem with doing that without PCID is >that cr3 writes are really slow. Or am I missing something?
Writing cr3 on a PCID system doesn't (necessarily) flush the TLB context. The whole reason for PCIDs is to *enable* lazy TLB by not making it necessary to flush a TLB context during the running of another process. As such, this methodology should help a PCID system even more: we can remember if we need to flush a TLB context during the scheduling of said task, without needing any IPI. -- Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.