On Mon, Jul 16, 2018 at 12:03 PM, Rik van Riel <r...@surriel.com> wrote: > Lazy TLB mode can result in an idle CPU being woken up by a TLB flush, > when all it really needs to do is reload %CR3 at the next context switch, > assuming no page table pages got freed. > > Memory ordering is used to prevent race conditions between switch_mm_irqs_off, > which checks whether .tlb_gen changed, and the TLB invalidation code, which > increments .tlb_gen whenever page table entries get invalidated. > > The atomic increment in inc_mm_tlb_gen is its own barrier; the context > switch code adds an explicit barrier between reading tlbstate.is_lazy and > next->context.tlb_gen. > > Unlike the 2016 version of this patch, CPUs with cpu_tlbstate.is_lazy set > are not removed from the mm_cpumask(mm), since that would prevent the TLB > flush IPIs at page table free time from being sent to all the CPUs > that need them. > > This patch reduces total CPU use in the system by about 1-2% for a > memcache workload on two socket systems, and by about 1% for a heavily > multi-process netperf between two systems. >
I'm not 100% certain I'm replying to the right email, and I haven't gotten the tip-bot notification at all, but: I think you've introduced a minor-ish performance regression due to changing the old (admittedly terribly documented) control flow a bit. Before, if real_prev == next, we would skip: load_mm_cr4(next); switch_ldt(real_prev, next); Now we don't any more. I think you should reinstate that optimization. It's probably as simple as wrapping them in an if (real_priv != next) with a comment like /* Remote changes that would require a cr4 or ldt reload will unconditionally send an IPI even to lazy CPUs. So, if we aren't changing our mm, we don't need to refresh cr4 or the ldt */ Hmm. load_mm_cr4() should bypass itself when mm == &init_mm. Want to fix that part or should I? --Andy