On Mon, Mar 13, 2017 at 9:55 AM, Peter Zijlstra <pet...@infradead.org> wrote: > On Mon, Mar 13, 2017 at 09:44:02AM -0700, Andy Lutomirski wrote: >> static void x86_pmu_event_mapped(struct perf_event *event) >> { >> if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED)) >> return; >> >> if (atomic_inc_return(¤t->mm->context.perf_rdpmc_allowed) == 1) >> >> <-- thread 1 stalls here >> >> on_each_cpu_mask(mm_cpumask(current->mm), refresh_pce, NULL, 1); >> } >> >> Suppose you start with perf_rdpmc_allowed == 0. Thread 1 runs >> x86_pmu_event_mapped and gets preempted (or just runs slowly) where I >> marked. Then thread 2 runs the whole function, does *not* update CR4, >> returns to userspace, and GPFs. >> >> The big hammer solution is to stick a per-mm mutex around it. Let me >> ponder whether a smaller hammer is available. > > Reminds me a bit of what we ended up with in > kernel/jump_label.c:static_key_slow_inc(). > >
One thing I don't get: isn't mmap_sem held for write the whole time?