On Mon, Mar 13, 2017 at 09:44:02AM -0700, Andy Lutomirski wrote:
> static void x86_pmu_event_mapped(struct perf_event *event)
> {
>     if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
>         return;
> 
>     if (atomic_inc_return(&current->mm->context.perf_rdpmc_allowed) == 1)
> 
> <-- thread 1 stalls here
> 
>         on_each_cpu_mask(mm_cpumask(current->mm), refresh_pce, NULL, 1);
> }
> 
> Suppose you start with perf_rdpmc_allowed == 0.  Thread 1 runs
> x86_pmu_event_mapped and gets preempted (or just runs slowly) where I
> marked.  Then thread 2 runs the whole function, does *not* update CR4,
> returns to userspace, and GPFs.
> 
> The big hammer solution is to stick a per-mm mutex around it.  Let me
> ponder whether a smaller hammer is available.

Reminds me a bit of what we ended up with in 
kernel/jump_label.c:static_key_slow_inc().


Reply via email to