On Tue, 14 Mar 2017, Andy Lutomirski wrote:

> On Mon, Mar 13, 2017 at 2:05 PM, Andy Lutomirski <l...@kernel.org> wrote:
> > On Mon, Mar 13, 2017 at 9:55 AM, Peter Zijlstra <pet...@infradead.org> 
> > wrote:
> >> On Mon, Mar 13, 2017 at 09:44:02AM -0700, Andy Lutomirski wrote:
> >>> static void x86_pmu_event_mapped(struct perf_event *event)
> >>> {
> >>>     if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
> >>>         return;
> >>>
> >>>     if (atomic_inc_return(&current->mm->context.perf_rdpmc_allowed) == 1)
> >>>
> >>> <-- thread 1 stalls here
> >>>
> >>>         on_each_cpu_mask(mm_cpumask(current->mm), refresh_pce, NULL, 1);
> >>> }
> >>>
> >>> Suppose you start with perf_rdpmc_allowed == 0.  Thread 1 runs
> >>> x86_pmu_event_mapped and gets preempted (or just runs slowly) where I
> >>> marked.  Then thread 2 runs the whole function, does *not* update CR4,
> >>> returns to userspace, and GPFs.
> >>>
> >>> The big hammer solution is to stick a per-mm mutex around it.  Let me
> >>> ponder whether a smaller hammer is available.
> >>
> >> Reminds me a bit of what we ended up with in 
> >> kernel/jump_label.c:static_key_slow_inc().
> >>
> >>
> >
> > One thing I don't get: isn't mmap_sem held for write the whole time?
> 
> mmap_sem is indeed held, so my theory is wrong.  I can reproduce it,
> but I don't see the bug yet...

It could still be a PAPI bug, as I'm having absolutely no luck trying to 
come up with a plain perf_event reproducer.

Let me dig through the PAPI code again and make sure I'm not missing 
something.

Vince

Reply via email to