On Thu, Nov 17, 2016 at 05:07:00PM +0100, Peter Zijlstra wrote:
> On Thu, Nov 17, 2016 at 09:18:48AM -0600, Josh Poimboeuf wrote:

> > Thanks, I just waded through this and it turned up some good clues.  And
> > according to 'git blame', you might be able to help :-)
> > 
> > It's not stack corruption.  Instead it looks like
> > __intel_pmu_pebs_event() is creating a bad or stale pt_regs which gets
> > passed to the unwinder.  Specifically, regs->bp points to a seemingly
> > random address on the NMI stack.  Which seems odd, considering the code
> > itself is running on the same NMI stack.
> > 
> > I don't know much about the PEBS code but it seems like it's passing
> > some stale data.  Either that or there's some NMI nesting going on.

So the puzzle was BP,SP pointing into the NMI stack at random spots. But
I think I can explain this; if the event has a very _very_ short period,
then the tail __intel_pmu_enable_all() call from the PMI handler will
'insta' trigger a record and raise another PMI.

We then get back-to-back NMIs with a record pointing to a now
overwritten stack.

The other scenario, where there is an (i)ret between the record and the
interrupt would be less confusing but still wrong.

Solve this by always using iregs->{bp,sp} for callchains.

The below patch still copies the record BP,SP when !CALLCHAINS &&
SAMPLE_REGS; does this make sense?

The fuzzer is still running with this patch applied.. I'll let it run
for a while.

---
 arch/x86/events/intel/ds.c   | 35 +++++++++++++++++++++++------------
 arch/x86/events/perf_event.h |  2 +-
 2 files changed, 24 insertions(+), 13 deletions(-)

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 0319311dbdbb..be202390bbd3 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1108,20 +1108,20 @@ static void setup_pebs_sample_data(struct perf_event 
*event,
        }
 
        /*
-        * We use the interrupt regs as a base because the PEBS record
-        * does not contain a full regs set, specifically it seems to
-        * lack segment descriptors, which get used by things like
-        * user_mode().
+        * We use the interrupt regs as a base because the PEBS record does not
+        * contain a full regs set, specifically it seems to lack segment
+        * descriptors, which get used by things like user_mode().
         *
-        * In the simple case fix up only the IP and BP,SP regs, for
-        * PERF_SAMPLE_IP and PERF_SAMPLE_CALLCHAIN to function properly.
-        * A possible PERF_SAMPLE_REGS will have to transfer all regs.
+        * In the simple case fix up only the IP for PERF_SAMPLE_IP.
+        *
+        * We must however always use BP,SP from iregs for the unwinder to stay
+        * sane; the record BP,SP can point into thin air when the record is
+        * from a previous PMI context or an (I)RET happend between the record
+        * and PMI.
         */
        *regs = *iregs;
        regs->flags = pebs->flags;
        set_linear_ip(regs, pebs->ip);
-       regs->bp = pebs->bp;
-       regs->sp = pebs->sp;
 
        if (sample_type & PERF_SAMPLE_REGS_INTR) {
                regs->ax = pebs->ax;
@@ -1130,10 +1130,21 @@ static void setup_pebs_sample_data(struct perf_event 
*event,
                regs->dx = pebs->dx;
                regs->si = pebs->si;
                regs->di = pebs->di;
-               regs->bp = pebs->bp;
-               regs->sp = pebs->sp;
 
-               regs->flags = pebs->flags;
+               /*
+                * Per the above; only set BP,SP if we don't need callchains.
+                *
+                * XXX: does this make sense?
+                */
+               if (!(sample_type & PERF_SAMPLE_CALLCHAIN)) {
+                       regs->bp = pebs->bp;
+                       regs->sp = pebs->sp;
+               }
+
+               /*
+                * Preserve PERF_EFLAGS_VM from set_linear_ip().
+                */
+               regs->flags = pebs->flags | (regs->flags & PERF_EFLAGS_VM);
 #ifndef CONFIG_X86_32
                regs->r8 = pebs->r8;
                regs->r9 = pebs->r9;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 5874d8de1f8d..a77ee026643d 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -113,7 +113,7 @@ struct debug_store {
  * Per register state.
  */
 struct er_account {
-       raw_spinlock_t          lock;   /* per-core: protect structure */
+       raw_spinlock_t      lock;       /* per-core: protect structure */
        u64                 config;     /* extra MSR config */
        u64                 reg;        /* extra MSR number */
        atomic_t            ref;        /* reference count */

Reply via email to