On Tue, Sep 10, 2013 at 7:29 AM, Ingo Molnar <mi...@kernel.org> wrote: > > * Stephane Eranian <eran...@googlemail.com> wrote: > >> On Tue, Sep 10, 2013 at 6:38 AM, Ingo Molnar <mi...@kernel.org> wrote: >> > >> > * Stephane Eranian <eran...@googlemail.com> wrote: >> > >> >> Hi, >> >> >> >> Ok, so I am able to reproduce the problem using a simpler >> >> test case with a simple multithreaded program where >> >> #threads >> #CPUs. >> > >> > Does it go away if you use 'perf record --all-cpus'? >> > >> Haven't tried that yet. >> >> But I verified the DS pointers: >> init: >> CPU6 pebs base=ffff8808262de000 index=ffff8808262de000 >> intr=ffff8808262de0c0 max=ffff8808262defc0 >> crash: >> CPU6 pebs base=ffff8808262de000 index=ffff8808262de9c0 >> intr=ffff8808262de0c0 max=ffff8808262defc0 >> >> Neither the base nor the max are modified. >> The index simply goes beyond the threshold but that's not a bug. >> It is 12 after the threshold of 1, so total 13 is my new crash report. >> >> Two things to try: >> - measure only one thread/core >> - move the threshold a bit farther away (to get 2 or 3 entries) >> >> The threshold is where to generate the interrupt. It does not mean where >> to stop PEBS recording. So it is possible that in HSW, we may get into a >> situation where it takes time to get to the handler to stop the PMU. I >> don't know how given we use NMI. Well, unless we were already servicing >> an NMI at the time. But given that we stop the PMU almost immediately in >> the handler, I don't see how that would possible. The other oddity in >> HSW is that we clear the NMI on entry to the handler and not at the end. >> I never gotten an good explanation as to why that was necessary. So >> maybe it is related... > > Do you mean: > > if (!x86_pmu.late_ack) > apic_write(APIC_LVTPC, APIC_DM_NMI); > > AFAICS that means the opposite: that we clear the NMI late, i.e. shortly > before return, after we've processed the PMU. > Yeah, the opposity, I got confused.
Let me try reverting that. Also curious about the influence of the LBR here. > Do the symptoms change if you remove the x86_pmu.late_ack setting line > from: > > case 60: /* Haswell Client */ > case 70: > case 71: > case 63: > case 69: > x86_pmu.late_ack = true; > > ? > > Thanks, > > Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/