On Mon, May 04, 2015 at 12:32:56PM -0700, Stephane Eranian wrote: > On Fri, May 1, 2015 at 5:59 AM, Peter Zijlstra <pet...@infradead.org> wrote: > > > > On Thu, Apr 30, 2015 at 03:08:56PM -0400, Vince Weaver wrote: > > > > > > So the perf_fuzzer caught this after about a week of fuzzing on a Haswell > > > machine running a recent git kernel (pre 4.1-rc1 though). > > > > > > We've seen this BUG before and various fixes were applied but apparently > > > it wasn't enough. > > > > > > Sadly it doesn't seem to be reproducible. > > > > > > validate_group() -> x86_pmu.schedule_events() -> ???? -> > > > variable_test_bit() > > > (hard to tell which test bit with all the inlining going on). > > > > Assuming you build with debug info addr2line -i can help, but I think I > > found it by comparing the Code section below with my objdump -D output. > > > > Its: > > /* constraint still honored */ > > if (!test_bit(hwc->idx, c->idxmsk)) > > break; > > > > Which would seem to suggest c is NULL. > > > But then, you'd crash in the previous loop, because after > get_event_contraint(), you touch > c->weight.
Indeed so; and we can make an analogous argument for hwc. However: > I think it is more likely related to the bitmask (idxmsk). But then > it is always allocated with the constraint even with the HT bug > workaround. So most, likely the index is bogus and you touch outside > the idxmsk[] array. [428232.701319] BUG: unable to handle kernel NULL pointer dereference at (null) But the thing really tried to touch NULL, not some random address that faulted. As always, Vince has found us a good puzzle ;-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/