On Thu, 4 Apr 2019, Cyrill Gorcunov wrote: > On Thu, Apr 04, 2019 at 12:37:18PM -0400, Vince Weaver wrote: > > Oh, Vince, I suspect such kind of bisection might consume a lot of your > time :( Maybe we could update perf fuzzer so that it would send events > to some net-storage first then write them to the counters, iow to automatize > this all stuff somehow?
I do have a lot of this automated already from tracking down past bugs, but it turns out that most of the fuzzer-found bugs aren't deterministic so it doesn't always work. For example this bug, while I can easily repeat it, doesn't happen at the same time each time. I suspect something corrupts things, but the crash doesn't trigger until a context switch happens. For what it's worth I've put code in p4_pmu_enable_all() to see what's going on when the NULL dereference happens, and sure enough the printk is triggered where I'd expect. [ 138.132889] VMW: p4_pmu_enable_all: idx 4 is NULL [ 138.171380] VMW: p4_pmu_enable_all: idx 4 is NULL [ 138.212588] VMW: p4_pmu_enable_all: idx 4 is NULL [ 138.263761] VMW: p4_pmu_enable_all: idx 4 is NULL [ 138.279944] VMW: p4_pmu_enable_all: idx 4 is NULL static void p4_pmu_enable_all(int added) { struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); int idx; for (idx = 0; idx < x86_pmu.num_counters; idx++) { struct perf_event *event = cpuc->events[idx]; if (!test_bit(idx, cpuc->active_mask)) continue; if (event==NULL) { printk("VMW: p4_pmu_enable_all: idx %d is NULL\n",idx); } else { p4_pmu_enable_event(event); } } } the machine still crashes after this, but not right away. Vince