On Fri, 1 May 2015, Ingo Molnar wrote:

> 
> * Vince Weaver <vincent.wea...@maine.edu> wrote:
> 
> > So this is just a warning, and I've reported it before, but the 
> > perf_fuzzer triggers this fairly regularly on my Haswell system.
> > 
> > It looks like fixed counter 0 (retired instructions) being set to 
> > 0000fffffffffffe occasionally causes an irq loop storm and gets 
> > stuck until the PMU state is cleared.
> 
> So 0000fffffffffffe corresponds to 2 events left until overflow, 
> right? And on Haswell we don't set x86_pmu.limit_period AFAICS, so we 
> allow these super short periods.
> 
> Maybe like on Broadwell we need a quirk on Nehalem/Haswell as well, 
> one similar to bdw_limit_period()? Something like the patch below?

I spent the morning trying to get a reproducer for this.  It turns out to 
be complex.  It seems in addition to fixed counter 0 being set to -2, at 
least one other non-fixed counter must be about to overflow.

For example, in this case gen-PMC2 is also poised to overflow at the same 
time.

CPU#0:   gen-PMC2 ctrl:         00000003ff96764b
CPU#0:   gen-PMC2 count:        0000000000000001
gen-PMC2 left:                  0000ffffffffffff
...
[ 2408.612442] CPU#0: fixed-PMC0 count: 0000fffffffffffe


It's not always PMC2 but in the warnings there's at least one other 
gen-PMC about to overflow at the exact same time as the fixed one.

Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to