On Thu, 20 Feb 2014, Peter Zijlstra wrote:
> On Thu, Feb 20, 2014 at 01:03:16PM -0500, Vince Weaver wrote:
> > attached, it's not very big.
>
> This is I think the relevant bit:
>
> pec_1076_warn-2804 [000] d... 147.926153: x86_pmu_disable:
> x86_pmu_disable
> pec_1076_warn-2804 [000] d... 147.926153: x86_pmu_state: Events: {
> pec_1076_warn-2804 [000] d... 147.926156: x86_pmu_state: 0: state: .R
> config: ffffffffffffffff ( (null))
> pec_1076_warn-2804 [000] d... 147.926158: x86_pmu_state: 33: state:
> AR config: 0 (ffff88011ac99800)
> pec_1076_warn-2804 [000] d... 147.926159: x86_pmu_state: }
> pec_1076_warn-2804 [000] d... 147.926160: x86_pmu_state: n_events: 1,
> n_added: 0, n_txn: 1
> pec_1076_warn-2804 [000] d... 147.926161: x86_pmu_state: Assignment: {
> pec_1076_warn-2804 [000] d... 147.926162: x86_pmu_state: 0->33 tag: 1
> config: 0 (ffff88011ac99800)
> pec_1076_warn-2804 [000] d... 147.926163: x86_pmu_state: }
> pec_1076_warn-2804 [000] d... 147.926166: collect_events: Adding event:
> 1 (ffff880119ec8800)
> pec_1076_warn-2804 [000] d... 147.926170: collect_events: Adding event:
> 0 (ffff8800c9e01800)
> pec_1076_warn-2804 [000] d... 147.926172: collect_events: Adding event:
> 4 (ffff8800cbab2c00)
> pec_1076_warn-2804 [000] d... 147.926177: x86_pmu_enable: x86_pmu_enable
> pec_1076_warn-2804 [000] d... 147.926177: x86_pmu_state: Events: {
> pec_1076_warn-2804 [000] d... 147.926179: x86_pmu_state: 0: state: .R
> config: ffffffffffffffff ( (null))
> pec_1076_warn-2804 [000] d... 147.926181: x86_pmu_state: 33: state:
> AR config: 0 (ffff88011ac99800)
> pec_1076_warn-2804 [000] d... 147.926182: x86_pmu_state: }
> pec_1076_warn-2804 [000] d... 147.926184: x86_pmu_state: n_events: 2,
> n_added: 2, n_txn: 2
> pec_1076_warn-2804 [000] d... 147.926184: x86_pmu_state: Assignment: {
> pec_1076_warn-2804 [000] d... 147.926186: x86_pmu_state: 0->33 tag: 1
> config: 0 (ffff88011ac99800)
> pec_1076_warn-2804 [000] d... 147.926188: x86_pmu_state: 1->0 tag: 1
> config: 1 (ffff880119ec8800)
> pec_1076_warn-2804 [000] d... 147.926188: x86_pmu_state: }
> pec_1076_warn-2804 [000] d... 147.926190: x86_pmu_enable: S0: hwc->idx:
> 33, hwc->last_cpu: 0, hwc->last_tag: 1 hwc->state: 0
> pec_1076_warn-2804 [000] d... 147.926191: x86_pmu_enable: starting: 0
>
> so it does indeed look like n_added got scrambled; we started out with 1
> event on disable; we've got 2 events on enable, but n_added is also 2,
> which would suggest we had 0 on disable.
>
> That makes us want to (re)start the NMI counter alright.
Might be relevant: check the last_cpu values. Right before the above
it looks like the thread gets moved from CPU 1 to CPU 0
(possibly as a result of the long chain started with the
close() of the tracepoint event),
so the problem NMI watchdog event being enabled is a different one than
the one that was disabled just before.
Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/