On Thu, 20 Feb 2014, Peter Zijlstra wrote:

> On Thu, Feb 20, 2014 at 01:03:16PM -0500, Vince Weaver wrote:
> > attached, it's not very big.
> 
> This is I think the relevant bit:
> 
>    pec_1076_warn-2804  [000] d...   147.926153: x86_pmu_disable: 
> x86_pmu_disable
>    pec_1076_warn-2804  [000] d...   147.926153: x86_pmu_state: Events: {
>    pec_1076_warn-2804  [000] d...   147.926156: x86_pmu_state:   0: state: .R 
> config: ffffffffffffffff (          (null))
>    pec_1076_warn-2804  [000] d...   147.926158: x86_pmu_state:   33: state: 
> AR config: 0 (ffff88011ac99800)
>    pec_1076_warn-2804  [000] d...   147.926159: x86_pmu_state: }
>    pec_1076_warn-2804  [000] d...   147.926160: x86_pmu_state: n_events: 1, 
> n_added: 0, n_txn: 1
>    pec_1076_warn-2804  [000] d...   147.926161: x86_pmu_state: Assignment: {
>    pec_1076_warn-2804  [000] d...   147.926162: x86_pmu_state:   0->33 tag: 1 
> config: 0 (ffff88011ac99800)
>    pec_1076_warn-2804  [000] d...   147.926163: x86_pmu_state: }
>    pec_1076_warn-2804  [000] d...   147.926166: collect_events: Adding event: 
> 1 (ffff880119ec8800)
>    pec_1076_warn-2804  [000] d...   147.926170: collect_events: Adding event: 
> 0 (ffff8800c9e01800)
>    pec_1076_warn-2804  [000] d...   147.926172: collect_events: Adding event: 
> 4 (ffff8800cbab2c00)
>    pec_1076_warn-2804  [000] d...   147.926177: x86_pmu_enable: x86_pmu_enable
>    pec_1076_warn-2804  [000] d...   147.926177: x86_pmu_state: Events: {
>    pec_1076_warn-2804  [000] d...   147.926179: x86_pmu_state:   0: state: .R 
> config: ffffffffffffffff (          (null))
>    pec_1076_warn-2804  [000] d...   147.926181: x86_pmu_state:   33: state: 
> AR config: 0 (ffff88011ac99800)
>    pec_1076_warn-2804  [000] d...   147.926182: x86_pmu_state: }
>    pec_1076_warn-2804  [000] d...   147.926184: x86_pmu_state: n_events: 2, 
> n_added: 2, n_txn: 2
>    pec_1076_warn-2804  [000] d...   147.926184: x86_pmu_state: Assignment: {
>    pec_1076_warn-2804  [000] d...   147.926186: x86_pmu_state:   0->33 tag: 1 
> config: 0 (ffff88011ac99800)
>    pec_1076_warn-2804  [000] d...   147.926188: x86_pmu_state:   1->0 tag: 1 
> config: 1 (ffff880119ec8800)
>    pec_1076_warn-2804  [000] d...   147.926188: x86_pmu_state: }
>    pec_1076_warn-2804  [000] d...   147.926190: x86_pmu_enable: S0: hwc->idx: 
> 33, hwc->last_cpu: 0, hwc->last_tag: 1 hwc->state: 0
>    pec_1076_warn-2804  [000] d...   147.926191: x86_pmu_enable: starting: 0
> 
> so it does indeed look like n_added got scrambled; we started out with 1
> event on disable; we've got 2 events on enable, but n_added is also 2,
> which would suggest we had 0 on disable.
> 
> That makes us want to (re)start the NMI counter alright.

Might be relevant: check the last_cpu values.  Right before the above
it looks like the thread gets moved from CPU 1 to CPU 0
(possibly as a result of the long chain started with the
close() of the tracepoint event),
so the problem NMI watchdog event being enabled is a different one than 
the one that was disabled just before.

Vince



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to