amd: Resolve NMI latency issues when multiple PMCs are active

Peter Zijlstra Fri, 15 Mar 2019 05:04:04 -0700

On Mon, Mar 11, 2019 at 04:48:51PM +0000, Lendacky, Thomas wrote:
> @@ -467,6 +470,45 @@ static void amd_pmu_wait_on_overflow(int idx, u64 config)
>       }
>  }
>  
> +/*
> + * Because of NMI latency, if multiple PMC counters are active we need to 
> take
> + * into account that multiple PMC overflows can generate multiple NMIs but be
> + * handled by a single invocation of the NMI handler (think PMC overflow 
> while
> + * in the NMI handler). This could result in subsequent unknown NMI messages
> + * being issued.
> + *
> + * Attempt to mitigate this by using the number of active PMCs to determine
> + * whether to return NMI_HANDLED if the perf NMI handler did not handle/reset
> + * any PMCs. The per-CPU perf_nmi_counter variable is set to a minimum of one
> + * less than the number of active PMCs or 2. The value of 2 is used in case 
> the
> + * NMI does not arrive at the APIC in time to be collapsed into an already
> + * pending NMI.


LAPIC I really do hope?!

> + */
> +static int amd_pmu_mitigate_nmi_latency(unsigned int active, int handled)
> +{
> +     /* If multiple counters are not active return original handled count */
> +     if (active <= 1)
> +             return handled;

Should we not reset perf_nmi_counter in this case?

> +
> +     /*
> +      * If a counter was handled, record the number of possible remaining
> +      * NMIs that can occur.
> +      */
> +     if (handled) {
> +             this_cpu_write(perf_nmi_counter,
> +                            min_t(unsigned int, 2, active - 1));
> +
> +             return handled;
> +     }
> +
> +     if (!this_cpu_read(perf_nmi_counter))
> +             return NMI_DONE;
> +
> +     this_cpu_dec(perf_nmi_counter);
> +
> +     return NMI_HANDLED;
> +}
> +
>  static struct event_constraint *
>  amd_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
>                         struct perf_event *event)
> @@ -689,6 +731,7 @@ static __initconst const struct x86_pmu amd_pmu = {
>  
>       .amd_nb_constraints     = 1,
>       .wait_on_overflow       = amd_pmu_wait_on_overflow,
> +     .mitigate_nmi_latency   = amd_pmu_mitigate_nmi_latency,
>  };

Again, you could just do amd_pmu_handle_irq() and avoid an extra
callback.

Anyway, we already had code to deal with spurious NMIs from AMD; see
commit:

  63e6be6d98e1 ("perf, x86: Catch spurious interrupts after disabling counters")

And that looks to be doing something very much the same. Why then do you
still need this on top?

Re: [RFC PATCH 2/2] x86/perf/amd: Resolve NMI latency issues when multiple PMCs are active

Reply via email to