On some systems pressing the external NMI button is now failing to inject
an NMI 5-10% of the time.  This causes confusion for a user that expects
the NMI to dump the system.

Commit 6089327f5424 ("perf/x86: Add sysfs entry to freeze counters on SMI")
does not read the firmware setting of the FREEZE_WHILE_SMM bit and will
always clear it when the PMU is initialized.  As a result the performance
counters will always run and that greatly expands the race in which
external NMI will not be processed if a local NMI is already being
processed.

One option is to change default_do_nmi().  The code snippet below shows the
relevant portion of a patch that resolves the issue, but it is problematic
from a performance perspective and was dismissed.

-345,7 +345,17 @@ static void default_do_nmi(struct pt_regs *regs)
                 */
                if (handled > 1)
                        __this_cpu_write(swallow_nmi, true);
-               return;
+
+               /*
+                * Unfortunately, there is a race condition which can
+                * result in a missing an external NMI.  Typically, an
+                * external NMI is processed on cpu 0.  Therefore, on
+                * cpu 0 check for an external NMI before returning.
+                */
+               if (smp_processor_id() ||
+                   (x86_platform.get_nmi_reason() & NMI_REASON_MASK) == 0) {
+                       return;
+               }
        }

Ultimately, the issue can be resolved by storing the default firmware
setting of FREEZE_WHILE_SMM before initializing the PMU.

Fixes: 6089327f5424 ("perf/x86: Add sysfs entry to freeze counters on SMI")

Signed-off-by: David Arcari <darc...@redhat.com>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: "H. Peter Anvin" <h...@zytor.com>
Cc: x...@kernel.org
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Andi Kleen <a...@linux.intel.com>
Cc: Kan Liang <kan.li...@linux.intel.com>
Cc: Jiri Olsa <jo...@redhat.com>
Cc: Donald Zickus <dzic...@redhat.com>
Cc: Prarit Bhargava <pra...@redhat.com>
Cc: Jerry Hoemann <jerry.hoem...@hpe.com>
---
 arch/x86/events/intel/core.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 707b2a9..fce98df 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3325,6 +3325,18 @@ static void flip_smm_bit(void *data)
        }
 }
 
+static int read_smm_bit(void)
+{
+       u64 val;
+
+       if (!rdmsrl_safe(MSR_IA32_DEBUGCTLMSR, &val)) {
+               if (val & DEBUGCTLMSR_FREEZE_IN_SMM)
+                       return 1;
+       }
+
+       return 0;
+}
+
 static void intel_pmu_cpu_starting(int cpu)
 {
        struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu);
@@ -4423,6 +4435,8 @@ __init int intel_pmu_init(void)
                pr_cont("full-width counters, ");
        }
 
+       x86_pmu.attr_freeze_on_smi = read_smm_bit();
+
        kfree(to_free);
        return 0;
 }
-- 
1.8.3.1

Reply via email to