On Thu, Jun 21, 2018 at 11:18:09AM -0700, Luck, Tony wrote: > Counter proposal. We don't need the temp mci_status because we exit the > loop early. Nor the "ret" variable. > > > How does this look?
Yap, better. I'll test it later or tomorrow: --- From: Borislav Petkov <[email protected]> Date: Thu, 21 Jun 2018 14:18:47 +0200 Subject: [PATCH] x86/mce: Do not overwrite MCi_STATUS in mce_no_way_out() mce_no_way_out() does a quick check during #MC to see whether some of the MCEs logged would require the kernel to panic immediately. And it passes a struct mce where MCi_STATUS gets written. However, after having saved a valid status value, the next iteration of the loop which goes over the MCA banks on the CPU, overwrites the valid status value because we're using struct mce as storage instead of a temporary variable. Which leads to MCE records with an empty status value: mce: [Hardware Error]: CPU 0: Machine Check Exception: 6 Bank 0: 0000000000000000 mce: [Hardware Error]: RIP 10:<ffffffffbd42fbd7> {trigger_mce+0x7/0x10} In order to prevent the loss of the status register value, return immediately when severity is a panic one so that we can panic immediately with the first fatal MCE logged. This is also the intention of this function and not to noodle over the banks while a fatal MCE is already logged. Suggested-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Cc: <[email protected]> --- arch/x86/kernel/cpu/mcheck/mce.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index 5c38d1f861f2..e75418096ec6 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -772,23 +772,24 @@ EXPORT_SYMBOL_GPL(machine_check_poll); static int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp, struct pt_regs *regs) { - int i, ret = 0; char *tmp; + int i; for (i = 0; i < mca_cfg.banks; i++) { m->status = mce_rdmsrl(msr_ops.status(i)); - if (m->status & MCI_STATUS_VAL) { - __set_bit(i, validp); - if (quirk_no_way_out) - quirk_no_way_out(i, m, regs); - } + if (!(m->status & MCI_STATUS_VAL)) + continue; + + __set_bit(i, validp); + if (quirk_no_way_out) + quirk_no_way_out(i, m, regs); if (mce_severity(m, mca_cfg.tolerant, &tmp, true) >= MCE_PANIC_SEVERITY) { *msg = tmp; - ret = 1; + return 1; } } - return ret; + return 0; } /* -- 2.17.0.582.gccdcbd54c -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.

