Internal injection testing crashed with a console log that said:

mce: [Hardware Error]: CPU 7: Machine Check Exception: f Bank 0: 
bd80000000100134

This caused a lot of head scratching because the MCACOD (bits 15:0) of that
status is a signature from an L1 data cache error. But Linux says that it found
it in "Bank 0", which on this model CPU only reports L1 instruction cache 
errors.

The answer was that Linux doesn't initialize "m->bank" in the case that it finds
a fatal error in the mce_no_way_out() pre-scan of banks. If this was a local 
machine
check, then we pass this partially initialized "struct mce" to mce_panic().

Fix is simple. Just initialize m->bank in the case that we found a fatal error.

Fixes: 40c36e2741d7 ("x86/mce: Fix incorrect "Machine check from unknown 
source" message")
Cc: [email protected] # v4.18 Note pre-v5.0 arch/x86/kernel/cpu/mce/core.c 
was called arch/x86/kernel/cpu/mcheck/mce.c
Signed-off-by: Tony Luck <[email protected]>
---
 arch/x86/kernel/cpu/mce/core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 672c7225cb1b..6ce290c506d9 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -784,6 +784,7 @@ static int mce_no_way_out(struct mce *m, char **msg, 
unsigned long *validp,
                        quirk_no_way_out(i, m, regs);
 
                if (mce_severity(m, mca_cfg.tolerant, &tmp, true) >= 
MCE_PANIC_SEVERITY) {
+                       m->bank = i;
                        mce_read_aux(m, i);
                        *msg = tmp;
                        return 1;
-- 
2.19.1

Reply via email to