A long time ago, Linux cleared IA32_MCG_STATUS at the very end of machine
check processing.

Then we added some fancy recovery and IST manipulation in:

commit d4812e169de4 ("x86, mce: Get rid of TIF_MCE_NOTIFY and associated mce 
tricks")

and clearing IA32_MCG_STATUS was pulled earlier in the function.

Next change moved the actual recovery out of do_machine_check() and just
used task_work_add() to schedule it later (before returning to the user):

commit 5567d11c21a1 ("x86/mce: Send #MC singal from task work")

Most recently the fancy IST footwork was removed as no longer needed:

commit b052df3da821 ("x86/entry: Get rid of ist_begin/end_non_atomic()")

At this point there is no reason remaining to clear IA32_MCG_STATUS early.
It can move back to the very end of the function.

Also moved sync_core(). The comments for this function say that it should
only be called when instructions have been changed/re-mapped. Recovery for
an instruction fetch may change the physical address. But that doesn't happen
until the scheduled work runs (which could be on another CPU).

Reported-by: Gabriele Paoloni <gabriele.paol...@intel.com>
Signed-off-by: Tony Luck <tony.l...@intel.com>
---
 arch/x86/kernel/cpu/mce/core.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index f43a78bde670..0ba24dfffdb2 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1190,6 +1190,7 @@ static void kill_me_maybe(struct callback_head *cb)
 
        if (!memory_failure(p->mce_addr >> PAGE_SHIFT, flags)) {
                set_mce_nospec(p->mce_addr >> PAGE_SHIFT, p->mce_whole_page);
+               sync_core();
                return;
        }
 
@@ -1330,12 +1331,8 @@ noinstr void do_machine_check(struct pt_regs *regs)
        if (worst > 0)
                irq_work_queue(&mce_irq_work);
 
-       mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
-
-       sync_core();
-
        if (worst != MCE_AR_SEVERITY && !kill_it)
-               return;
+               goto out;
 
        /* Fault was in user mode and we need to take some action */
        if ((m.cs & 3) == 3) {
@@ -1364,6 +1361,8 @@ noinstr void do_machine_check(struct pt_regs *regs)
                                mce_panic("Failed kernel mode recovery", &m, 
msg);
                }
        }
+out:
+       mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
 }
 EXPORT_SYMBOL_GPL(do_machine_check);
 
-- 
2.21.1

Reply via email to