Hello John,

I could test your fixes and I can confirm that the BUS_MCEERR_AR is now working on AMD:

Before the fix, the VM panics with:

qemu-system-x86_64: Guest MCE Memory Error at QEMU addr 0x7f89573ce000 and GUEST addr 0x10b5ce000 of type BUS_MCEERR_AR injected [   83.562579] mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank 1: a000000000000000 [   83.562585] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff81e8f6ff> {pv_native_safe_halt+0xf/0x20}
[   83.562592] mce: [Hardware Error]: TSC 3d39402bdc
[   83.562593] mce: [Hardware Error]: PROCESSOR 2:800f12 TIME 1693515449 SOCKET 0 APIC 0 microcode 800126e [   83.562596] mce: [Hardware Error]: Machine check: Uncorrected error without MCA Recovery
[   83.562597] Kernel panic - not syncing: Fatal local machine check
[   83.563401] Kernel Offset: disabled

With the fix, the same error injection doesn't kill the VM, but generates the following console messages:

qemu-system-x86_64: Guest MCE Memory Error at QEMU addr 0x7fa430ab9000 and GUEST addr 0x118cb9000 of type BUS_MCEERR_AR injected
[  250.851996] Disabling lock debugging due to kernel taint
[  250.852928] mce: Uncorrected hardware memory error in user-access at 118cb9000 [  250.853261] Memory failure: 0x118cb9: Sending SIGBUS to mce_process_rea:1227 due to hardware memory corruption
[  250.854933] mce: [Hardware Error]: Machine check events logged
[  250.855800] Memory failure: 0x118cb9: recovery action for dirty LRU page: Recovered [  250.856661] mce: [Hardware Error]: CPU 2: Machine Check Exception: 7 Bank 9: bc00000000000000
[  250.860552] mce: [Hardware Error]: RIP 33:<00007f56b9ecbee5>
[  250.861405] mce: [Hardware Error]: TSC 8c2c664410 ADDR 118cb9000 MISC 8c
[  250.862679] mce: [Hardware Error]: PROCESSOR 2:800f12 TIME 1693508937 SOCKET 0 APIC 2 microcode 800126e


But a problem still exists with BUS_MCEERR_AO that kills the VM with:

qemu-system-x86_64: warning: Guest MCE Memory Error at QEMU addr 0x7f1d108e5000 and GUEST addr 0x114ae5000 of type BUS_MCEERR_AO injected [  157.392905] mce: [Hardware Error]: CPU 0: Machine Check Exception: 7 Bank 9: bc00000000000000 [  157.392912] mce: [Hardware Error]: RIP 10:<ffffffff81e8f6ff> {pv_native_safe_halt+0xf/0x20}
[  157.392919] mce: [Hardware Error]: TSC 60b92a54d0 ADDR 114ae5000 MISC 8c
[  157.392921] mce: [Hardware Error]: PROCESSOR 2:800f12 TIME 1693500765 SOCKET 0 APIC 0 microcode 800126e [  157.392924] mce: [Hardware Error]: Machine check: Uncorrected unrecoverable error in kernel context
[  157.392925] Kernel panic - not syncing: Fatal local machine check
[  157.402582] Kernel Offset: disabled

As AMD guests can't currently deal with BUS_MCEERR_AO MCE injection, according to me the fix is not complete, the 'AO' case must be handled. The simplest way is probably to filter it at the qemu level, to only inject the 'AR' case -- and it also gives the possibility to let qemu provide a message about an ignored 'AO' error.

I would suggest to add a 3rd patch implementing this AMD specific filter:


commit bf8cc74df3fcc7bf958a7c42b876e9c059fe4d06
Author: William Roche <william.ro...@oracle.com>
Date:   Thu Aug 31 18:54:57 2023 +0000

    i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

    AMD guests can't currently deal with BUS_MCEERR_AO MCE injection
    as it panics the VM kernel. We filter this event and provide a
    warning message.

    Signed-off-by: William Roche <william.ro...@oracle.com>

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 9ca7187628..bd60d5697b 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -606,6 +606,10 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr, int code)
             mcg_status |= MCG_STATUS_RIPV;
         }
     } else {
+        if (code == BUS_MCEERR_AO) {
+            /* XXX we don't support BUS_MCEERR_AO injection on AMD yet */
+            return;
+        }
         mcg_status |= MCG_STATUS_EIPV | MCG_STATUS_RIPV;
     }

@@ -657,7 +661,8 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
         if (ram_addr != RAM_ADDR_INVALID &&
             kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
             kvm_hwpoison_page_add(ram_addr);
-            kvm_mce_inject(cpu, paddr, code);
+            if (!IS_AMD_CPU(env) || code != BUS_MCEERR_AO)
+                kvm_mce_inject(cpu, paddr, code);

             /*
              * Use different logging severity based on error type.
@@ -670,8 +675,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
                     addr, paddr, "BUS_MCEERR_AR");
             } else {
                  warn_report("Guest MCE Memory Error at QEMU addr %p and "
-                     "GUEST addr 0x%" HWADDR_PRIx " of type %s injected",
-                     addr, paddr, "BUS_MCEERR_AO");
+                     "GUEST addr 0x%" HWADDR_PRIx " of type %s %s",
+                     addr, paddr, "BUS_MCEERR_AO",
+                     IS_AMD_CPU(env) ? "ignored on AMD guest" : "injected");
             }

             return;
---


I hope this can help.

William.


On 7/26/23 22:41, John Allen wrote:
In the event that a guest process attempts to access memory that has
been poisoned in response to a deferred uncorrected MCE, an AMD system
will currently generate a SIGBUS error which will result in the entire
guest being shutdown. Ideally, we only want to kill the guest process
that accessed poisoned memory in this case.

This support has been included in qemu for Intel hosts for a long time,
but there are a couple of changes needed for AMD hosts. First, we will
need to expose the SUCCOR cpuid bit to guests. Second, we need to modify
the MCE injection code to avoid Intel specific behavior when we are
running on an AMD host.

v2:
   - Add "succor" feature word.
   - Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature.

John Allen (2):
   i386: Add support for SUCCOR feature
   i386: Fix MCE support for AMD hosts

  target/i386/cpu.c     | 18 +++++++++++++++++-
  target/i386/cpu.h     |  4 ++++
  target/i386/helper.c  |  4 ++++
  target/i386/kvm/kvm.c | 19 +++++++++++++------
  4 files changed, 38 insertions(+), 7 deletions(-)


Reply via email to