Re: [PULL 39/42] i386: Add support for SUCCOR feature

2024-06-24 Thread John Allen
On Thu, Jun 13, 2024 at 05:50:08PM +0800, Xiaoyao Li wrote:
> On 6/8/2024 4:34 PM, Paolo Bonzini wrote:
> > From: John Allen 
> > 
> > Add cpuid bit definition for the SUCCOR feature. This cpuid bit is required 
> > to
> > be exposed to guests to allow them to handle machine check exceptions on AMD
> > hosts.
> > 
> > 
> > v2:
> >- Add "succor" feature word.
> >- Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature.
> > 
> > Reported-by: William Roche 
> > Reviewed-by: Joao Martins 
> > Signed-off-by: John Allen 
> > Message-ID: <20240603193622.47156-3-john.al...@amd.com>
> > Signed-off-by: Paolo Bonzini 
> 
> [snip]
> ...
> 
> > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> > index 55a9e8a70cf..56d8e2a89ec 100644
> > --- a/target/i386/kvm/kvm.c
> > +++ b/target/i386/kvm/kvm.c
> > @@ -532,6 +532,8 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
> > uint32_t function,
> >*/
> >   cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
> >   ret |= cpuid_1_edx & CPUID_EXT2_AMD_ALIASES;
> > +} else if (function == 0x8007 && reg == R_EBX) {
> > +ret |= CPUID_8000_0007_EBX_SUCCOR;
> 
> IMO, this is incorrect.
> 
> Why make it supported unconditionally? Shouldn't there be a KVM patch to
> report it in KVM_GET_SUPPORTED_CPUID?
> 
> If make it supported unconditionally, all VMs boot with "-cpu host/max" will
> see the CPUID bits, even if it is Intel VMs.

Paolo,

This change in kvm_arch_get_supported_cpuid was added based on your
suggestion here:
https://lore.kernel.org/all/d4c1bb9b-8438-ed00-c79d-e8ad2a7e4...@redhat.com/

Is there something missing from the patch needed to prevent the bits
from being seen on Intel VMs?

I am planning to send a patch early this week to report the bits for KVM
and a patch that removes the above change for qemu. Is there another way
you would prefer to handle it?

Thanks,
John

> 
> 
> >   } else if (function == KVM_CPUID_FEATURES && reg == R_EAX) {
> >   /* kvm_pv_unhalt is reported by GET_SUPPORTED_CPUID, but it can't
> >* be enabled without the in-kernel irqchip
> 



Re: [PATCH v5 0/3] Fix MCE handling on AMD hosts

2024-06-06 Thread John Allen
On Thu, Jun 06, 2024 at 11:09:05AM +0200, Paolo Bonzini wrote:
> Queued, thanks.  I added a note to the commit message in the third patch:

Thanks, Paolo!

> 
> By the time the MCE reaches the guest, the overflow has been handled
> by the host and has not caused a shutdown, so include the bit 
> unconditionally.

I'm not sure I understand this additional comment. Is this talking about
the case where the host gets an overflow? If so, yes, if the host has
overflow recovery supported, it should handle the overflow and won't
require any overflow recovery on the part of the guest. For clarity, it
may be nice to prefix the above statement with something like:
"In the case of a host overflow, ..."

If we're going to bring up the host overflow case, it may be worth
clarifying further that host overflows should not propagate to the guest
and this patch is specifically intended to allow the guest to handle
overflows in the MCEs that are injected from qemu.

> 
> Advertising of SUCCOR and OVERFLOW_RECOV in KVM would still be nice. :)

Sure, I will send a series for this.

Thanks,
John



[PATCH v5 0/3] Fix MCE handling on AMD hosts

2024-06-03 Thread John Allen
In the event that a guest process attempts to access memory that has
been poisoned in response to a deferred uncorrected MCE, an AMD system
will currently generate a SIGBUS error which will result in the entire
guest being shutdown. Ideally, we only want to kill the guest process
that accessed poisoned memory in this case.

This support has been included in qemu for Intel hosts for a long time,
but there are a couple of changes needed for AMD hosts. First, we will
need to expose the SUCCOR and overflow recovery cpuid bits to guests.
Second, we need to modify the MCE injection code to avoid Intel specific
behavior when we are running on an AMD host.

Version 5 of the series differs from previous versions in that it
handles AO (deferred) errors rather than ignoring them. This is made
possible by in progress kernel patches that utilize recently accepted
address translation capabilities on AMD platforms to translate
UMC relative normalized addresses received with a deferred error to
system physical addresses that can be used for memory error recovery.
While the bulk of the address translation code is upstream, the code
to use the new translation code in the event of a deferred error is
not, but can be seen here:
https://github.com/AMDESE/linux/commits/wip-mca/

This adds a new wrapper struct for MCEs and uses this wrapper to store
the translated physical address in the following commit:
https://github.com/AMDESE/linux/commit/76732c67cbf96c14f55ed1061804db9ff1505ea3

v2:
  - Add "succor" feature word.
  - Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature.

v3:
  - Reorder series. Only enable SUCCOR after bugs have been fixed.
  - Introduce new patch ignoring AO errors.

v4:
  - Remove redundant check for AO errors.

v5:
  - Remove patch to ignore AO errors and introduce proper deferred
error support.
  - Introduce new patch to support overflow recovery cpuid bits.

John Allen (3):
  i386: Fix MCE support for AMD hosts
  i386: Add support for SUCCOR feature
  i386: Add support for overflow recovery

 target/i386/cpu.c | 18 +-
 target/i386/cpu.h |  7 +++
 target/i386/helper.c  |  4 
 target/i386/kvm/kvm.c | 41 +
 4 files changed, 61 insertions(+), 9 deletions(-)

-- 
2.43.0




[PATCH v5 3/3] i386: Add support for overflow recovery

2024-06-03 Thread John Allen
Add cpuid bit definition for overflow recovery. This is needed in the case
where a deferred error has been sent to the guest, a guest process accesses the
poisoned memory, but the machine_check_poll function has not yet handled the
original deferred error. If overflow recovery is not set in this case, when we
handle the uncorrected error from the poisoned memory access, the overflow bit
will be set and will result in the guest being shut down.

Signed-off-by: John Allen 
---
v5:
  - New in v5.
---
 target/i386/cpu.c | 2 +-
 target/i386/cpu.h | 1 +
 target/i386/kvm/kvm.c | 2 +-
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 5fa2dde732..5385b26d4a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1035,7 +1035,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 [FEAT_8000_0007_EBX] = {
 .type = CPUID_FEATURE_WORD,
 .feat_names = {
-NULL, "succor", NULL, NULL,
+"overflow-recov", "succor", NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 5dd41e3d69..d56cf631b5 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -955,6 +955,7 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 #define CPUID_14_0_ECX_LIP  (1U << 31)
 
 /* RAS Features */
+#define CPUID_8000_0007_EBX_OVERFLOW_RECOV(1U << 0)
 #define CPUID_8000_0007_EBX_SUCCOR  (1U << 1)
 
 /* CLZERO instruction */
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 384b702fef..796d5e9e38 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -477,7 +477,7 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t 
function,
 cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
 ret |= cpuid_1_edx & CPUID_EXT2_AMD_ALIASES;
 } else if (function == 0x8007 && reg == R_EBX) {
-ret |= CPUID_8000_0007_EBX_SUCCOR;
+ret |= CPUID_8000_0007_EBX_OVERFLOW_RECOV | CPUID_8000_0007_EBX_SUCCOR;
 } else if (function == KVM_CPUID_FEATURES && reg == R_EAX) {
 /* kvm_pv_unhalt is reported by GET_SUPPORTED_CPUID, but it can't
  * be enabled without the in-kernel irqchip
-- 
2.43.0




[PATCH v5 2/3] i386: Add support for SUCCOR feature

2024-06-03 Thread John Allen
Add cpuid bit definition for the SUCCOR feature. This cpuid bit is required to
be exposed to guests to allow them to handle machine check exceptions on AMD
hosts.


v2:
  - Add "succor" feature word.
  - Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature.

Reported-by: William Roche 
Reviewed-by: Joao Martins 
Signed-off-by: John Allen 
---
 target/i386/cpu.c | 18 +-
 target/i386/cpu.h |  4 
 target/i386/kvm/kvm.c |  2 ++
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index bca776e1fe..5fa2dde732 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1032,6 +1032,22 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 .tcg_features = TCG_APM_FEATURES,
 .unmigratable_flags = CPUID_APM_INVTSC,
 },
+[FEAT_8000_0007_EBX] = {
+.type = CPUID_FEATURE_WORD,
+.feat_names = {
+NULL, "succor", NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+},
+.cpuid = { .eax = 0x8007, .reg = R_EBX, },
+.tcg_features = 0,
+.unmigratable_flags = 0,
+},
 [FEAT_8000_0008_EBX] = {
 .type = CPUID_FEATURE_WORD,
 .feat_names = {
@@ -6556,7 +6572,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 break;
 case 0x8007:
 *eax = 0;
-*ebx = 0;
+*ebx = env->features[FEAT_8000_0007_EBX];
 *ecx = 0;
 *edx = env->features[FEAT_8000_0007_EDX];
 break;
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index ddf53937b4..5dd41e3d69 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -607,6 +607,7 @@ typedef enum FeatureWord {
 FEAT_7_1_EAX,   /* CPUID[EAX=7,ECX=1].EAX */
 FEAT_8000_0001_EDX, /* CPUID[8000_0001].EDX */
 FEAT_8000_0001_ECX, /* CPUID[8000_0001].ECX */
+FEAT_8000_0007_EBX, /* CPUID[8000_0007].EBX */
 FEAT_8000_0007_EDX, /* CPUID[8000_0007].EDX */
 FEAT_8000_0008_EBX, /* CPUID[8000_0008].EBX */
 FEAT_8000_0021_EAX, /* CPUID[8000_0021].EAX */
@@ -953,6 +954,9 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 /* Packets which contain IP payload have LIP values */
 #define CPUID_14_0_ECX_LIP  (1U << 31)
 
+/* RAS Features */
+#define CPUID_8000_0007_EBX_SUCCOR  (1U << 1)
+
 /* CLZERO instruction */
 #define CPUID_8000_0008_EBX_CLZERO  (1U << 0)
 /* Always save/restore FP error pointers */
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 810f586a59..384b702fef 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -476,6 +476,8 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t 
function,
  */
 cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
 ret |= cpuid_1_edx & CPUID_EXT2_AMD_ALIASES;
+} else if (function == 0x8007 && reg == R_EBX) {
+ret |= CPUID_8000_0007_EBX_SUCCOR;
 } else if (function == KVM_CPUID_FEATURES && reg == R_EAX) {
 /* kvm_pv_unhalt is reported by GET_SUPPORTED_CPUID, but it can't
  * be enabled without the in-kernel irqchip
-- 
2.43.0




[PATCH v5 1/3] i386: Fix MCE support for AMD hosts

2024-06-03 Thread John Allen
For the most part, AMD hosts can use the same MCE injection code as Intel, but
there are instances where the qemu implementation is Intel specific. First, MCE
delivery works differently on AMD and does not support broadcast. Second,
kvm_mce_inject generates MCEs that include a number of Intel specific status
bits. Modify kvm_mce_inject to properly generate MCEs on AMD platforms.

Reported-by: William Roche 
Signed-off-by: John Allen 
---
v3:
  - Update to latest qemu code that introduces using MCG_STATUS_RIPV in the
case of a BUS_MCEERR_AR on a non-AMD machine.
v5:
  - This version implements proper support for AO (deferred) errors. For
deferred errors, there are some differences in the flags that need to be
set in order to be handled correctly. First, the UC flag must not be set
to allow deferred errors to be picked up by the machine_check_poll
function in the guest kernel, mirroring baremetal behavior. If the UC
flag is set, the injected MCE will be sent as an interrupt and will be
handled in do_machine_check which currently is not expected to handle
deferred errors and would need significant modification to do so.

  - Both the deferred (AO) and uncorrected (AR) error cases set the POISON
bit. For the uncorrected error case, this is expected and properly
mirrors baremetal behavior. For the deferred error case, this bit would
not be set by hardware, but is being reused as a convenient workaround
to give better behavior in the case that the MCE_KILL_EARLY flag is set
in the guest process. See code comment for more details.
---
 target/i386/cpu.h |  2 ++
 target/i386/helper.c  |  4 
 target/i386/kvm/kvm.c | 39 +++
 3 files changed, 37 insertions(+), 8 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index dfe43b8204..ddf53937b4 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -365,6 +365,8 @@ typedef enum X86Seg {
 #define MCI_STATUS_PCC   (1ULL<<57)  /* processor context corrupt */
 #define MCI_STATUS_S (1ULL<<56)  /* Signaled machine check */
 #define MCI_STATUS_AR(1ULL<<55)  /* Action required */
+#define MCI_STATUS_DEFERRED(1ULL<<44)  /* Deferred error */
+#define MCI_STATUS_POISON  (1ULL<<43)  /* Poisoned data consumed */
 
 /* MISC register defines */
 #define MCM_ADDR_SEGOFF  0  /* segment offset */
diff --git a/target/i386/helper.c b/target/i386/helper.c
index 2070dd0dda..6f6b0f02e0 100644
--- a/target/i386/helper.c
+++ b/target/i386/helper.c
@@ -91,6 +91,10 @@ int cpu_x86_support_mca_broadcast(CPUX86State *env)
 int family = 0;
 int model = 0;
 
+if (IS_AMD_CPU(env)) {
+return 0;
+}
+
 cpu_x86_version(env, , );
 if ((family == 6 && model >= 14) || family > 6) {
 return 1;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 42970ab046..810f586a59 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -582,17 +582,40 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr, int 
code)
 {
 CPUState *cs = CPU(cpu);
 CPUX86State *env = >env;
-uint64_t status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN |
-  MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S;
-uint64_t mcg_status = MCG_STATUS_MCIP;
+uint64_t status = MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_MISCV |
+  MCI_STATUS_ADDRV;
+uint64_t mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV;
 int flags = 0;
 
-if (code == BUS_MCEERR_AR) {
-status |= MCI_STATUS_AR | 0x134;
-mcg_status |= MCG_STATUS_RIPV | MCG_STATUS_EIPV;
+if (!IS_AMD_CPU(env)) {
+status |= MCI_STATUS_S | MCI_STATUS_UC;
+if (code == BUS_MCEERR_AR) {
+status |= MCI_STATUS_AR | 0x134;
+mcg_status |= MCG_STATUS_EIPV;
+} else {
+status |= 0xc0;
+}
 } else {
-status |= 0xc0;
-mcg_status |= MCG_STATUS_RIPV;
+if (code == BUS_MCEERR_AR) {
+status |= MCI_STATUS_UC | MCI_STATUS_POISON;
+mcg_status |= MCG_STATUS_EIPV;
+} else {
+/* Setting the POISON bit for deferred errors indicates to the
+ * guest kernel that the address provided by the MCE is valid
+ * and usable which will ensure that the guest kernel will send
+ * a SIGBUS_AO signal to the guest process. This allows for
+ * more desirable behavior in the case that the guest process
+ * with poisoned memory has set the MCE_KILL_EARLY prctl flag
+ * which indicates that the process would prefer to handle or
+ * shutdown due to the poisoned memory condition before the
+ * memory has been accessed.
+ *
+ * While the POISON bit would not be set in a deferred error
+ * sent from hardware, the bit is not meaningful for de

Re: [PATCH v4 0/3] Fix MCE handling on AMD hosts

2024-02-20 Thread John Allen
On Wed, Feb 07, 2024 at 11:21:05AM +, Joao Martins wrote:
> On 12/09/2023 22:18, John Allen wrote:
> > In the event that a guest process attempts to access memory that has
> > been poisoned in response to a deferred uncorrected MCE, an AMD system
> > will currently generate a SIGBUS error which will result in the entire
> > guest being shutdown. Ideally, we only want to kill the guest process
> > that accessed poisoned memory in this case.
> > 
> > This support has been included in qemu for Intel hosts for a long time,
> > but there are a couple of changes needed for AMD hosts. First, we will
> > need to expose the SUCCOR cpuid bit to guests. Second, we need to modify
> > the MCE injection code to avoid Intel specific behavior when we are
> > running on an AMD host.
> > 
> 
> Is there any update with respect to this series?
> 
> John's series should fix MCE injection on AMD; as today it is just crashing 
> the
> guest (sadly) when an MCE happens in the hypervisor.
> 
> William, Paolo, I think the sort-of-dependency(?) of this where we block
> migration if there was a poisoned page on is already in Peter's migration
> tree[1] (CC'ed). So perhaps this series just needs John to resend it given 
> that
> it's been a couple months since v4?

It looks like this series still applies cleanly to latest qemu, but I
can resend if needed.

Thanks,
John

> 
> [1]
> https://lore.kernel.org/qemu-devel/20240130190640.139364-2-william.ro...@oracle.com/
> 
> > v2:
> >   - Add "succor" feature word.
> >   - Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature.
> > 
> > v3:
> >   - Reorder series. Only enable SUCCOR after bugs have been fixed.
> >   - Introduce new patch ignoring AO errors.
> > 
> > v4:
> >   - Remove redundant check for AO errors.
> > 
> > John Allen (2):
> >   i386: Fix MCE support for AMD hosts
> >   i386: Add support for SUCCOR feature
> > 
> > William Roche (1):
> >   i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest
> > 
> >  target/i386/cpu.c | 18 +-
> >  target/i386/cpu.h |  4 
> >  target/i386/helper.c  |  4 
> >  target/i386/kvm/kvm.c | 28 
> >  4 files changed, 45 insertions(+), 9 deletions(-)
> > 
> 



[PATCH v4 0/3] Fix MCE handling on AMD hosts

2023-09-12 Thread John Allen
In the event that a guest process attempts to access memory that has
been poisoned in response to a deferred uncorrected MCE, an AMD system
will currently generate a SIGBUS error which will result in the entire
guest being shutdown. Ideally, we only want to kill the guest process
that accessed poisoned memory in this case.

This support has been included in qemu for Intel hosts for a long time,
but there are a couple of changes needed for AMD hosts. First, we will
need to expose the SUCCOR cpuid bit to guests. Second, we need to modify
the MCE injection code to avoid Intel specific behavior when we are
running on an AMD host.

v2:
  - Add "succor" feature word.
  - Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature.

v3:
  - Reorder series. Only enable SUCCOR after bugs have been fixed.
  - Introduce new patch ignoring AO errors.

v4:
  - Remove redundant check for AO errors.

John Allen (2):
  i386: Fix MCE support for AMD hosts
  i386: Add support for SUCCOR feature

William Roche (1):
  i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

 target/i386/cpu.c | 18 +-
 target/i386/cpu.h |  4 
 target/i386/helper.c  |  4 
 target/i386/kvm/kvm.c | 28 
 4 files changed, 45 insertions(+), 9 deletions(-)

-- 
2.39.3




[PATCH v4 1/3] i386: Fix MCE support for AMD hosts

2023-09-12 Thread John Allen
For the most part, AMD hosts can use the same MCE injection code as Intel, but
there are instances where the qemu implementation is Intel specific. First, MCE
delivery works differently on AMD and does not support broadcast. Second,
kvm_mce_inject generates MCEs that include a number of Intel specific status
bits. Modify kvm_mce_inject to properly generate MCEs on AMD platforms.

Reported-by: William Roche 
Signed-off-by: John Allen 
---
v3:
  - Update to latest qemu code that introduces using MCG_STATUS_RIPV in the
case of a BUS_MCEERR_AR on a non-AMD machine.
---
 target/i386/helper.c  |  4 
 target/i386/kvm/kvm.c | 17 +++--
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/target/i386/helper.c b/target/i386/helper.c
index 89aa696c6d..9547e2b09d 100644
--- a/target/i386/helper.c
+++ b/target/i386/helper.c
@@ -91,6 +91,10 @@ int cpu_x86_support_mca_broadcast(CPUX86State *env)
 int family = 0;
 int model = 0;
 
+if (IS_AMD_CPU(env)) {
+return 0;
+}
+
 cpu_x86_version(env, , );
 if ((family == 6 && model >= 14) || family > 6) {
 return 1;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 639a242ad8..5fce74aac5 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -590,16 +590,21 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr, int 
code)
 CPUState *cs = CPU(cpu);
 CPUX86State *env = >env;
 uint64_t status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN |
-  MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S;
+  MCI_STATUS_MISCV | MCI_STATUS_ADDRV;
 uint64_t mcg_status = MCG_STATUS_MCIP;
 int flags = 0;
 
-if (code == BUS_MCEERR_AR) {
-status |= MCI_STATUS_AR | 0x134;
-mcg_status |= MCG_STATUS_RIPV | MCG_STATUS_EIPV;
+if (!IS_AMD_CPU(env)) {
+status |= MCI_STATUS_S;
+if (code == BUS_MCEERR_AR) {
+status |= MCI_STATUS_AR | 0x134;
+mcg_status |= MCG_STATUS_RIPV | MCG_STATUS_EIPV;
+} else {
+status |= 0xc0;
+mcg_status |= MCG_STATUS_RIPV;
+}
 } else {
-status |= 0xc0;
-mcg_status |= MCG_STATUS_RIPV;
+mcg_status |= MCG_STATUS_EIPV | MCG_STATUS_RIPV;
 }
 
 flags = cpu_x86_support_mca_broadcast(env) ? MCE_INJECT_BROADCAST : 0;
-- 
2.39.3




[PATCH v4 3/3] i386: Add support for SUCCOR feature

2023-09-12 Thread John Allen
Add cpuid bit definition for the SUCCOR feature. This cpuid bit is required to
be exposed to guests to allow them to handle machine check exceptions on AMD
hosts.

Reported-by: William Roche 
Reviewed-by: Joao Martins 
Signed-off-by: John Allen 

v2:
  - Add "succor" feature word.
  - Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature.
---
 target/i386/cpu.c | 18 +-
 target/i386/cpu.h |  4 
 target/i386/kvm/kvm.c |  2 ++
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 00f913b638..d90d3a9489 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1029,6 +1029,22 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 .tcg_features = TCG_APM_FEATURES,
 .unmigratable_flags = CPUID_APM_INVTSC,
 },
+[FEAT_8000_0007_EBX] = {
+.type = CPUID_FEATURE_WORD,
+.feat_names = {
+NULL, "succor", NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+},
+.cpuid = { .eax = 0x8007, .reg = R_EBX, },
+.tcg_features = 0,
+.unmigratable_flags = 0,
+},
 [FEAT_8000_0008_EBX] = {
 .type = CPUID_FEATURE_WORD,
 .feat_names = {
@@ -6554,7 +6570,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 break;
 case 0x8007:
 *eax = 0;
-*ebx = 0;
+*ebx = env->features[FEAT_8000_0007_EBX];
 *ecx = 0;
 *edx = env->features[FEAT_8000_0007_EDX];
 break;
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index a6000e93bd..f5afc5e4fd 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -598,6 +598,7 @@ typedef enum FeatureWord {
 FEAT_7_1_EAX,   /* CPUID[EAX=7,ECX=1].EAX */
 FEAT_8000_0001_EDX, /* CPUID[8000_0001].EDX */
 FEAT_8000_0001_ECX, /* CPUID[8000_0001].ECX */
+FEAT_8000_0007_EBX, /* CPUID[8000_0007].EBX */
 FEAT_8000_0007_EDX, /* CPUID[8000_0007].EDX */
 FEAT_8000_0008_EBX, /* CPUID[8000_0008].EBX */
 FEAT_8000_0021_EAX, /* CPUID[8000_0021].EAX */
@@ -942,6 +943,9 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 /* Packets which contain IP payload have LIP values */
 #define CPUID_14_0_ECX_LIP  (1U << 31)
 
+/* RAS Features */
+#define CPUID_8000_0007_EBX_SUCCOR  (1U << 1)
+
 /* CLZERO instruction */
 #define CPUID_8000_0008_EBX_CLZERO  (1U << 0)
 /* Always save/restore FP error pointers */
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 7e9fc0cac5..15a642a894 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -477,6 +477,8 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t 
function,
  */
 cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
 ret |= cpuid_1_edx & CPUID_EXT2_AMD_ALIASES;
+} else if (function == 0x8007 && reg == R_EBX) {
+ret |= CPUID_8000_0007_EBX_SUCCOR;
 } else if (function == KVM_CPUID_FEATURES && reg == R_EAX) {
 /* kvm_pv_unhalt is reported by GET_SUPPORTED_CPUID, but it can't
  * be enabled without the in-kernel irqchip
-- 
2.39.3




[PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

2023-09-12 Thread John Allen
From: William Roche 

AMD guests can't currently deal with BUS_MCEERR_AO MCE injection
as it panics the VM kernel. We filter this event and provide a
warning message.

Signed-off-by: William Roche 
---
v3:
  - New patch
v4:
  - Remove redundant check for AO errors
---
 target/i386/kvm/kvm.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 5fce74aac5..7e9fc0cac5 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -604,6 +604,10 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr, int 
code)
 mcg_status |= MCG_STATUS_RIPV;
 }
 } else {
+if (code == BUS_MCEERR_AO) {
+/* XXX we don't support BUS_MCEERR_AO injection on AMD yet */
+return;
+}
 mcg_status |= MCG_STATUS_EIPV | MCG_STATUS_RIPV;
 }
 
@@ -668,8 +672,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void 
*addr)
 addr, paddr, "BUS_MCEERR_AR");
 } else {
  warn_report("Guest MCE Memory Error at QEMU addr %p and "
- "GUEST addr 0x%" HWADDR_PRIx " of type %s injected",
- addr, paddr, "BUS_MCEERR_AO");
+ "GUEST addr 0x%" HWADDR_PRIx " of type %s %s",
+ addr, paddr, "BUS_MCEERR_AO",
+ IS_AMD_CPU(env) ? "ignored on AMD guest" : "injected");
 }
 
 return;
-- 
2.39.3




[PATCH v3 0/3] Fix MCE handling on AMD hosts

2023-09-06 Thread John Allen
In the event that a guest process attempts to access memory that has
been poisoned in response to a deferred uncorrected MCE, an AMD system
will currently generate a SIGBUS error which will result in the entire
guest being shutdown. Ideally, we only want to kill the guest process
that accessed poisoned memory in this case.

This support has been included in qemu for Intel hosts for a long time,
but there are a couple of changes needed for AMD hosts. First, we will
need to expose the SUCCOR cpuid bit to guests. Second, we need to modify
the MCE injection code to avoid Intel specific behavior when we are
running on an AMD host.

v2:
  - Add "succor" feature word.
  - Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature.

v3:
  - Reorder series. Only enable SUCCOR after bugs have been fixed.
  - Introduce new patch ignoring AO errors.

John Allen (2):
  i386: Fix MCE support for AMD hosts
  i386: Add support for SUCCOR feature

William Roche (1):
  i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

 target/i386/cpu.c | 18 +-
 target/i386/cpu.h |  4 
 target/i386/helper.c  |  4 
 target/i386/kvm/kvm.c | 34 --
 4 files changed, 49 insertions(+), 11 deletions(-)

-- 
2.39.3




[PATCH v3 1/3] i386: Fix MCE support for AMD hosts

2023-09-06 Thread John Allen
For the most part, AMD hosts can use the same MCE injection code as Intel but,
there are instances where the qemu implementation is Intel specific. First, MCE
deliviery works differently on AMD and does not support broadcast. Second,
kvm_mce_inject generates MCEs that include a number of Intel specific status
bits. Modify kvm_mce_inject to properly generate MCEs on AMD platforms.

Reported-by: William Roche 
Signed-off-by: John Allen 
---
v3:
  - Update to latest qemu code that introduces using MCG_STATUS_RIPV in the
case of a BUS_MCEERR_AR on a non-AMD machine.
---
 target/i386/helper.c  |  4 
 target/i386/kvm/kvm.c | 17 +++--
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/target/i386/helper.c b/target/i386/helper.c
index 89aa696c6d..9547e2b09d 100644
--- a/target/i386/helper.c
+++ b/target/i386/helper.c
@@ -91,6 +91,10 @@ int cpu_x86_support_mca_broadcast(CPUX86State *env)
 int family = 0;
 int model = 0;
 
+if (IS_AMD_CPU(env)) {
+return 0;
+}
+
 cpu_x86_version(env, , );
 if ((family == 6 && model >= 14) || family > 6) {
 return 1;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 639a242ad8..5fce74aac5 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -590,16 +590,21 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr, int 
code)
 CPUState *cs = CPU(cpu);
 CPUX86State *env = >env;
 uint64_t status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN |
-  MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S;
+  MCI_STATUS_MISCV | MCI_STATUS_ADDRV;
 uint64_t mcg_status = MCG_STATUS_MCIP;
 int flags = 0;
 
-if (code == BUS_MCEERR_AR) {
-status |= MCI_STATUS_AR | 0x134;
-mcg_status |= MCG_STATUS_RIPV | MCG_STATUS_EIPV;
+if (!IS_AMD_CPU(env)) {
+status |= MCI_STATUS_S;
+if (code == BUS_MCEERR_AR) {
+status |= MCI_STATUS_AR | 0x134;
+mcg_status |= MCG_STATUS_RIPV | MCG_STATUS_EIPV;
+} else {
+status |= 0xc0;
+mcg_status |= MCG_STATUS_RIPV;
+}
 } else {
-status |= 0xc0;
-mcg_status |= MCG_STATUS_RIPV;
+mcg_status |= MCG_STATUS_EIPV | MCG_STATUS_RIPV;
 }
 
 flags = cpu_x86_support_mca_broadcast(env) ? MCE_INJECT_BROADCAST : 0;
-- 
2.39.3




[PATCH v3 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

2023-09-06 Thread John Allen
From: William Roche 

AMD guests can't currently deal with BUS_MCEERR_AO MCE injection
as it panics the VM kernel. We filter this event and provide a
warning message.

Signed-off-by: William Roche 
---
v3:
  - New patch
---
 target/i386/kvm/kvm.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 5fce74aac5..4d42d3ed4c 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -604,6 +604,10 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr, int 
code)
 mcg_status |= MCG_STATUS_RIPV;
 }
 } else {
+if (code == BUS_MCEERR_AO) {
+/* XXX we don't support BUS_MCEERR_AO injection on AMD yet */
+return;
+}
 mcg_status |= MCG_STATUS_EIPV | MCG_STATUS_RIPV;
 }
 
@@ -655,7 +659,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void 
*addr)
 if (ram_addr != RAM_ADDR_INVALID &&
 kvm_physical_memory_addr_from_host(c->kvm_state, addr, )) {
 kvm_hwpoison_page_add(ram_addr);
-kvm_mce_inject(cpu, paddr, code);
+if (!IS_AMD_CPU(env) || code != BUS_MCEERR_AO) {
+kvm_mce_inject(cpu, paddr, code);
+}
 
 /*
  * Use different logging severity based on error type.
@@ -668,8 +674,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void 
*addr)
 addr, paddr, "BUS_MCEERR_AR");
 } else {
  warn_report("Guest MCE Memory Error at QEMU addr %p and "
- "GUEST addr 0x%" HWADDR_PRIx " of type %s injected",
- addr, paddr, "BUS_MCEERR_AO");
+ "GUEST addr 0x%" HWADDR_PRIx " of type %s %s",
+ addr, paddr, "BUS_MCEERR_AO",
+ IS_AMD_CPU(env) ? "ignored on AMD guest" : "injected");
 }
 
 return;
-- 
2.39.3




[PATCH v3 3/3] i386: Add support for SUCCOR feature

2023-09-06 Thread John Allen
Add cpuid bit definition for the SUCCOR feature. This cpuid bit is required to
be exposed to guests to allow them to handle machine check exceptions on AMD
hosts.

Reported-by: William Roche 
Reviewed-by: Joao Martins 
Signed-off-by: John Allen 

v2:
  - Add "succor" feature word.
  - Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature.
---
 target/i386/cpu.c | 18 +-
 target/i386/cpu.h |  4 
 target/i386/kvm/kvm.c |  2 ++
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 00f913b638..d90d3a9489 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1029,6 +1029,22 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 .tcg_features = TCG_APM_FEATURES,
 .unmigratable_flags = CPUID_APM_INVTSC,
 },
+[FEAT_8000_0007_EBX] = {
+.type = CPUID_FEATURE_WORD,
+.feat_names = {
+NULL, "succor", NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+},
+.cpuid = { .eax = 0x8007, .reg = R_EBX, },
+.tcg_features = 0,
+.unmigratable_flags = 0,
+},
 [FEAT_8000_0008_EBX] = {
 .type = CPUID_FEATURE_WORD,
 .feat_names = {
@@ -6554,7 +6570,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 break;
 case 0x8007:
 *eax = 0;
-*ebx = 0;
+*ebx = env->features[FEAT_8000_0007_EBX];
 *ecx = 0;
 *edx = env->features[FEAT_8000_0007_EDX];
 break;
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index a6000e93bd..f5afc5e4fd 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -598,6 +598,7 @@ typedef enum FeatureWord {
 FEAT_7_1_EAX,   /* CPUID[EAX=7,ECX=1].EAX */
 FEAT_8000_0001_EDX, /* CPUID[8000_0001].EDX */
 FEAT_8000_0001_ECX, /* CPUID[8000_0001].ECX */
+FEAT_8000_0007_EBX, /* CPUID[8000_0007].EBX */
 FEAT_8000_0007_EDX, /* CPUID[8000_0007].EDX */
 FEAT_8000_0008_EBX, /* CPUID[8000_0008].EBX */
 FEAT_8000_0021_EAX, /* CPUID[8000_0021].EAX */
@@ -942,6 +943,9 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 /* Packets which contain IP payload have LIP values */
 #define CPUID_14_0_ECX_LIP  (1U << 31)
 
+/* RAS Features */
+#define CPUID_8000_0007_EBX_SUCCOR  (1U << 1)
+
 /* CLZERO instruction */
 #define CPUID_8000_0008_EBX_CLZERO  (1U << 0)
 /* Always save/restore FP error pointers */
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 4d42d3ed4c..0255863421 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -477,6 +477,8 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t 
function,
  */
 cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
 ret |= cpuid_1_edx & CPUID_EXT2_AMD_ALIASES;
+} else if (function == 0x8007 && reg == R_EBX) {
+ret |= CPUID_8000_0007_EBX_SUCCOR;
 } else if (function == KVM_CPUID_FEATURES && reg == R_EAX) {
 /* kvm_pv_unhalt is reported by GET_SUPPORTED_CPUID, but it can't
  * be enabled without the in-kernel irqchip
-- 
2.39.3




Re: [PATCH v2 0/2] Fix MCE handling on AMD hosts

2023-09-05 Thread John Allen via
0,8 +675,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void
> *addr)
>  addr, paddr, "BUS_MCEERR_AR");
>  } else {
>   warn_report("Guest MCE Memory Error at QEMU addr %p and "
> - "GUEST addr 0x%" HWADDR_PRIx " of type %s injected",
> - addr, paddr, "BUS_MCEERR_AO");
> + "GUEST addr 0x%" HWADDR_PRIx " of type %s %s",
> + addr, paddr, "BUS_MCEERR_AO",
> + IS_AMD_CPU(env) ? "ignored on AMD guest" :
> "injected");
>  }
> 
>  return;
> ---

Thanks, I think this will be a good solution for now while we can't
fully support AO errors. I will test this and include in the next
version of the series.

Thanks,
John

> 
> 
> I hope this can help.
> 
> William.
> 
> 
> On 7/26/23 22:41, John Allen wrote:
> > In the event that a guest process attempts to access memory that has
> > been poisoned in response to a deferred uncorrected MCE, an AMD system
> > will currently generate a SIGBUS error which will result in the entire
> > guest being shutdown. Ideally, we only want to kill the guest process
> > that accessed poisoned memory in this case.
> > 
> > This support has been included in qemu for Intel hosts for a long time,
> > but there are a couple of changes needed for AMD hosts. First, we will
> > need to expose the SUCCOR cpuid bit to guests. Second, we need to modify
> > the MCE injection code to avoid Intel specific behavior when we are
> > running on an AMD host.
> > 
> > v2:
> >- Add "succor" feature word.
> >- Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature.
> > 
> > John Allen (2):
> >i386: Add support for SUCCOR feature
> >i386: Fix MCE support for AMD hosts
> > 
> >   target/i386/cpu.c | 18 +-
> >   target/i386/cpu.h |  4 
> >   target/i386/helper.c  |  4 
> >   target/i386/kvm/kvm.c | 19 +--
> >   4 files changed, 38 insertions(+), 7 deletions(-)
> > 



Re: [PATCH v2 1/2] i386: Add support for SUCCOR feature

2023-09-05 Thread John Allen
On Fri, Sep 01, 2023 at 11:30:53AM +0100, Joao Martins wrote:
> On 26/07/2023 21:41, John Allen wrote:
> > Add cpuid bit definition for the SUCCOR feature. This cpuid bit is required 
> > to
> > be exposed to guests to allow them to handle machine check exceptions on AMD
> > hosts.
> > 
> > Reported-by: William Roche 
> > Signed-off-by: John Allen 
> 
> I think this is matching the last discussion:
> 
>   Reviewed-by: Joao Martins 
> 
> The patch ordering doesn't look correct though. Perhaps we should expose 
> succor
> only after MCE is fixed so this patch would be the second, not the first?

Yes, that makes sense. I will address this and send another version of
the series with the correct ordering.

> 
> Also, this should in generally be OK for -cpu host, but might be missing a 
> third
> patch that adds "succor" to the AMD models e.g.

Babu,

I think we previously discussed adding this to the models later in a
separate series. Is this your preferred course of action or can we add
it with this series?

Thanks,
John

> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 97ad229d8ba3..d132cb3e 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -4731,6 +4731,16 @@ static const X86CPUDefinition builtin_x86_defs[] = {
>  { /* end of list */ }
>  },
>  },
> +{
> +.version = 5,
> +.props = (PropValue[]) {
> +/* Erratum 1386 */
> +{ "model-id",
> +  "AMD EPYC-Rome-v5 Processor" },
> +{ "succor", "on" },
> +{ /* end of list */ }
> +},
> +},
>  { /* end of list */ }
>  }
>  },
> @@ -4806,6 +4816,16 @@ static const X86CPUDefinition builtin_x86_defs[] = {
>  },
>  .cache_info = _milan_v2_cache_info
>  },
> +{
> +.version = 3,
> +.props = (PropValue[]) {
> +/* Erratum 1386 */
> +{ "model-id",
> +  "AMD EPYC-Milan-v3 Processor" },
> +{ "succor", "on" },
> +{ /* end of list */ }
> +},
> +},
>  { /* end of list */ }
>  }
>  },
> @@ -4880,6 +4900,20 @@ static const X86CPUDefinition builtin_x86_defs[] = {
>  .xlevel = 0x8022,
>  .model_id = "AMD EPYC-Genoa Processor",
>  .cache_info = _genoa_cache_info,
> +.versions = (X86CPUVersionDefinition[]) {
> +{ .version = 1 },
> +{
> +.version = 2,
> +.props = (PropValue[]) {
> +/* Erratum 1386 */
> +{ "model-id",
> +  "AMD EPYC-Genoa-v2 Processor" },
> +{ "succor", "on" },
> +{ /* end of list */ }
> +},
> +},
> +{ /* end of list */ }
> +}
>  },



Re: [PATCH v2 2/2] i386: Fix MCE support for AMD hosts

2023-09-05 Thread John Allen via
On Thu, Aug 31, 2023 at 11:54:56PM +0200, William Roche wrote:
> John,
> 
> I also noticed something important about this specific code:
> 
> Qemu commit cb48748af7dfd7654323eb839d1f853ffa873652 introduced the use of
> the MCG_STATUS_RIPV in the case of a BUS_MCEERR_AR error, but it looks like
> your reference code is not having this flag.
> 
> According to me, we should keep this flag in the case of a non-AMD machine
> with a BUS_MCEERR_AR.
> 
> The patch should look something like that:
> 
> [...]
> -    if (code == BUS_MCEERR_AR) {
> -    status |= MCI_STATUS_AR | 0x134;
> -    mcg_status |= MCG_STATUS_RIPV | MCG_STATUS_EIPV;
> +    if (!IS_AMD_CPU(env)) {
> +    status |= MCI_STATUS_S;
> +    if (code == BUS_MCEERR_AR) {
> +    status |= MCI_STATUS_AR | 0x134;
> +    mcg_status |= MCG_STATUS_RIPV | MCG_STATUS_EIPV;
> [...]

Yes, that looks correct. I will fix this in the next version of the
series.

Thanks,
John

> 
> 
> Cheers,
> William.
> 
> 
> On 7/26/23 22:41, John Allen wrote:
> > For the most part, AMD hosts can use the same MCE injection code as Intel 
> > but,
> > there are instances where the qemu implementation is Intel specific. First, 
> > MCE
> > deliviery works differently on AMD and does not support broadcast. Second,
> > kvm_mce_inject generates MCEs that include a number of Intel specific status
> > bits. Modify kvm_mce_inject to properly generate MCEs on AMD platforms.
> > 
> > Reported-by: William Roche 
> > Signed-off-by: John Allen 
> > ---
> >   target/i386/helper.c  |  4 
> >   target/i386/kvm/kvm.c | 17 +++--
> >   2 files changed, 15 insertions(+), 6 deletions(-)
> > 
> > diff --git a/target/i386/helper.c b/target/i386/helper.c
> > index 533b29cb91..a6523858e0 100644
> > --- a/target/i386/helper.c
> > +++ b/target/i386/helper.c
> > @@ -76,6 +76,10 @@ int cpu_x86_support_mca_broadcast(CPUX86State *env)
> >   int family = 0;
> >   int model = 0;
> > +if (IS_AMD_CPU(env)) {
> > +return 0;
> > +}
> > +
> >   cpu_x86_version(env, , );
> >   if ((family == 6 && model >= 14) || family > 6) {
> >   return 1;
> > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> > index 4b62138459..87a50c8aaf 100644
> > --- a/target/i386/kvm/kvm.c
> > +++ b/target/i386/kvm/kvm.c
> > @@ -532,16 +532,21 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr, 
> > int code)
> >   CPUState *cs = CPU(cpu);
> >   CPUX86State *env = >env;
> >   uint64_t status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN |
> > -  MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S;
> > +  MCI_STATUS_MISCV | MCI_STATUS_ADDRV;
> >   uint64_t mcg_status = MCG_STATUS_MCIP;
> >   int flags = 0;
> > -if (code == BUS_MCEERR_AR) {
> > -status |= MCI_STATUS_AR | 0x134;
> > -mcg_status |= MCG_STATUS_EIPV;
> > +if (!IS_AMD_CPU(env)) {
> > +status |= MCI_STATUS_S;
> > +if (code == BUS_MCEERR_AR) {
> > +status |= MCI_STATUS_AR | 0x134;
> > +mcg_status |= MCG_STATUS_EIPV;
> > +} else {
> > +status |= 0xc0;
> > +mcg_status |= MCG_STATUS_RIPV;
> > +}
> >   } else {
> > -status |= 0xc0;
> > -mcg_status |= MCG_STATUS_RIPV;
> > +mcg_status |= MCG_STATUS_EIPV | MCG_STATUS_RIPV;
> >   }
> >   flags = cpu_x86_support_mca_broadcast(env) ? MCE_INJECT_BROADCAST : 0;



[PATCH v2 0/2] Fix MCE handling on AMD hosts

2023-07-26 Thread John Allen
In the event that a guest process attempts to access memory that has
been poisoned in response to a deferred uncorrected MCE, an AMD system
will currently generate a SIGBUS error which will result in the entire
guest being shutdown. Ideally, we only want to kill the guest process
that accessed poisoned memory in this case.

This support has been included in qemu for Intel hosts for a long time,
but there are a couple of changes needed for AMD hosts. First, we will
need to expose the SUCCOR cpuid bit to guests. Second, we need to modify
the MCE injection code to avoid Intel specific behavior when we are
running on an AMD host.

v2:
  - Add "succor" feature word.
  - Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature.

John Allen (2):
  i386: Add support for SUCCOR feature
  i386: Fix MCE support for AMD hosts

 target/i386/cpu.c | 18 +-
 target/i386/cpu.h |  4 
 target/i386/helper.c  |  4 
 target/i386/kvm/kvm.c | 19 +--
 4 files changed, 38 insertions(+), 7 deletions(-)

-- 
2.39.3




[PATCH v2 1/2] i386: Add support for SUCCOR feature

2023-07-26 Thread John Allen
Add cpuid bit definition for the SUCCOR feature. This cpuid bit is required to
be exposed to guests to allow them to handle machine check exceptions on AMD
hosts.

Reported-by: William Roche 
Signed-off-by: John Allen 
---
v2:
  - Add "succor" feature word.
  - Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature.
---
 target/i386/cpu.c | 18 +-
 target/i386/cpu.h |  4 
 target/i386/kvm/kvm.c |  2 ++
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 06009b80e8..9708785255 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -872,6 +872,22 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 },
 .tcg_features = TCG_7_1_EAX_FEATURES,
 },
+[FEAT_8000_0007_EBX] = {
+.type = CPUID_FEATURE_WORD,
+.feat_names = {
+NULL, "succor", NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+},
+.cpuid = { .eax = 0x8007, .reg = R_EBX, },
+.tcg_features = 0,
+.unmigratable_flags = 0,
+},
 [FEAT_8000_0007_EDX] = {
 .type = CPUID_FEATURE_WORD,
 .feat_names = {
@@ -5874,7 +5890,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 break;
 case 0x8007:
 *eax = 0;
-*ebx = 0;
+*ebx = env->features[FEAT_8000_0007_EBX];
 *ecx = 0;
 *edx = env->features[FEAT_8000_0007_EDX];
 break;
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 3edaad7688..fccb9b5a97 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -555,6 +555,7 @@ typedef enum FeatureWord {
 FEAT_7_1_EAX,   /* CPUID[EAX=7,ECX=1].EAX */
 FEAT_8000_0001_EDX, /* CPUID[8000_0001].EDX */
 FEAT_8000_0001_ECX, /* CPUID[8000_0001].ECX */
+FEAT_8000_0007_EBX, /* CPUID[8000_0007].EBX */
 FEAT_8000_0007_EDX, /* CPUID[8000_0007].EDX */
 FEAT_8000_0008_EBX, /* CPUID[8000_0008].EBX */
 FEAT_C000_0001_EDX, /* CPUID[C000_0001].EDX */
@@ -856,6 +857,9 @@ typedef uint64_t FeatureWordArray[FEATURE_WORDS];
 /* Packets which contain IP payload have LIP values */
 #define CPUID_14_0_ECX_LIP  (1U << 31)
 
+/* RAS Features */
+#define CPUID_8000_0007_EBX_SUCCOR  (1U << 1)
+
 /* CLZERO instruction */
 #define CPUID_8000_0008_EBX_CLZERO  (1U << 0)
 /* Always save/restore FP error pointers */
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index f25837f63f..4b62138459 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -417,6 +417,8 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t 
function,
  */
 cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
 ret |= cpuid_1_edx & CPUID_EXT2_AMD_ALIASES;
+} else if (function == 0x8007 && reg == R_EBX) {
+ret |= CPUID_8000_0007_EBX_SUCCOR;
 } else if (function == KVM_CPUID_FEATURES && reg == R_EAX) {
 /* kvm_pv_unhalt is reported by GET_SUPPORTED_CPUID, but it can't
  * be enabled without the in-kernel irqchip
-- 
2.39.3




[PATCH v2 2/2] i386: Fix MCE support for AMD hosts

2023-07-26 Thread John Allen
For the most part, AMD hosts can use the same MCE injection code as Intel but,
there are instances where the qemu implementation is Intel specific. First, MCE
deliviery works differently on AMD and does not support broadcast. Second,
kvm_mce_inject generates MCEs that include a number of Intel specific status
bits. Modify kvm_mce_inject to properly generate MCEs on AMD platforms.

Reported-by: William Roche 
Signed-off-by: John Allen 
---
 target/i386/helper.c  |  4 
 target/i386/kvm/kvm.c | 17 +++--
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/target/i386/helper.c b/target/i386/helper.c
index 533b29cb91..a6523858e0 100644
--- a/target/i386/helper.c
+++ b/target/i386/helper.c
@@ -76,6 +76,10 @@ int cpu_x86_support_mca_broadcast(CPUX86State *env)
 int family = 0;
 int model = 0;
 
+if (IS_AMD_CPU(env)) {
+return 0;
+}
+
 cpu_x86_version(env, , );
 if ((family == 6 && model >= 14) || family > 6) {
 return 1;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 4b62138459..87a50c8aaf 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -532,16 +532,21 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr, int 
code)
 CPUState *cs = CPU(cpu);
 CPUX86State *env = >env;
 uint64_t status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN |
-  MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S;
+  MCI_STATUS_MISCV | MCI_STATUS_ADDRV;
 uint64_t mcg_status = MCG_STATUS_MCIP;
 int flags = 0;
 
-if (code == BUS_MCEERR_AR) {
-status |= MCI_STATUS_AR | 0x134;
-mcg_status |= MCG_STATUS_EIPV;
+if (!IS_AMD_CPU(env)) {
+status |= MCI_STATUS_S;
+if (code == BUS_MCEERR_AR) {
+status |= MCI_STATUS_AR | 0x134;
+mcg_status |= MCG_STATUS_EIPV;
+} else {
+status |= 0xc0;
+mcg_status |= MCG_STATUS_RIPV;
+}
 } else {
-status |= 0xc0;
-mcg_status |= MCG_STATUS_RIPV;
+mcg_status |= MCG_STATUS_EIPV | MCG_STATUS_RIPV;
 }
 
 flags = cpu_x86_support_mca_broadcast(env) ? MCE_INJECT_BROADCAST : 0;
-- 
2.39.3




Re: [PATCH 1/2] i386: Add support for SUCCOR feature

2023-07-12 Thread John Allen
On Fri, Jul 07, 2023 at 04:25:22PM +0200, Paolo Bonzini wrote:
> On 7/6/23 21:40, John Allen wrote:
> >   case 0x8007:
> >   *eax = 0;
> > -*ebx = 0;
> > +*ebx = env->features[FEAT_8000_0007_EBX] | 
> > CPUID_8000_0007_EBX_SUCCOR;
> >   *ecx = 0;
> >   *edx = env->features[FEAT_8000_0007_EDX];
> >   break;
> 
> I agree that it needs no hypervisor support, but Babu is right that you
> cannot add it unconditionally (especially not on Intel processors).
> 
> You can special case CPUID_8000_0007_EBX_SUCCOR in
> kvm_arch_get_supported_cpuid() so that it is added even on old kernels.
> There are already several such cases.  Adding it to KVM is nice to have
> anyway, so please send a patch for that.

By adding it to KVM do you mean adding a patch to the kernel to expose
the cpuid bit? Or do you mean just adding the special case to
kvm_arch_get_supported_cpuid?

For the kvm_arch_get_supported_cpuid case, I don't understand how this
would be different from unconditionally exposing the bit as done above.
Can you help me understand what you have in mind for this?

I might add a case like below:
...
} else if (function == 0x8007 && reg == R_EBX) {
ret |= CPUID_8000_0007_EBX_SUCCOR;
...

If we wanted to only expose the bit for AMD cpus, we would then need to
call IS_AMD_CPU with the CPUX86State as a paramter which would mean that
kvm_arch_get_supported_cpuid and all of its callers would need to take
the CPUX86State as a parameter. Is there another way to differentiate
between AMD and Intel cpus in this case?

> 
> Also, the patch does not compile (probably you missed a prerequisite) as it
> lacks all the rigamarole that is needed to add FEAT_8000_0007_EBX.

I'm not encountering any compilation issues. What are the errors that
you are seeing?

Thanks,
John



[PATCH 1/2] i386: Add support for SUCCOR feature

2023-07-06 Thread John Allen
Add cpuid bit definition for the SUCCOR feature. This cpuid bit is required to
be exposed to guests to allow them to handle machine check exceptions on AMD
hosts.

Reported-by: William Roche 
Signed-off-by: John Allen 
---
 target/i386/cpu.c | 2 +-
 target/i386/cpu.h | 4 
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 06009b80e8..09fae9337a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5874,7 +5874,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 break;
 case 0x8007:
 *eax = 0;
-*ebx = 0;
+*ebx = env->features[FEAT_8000_0007_EBX] | CPUID_8000_0007_EBX_SUCCOR;
 *ecx = 0;
 *edx = env->features[FEAT_8000_0007_EDX];
 break;
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 3edaad7688..fccb9b5a97 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -555,6 +555,7 @@ typedef enum FeatureWord {
 FEAT_7_1_EAX,   /* CPUID[EAX=7,ECX=1].EAX */
 FEAT_8000_0001_EDX, /* CPUID[8000_0001].EDX */
 FEAT_8000_0001_ECX, /* CPUID[8000_0001].ECX */
+FEAT_8000_0007_EBX, /* CPUID[8000_0007].EBX */
 FEAT_8000_0007_EDX, /* CPUID[8000_0007].EDX */
 FEAT_8000_0008_EBX, /* CPUID[8000_0008].EBX */
 FEAT_C000_0001_EDX, /* CPUID[C000_0001].EDX */
@@ -856,6 +857,9 @@ typedef uint64_t FeatureWordArray[FEATURE_WORDS];
 /* Packets which contain IP payload have LIP values */
 #define CPUID_14_0_ECX_LIP  (1U << 31)
 
+/* RAS Features */
+#define CPUID_8000_0007_EBX_SUCCOR  (1U << 1)
+
 /* CLZERO instruction */
 #define CPUID_8000_0008_EBX_CLZERO  (1U << 0)
 /* Always save/restore FP error pointers */
-- 
2.39.3




[PATCH 2/2] i386: Fix MCE support for AMD hosts

2023-07-06 Thread John Allen
For the most part, AMD hosts can use the same MCE injection code as Intel but,
there are instances where the qemu implementation is Intel specific. First, MCE
deliviery works differently on AMD and does not support broadcast. Second,
kvm_mce_inject generates MCEs that include a number of Intel specific status
bits. Modify kvm_mce_inject to properly generate MCEs on AMD platforms.

Reported-by: William Roche 
Signed-off-by: John Allen 
---
 target/i386/helper.c  |  4 
 target/i386/kvm/kvm.c | 17 +++--
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/target/i386/helper.c b/target/i386/helper.c
index 533b29cb91..a6523858e0 100644
--- a/target/i386/helper.c
+++ b/target/i386/helper.c
@@ -76,6 +76,10 @@ int cpu_x86_support_mca_broadcast(CPUX86State *env)
 int family = 0;
 int model = 0;
 
+if (IS_AMD_CPU(env)) {
+return 0;
+}
+
 cpu_x86_version(env, , );
 if ((family == 6 && model >= 14) || family > 6) {
 return 1;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index f25837f63f..63bd7a7d3a 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -530,16 +530,21 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr, int 
code)
 CPUState *cs = CPU(cpu);
 CPUX86State *env = >env;
 uint64_t status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN |
-  MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S;
+  MCI_STATUS_MISCV | MCI_STATUS_ADDRV;
 uint64_t mcg_status = MCG_STATUS_MCIP;
 int flags = 0;
 
-if (code == BUS_MCEERR_AR) {
-status |= MCI_STATUS_AR | 0x134;
-mcg_status |= MCG_STATUS_EIPV;
+if (!IS_AMD_CPU(env)) {
+status |= MCI_STATUS_S;
+if (code == BUS_MCEERR_AR) {
+status |= MCI_STATUS_AR | 0x134;
+mcg_status |= MCG_STATUS_EIPV;
+} else {
+status |= 0xc0;
+mcg_status |= MCG_STATUS_RIPV;
+}
 } else {
-status |= 0xc0;
-mcg_status |= MCG_STATUS_RIPV;
+mcg_status |= MCG_STATUS_EIPV | MCG_STATUS_RIPV;
 }
 
 flags = cpu_x86_support_mca_broadcast(env) ? MCE_INJECT_BROADCAST : 0;
-- 
2.39.3




[PATCH 0/2] Fix MCE handling on AMD hosts

2023-07-06 Thread John Allen
In the event that a guest process attempts to access memory that has
been poisoned in response to a deferred uncorrected MCE, an AMD system
will currently generate a SIGBUS error which will result in the entire
guest being shutdown. Ideally, we only want to kill the guest process
that accessed poisoned memory in this case.

This support has been included in qemu for Intel hosts for a long time,
but there are a couple of changes needed for AMD hosts. First, we will
need to expose the SUCCOR cpuid bit to guests. Second, we need to modify
the MCE injection code to avoid Intel specific behavior when we are
running on an AMD host.

John Allen (2):
  i386: Add support for SUCCOR feature
  i386: Fix MCE support for AMD hosts

 target/i386/cpu.c |  2 +-
 target/i386/cpu.h |  4 
 target/i386/helper.c  |  4 
 target/i386/kvm/kvm.c | 17 +++--
 4 files changed, 20 insertions(+), 7 deletions(-)

-- 
2.39.3