On Mon, Mar 25, 2019 at 01:31:12PM +0530, Aravinda Prasad wrote: > > > On Monday 25 March 2019 11:52 AM, David Gibson wrote: > > On Fri, Mar 22, 2019 at 12:03:58PM +0530, Aravinda Prasad wrote: > >> Memory error such as bit flips that cannot be corrected > >> by hardware are passed on to the kernel for handling. > >> If the memory address in error belongs to guest then > >> the guest kernel is responsible for taking suitable action. > >> Patch [1] enhances KVM to exit guest with exit reason > >> set to KVM_EXIT_NMI in such cases. This patch handles > >> KVM_EXIT_NMI exit. > >> > >> [1] https://www.spinics.net/lists/kvm-ppc/msg12637.html > >> (e20bbd3d and related commits) > >> > >> Signed-off-by: Aravinda Prasad <aravi...@linux.vnet.ibm.com> > >> --- > >> hw/ppc/spapr_events.c | 22 ++++++++++++++++++++++ > >> include/hw/ppc/spapr.h | 1 + > >> target/ppc/kvm.c | 16 ++++++++++++++++ > >> target/ppc/kvm_ppc.h | 2 ++ > >> 4 files changed, 41 insertions(+) > >> > >> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c > >> index ae0f093..e7a24ad 100644 > >> --- a/hw/ppc/spapr_events.c > >> +++ b/hw/ppc/spapr_events.c > >> @@ -620,6 +620,28 @@ void > >> spapr_hotplug_req_remove_by_count_indexed(SpaprDrcType drc_type, > >> RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, > >> &drc_id); > >> } > >> > >> +void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered) > >> +{ > >> + SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine()); > >> + > >> + while (spapr->mc_status != -1) { > >> + /* > >> + * Check whether the same CPU got machine check error > >> + * while still handling the mc error (i.e., before > >> + * that CPU called "ibm,nmi-interlock" > >> + */ > >> + if (spapr->mc_status == cpu->vcpu_id) { > >> + qemu_system_guest_panicked(NULL); > >> + } > >> + qemu_cond_wait_iothread(&spapr->mc_delivery_cond); > >> + /* If the system is reset meanwhile, then just return */ > >> + if (spapr->mc_reset) { > > > > I don't really see what this accomplishes. IIUC mc_reset is true from > > reset time until nmi-register is called. Which means you could just > > check for guest_mnachine_check_addre being unset - in which case don't > > you need to fallback to the old machine check behaviour anyway? > > We can check for guest_machine_check_addr being unset instead of mc_reset. > > Do we need any kind of memory barriers to ensure that this thread reads > the updated guest_machine_check_addr/mc_reset?
IIUC these variables are all protected by the global qemu mutex, so that should include the necessary memory barriers already. > We don't have to fallback to the old machine check behavior, because > guest_machine_check_addr is unset only during system reset. Yes... which means that between reset and re-registering, we should be using the old machine check behaviour, yes? Which is exactly this situation. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature