On Sun, Oct 08, 2017 at 02:29:26PM +0530, Aravinda Prasad wrote: > > > On Wednesday 04 October 2017 06:59 AM, David Gibson wrote: > > On Thu, Sep 28, 2017 at 04:08:10PM +0530, Aravinda Prasad wrote: > >> Memory error such as bit flips that cannot be corrected > >> by hardware are passed on to the kernel for handling. > >> If the memory address in error belongs to guest then > >> the guest kernel is responsible for taking suitable action. > >> Patch [1] enhances KVM to exit guest with exit reason > >> set to KVM_EXIT_NMI in such cases. > >> > >> This patch handles KVM_EXIT_NMI exit. If the guest OS > >> has registered the machine check handling routine by > >> calling "ibm,nmi-register", then the handler builds > >> the error log and invokes the registered handler else > >> invokes the handler at 0x200. > >> > >> Note that FWNMI handles synchronous machine check exceptions > >> triggered by the hardware and hence we do not extend > >> such support to the "nmi" command available in the QEMU > >> monitor. Hence, "nmi" command from the monitor will > >> always go through 0x200 vector. > >> > >> [1] https://www.spinics.net/lists/kvm-ppc/msg12637.html > >> (e20bbd3d and related commits) > > > > What does happen on KVM if an asynchronous machine check exception > > occurs while in the guest? Or under PowerVM for that matter. > > AFAIK asynchronous errors take a different path in KVM as it can happen > in a different process context.
Well, obviously, I'm wondering what impact it will have on the guest, one way or another. [snip] > >> +ssize_t spapr_get_rtas_size(void) > >> +{ > >> + return RTAS_ERRLOG_OFFSET + sizeof(struct rtas_event_log_mce); > > > > Erm.. because of the definition of rtas_event_log_mce, this only > > allows for 1 byte of extended log buffer. That doesn't seem right. > > This is directly taken from the kernel's RTAS log (struct rtas_error_log > in arch/powerpc/include/asm/rtas.h). I am not sure why they use 1 byte > extended log buffer. I think you'd better find out, then. [snip] > >> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h > >> index 28b6e2e..a75e9cf 100644 > >> --- a/include/hw/ppc/spapr.h > >> +++ b/include/hw/ppc/spapr.h > >> @@ -556,6 +556,9 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, > >> target_ulong opcode, > >> #define DIAGNOSTICS_RUN_MODE_IMMEDIATE 2 > >> #define DIAGNOSTICS_RUN_MODE_PERIODIC 3 > >> > >> +/* Offset from rtas-base where error log is placed */ > >> +#define RTAS_ERRLOG_OFFSET 0x200 > > > > Is there any particular rationale for this offset? Our actual RTAS > > code is 20 bytes, much smaller than this. > > Just to ensure some space if in case RTAS code needs to be extended in > future. Hm, but IIUC, we control both sides here. qemu puts the log into the RTAS buffer at a particular offset, and qemu tells the guest where to find it at a particular offset within the RTAS buffer. So, if we need to extend the RTAS code (unlikely) we can increase our offset, and the guest will be none the wiser. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature