On 2014-06-17 16:23:58 Tue, Paul Mackerras wrote: > On Wed, Jun 11, 2014 at 02:18:21PM +0530, Mahesh J Salgaonkar wrote: > > From: Mahesh Salgaonkar <mah...@linux.vnet.ibm.com> > > > > Currently we forward MCEs to guest which have been recovered by guest. > > And for unhandled errors we do not deliver the MCE to guest. It looks like > > with no support of FWNMI in qemu, guest just panics whenever we deliver the > > recovered MCEs to guest. Also, the existig code used to return to host for > > unhandled errors which was casuing guest to hang with soft lockups inside > > guest and makes it difficult to recover guest instance. > > > > This patch now forwards all fatal MCEs to guest causing guest to > > crash/panic. > > And, for recovered errors we just go back to normal functioning of guest > > instead of returning to host. > > ... having corrupted possibly live values that the guest had in SRR0/1. > > Ideally the guest should have cleared MSR[RI] before putting values in > SRR0/1, so perhaps you could check that and return to the guest > without giving it a machine check if MSR[RI] is set. But if MSR[RI] > is clear, the guest is unfixably corrupted because the machine check > overwrote SRR0/1, and the only thing we can do, in the absence of > FWNMI support, is give the guest a machine check interrupt and let it > crash.
Yes agree. I have patch (below) ready for the same, will test/verify and send it out soon. Thanks, -Mahesh. ------------- Deliver machine check with MSR(RI=0) to guest as MCE From: Mahesh Salgaonkar <mah...@linux.vnet.ibm.com> --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 868347e..c9c56ee 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -2257,7 +2257,6 @@ machine_check_realmode: mr r3, r9 /* get vcpu pointer */ bl kvmppc_realmode_machine_check nop - cmpdi r3, 0 /* Did we handle MCE ? */ ld r9, HSTATE_KVM_VCPU(r13) li r12, BOOK3S_INTERRUPT_MACHINE_CHECK /* @@ -2270,13 +2269,18 @@ machine_check_realmode: * The old code used to return to host for unhandled errors which * was causing guest to hang with soft lockups inside guest and * makes it difficult to recover guest instance. + * + * if we receive machine check with MSR(RI=0) then deliver it to + * guest as machine check causing guest to crash. */ - ld r10, VCPU_PC(r9) ld r11, VCPU_MSR(r9) + andi. r10, r11, MSR_RI /* check for unrecoverable exception */ + beq 1f /* Deliver a machine check to guest */ + ld r10, VCPU_PC(r9) + cmpdi r3, 0 /* Did we handle MCE ? */ bne 2f /* Continue guest execution. */ /* If not, deliver a machine check. SRR0/1 are already set */ - li r10, BOOK3S_INTERRUPT_MACHINE_CHECK - ld r11, VCPU_MSR(r9) +1: li r10, BOOK3S_INTERRUPT_MACHINE_CHECK bl kvmppc_msr_interrupt 2: b fast_interrupt_c_return _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev