[PATCH] KVM: PPC: Exit guest upon fatal machine check exception
This patch modifies KVM to cause a guest exit with KVM_EXIT_NMI instead of immediately delivering a 0x200 interrupt to guest upon machine check exception in guest address. Exiting the guest enables QEMU to build error log and deliver machine check exception to guest OS (either via guest OS registered machine check handler or via 0x200 guest OS interrupt vector). This approach simplifies the delivering of machine check exception to guest OS compared to the earlier approach of KVM directly invoking 0x200 guest interrupt vector. In the earlier approach QEMU patched the 0x200 interrupt vector during boot. The patched code at 0x200 issued a private hcall to pass the control to QEMU to build the error log. This design/approach is based on the feedback for the QEMU patches to handle machine check exception. Details of earlier approach of handling machine check exception in QEMU and related discussions can be found at: https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html Signed-off-by: Aravinda Prasad--- arch/powerpc/kvm/book3s_hv.c| 12 +++- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 31 +++ 2 files changed, 14 insertions(+), 29 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 2280497..1b1dff0 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -859,15 +859,9 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu, r = RESUME_GUEST; break; case BOOK3S_INTERRUPT_MACHINE_CHECK: - /* -* Deliver a machine check interrupt to the guest. -* We have to do this, even if the host has handled the -* machine check, because machine checks use SRR0/1 and -* the interrupt might have trashed guest state in them. -*/ - kvmppc_book3s_queue_irqprio(vcpu, - BOOK3S_INTERRUPT_MACHINE_CHECK); - r = RESUME_GUEST; + /* Exit to guest with KVM_EXIT_NMI as exit reason */ + run->exit_reason = KVM_EXIT_NMI; + r = RESUME_HOST; break; case BOOK3S_INTERRUPT_PROGRAM: { diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index b98889e..672b4f6 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -147,7 +147,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) addir1, r1, 112 ld r7, HSTATE_HOST_MSR(r13) - cmpwi cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK cmpwi r12, BOOK3S_INTERRUPT_EXTERNAL beq 11f cmpwi cr2, r12, BOOK3S_INTERRUPT_HMI @@ -160,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) mtmsrd r6, 1 /* Clear RI in MSR */ mtsrr0 r8 mtsrr1 r7 - beq cr1, 13f/* machine check */ RFI /* On POWER7, we have external interrupts set to use HSRR0/1 */ @@ -168,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) mtspr SPRN_HSRR1, r7 ba 0x500 -13:b machine_check_fwnmi - 14:mtspr SPRN_HSRR0, r8 mtspr SPRN_HSRR1, r7 b hmi_exception_after_realmode @@ -2381,24 +2377,19 @@ machine_check_realmode: ld r9, HSTATE_KVM_VCPU(r13) li r12, BOOK3S_INTERRUPT_MACHINE_CHECK /* -* Deliver unhandled/fatal (e.g. UE) MCE errors to guest through -* machine check interrupt (set HSRR0 to 0x200). And for handled -* errors (no-fatal), just go back to guest execution with current -* HSRR0 instead of exiting guest. This new approach will inject -* machine check to guest for fatal error causing guest to crash. -* -* The old code used to return to host for unhandled errors which -* was causing guest to hang with soft lockups inside guest and -* makes it difficult to recover guest instance. +* Deliver unhandled/fatal (e.g. UE) MCE errors to guest +* by exiting the guest with KVM_EXIT_NMI exit reason (exit +* reason set later based on trap). For handled errors +* (no-fatal), go back to guest execution with current HSRR0 +* instead of exiting the guest. This approach will cause +* the guest to exit in case of fatal machine check error. */ - ld r10, VCPU_PC(r9) + bne 2f /* Continue guest execution? */ + /* If not, exit the guest. SRR0/1 are already set */ + b mc_cont +2: ld r10, VCPU_PC(r9) ld r11, VCPU_MSR(r9) - bne 2f /* Continue guest execution. */ - /* If not, deliver a machine check. SRR0/1 are already set */ - li r10, BOOK3S_INTERRUPT_MACHINE_CHECK -
Re: [PATCH] KVM: PPC: Exit guest upon fatal machine check exception
> So, IIUC. Once the qemu pieces are in place as well it shouldn't > change this behaviour: KVM will exit to qemu, qemu will log the error > information (new), then reinject the MC to the guest which can still > handle it as you describe above. Ah, that makes *much* more sense now! Thanks for the explanation: I don't really follow qemu development. > > But, there could be a problem if you have a new kernel with an old > qemu, in that case qemu might not understand the new exit type and > treat it as a fatal error, even though the guest could actually cope > with it. > > Aravinda, do we need to change this so that qemu has to explicitly > enable the new NMI behaviour? Or have I missed something that will > make that case work already. Yeah, it would be good not to break this. Regards, Daniel > -- > David Gibson | I'll have my music baroque, and my code > david AT gibson.dropbear.id.au| minimalist, thank you. NOT _the_ > _other_ > | _way_ _around_! > http://www.ozlabs.org/~dgibson -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Exit guest upon fatal machine check exception
On Thu, Nov 12, 2015 at 10:02:10AM +0530, Aravinda Prasad wrote: > > > On Thursday 12 November 2015 09:08 AM, David Gibson wrote: > > On Thu, Nov 12, 2015 at 01:24:19PM +1100, Daniel Axtens wrote: > >> Aravinda Prasadwrites: > >> > >>> This patch modifies KVM to cause a guest exit with > >>> KVM_EXIT_NMI instead of immediately delivering a 0x200 > >>> interrupt to guest upon machine check exception in > >>> guest address. Exiting the guest enables QEMU to build > >>> error log and deliver machine check exception to guest > >>> OS (either via guest OS registered machine check > >>> handler or via 0x200 guest OS interrupt vector). > >>> > >>> This approach simplifies the delivering of machine > >>> check exception to guest OS compared to the earlier approach > >>> of KVM directly invoking 0x200 guest interrupt vector. > >>> In the earlier approach QEMU patched the 0x200 interrupt > >>> vector during boot. The patched code at 0x200 issued a > >>> private hcall to pass the control to QEMU to build the > >>> error log. > >>> > >>> This design/approach is based on the feedback for the > >>> QEMU patches to handle machine check exception. Details > >>> of earlier approach of handling machine check exception > >>> in QEMU and related discussions can be found at: > >>> > >>> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html > >> > >> I've poked at the MCE code, but not the KVM MCE code, so I may be > >> mistaken here, but I'm not clear on how this handles errors that the > >> guest can recover without terminating. > >> > >> For example, a Linux guest can handle a UE in guest userspace by killing > >> the guest process. A hypthetical non-linux guest with a microkernel > >> could even survive UEs in drivers. > >> > >> It sounds from your patch like you're changing this behaviour. Is this > >> right? > > > > So, IIUC. Once the qemu pieces are in place as well it shouldn't > > change this behaviour: KVM will exit to qemu, qemu will log the error > > information (new), then reinject the MC to the guest which can still > > handle it as you describe above. > > Yes. With KVM and QEMU both in place this will not change the behavior. > QEMU will inject the UE to guest and the guest handles the UE based on > where it occurred. For example if an UE happens in a guest process > address space, that process will be killed. > > > > > But, there could be a problem if you have a new kernel with an old > > qemu, in that case qemu might not understand the new exit type and > > treat it as a fatal error, even though the guest could actually cope > > with it. > > In case of new kernel and old QEMU, the guest terminates as old QEMU > does not understand the NMI exit reason. However, this is the case with > old kernel and old QEMU as they do not handle UE belonging to guest. The > difference is that the guest kernel terminates with different error > code. Ok.. assuming the guest has code to handle the UE in 0x200, why would the guest terminate with old kernel and old qemu? I haven't quite followed the logic. > > old kernel and old QEMU -> guest panics [1] irrespective of where UE >happened in guest address space. > old kernel and new QEMU -> guest panics. same as above. > new kernel and old QEMU -> guest terminates with unhanded NMI error >irrespective of where UE happened in guest > new kernel and new QEMU -> guest handles UEs in process address space >by killing the process. guest terminates >for UEs in guest kernel address space. > > [1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-June/118329.html > > > > > Aravinda, do we need to change this so that qemu has to explicitly > > enable the new NMI behaviour? Or have I missed something that will > > make that case work already. > > I think we don't need to explicitly enable the new behavior. With new > kernel and new QEMU this should just work. As mentioned above this is > already broken for old kernel/QEMU. Any thoughts? > > Regards, > Aravinda > > > > > > > > > ___ > > Linuxppc-dev mailing list > > linuxppc-...@lists.ozlabs.org > > https://lists.ozlabs.org/listinfo/linuxppc-dev > > > -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
[PATCH 1/2] KVM: PPC: Book3S HV: Prohibit setting illegal transaction state in MSR
Currently it is possible for userspace (e.g. QEMU) to set a value for the MSR for a guest VCPU which has both of the TS bits set, which is an illegal combination. The result of this is that when we execute a hrfid (hypervisor return from interrupt doubleword) instruction to enter the guest, the CPU will take a TM Bad Thing type of program interrupt (vector 0x700). Now, if PR KVM is configured in the kernel along with HV KVM, we actually handle this without crashing the host or giving hypervisor privilege to the guest; instead what happens is that we deliver a program interrupt to the guest, with SRR0 reflecting the address of the hrfid instruction and SRR1 containing the MSR value at that point. If PR KVM is not configured in the kernel, then we try to run the host's program interrupt handler with the MMU set to the guest context, which almost certainly causes a host crash. This closes the hole by making kvmppc_set_msr_hv() check for the illegal combination and force the TS field to a safe value (00, meaning non-transactional). Signed-off-by: Paul Mackerras--- arch/powerpc/kvm/book3s_hv.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index becad3a..f668712 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -231,6 +231,12 @@ static void kvmppc_core_vcpu_put_hv(struct kvm_vcpu *vcpu) static void kvmppc_set_msr_hv(struct kvm_vcpu *vcpu, u64 msr) { + /* +* Check for illegal transactional state bit combination +* and if we find it, force the TS field to a safe state. +*/ + if ((msr & MSR_TS_MASK) == MSR_TS_MASK) + msr &= ~MSR_TS_MASK; vcpu->arch.shregs.msr = msr; kvmppc_end_cede(vcpu); } -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] powerpc/64: Include KVM guest test in all interrupt vectors
Currently, if HV KVM is configured but PR KVM isn't, we don't include a test to see whether we were interrupted in KVM guest context for the set of interrupts which get delivered directly to the guest by hardware if they occur in the guest. This includes things like program interrupts. However, the recent bug where userspace could set the MSR for a VCPU to have an illegal value in the TS field, and thus cause a TM Bad Thing type of program interrupt on the hrfid that enters the guest, showed that we can never be completely sure that these interrupts can never occur in the guest entry/exit code. If one of these interrupts does happen and we have HV KVM configured but not PR KVM, then we end up trying to run the handler in the host with the MMU set to the guest MMU context, which generally ends badly. Thus, for robustness it is better to have the test in every interrupt vector, so that if some way is found to trigger some interrupt in the guest entry/exit path, we can handle it without immediately crashing the host. This means that the distinction between KVMTEST and KVMTEST_PR goes away. Thus we delete KVMTEST_PR and associated macros and use KVMTEST everywhere that we previously used either KVMTEST_PR or KVMTEST. It also means that SOFTEN_TEST_HV_201 becomes the same as SOFTEN_TEST_PR, so we deleted SOFTEN_TEST_HV_201 and use SOFTEN_TEST_PR instead. Signed-off-by: Paul Mackerras--- arch/powerpc/include/asm/exception-64s.h | 21 +++- arch/powerpc/kernel/exceptions-64s.S | 34 2 files changed, 20 insertions(+), 35 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index 77f52b2..9ee1078 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -263,17 +263,6 @@ do_kvm_##n: \ #define KVM_HANDLER_SKIP(area, h, n) #endif -#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE -#define KVMTEST_PR(n) __KVMTEST(n) -#define KVM_HANDLER_PR(area, h, n) __KVM_HANDLER(area, h, n) -#define KVM_HANDLER_PR_SKIP(area, h, n)__KVM_HANDLER_SKIP(area, h, n) - -#else -#define KVMTEST_PR(n) -#define KVM_HANDLER_PR(area, h, n) -#define KVM_HANDLER_PR_SKIP(area, h, n) -#endif - #define NOTEST(n) /* @@ -360,13 +349,13 @@ label##_pSeries: \ HMT_MEDIUM_PPR_DISCARD; \ SET_SCRATCH0(r13); /* save r13 */ \ EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, label##_common,\ -EXC_STD, KVMTEST_PR, vec) +EXC_STD, KVMTEST, vec) /* Version of above for when we have to branch out-of-line */ #define STD_EXCEPTION_PSERIES_OOL(vec, label) \ .globl label##_pSeries; \ label##_pSeries: \ - EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST_PR, vec);\ + EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST, vec); \ EXCEPTION_PROLOG_PSERIES_1(label##_common, EXC_STD) #define STD_EXCEPTION_HV(loc, vec, label) \ @@ -436,17 +425,13 @@ label##_relon_hv: \ #define _SOFTEN_TEST(h, vec) __SOFTEN_TEST(h, vec) #define SOFTEN_TEST_PR(vec)\ - KVMTEST_PR(vec);\ + KVMTEST(vec); \ _SOFTEN_TEST(EXC_STD, vec) #define SOFTEN_TEST_HV(vec)\ KVMTEST(vec); \ _SOFTEN_TEST(EXC_HV, vec) -#define SOFTEN_TEST_HV_201(vec) \ - KVMTEST(vec); \ - _SOFTEN_TEST(EXC_STD, vec) - #define SOFTEN_NOTEST_PR(vec) _SOFTEN_TEST(EXC_STD, vec) #define SOFTEN_NOTEST_HV(vec) _SOFTEN_TEST(EXC_HV, vec) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 0a0399c2..1a03142 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -242,7 +242,7 @@ instruction_access_slb_pSeries: HMT_MEDIUM_PPR_DISCARD SET_SCRATCH0(r13) EXCEPTION_PROLOG_0(PACA_EXSLB) - EXCEPTION_PROLOG_1(PACA_EXSLB, KVMTEST_PR, 0x480) + EXCEPTION_PROLOG_1(PACA_EXSLB, KVMTEST, 0x480) std r3,PACA_EXSLB+EX_R3(r13) mfspr r3,SPRN_SRR0/* SRR0 is faulting address */ #ifdef __DISABLED__ @@ -276,18 +276,18 @@ hardware_interrupt_hv: KVM_HANDLER(PACA_EXGEN, EXC_HV, 0x502) FTR_SECTION_ELSE _MASKABLE_EXCEPTION_PSERIES(0x500, hardware_interrupt, -
Re: [PATCH] KVM: PPC: Exit guest upon fatal machine check exception
On Wed, Nov 11, 2015 at 10:28:46PM +0530, Aravinda Prasad wrote: > This patch modifies KVM to cause a guest exit with > KVM_EXIT_NMI instead of immediately delivering a 0x200 > interrupt to guest upon machine check exception in > guest address. Exiting the guest enables QEMU to build > error log and deliver machine check exception to guest > OS (either via guest OS registered machine check > handler or via 0x200 guest OS interrupt vector). > > This approach simplifies the delivering of machine > check exception to guest OS compared to the earlier approach > of KVM directly invoking 0x200 guest interrupt vector. > In the earlier approach QEMU patched the 0x200 interrupt > vector during boot. The patched code at 0x200 issued a > private hcall to pass the control to QEMU to build the > error log. > > This design/approach is based on the feedback for the > QEMU patches to handle machine check exception. Details > of earlier approach of handling machine check exception > in QEMU and related discussions can be found at: > > https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html > > Signed-off-by: Aravinda Prasad> --- > arch/powerpc/kvm/book3s_hv.c| 12 +++- > arch/powerpc/kvm/book3s_hv_rmhandlers.S | 31 > +++ > 2 files changed, 14 insertions(+), 29 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index 2280497..1b1dff0 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -859,15 +859,9 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, > struct kvm_vcpu *vcpu, > r = RESUME_GUEST; > break; > case BOOK3S_INTERRUPT_MACHINE_CHECK: > - /* > - * Deliver a machine check interrupt to the guest. > - * We have to do this, even if the host has handled the > - * machine check, because machine checks use SRR0/1 and > - * the interrupt might have trashed guest state in them. > - */ > - kvmppc_book3s_queue_irqprio(vcpu, > - BOOK3S_INTERRUPT_MACHINE_CHECK); > - r = RESUME_GUEST; > + /* Exit to guest with KVM_EXIT_NMI as exit reason */ > + run->exit_reason = KVM_EXIT_NMI; > + r = RESUME_HOST; > break; > case BOOK3S_INTERRUPT_PROGRAM: > { > diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S > b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > index b98889e..672b4f6 100644 > --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S > +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > @@ -147,7 +147,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) > addir1, r1, 112 > ld r7, HSTATE_HOST_MSR(r13) > > - cmpwi cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK > cmpwi r12, BOOK3S_INTERRUPT_EXTERNAL > beq 11f > cmpwi cr2, r12, BOOK3S_INTERRUPT_HMI > @@ -160,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) > mtmsrd r6, 1 /* Clear RI in MSR */ > mtsrr0 r8 > mtsrr1 r7 > - beq cr1, 13f/* machine check */ > RFI > > /* On POWER7, we have external interrupts set to use HSRR0/1 */ > @@ -168,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) > mtspr SPRN_HSRR1, r7 > ba 0x500 > > -13: b machine_check_fwnmi > - I don't quite understand the 3 hunks above. The rest seems to change the delivery of MCs as I'd expect, but the above seems to change their detection and the reason for that isn't obvious to me. > 14: mtspr SPRN_HSRR0, r8 > mtspr SPRN_HSRR1, r7 > b hmi_exception_after_realmode > @@ -2381,24 +2377,19 @@ machine_check_realmode: > ld r9, HSTATE_KVM_VCPU(r13) > li r12, BOOK3S_INTERRUPT_MACHINE_CHECK > /* > - * Deliver unhandled/fatal (e.g. UE) MCE errors to guest through > - * machine check interrupt (set HSRR0 to 0x200). And for handled > - * errors (no-fatal), just go back to guest execution with current > - * HSRR0 instead of exiting guest. This new approach will inject > - * machine check to guest for fatal error causing guest to crash. > - * > - * The old code used to return to host for unhandled errors which > - * was causing guest to hang with soft lockups inside guest and > - * makes it difficult to recover guest instance. > + * Deliver unhandled/fatal (e.g. UE) MCE errors to guest > + * by exiting the guest with KVM_EXIT_NMI exit reason (exit > + * reason set later based on trap). For handled errors > + * (no-fatal), go back to guest execution with current HSRR0 > + * instead of exiting the guest. This approach will cause > + * the guest to exit in case of fatal machine check error. >*/ > - ld r10, VCPU_PC(r9) > + bne 2f /*
Re: [PATCH] KVM: PPC: Exit guest upon fatal machine check exception
On Thu, Nov 12, 2015 at 01:24:19PM +1100, Daniel Axtens wrote: > Aravinda Prasadwrites: > > > This patch modifies KVM to cause a guest exit with > > KVM_EXIT_NMI instead of immediately delivering a 0x200 > > interrupt to guest upon machine check exception in > > guest address. Exiting the guest enables QEMU to build > > error log and deliver machine check exception to guest > > OS (either via guest OS registered machine check > > handler or via 0x200 guest OS interrupt vector). > > > > This approach simplifies the delivering of machine > > check exception to guest OS compared to the earlier approach > > of KVM directly invoking 0x200 guest interrupt vector. > > In the earlier approach QEMU patched the 0x200 interrupt > > vector during boot. The patched code at 0x200 issued a > > private hcall to pass the control to QEMU to build the > > error log. > > > > This design/approach is based on the feedback for the > > QEMU patches to handle machine check exception. Details > > of earlier approach of handling machine check exception > > in QEMU and related discussions can be found at: > > > > https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html > > I've poked at the MCE code, but not the KVM MCE code, so I may be > mistaken here, but I'm not clear on how this handles errors that the > guest can recover without terminating. > > For example, a Linux guest can handle a UE in guest userspace by killing > the guest process. A hypthetical non-linux guest with a microkernel > could even survive UEs in drivers. > > It sounds from your patch like you're changing this behaviour. Is this > right? So, IIUC. Once the qemu pieces are in place as well it shouldn't change this behaviour: KVM will exit to qemu, qemu will log the error information (new), then reinject the MC to the guest which can still handle it as you describe above. But, there could be a problem if you have a new kernel with an old qemu, in that case qemu might not understand the new exit type and treat it as a fatal error, even though the guest could actually cope with it. Aravinda, do we need to change this so that qemu has to explicitly enable the new NMI behaviour? Or have I missed something that will make that case work already. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
[GIT PULL] Please pull my kvm-ppc-fixes branch
Paolo, I have two fixes for HV KVM which I would like to have included in v4.4-rc1. The first one is a fix for a bug identified by Red Hat which causes occasional guest crashes. The second one fixes a bug which causes host stalls and timeouts under certain circumstances when the host is configured for static 2-way micro-threading mode. Thanks, Paul. The following changes since commit a3eaa8649e4c6a6afdafaa04b9114fb230617bb1: KVM: VMX: Fix commit which broke PML (2015-11-05 11:34:11 +0100) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git kvm-ppc-fixes for you to fetch changes up to f74f2e2e26199f695ca3df94f29e9ab7cb707ea4: KVM: PPC: Book3S HV: Don't dynamically split core when already split (2015-11-06 16:02:59 +1100) Paul Mackerras (2): KVM: PPC: Book3S HV: Synthesize segment fault if SLB lookup fails KVM: PPC: Book3S HV: Don't dynamically split core when already split arch/powerpc/kvm/book3s_hv.c| 2 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 20 2 files changed, 13 insertions(+), 9 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Exit guest upon fatal machine check exception
Aravinda Prasadwrites: > This patch modifies KVM to cause a guest exit with > KVM_EXIT_NMI instead of immediately delivering a 0x200 > interrupt to guest upon machine check exception in > guest address. Exiting the guest enables QEMU to build > error log and deliver machine check exception to guest > OS (either via guest OS registered machine check > handler or via 0x200 guest OS interrupt vector). > > This approach simplifies the delivering of machine > check exception to guest OS compared to the earlier approach > of KVM directly invoking 0x200 guest interrupt vector. > In the earlier approach QEMU patched the 0x200 interrupt > vector during boot. The patched code at 0x200 issued a > private hcall to pass the control to QEMU to build the > error log. > > This design/approach is based on the feedback for the > QEMU patches to handle machine check exception. Details > of earlier approach of handling machine check exception > in QEMU and related discussions can be found at: > > https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html I've poked at the MCE code, but not the KVM MCE code, so I may be mistaken here, but I'm not clear on how this handles errors that the guest can recover without terminating. For example, a Linux guest can handle a UE in guest userspace by killing the guest process. A hypthetical non-linux guest with a microkernel could even survive UEs in drivers. It sounds from your patch like you're changing this behaviour. Is this right? Regards, Daniel > > Signed-off-by: Aravinda Prasad > --- > arch/powerpc/kvm/book3s_hv.c| 12 +++- > arch/powerpc/kvm/book3s_hv_rmhandlers.S | 31 > +++ > 2 files changed, 14 insertions(+), 29 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index 2280497..1b1dff0 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -859,15 +859,9 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, > struct kvm_vcpu *vcpu, > r = RESUME_GUEST; > break; > case BOOK3S_INTERRUPT_MACHINE_CHECK: > - /* > - * Deliver a machine check interrupt to the guest. > - * We have to do this, even if the host has handled the > - * machine check, because machine checks use SRR0/1 and > - * the interrupt might have trashed guest state in them. > - */ > - kvmppc_book3s_queue_irqprio(vcpu, > - BOOK3S_INTERRUPT_MACHINE_CHECK); > - r = RESUME_GUEST; > + /* Exit to guest with KVM_EXIT_NMI as exit reason */ > + run->exit_reason = KVM_EXIT_NMI; > + r = RESUME_HOST; > break; > case BOOK3S_INTERRUPT_PROGRAM: > { > diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S > b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > index b98889e..672b4f6 100644 > --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S > +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > @@ -147,7 +147,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) > addir1, r1, 112 > ld r7, HSTATE_HOST_MSR(r13) > > - cmpwi cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK > cmpwi r12, BOOK3S_INTERRUPT_EXTERNAL > beq 11f > cmpwi cr2, r12, BOOK3S_INTERRUPT_HMI > @@ -160,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) > mtmsrd r6, 1 /* Clear RI in MSR */ > mtsrr0 r8 > mtsrr1 r7 > - beq cr1, 13f/* machine check */ > RFI > > /* On POWER7, we have external interrupts set to use HSRR0/1 */ > @@ -168,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) > mtspr SPRN_HSRR1, r7 > ba 0x500 > > -13: b machine_check_fwnmi > - > 14: mtspr SPRN_HSRR0, r8 > mtspr SPRN_HSRR1, r7 > b hmi_exception_after_realmode > @@ -2381,24 +2377,19 @@ machine_check_realmode: > ld r9, HSTATE_KVM_VCPU(r13) > li r12, BOOK3S_INTERRUPT_MACHINE_CHECK > /* > - * Deliver unhandled/fatal (e.g. UE) MCE errors to guest through > - * machine check interrupt (set HSRR0 to 0x200). And for handled > - * errors (no-fatal), just go back to guest execution with current > - * HSRR0 instead of exiting guest. This new approach will inject > - * machine check to guest for fatal error causing guest to crash. > - * > - * The old code used to return to host for unhandled errors which > - * was causing guest to hang with soft lockups inside guest and > - * makes it difficult to recover guest instance. > + * Deliver unhandled/fatal (e.g. UE) MCE errors to guest > + * by exiting the guest with KVM_EXIT_NMI exit reason (exit > + * reason set later based on trap). For handled errors > + * (no-fatal), go back to guest
Re: [PATCH] KVM: PPC: Exit guest upon fatal machine check exception
On Thursday 12 November 2015 09:04 AM, David Gibson wrote: > On Wed, Nov 11, 2015 at 10:28:46PM +0530, Aravinda Prasad wrote: >> This patch modifies KVM to cause a guest exit with >> KVM_EXIT_NMI instead of immediately delivering a 0x200 >> interrupt to guest upon machine check exception in >> guest address. Exiting the guest enables QEMU to build >> error log and deliver machine check exception to guest >> OS (either via guest OS registered machine check >> handler or via 0x200 guest OS interrupt vector). >> >> This approach simplifies the delivering of machine >> check exception to guest OS compared to the earlier approach >> of KVM directly invoking 0x200 guest interrupt vector. >> In the earlier approach QEMU patched the 0x200 interrupt >> vector during boot. The patched code at 0x200 issued a >> private hcall to pass the control to QEMU to build the >> error log. >> >> This design/approach is based on the feedback for the >> QEMU patches to handle machine check exception. Details >> of earlier approach of handling machine check exception >> in QEMU and related discussions can be found at: >> >> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html >> >> Signed-off-by: Aravinda Prasad>> --- >> arch/powerpc/kvm/book3s_hv.c| 12 +++- >> arch/powerpc/kvm/book3s_hv_rmhandlers.S | 31 >> +++ >> 2 files changed, 14 insertions(+), 29 deletions(-) >> >> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c >> index 2280497..1b1dff0 100644 >> --- a/arch/powerpc/kvm/book3s_hv.c >> +++ b/arch/powerpc/kvm/book3s_hv.c >> @@ -859,15 +859,9 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, >> struct kvm_vcpu *vcpu, >> r = RESUME_GUEST; >> break; >> case BOOK3S_INTERRUPT_MACHINE_CHECK: >> -/* >> - * Deliver a machine check interrupt to the guest. >> - * We have to do this, even if the host has handled the >> - * machine check, because machine checks use SRR0/1 and >> - * the interrupt might have trashed guest state in them. >> - */ >> -kvmppc_book3s_queue_irqprio(vcpu, >> -BOOK3S_INTERRUPT_MACHINE_CHECK); >> -r = RESUME_GUEST; >> +/* Exit to guest with KVM_EXIT_NMI as exit reason */ >> +run->exit_reason = KVM_EXIT_NMI; >> +r = RESUME_HOST; >> break; >> case BOOK3S_INTERRUPT_PROGRAM: >> { >> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S >> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S >> index b98889e..672b4f6 100644 >> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S >> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S >> @@ -147,7 +147,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) >> addir1, r1, 112 >> ld r7, HSTATE_HOST_MSR(r13) >> >> -cmpwi cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK >> cmpwi r12, BOOK3S_INTERRUPT_EXTERNAL >> beq 11f >> cmpwi cr2, r12, BOOK3S_INTERRUPT_HMI >> @@ -160,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) >> mtmsrd r6, 1 /* Clear RI in MSR */ >> mtsrr0 r8 >> mtsrr1 r7 >> -beq cr1, 13f/* machine check */ >> RFI >> >> /* On POWER7, we have external interrupts set to use HSRR0/1 */ >> @@ -168,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) >> mtspr SPRN_HSRR1, r7 >> ba 0x500 >> >> -13: b machine_check_fwnmi >> - > > I don't quite understand the 3 hunks above. The rest seems to change > the delivery of MCs as I'd expect, but the above seems to change their > detection and the reason for that isn't obvious to me. We need to do guest exit here and hence we continue with RFI or else if we branch to machine_check_fwnmi then the control will flow to opal_recover_mce routine without causing the guest to exit with NMI exit code. And I think, opal_recover_mce() is used to recover from UEs in host user/kernel space and does not have a check for UEs belonging to guest. Hence if we branch to machine_check_fwnmi we end up running into panic() in opal_machine_check() if UE belonged to guest. Regards, Aravinda > > >> 14: mtspr SPRN_HSRR0, r8 >> mtspr SPRN_HSRR1, r7 >> b hmi_exception_after_realmode >> @@ -2381,24 +2377,19 @@ machine_check_realmode: >> ld r9, HSTATE_KVM_VCPU(r13) >> li r12, BOOK3S_INTERRUPT_MACHINE_CHECK >> /* >> - * Deliver unhandled/fatal (e.g. UE) MCE errors to guest through >> - * machine check interrupt (set HSRR0 to 0x200). And for handled >> - * errors (no-fatal), just go back to guest execution with current >> - * HSRR0 instead of exiting guest. This new approach will inject >> - * machine check to guest for fatal error causing guest to crash. >> - * >> - * The old code used to return to host for
[PATCH 2/2] KVM: PPC: Book3S HV: Handle unexpected traps in guest entry/exit code better
As we saw with the TM Bad Thing type of program interrupt occurring on the hrfid that enters the guest, it is not completely impossible to have a trap occurring in the guest entry/exit code, despite the fact that the code has been written to avoid taking any traps. This adds a check in the kvmppc_handle_exit_hv() function to detect the case when a trap has occurred in the hypervisor-mode code, and instead of treating it just like a trap in guest code, we now print a message and return to userspace with a KVM_EXIT_INTERNAL_ERROR exit reason. Of the various interrupts that get handled in the assembly code in the guest exit path and that can return directly to the guest, the only one that can occur when MSR.HV=1 and MSR.EE=0 is machine check (other than system call, which we can avoid just by not doing a sc instruction). Therefore this adds code to the machine check path to ensure that if the MCE occurred in hypervisor mode, we exit to the host rather than trying to continue the guest. Signed-off-by: Paul Mackerras--- arch/powerpc/kvm/book3s_hv.c| 18 ++ arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 ++ 2 files changed, 20 insertions(+) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index f668712..d6baf0a 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -846,6 +846,24 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu, vcpu->stat.sum_exits++; + /* +* This can happen if an interrupt occurs in the last stages +* of guest entry or the first stages of guest exit (i.e. after +* setting paca->kvm_hstate.in_guest to KVM_GUEST_MODE_GUEST_HV +* and before setting it to KVM_GUEST_MODE_HOST_HV). +* That can happen due to a bug, or due to a machine check +* occurring at just the wrong time. +*/ + if (vcpu->arch.shregs.msr & MSR_HV) { + printk(KERN_EMERG "KVM trap in HV mode!\n"); + printk(KERN_EMERG "trap=0x%x | pc=0x%lx | msr=0x%llx\n", + vcpu->arch.trap, kvmppc_get_pc(vcpu), + vcpu->arch.shregs.msr); + kvmppc_dump_regs(vcpu); + run->exit_reason = KVM_EXIT_INTERNAL_ERROR; + run->hw.hardware_exit_reason = vcpu->arch.trap; + return RESUME_HOST; + } run->exit_reason = KVM_EXIT_UNKNOWN; run->ready_for_interrupt_injection = 1; switch (vcpu->arch.trap) { diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 3c6badc..b3ce8ff 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -2404,6 +2404,8 @@ machine_check_realmode: * guest as machine check causing guest to crash. */ ld r11, VCPU_MSR(r9) + rldicl. r0, r11, 64-MSR_HV_LG, 63 /* check if it happened in HV mode */ + bne mc_cont /* if so, exit to host */ andi. r10, r11, MSR_RI/* check for unrecoverable exception */ beq 1f /* Deliver a machine check to guest */ ld r10, VCPU_PC(r9) -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Exit guest upon fatal machine check exception
On Thursday 12 November 2015 09:08 AM, David Gibson wrote: > On Thu, Nov 12, 2015 at 01:24:19PM +1100, Daniel Axtens wrote: >> Aravinda Prasadwrites: >> >>> This patch modifies KVM to cause a guest exit with >>> KVM_EXIT_NMI instead of immediately delivering a 0x200 >>> interrupt to guest upon machine check exception in >>> guest address. Exiting the guest enables QEMU to build >>> error log and deliver machine check exception to guest >>> OS (either via guest OS registered machine check >>> handler or via 0x200 guest OS interrupt vector). >>> >>> This approach simplifies the delivering of machine >>> check exception to guest OS compared to the earlier approach >>> of KVM directly invoking 0x200 guest interrupt vector. >>> In the earlier approach QEMU patched the 0x200 interrupt >>> vector during boot. The patched code at 0x200 issued a >>> private hcall to pass the control to QEMU to build the >>> error log. >>> >>> This design/approach is based on the feedback for the >>> QEMU patches to handle machine check exception. Details >>> of earlier approach of handling machine check exception >>> in QEMU and related discussions can be found at: >>> >>> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html >> >> I've poked at the MCE code, but not the KVM MCE code, so I may be >> mistaken here, but I'm not clear on how this handles errors that the >> guest can recover without terminating. >> >> For example, a Linux guest can handle a UE in guest userspace by killing >> the guest process. A hypthetical non-linux guest with a microkernel >> could even survive UEs in drivers. >> >> It sounds from your patch like you're changing this behaviour. Is this >> right? > > So, IIUC. Once the qemu pieces are in place as well it shouldn't > change this behaviour: KVM will exit to qemu, qemu will log the error > information (new), then reinject the MC to the guest which can still > handle it as you describe above. Yes. With KVM and QEMU both in place this will not change the behavior. QEMU will inject the UE to guest and the guest handles the UE based on where it occurred. For example if an UE happens in a guest process address space, that process will be killed. > > But, there could be a problem if you have a new kernel with an old > qemu, in that case qemu might not understand the new exit type and > treat it as a fatal error, even though the guest could actually cope > with it. In case of new kernel and old QEMU, the guest terminates as old QEMU does not understand the NMI exit reason. However, this is the case with old kernel and old QEMU as they do not handle UE belonging to guest. The difference is that the guest kernel terminates with different error code. old kernel and old QEMU -> guest panics [1] irrespective of where UE happened in guest address space. old kernel and new QEMU -> guest panics. same as above. new kernel and old QEMU -> guest terminates with unhanded NMI error irrespective of where UE happened in guest new kernel and new QEMU -> guest handles UEs in process address space by killing the process. guest terminates for UEs in guest kernel address space. [1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-June/118329.html > > Aravinda, do we need to change this so that qemu has to explicitly > enable the new NMI behaviour? Or have I missed something that will > make that case work already. I think we don't need to explicitly enable the new behavior. With new kernel and new QEMU this should just work. As mentioned above this is already broken for old kernel/QEMU. Any thoughts? Regards, Aravinda > > > > ___ > Linuxppc-dev mailing list > linuxppc-...@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/linuxppc-dev > -- Regards, Aravinda -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html