[PATCH] KVM: PPC: Exit guest upon fatal machine check exception

2015-11-11 Thread Aravinda Prasad
This patch modifies KVM to cause a guest exit with
KVM_EXIT_NMI instead of immediately delivering a 0x200
interrupt to guest upon machine check exception in
guest address. Exiting the guest enables QEMU to build
error log and deliver machine check exception to guest
OS (either via guest OS registered machine check
handler or via 0x200 guest OS interrupt vector).

This approach simplifies the delivering of machine
check exception to guest OS compared to the earlier approach
of KVM directly invoking 0x200 guest interrupt vector.
In the earlier approach QEMU patched the 0x200 interrupt
vector during boot. The patched code at 0x200 issued a
private hcall to pass the control to QEMU to build the
error log.

This design/approach is based on the feedback for the
QEMU patches to handle machine check exception. Details
of earlier approach of handling machine check exception
in QEMU and related discussions can be found at:

https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html

Signed-off-by: Aravinda Prasad 
---
 arch/powerpc/kvm/book3s_hv.c|   12 +++-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   31 +++
 2 files changed, 14 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 2280497..1b1dff0 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -859,15 +859,9 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
r = RESUME_GUEST;
break;
case BOOK3S_INTERRUPT_MACHINE_CHECK:
-   /*
-* Deliver a machine check interrupt to the guest.
-* We have to do this, even if the host has handled the
-* machine check, because machine checks use SRR0/1 and
-* the interrupt might have trashed guest state in them.
-*/
-   kvmppc_book3s_queue_irqprio(vcpu,
-   BOOK3S_INTERRUPT_MACHINE_CHECK);
-   r = RESUME_GUEST;
+   /* Exit to guest with KVM_EXIT_NMI as exit reason */
+   run->exit_reason = KVM_EXIT_NMI;
+   r = RESUME_HOST;
break;
case BOOK3S_INTERRUPT_PROGRAM:
{
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index b98889e..672b4f6 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -147,7 +147,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
addir1, r1, 112
ld  r7, HSTATE_HOST_MSR(r13)
 
-   cmpwi   cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
cmpwi   r12, BOOK3S_INTERRUPT_EXTERNAL
beq 11f
cmpwi   cr2, r12, BOOK3S_INTERRUPT_HMI
@@ -160,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
mtmsrd  r6, 1   /* Clear RI in MSR */
mtsrr0  r8
mtsrr1  r7
-   beq cr1, 13f/* machine check */
RFI
 
/* On POWER7, we have external interrupts set to use HSRR0/1 */
@@ -168,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
mtspr   SPRN_HSRR1, r7
ba  0x500
 
-13:b   machine_check_fwnmi
-
 14:mtspr   SPRN_HSRR0, r8
mtspr   SPRN_HSRR1, r7
b   hmi_exception_after_realmode
@@ -2381,24 +2377,19 @@ machine_check_realmode:
ld  r9, HSTATE_KVM_VCPU(r13)
li  r12, BOOK3S_INTERRUPT_MACHINE_CHECK
/*
-* Deliver unhandled/fatal (e.g. UE) MCE errors to guest through
-* machine check interrupt (set HSRR0 to 0x200). And for handled
-* errors (no-fatal), just go back to guest execution with current
-* HSRR0 instead of exiting guest. This new approach will inject
-* machine check to guest for fatal error causing guest to crash.
-*
-* The old code used to return to host for unhandled errors which
-* was causing guest to hang with soft lockups inside guest and
-* makes it difficult to recover guest instance.
+* Deliver unhandled/fatal (e.g. UE) MCE errors to guest
+* by exiting the guest with KVM_EXIT_NMI exit reason (exit
+* reason set later based on trap). For handled errors
+* (no-fatal), go back to guest execution with current HSRR0
+* instead of exiting the guest. This approach will cause
+* the guest to exit in case of fatal machine check error.
 */
-   ld  r10, VCPU_PC(r9)
+   bne 2f  /* Continue guest execution? */
+   /* If not, exit the guest. SRR0/1 are already set */
+   b   mc_cont
+2: ld  r10, VCPU_PC(r9)
ld  r11, VCPU_MSR(r9)
-   bne 2f  /* Continue guest execution. */
-   /* If not, deliver a machine check.  SRR0/1 are already set */
-   li  r10, BOOK3S_INTERRUPT_MACHINE_CHECK
- 

Re: [PATCH] KVM: PPC: Exit guest upon fatal machine check exception

2015-11-11 Thread Daniel Axtens

> So, IIUC.  Once the qemu pieces are in place as well it shouldn't
> change this behaviour: KVM will exit to qemu, qemu will log the error
> information (new), then reinject the MC to the guest which can still
> handle it as you describe above.

Ah, that makes *much* more sense now! Thanks for the explanation: I
don't really follow qemu development.

>
> But, there could be a problem if you have a new kernel with an old
> qemu, in that case qemu might not understand the new exit type and
> treat it as a fatal error, even though the guest could actually cope
> with it.
>
> Aravinda, do we need to change this so that qemu has to explicitly
> enable the new NMI behaviour?  Or have I missed something that will
> make that case work already.

Yeah, it would be good not to break this.

Regards,
Daniel


> -- 
> David Gibson  | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au| minimalist, thank you.  NOT _the_ 
> _other_
>   | _way_ _around_!
> http://www.ozlabs.org/~dgibson
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Exit guest upon fatal machine check exception

2015-11-11 Thread David Gibson
On Thu, Nov 12, 2015 at 10:02:10AM +0530, Aravinda Prasad wrote:
> 
> 
> On Thursday 12 November 2015 09:08 AM, David Gibson wrote:
> > On Thu, Nov 12, 2015 at 01:24:19PM +1100, Daniel Axtens wrote:
> >> Aravinda Prasad  writes:
> >>
> >>> This patch modifies KVM to cause a guest exit with
> >>> KVM_EXIT_NMI instead of immediately delivering a 0x200
> >>> interrupt to guest upon machine check exception in
> >>> guest address. Exiting the guest enables QEMU to build
> >>> error log and deliver machine check exception to guest
> >>> OS (either via guest OS registered machine check
> >>> handler or via 0x200 guest OS interrupt vector).
> >>>
> >>> This approach simplifies the delivering of machine
> >>> check exception to guest OS compared to the earlier approach
> >>> of KVM directly invoking 0x200 guest interrupt vector.
> >>> In the earlier approach QEMU patched the 0x200 interrupt
> >>> vector during boot. The patched code at 0x200 issued a
> >>> private hcall to pass the control to QEMU to build the
> >>> error log.
> >>>
> >>> This design/approach is based on the feedback for the
> >>> QEMU patches to handle machine check exception. Details
> >>> of earlier approach of handling machine check exception
> >>> in QEMU and related discussions can be found at:
> >>>
> >>> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html
> >>
> >> I've poked at the MCE code, but not the KVM MCE code, so I may be
> >> mistaken here, but I'm not clear on how this handles errors that the
> >> guest can recover without terminating.
> >>
> >> For example, a Linux guest can handle a UE in guest userspace by killing
> >> the guest process. A hypthetical non-linux guest with a microkernel
> >> could even survive UEs in drivers.
> >>
> >> It sounds from your patch like you're changing this behaviour. Is this
> >> right?
> > 
> > So, IIUC.  Once the qemu pieces are in place as well it shouldn't
> > change this behaviour: KVM will exit to qemu, qemu will log the error
> > information (new), then reinject the MC to the guest which can still
> > handle it as you describe above.
> 
> Yes. With KVM and QEMU both in place this will not change the behavior.
> QEMU will inject the UE to guest and the guest handles the UE based on
> where it occurred. For example if an UE happens in a guest process
> address space, that process will be killed.
> 
> > 
> > But, there could be a problem if you have a new kernel with an old
> > qemu, in that case qemu might not understand the new exit type and
> > treat it as a fatal error, even though the guest could actually cope
> > with it.
> 
> In case of new kernel and old QEMU, the guest terminates as old QEMU
> does not understand the NMI exit reason. However, this is the case with
> old kernel and old QEMU as they do not handle UE belonging to guest. The
> difference is that the guest kernel terminates with different error
> code.

Ok.. assuming the guest has code to handle the UE in 0x200, why would
the guest terminate with old kernel and old qemu?  I haven't quite
followed the logic.

> 
> old kernel and old QEMU -> guest panics [1] irrespective of where UE
>happened in guest address space.
> old kernel and new QEMU -> guest panics. same as above.
> new kernel and old QEMU -> guest terminates with unhanded NMI error
>irrespective of where UE happened in guest
> new kernel and new QEMU -> guest handles UEs in process address space
>by killing the process. guest terminates
>for UEs in guest kernel address space.
> 
> [1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-June/118329.html
> 
> > 
> > Aravinda, do we need to change this so that qemu has to explicitly
> > enable the new NMI behaviour?  Or have I missed something that will
> > make that case work already.
> 
> I think we don't need to explicitly enable the new behavior. With new
> kernel and new QEMU this should just work. As mentioned above this is
> already broken for old kernel/QEMU. Any thoughts?
> 
> Regards,
> Aravinda
> 
> > 
> > 
> > 
> > ___
> > Linuxppc-dev mailing list
> > linuxppc-...@lists.ozlabs.org
> > https://lists.ozlabs.org/listinfo/linuxppc-dev
> > 
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


[PATCH 1/2] KVM: PPC: Book3S HV: Prohibit setting illegal transaction state in MSR

2015-11-11 Thread Paul Mackerras
Currently it is possible for userspace (e.g. QEMU) to set a value
for the MSR for a guest VCPU which has both of the TS bits set,
which is an illegal combination.  The result of this is that when
we execute a hrfid (hypervisor return from interrupt doubleword)
instruction to enter the guest, the CPU will take a TM Bad Thing
type of program interrupt (vector 0x700).

Now, if PR KVM is configured in the kernel along with HV KVM, we
actually handle this without crashing the host or giving hypervisor
privilege to the guest; instead what happens is that we deliver a
program interrupt to the guest, with SRR0 reflecting the address
of the hrfid instruction and SRR1 containing the MSR value at that
point.  If PR KVM is not configured in the kernel, then we try to
run the host's program interrupt handler with the MMU set to the
guest context, which almost certainly causes a host crash.

This closes the hole by making kvmppc_set_msr_hv() check for the
illegal combination and force the TS field to a safe value (00,
meaning non-transactional).

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index becad3a..f668712 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -231,6 +231,12 @@ static void kvmppc_core_vcpu_put_hv(struct kvm_vcpu *vcpu)
 
 static void kvmppc_set_msr_hv(struct kvm_vcpu *vcpu, u64 msr)
 {
+   /*
+* Check for illegal transactional state bit combination
+* and if we find it, force the TS field to a safe state.
+*/
+   if ((msr & MSR_TS_MASK) == MSR_TS_MASK)
+   msr &= ~MSR_TS_MASK;
vcpu->arch.shregs.msr = msr;
kvmppc_end_cede(vcpu);
 }
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] powerpc/64: Include KVM guest test in all interrupt vectors

2015-11-11 Thread Paul Mackerras
Currently, if HV KVM is configured but PR KVM isn't, we don't include
a test to see whether we were interrupted in KVM guest context for the
set of interrupts which get delivered directly to the guest by hardware
if they occur in the guest.  This includes things like program
interrupts.

However, the recent bug where userspace could set the MSR for a VCPU
to have an illegal value in the TS field, and thus cause a TM Bad Thing
type of program interrupt on the hrfid that enters the guest, showed that
we can never be completely sure that these interrupts can never occur
in the guest entry/exit code.  If one of these interrupts does happen
and we have HV KVM configured but not PR KVM, then we end up trying to
run the handler in the host with the MMU set to the guest MMU context,
which generally ends badly.

Thus, for robustness it is better to have the test in every interrupt
vector, so that if some way is found to trigger some interrupt in the
guest entry/exit path, we can handle it without immediately crashing
the host.

This means that the distinction between KVMTEST and KVMTEST_PR goes
away.  Thus we delete KVMTEST_PR and associated macros and use KVMTEST
everywhere that we previously used either KVMTEST_PR or KVMTEST.  It
also means that SOFTEN_TEST_HV_201 becomes the same as SOFTEN_TEST_PR,
so we deleted SOFTEN_TEST_HV_201 and use SOFTEN_TEST_PR instead.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/exception-64s.h | 21 +++-
 arch/powerpc/kernel/exceptions-64s.S | 34 
 2 files changed, 20 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 77f52b2..9ee1078 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -263,17 +263,6 @@ do_kvm_##n:
\
 #define KVM_HANDLER_SKIP(area, h, n)
 #endif
 
-#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
-#define KVMTEST_PR(n)  __KVMTEST(n)
-#define KVM_HANDLER_PR(area, h, n) __KVM_HANDLER(area, h, n)
-#define KVM_HANDLER_PR_SKIP(area, h, n)__KVM_HANDLER_SKIP(area, h, n)
-
-#else
-#define KVMTEST_PR(n)
-#define KVM_HANDLER_PR(area, h, n)
-#define KVM_HANDLER_PR_SKIP(area, h, n)
-#endif
-
 #define NOTEST(n)
 
 /*
@@ -360,13 +349,13 @@ label##_pSeries:  \
HMT_MEDIUM_PPR_DISCARD; \
SET_SCRATCH0(r13);  /* save r13 */  \
EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, label##_common,\
-EXC_STD, KVMTEST_PR, vec)
+EXC_STD, KVMTEST, vec)
 
 /* Version of above for when we have to branch out-of-line */
 #define STD_EXCEPTION_PSERIES_OOL(vec, label)  \
.globl label##_pSeries; \
 label##_pSeries:   \
-   EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST_PR, vec);\
+   EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST, vec);   \
EXCEPTION_PROLOG_PSERIES_1(label##_common, EXC_STD)
 
 #define STD_EXCEPTION_HV(loc, vec, label)  \
@@ -436,17 +425,13 @@ label##_relon_hv: 
\
 #define _SOFTEN_TEST(h, vec)   __SOFTEN_TEST(h, vec)
 
 #define SOFTEN_TEST_PR(vec)\
-   KVMTEST_PR(vec);\
+   KVMTEST(vec);   \
_SOFTEN_TEST(EXC_STD, vec)
 
 #define SOFTEN_TEST_HV(vec)\
KVMTEST(vec);   \
_SOFTEN_TEST(EXC_HV, vec)
 
-#define SOFTEN_TEST_HV_201(vec)
\
-   KVMTEST(vec);   \
-   _SOFTEN_TEST(EXC_STD, vec)
-
 #define SOFTEN_NOTEST_PR(vec)  _SOFTEN_TEST(EXC_STD, vec)
 #define SOFTEN_NOTEST_HV(vec)  _SOFTEN_TEST(EXC_HV, vec)
 
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 0a0399c2..1a03142 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -242,7 +242,7 @@ instruction_access_slb_pSeries:
HMT_MEDIUM_PPR_DISCARD
SET_SCRATCH0(r13)
EXCEPTION_PROLOG_0(PACA_EXSLB)
-   EXCEPTION_PROLOG_1(PACA_EXSLB, KVMTEST_PR, 0x480)
+   EXCEPTION_PROLOG_1(PACA_EXSLB, KVMTEST, 0x480)
std r3,PACA_EXSLB+EX_R3(r13)
mfspr   r3,SPRN_SRR0/* SRR0 is faulting address */
 #ifdef __DISABLED__
@@ -276,18 +276,18 @@ hardware_interrupt_hv:
KVM_HANDLER(PACA_EXGEN, EXC_HV, 0x502)
FTR_SECTION_ELSE
_MASKABLE_EXCEPTION_PSERIES(0x500, hardware_interrupt,
- 

Re: [PATCH] KVM: PPC: Exit guest upon fatal machine check exception

2015-11-11 Thread David Gibson
On Wed, Nov 11, 2015 at 10:28:46PM +0530, Aravinda Prasad wrote:
> This patch modifies KVM to cause a guest exit with
> KVM_EXIT_NMI instead of immediately delivering a 0x200
> interrupt to guest upon machine check exception in
> guest address. Exiting the guest enables QEMU to build
> error log and deliver machine check exception to guest
> OS (either via guest OS registered machine check
> handler or via 0x200 guest OS interrupt vector).
> 
> This approach simplifies the delivering of machine
> check exception to guest OS compared to the earlier approach
> of KVM directly invoking 0x200 guest interrupt vector.
> In the earlier approach QEMU patched the 0x200 interrupt
> vector during boot. The patched code at 0x200 issued a
> private hcall to pass the control to QEMU to build the
> error log.
> 
> This design/approach is based on the feedback for the
> QEMU patches to handle machine check exception. Details
> of earlier approach of handling machine check exception
> in QEMU and related discussions can be found at:
> 
> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html
> 
> Signed-off-by: Aravinda Prasad 
> ---
>  arch/powerpc/kvm/book3s_hv.c|   12 +++-
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |   31 
> +++
>  2 files changed, 14 insertions(+), 29 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 2280497..1b1dff0 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -859,15 +859,9 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, 
> struct kvm_vcpu *vcpu,
>   r = RESUME_GUEST;
>   break;
>   case BOOK3S_INTERRUPT_MACHINE_CHECK:
> - /*
> -  * Deliver a machine check interrupt to the guest.
> -  * We have to do this, even if the host has handled the
> -  * machine check, because machine checks use SRR0/1 and
> -  * the interrupt might have trashed guest state in them.
> -  */
> - kvmppc_book3s_queue_irqprio(vcpu,
> - BOOK3S_INTERRUPT_MACHINE_CHECK);
> - r = RESUME_GUEST;
> + /* Exit to guest with KVM_EXIT_NMI as exit reason */
> + run->exit_reason = KVM_EXIT_NMI;
> + r = RESUME_HOST;
>   break;
>   case BOOK3S_INTERRUPT_PROGRAM:
>   {
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index b98889e..672b4f6 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -147,7 +147,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>   addir1, r1, 112
>   ld  r7, HSTATE_HOST_MSR(r13)
>  
> - cmpwi   cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>   cmpwi   r12, BOOK3S_INTERRUPT_EXTERNAL
>   beq 11f
>   cmpwi   cr2, r12, BOOK3S_INTERRUPT_HMI
> @@ -160,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>   mtmsrd  r6, 1   /* Clear RI in MSR */
>   mtsrr0  r8
>   mtsrr1  r7
> - beq cr1, 13f/* machine check */
>   RFI
>  
>   /* On POWER7, we have external interrupts set to use HSRR0/1 */
> @@ -168,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>   mtspr   SPRN_HSRR1, r7
>   ba  0x500
>  
> -13:  b   machine_check_fwnmi
> -

I don't quite understand the 3 hunks above.  The rest seems to change
the delivery of MCs as I'd expect, but the above seems to change their
detection and the reason for that isn't obvious to me.


>  14:  mtspr   SPRN_HSRR0, r8
>   mtspr   SPRN_HSRR1, r7
>   b   hmi_exception_after_realmode
> @@ -2381,24 +2377,19 @@ machine_check_realmode:
>   ld  r9, HSTATE_KVM_VCPU(r13)
>   li  r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>   /*
> -  * Deliver unhandled/fatal (e.g. UE) MCE errors to guest through
> -  * machine check interrupt (set HSRR0 to 0x200). And for handled
> -  * errors (no-fatal), just go back to guest execution with current
> -  * HSRR0 instead of exiting guest. This new approach will inject
> -  * machine check to guest for fatal error causing guest to crash.
> -  *
> -  * The old code used to return to host for unhandled errors which
> -  * was causing guest to hang with soft lockups inside guest and
> -  * makes it difficult to recover guest instance.
> +  * Deliver unhandled/fatal (e.g. UE) MCE errors to guest
> +  * by exiting the guest with KVM_EXIT_NMI exit reason (exit
> +  * reason set later based on trap). For handled errors
> +  * (no-fatal), go back to guest execution with current HSRR0
> +  * instead of exiting the guest. This approach will cause
> +  * the guest to exit in case of fatal machine check error.
>*/
> - ld  r10, VCPU_PC(r9)
> + bne 2f  /* 

Re: [PATCH] KVM: PPC: Exit guest upon fatal machine check exception

2015-11-11 Thread David Gibson
On Thu, Nov 12, 2015 at 01:24:19PM +1100, Daniel Axtens wrote:
> Aravinda Prasad  writes:
> 
> > This patch modifies KVM to cause a guest exit with
> > KVM_EXIT_NMI instead of immediately delivering a 0x200
> > interrupt to guest upon machine check exception in
> > guest address. Exiting the guest enables QEMU to build
> > error log and deliver machine check exception to guest
> > OS (either via guest OS registered machine check
> > handler or via 0x200 guest OS interrupt vector).
> >
> > This approach simplifies the delivering of machine
> > check exception to guest OS compared to the earlier approach
> > of KVM directly invoking 0x200 guest interrupt vector.
> > In the earlier approach QEMU patched the 0x200 interrupt
> > vector during boot. The patched code at 0x200 issued a
> > private hcall to pass the control to QEMU to build the
> > error log.
> >
> > This design/approach is based on the feedback for the
> > QEMU patches to handle machine check exception. Details
> > of earlier approach of handling machine check exception
> > in QEMU and related discussions can be found at:
> >
> > https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html
> 
> I've poked at the MCE code, but not the KVM MCE code, so I may be
> mistaken here, but I'm not clear on how this handles errors that the
> guest can recover without terminating.
> 
> For example, a Linux guest can handle a UE in guest userspace by killing
> the guest process. A hypthetical non-linux guest with a microkernel
> could even survive UEs in drivers.
> 
> It sounds from your patch like you're changing this behaviour. Is this
> right?

So, IIUC.  Once the qemu pieces are in place as well it shouldn't
change this behaviour: KVM will exit to qemu, qemu will log the error
information (new), then reinject the MC to the guest which can still
handle it as you describe above.

But, there could be a problem if you have a new kernel with an old
qemu, in that case qemu might not understand the new exit type and
treat it as a fatal error, even though the guest could actually cope
with it.

Aravinda, do we need to change this so that qemu has to explicitly
enable the new NMI behaviour?  Or have I missed something that will
make that case work already.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


[GIT PULL] Please pull my kvm-ppc-fixes branch

2015-11-11 Thread Paul Mackerras
Paolo,

I have two fixes for HV KVM which I would like to have included in
v4.4-rc1.  The first one is a fix for a bug identified by Red Hat
which causes occasional guest crashes.  The second one fixes a bug
which causes host stalls and timeouts under certain circumstances when
the host is configured for static 2-way micro-threading mode.

Thanks,
Paul.

The following changes since commit a3eaa8649e4c6a6afdafaa04b9114fb230617bb1:

  KVM: VMX: Fix commit which broke PML (2015-11-05 11:34:11 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git kvm-ppc-fixes

for you to fetch changes up to f74f2e2e26199f695ca3df94f29e9ab7cb707ea4:

  KVM: PPC: Book3S HV: Don't dynamically split core when already split 
(2015-11-06 16:02:59 +1100)


Paul Mackerras (2):
  KVM: PPC: Book3S HV: Synthesize segment fault if SLB lookup fails
  KVM: PPC: Book3S HV: Don't dynamically split core when already split

 arch/powerpc/kvm/book3s_hv.c|  2 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 20 
 2 files changed, 13 insertions(+), 9 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Exit guest upon fatal machine check exception

2015-11-11 Thread Daniel Axtens
Aravinda Prasad  writes:

> This patch modifies KVM to cause a guest exit with
> KVM_EXIT_NMI instead of immediately delivering a 0x200
> interrupt to guest upon machine check exception in
> guest address. Exiting the guest enables QEMU to build
> error log and deliver machine check exception to guest
> OS (either via guest OS registered machine check
> handler or via 0x200 guest OS interrupt vector).
>
> This approach simplifies the delivering of machine
> check exception to guest OS compared to the earlier approach
> of KVM directly invoking 0x200 guest interrupt vector.
> In the earlier approach QEMU patched the 0x200 interrupt
> vector during boot. The patched code at 0x200 issued a
> private hcall to pass the control to QEMU to build the
> error log.
>
> This design/approach is based on the feedback for the
> QEMU patches to handle machine check exception. Details
> of earlier approach of handling machine check exception
> in QEMU and related discussions can be found at:
>
> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html

I've poked at the MCE code, but not the KVM MCE code, so I may be
mistaken here, but I'm not clear on how this handles errors that the
guest can recover without terminating.

For example, a Linux guest can handle a UE in guest userspace by killing
the guest process. A hypthetical non-linux guest with a microkernel
could even survive UEs in drivers.

It sounds from your patch like you're changing this behaviour. Is this
right?

Regards,
Daniel

>
> Signed-off-by: Aravinda Prasad 
> ---
>  arch/powerpc/kvm/book3s_hv.c|   12 +++-
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |   31 
> +++
>  2 files changed, 14 insertions(+), 29 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 2280497..1b1dff0 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -859,15 +859,9 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, 
> struct kvm_vcpu *vcpu,
>   r = RESUME_GUEST;
>   break;
>   case BOOK3S_INTERRUPT_MACHINE_CHECK:
> - /*
> -  * Deliver a machine check interrupt to the guest.
> -  * We have to do this, even if the host has handled the
> -  * machine check, because machine checks use SRR0/1 and
> -  * the interrupt might have trashed guest state in them.
> -  */
> - kvmppc_book3s_queue_irqprio(vcpu,
> - BOOK3S_INTERRUPT_MACHINE_CHECK);
> - r = RESUME_GUEST;
> + /* Exit to guest with KVM_EXIT_NMI as exit reason */
> + run->exit_reason = KVM_EXIT_NMI;
> + r = RESUME_HOST;
>   break;
>   case BOOK3S_INTERRUPT_PROGRAM:
>   {
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index b98889e..672b4f6 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -147,7 +147,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>   addir1, r1, 112
>   ld  r7, HSTATE_HOST_MSR(r13)
>  
> - cmpwi   cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>   cmpwi   r12, BOOK3S_INTERRUPT_EXTERNAL
>   beq 11f
>   cmpwi   cr2, r12, BOOK3S_INTERRUPT_HMI
> @@ -160,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>   mtmsrd  r6, 1   /* Clear RI in MSR */
>   mtsrr0  r8
>   mtsrr1  r7
> - beq cr1, 13f/* machine check */
>   RFI
>  
>   /* On POWER7, we have external interrupts set to use HSRR0/1 */
> @@ -168,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>   mtspr   SPRN_HSRR1, r7
>   ba  0x500
>  
> -13:  b   machine_check_fwnmi
> -
>  14:  mtspr   SPRN_HSRR0, r8
>   mtspr   SPRN_HSRR1, r7
>   b   hmi_exception_after_realmode
> @@ -2381,24 +2377,19 @@ machine_check_realmode:
>   ld  r9, HSTATE_KVM_VCPU(r13)
>   li  r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>   /*
> -  * Deliver unhandled/fatal (e.g. UE) MCE errors to guest through
> -  * machine check interrupt (set HSRR0 to 0x200). And for handled
> -  * errors (no-fatal), just go back to guest execution with current
> -  * HSRR0 instead of exiting guest. This new approach will inject
> -  * machine check to guest for fatal error causing guest to crash.
> -  *
> -  * The old code used to return to host for unhandled errors which
> -  * was causing guest to hang with soft lockups inside guest and
> -  * makes it difficult to recover guest instance.
> +  * Deliver unhandled/fatal (e.g. UE) MCE errors to guest
> +  * by exiting the guest with KVM_EXIT_NMI exit reason (exit
> +  * reason set later based on trap). For handled errors
> +  * (no-fatal), go back to guest 

Re: [PATCH] KVM: PPC: Exit guest upon fatal machine check exception

2015-11-11 Thread Aravinda Prasad


On Thursday 12 November 2015 09:04 AM, David Gibson wrote:
> On Wed, Nov 11, 2015 at 10:28:46PM +0530, Aravinda Prasad wrote:
>> This patch modifies KVM to cause a guest exit with
>> KVM_EXIT_NMI instead of immediately delivering a 0x200
>> interrupt to guest upon machine check exception in
>> guest address. Exiting the guest enables QEMU to build
>> error log and deliver machine check exception to guest
>> OS (either via guest OS registered machine check
>> handler or via 0x200 guest OS interrupt vector).
>>
>> This approach simplifies the delivering of machine
>> check exception to guest OS compared to the earlier approach
>> of KVM directly invoking 0x200 guest interrupt vector.
>> In the earlier approach QEMU patched the 0x200 interrupt
>> vector during boot. The patched code at 0x200 issued a
>> private hcall to pass the control to QEMU to build the
>> error log.
>>
>> This design/approach is based on the feedback for the
>> QEMU patches to handle machine check exception. Details
>> of earlier approach of handling machine check exception
>> in QEMU and related discussions can be found at:
>>
>> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html
>>
>> Signed-off-by: Aravinda Prasad 
>> ---
>>  arch/powerpc/kvm/book3s_hv.c|   12 +++-
>>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |   31 
>> +++
>>  2 files changed, 14 insertions(+), 29 deletions(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index 2280497..1b1dff0 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -859,15 +859,9 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, 
>> struct kvm_vcpu *vcpu,
>>  r = RESUME_GUEST;
>>  break;
>>  case BOOK3S_INTERRUPT_MACHINE_CHECK:
>> -/*
>> - * Deliver a machine check interrupt to the guest.
>> - * We have to do this, even if the host has handled the
>> - * machine check, because machine checks use SRR0/1 and
>> - * the interrupt might have trashed guest state in them.
>> - */
>> -kvmppc_book3s_queue_irqprio(vcpu,
>> -BOOK3S_INTERRUPT_MACHINE_CHECK);
>> -r = RESUME_GUEST;
>> +/* Exit to guest with KVM_EXIT_NMI as exit reason */
>> +run->exit_reason = KVM_EXIT_NMI;
>> +r = RESUME_HOST;
>>  break;
>>  case BOOK3S_INTERRUPT_PROGRAM:
>>  {
>> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
>> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> index b98889e..672b4f6 100644
>> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> @@ -147,7 +147,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>  addir1, r1, 112
>>  ld  r7, HSTATE_HOST_MSR(r13)
>>  
>> -cmpwi   cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>>  cmpwi   r12, BOOK3S_INTERRUPT_EXTERNAL
>>  beq 11f
>>  cmpwi   cr2, r12, BOOK3S_INTERRUPT_HMI
>> @@ -160,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>  mtmsrd  r6, 1   /* Clear RI in MSR */
>>  mtsrr0  r8
>>  mtsrr1  r7
>> -beq cr1, 13f/* machine check */
>>  RFI
>>  
>>  /* On POWER7, we have external interrupts set to use HSRR0/1 */
>> @@ -168,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>  mtspr   SPRN_HSRR1, r7
>>  ba  0x500
>>  
>> -13: b   machine_check_fwnmi
>> -
> 
> I don't quite understand the 3 hunks above.  The rest seems to change
> the delivery of MCs as I'd expect, but the above seems to change their
> detection and the reason for that isn't obvious to me.

We need to do guest exit here and hence we continue with RFI or else if
we branch to machine_check_fwnmi then the control will flow to
opal_recover_mce routine without causing the guest to exit with NMI exit
code. And I think, opal_recover_mce() is used to recover from UEs in
host user/kernel space and does not have a check for UEs belonging to
guest. Hence if we branch to machine_check_fwnmi we end up running into
panic() in opal_machine_check() if UE belonged to guest.

Regards,
Aravinda

> 
> 
>>  14: mtspr   SPRN_HSRR0, r8
>>  mtspr   SPRN_HSRR1, r7
>>  b   hmi_exception_after_realmode
>> @@ -2381,24 +2377,19 @@ machine_check_realmode:
>>  ld  r9, HSTATE_KVM_VCPU(r13)
>>  li  r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>>  /*
>> - * Deliver unhandled/fatal (e.g. UE) MCE errors to guest through
>> - * machine check interrupt (set HSRR0 to 0x200). And for handled
>> - * errors (no-fatal), just go back to guest execution with current
>> - * HSRR0 instead of exiting guest. This new approach will inject
>> - * machine check to guest for fatal error causing guest to crash.
>> - *
>> - * The old code used to return to host for 

[PATCH 2/2] KVM: PPC: Book3S HV: Handle unexpected traps in guest entry/exit code better

2015-11-11 Thread Paul Mackerras
As we saw with the TM Bad Thing type of program interrupt occurring
on the hrfid that enters the guest, it is not completely impossible
to have a trap occurring in the guest entry/exit code, despite the
fact that the code has been written to avoid taking any traps.

This adds a check in the kvmppc_handle_exit_hv() function to detect
the case when a trap has occurred in the hypervisor-mode code, and
instead of treating it just like a trap in guest code, we now print
a message and return to userspace with a KVM_EXIT_INTERNAL_ERROR
exit reason.

Of the various interrupts that get handled in the assembly code in
the guest exit path and that can return directly to the guest, the
only one that can occur when MSR.HV=1 and MSR.EE=0 is machine check
(other than system call, which we can avoid just by not doing a sc
instruction).  Therefore this adds code to the machine check path to
ensure that if the MCE occurred in hypervisor mode, we exit to the
host rather than trying to continue the guest.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv.c| 18 ++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  2 ++
 2 files changed, 20 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index f668712..d6baf0a 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -846,6 +846,24 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
 
vcpu->stat.sum_exits++;
 
+   /*
+* This can happen if an interrupt occurs in the last stages
+* of guest entry or the first stages of guest exit (i.e. after
+* setting paca->kvm_hstate.in_guest to KVM_GUEST_MODE_GUEST_HV
+* and before setting it to KVM_GUEST_MODE_HOST_HV).
+* That can happen due to a bug, or due to a machine check
+* occurring at just the wrong time.
+*/
+   if (vcpu->arch.shregs.msr & MSR_HV) {
+   printk(KERN_EMERG "KVM trap in HV mode!\n");
+   printk(KERN_EMERG "trap=0x%x | pc=0x%lx | msr=0x%llx\n",
+   vcpu->arch.trap, kvmppc_get_pc(vcpu),
+   vcpu->arch.shregs.msr);
+   kvmppc_dump_regs(vcpu);
+   run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+   run->hw.hardware_exit_reason = vcpu->arch.trap;
+   return RESUME_HOST;
+   }
run->exit_reason = KVM_EXIT_UNKNOWN;
run->ready_for_interrupt_injection = 1;
switch (vcpu->arch.trap) {
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 3c6badc..b3ce8ff 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -2404,6 +2404,8 @@ machine_check_realmode:
 * guest as machine check causing guest to crash.
 */
ld  r11, VCPU_MSR(r9)
+   rldicl. r0, r11, 64-MSR_HV_LG, 63 /* check if it happened in HV mode */
+   bne mc_cont /* if so, exit to host */
andi.   r10, r11, MSR_RI/* check for unrecoverable exception */
beq 1f  /* Deliver a machine check to guest */
ld  r10, VCPU_PC(r9)
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Exit guest upon fatal machine check exception

2015-11-11 Thread Aravinda Prasad


On Thursday 12 November 2015 09:08 AM, David Gibson wrote:
> On Thu, Nov 12, 2015 at 01:24:19PM +1100, Daniel Axtens wrote:
>> Aravinda Prasad  writes:
>>
>>> This patch modifies KVM to cause a guest exit with
>>> KVM_EXIT_NMI instead of immediately delivering a 0x200
>>> interrupt to guest upon machine check exception in
>>> guest address. Exiting the guest enables QEMU to build
>>> error log and deliver machine check exception to guest
>>> OS (either via guest OS registered machine check
>>> handler or via 0x200 guest OS interrupt vector).
>>>
>>> This approach simplifies the delivering of machine
>>> check exception to guest OS compared to the earlier approach
>>> of KVM directly invoking 0x200 guest interrupt vector.
>>> In the earlier approach QEMU patched the 0x200 interrupt
>>> vector during boot. The patched code at 0x200 issued a
>>> private hcall to pass the control to QEMU to build the
>>> error log.
>>>
>>> This design/approach is based on the feedback for the
>>> QEMU patches to handle machine check exception. Details
>>> of earlier approach of handling machine check exception
>>> in QEMU and related discussions can be found at:
>>>
>>> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html
>>
>> I've poked at the MCE code, but not the KVM MCE code, so I may be
>> mistaken here, but I'm not clear on how this handles errors that the
>> guest can recover without terminating.
>>
>> For example, a Linux guest can handle a UE in guest userspace by killing
>> the guest process. A hypthetical non-linux guest with a microkernel
>> could even survive UEs in drivers.
>>
>> It sounds from your patch like you're changing this behaviour. Is this
>> right?
> 
> So, IIUC.  Once the qemu pieces are in place as well it shouldn't
> change this behaviour: KVM will exit to qemu, qemu will log the error
> information (new), then reinject the MC to the guest which can still
> handle it as you describe above.

Yes. With KVM and QEMU both in place this will not change the behavior.
QEMU will inject the UE to guest and the guest handles the UE based on
where it occurred. For example if an UE happens in a guest process
address space, that process will be killed.

> 
> But, there could be a problem if you have a new kernel with an old
> qemu, in that case qemu might not understand the new exit type and
> treat it as a fatal error, even though the guest could actually cope
> with it.

In case of new kernel and old QEMU, the guest terminates as old QEMU
does not understand the NMI exit reason. However, this is the case with
old kernel and old QEMU as they do not handle UE belonging to guest. The
difference is that the guest kernel terminates with different error code.

old kernel and old QEMU -> guest panics [1] irrespective of where UE
   happened in guest address space.
old kernel and new QEMU -> guest panics. same as above.
new kernel and old QEMU -> guest terminates with unhanded NMI error
   irrespective of where UE happened in guest
new kernel and new QEMU -> guest handles UEs in process address space
   by killing the process. guest terminates
   for UEs in guest kernel address space.

[1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-June/118329.html

> 
> Aravinda, do we need to change this so that qemu has to explicitly
> enable the new NMI behaviour?  Or have I missed something that will
> make that case work already.

I think we don't need to explicitly enable the new behavior. With new
kernel and new QEMU this should just work. As mentioned above this is
already broken for old kernel/QEMU. Any thoughts?

Regards,
Aravinda

> 
> 
> 
> ___
> Linuxppc-dev mailing list
> linuxppc-...@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

-- 
Regards,
Aravinda

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html