Re: [PATCH] KVM: SVM: fix cr8 intercept window

2014-03-14 Thread Paolo Bonzini

Il 13/03/2014 18:08, Radim Krčmář ha scritto:

> I agree that old code is wrong and the patch looks correct, but I only
> see how the bug may cause pending IRR to not be delivered in time,
> not how interrupt can disrupt a higher priority task.


Right.  Also, on SMP guests the effect would likely be just a deadlock
if a lower-priority ISR interrupted a higher priority task and accessed 
shared data (since you need anyway a spinlock in addition to raising the 
IRQL).


A more likely explanation is that if the remote processor delays an IPI 
too much, it will have a stable TLB entry.  The resulting random 
corruption of paged memory is compatible with the BAD_POOL_HEADER error 
codes that Radim observed.



Paolo, can you change the last sentence to ", which means we don't
inject pending IRR immediately."?  (or do we just forget it?)


It's already in Linus's tree.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: SVM: fix cr8 intercept window

2014-03-13 Thread Radim Krčmář
2014-03-13 15:52+0200, Gleb Natapov:
> On Wed, Mar 12, 2014 at 06:20:01PM +0100, Paolo Bonzini wrote:
> > Il 12/03/2014 11:40, Radim Krčmář ha scritto:
> > >2014-03-11 22:05-0300, Marcelo Tosatti:
> > >>On Tue, Mar 11, 2014 at 07:11:18PM +0100, Radim Krčmář wrote:
> > >>>We always disable cr8 intercept in its handler, but only re-enable it
> > >>>if handling KVM_REQ_EVENT, so there can be a window where we do not
> > >>>intercept cr8 writes, which allows an interrupt to disrupt a higher
> > >>>priority task.
> > >>>
> > >>>Fix this by disabling intercepts in the same function that re-enables
> > >>>them when needed. This fixes BSOD in Windows 2008.
> > >>>
> > >>>Cc: 
> > >>>Signed-off-by: Radim Krčmář 
> > >>>---
> > >>> arch/x86/kvm/svm.c | 6 +++---
> > >>> 1 file changed, 3 insertions(+), 3 deletions(-)
> > >>>
> > >>>diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > >>>index 64d9bb9..f676c18 100644
> > >>>--- a/arch/x86/kvm/svm.c
> > >>>+++ b/arch/x86/kvm/svm.c
> > >>>@@ -3003,10 +3003,8 @@ static int cr8_write_interception(struct vcpu_svm 
> > >>>*svm)
> > >>> u8 cr8_prev = kvm_get_cr8(&svm->vcpu);
> > >>> /* instruction emulation calls kvm_set_cr8() */
> > >>> r = cr_interception(svm);
> > >>>-if (irqchip_in_kernel(svm->vcpu.kvm)) {
> > >>>-clr_cr_intercept(svm, INTERCEPT_CR8_WRITE);
> > >>>+if (irqchip_in_kernel(svm->vcpu.kvm))
> > >>> return r;
> > >>>-}
> > 
> > I think that the old code here makes little sense, and for two reasons:
> > 
> I agree that old code is wrong and the patch looks correct, but I only
> see how the bug may cause pending IRR to not be delivered in time,
> not how interrupt can disrupt a higher priority task.

True, the commit message is bad.
We BSOD only because IRR is not injected right after lowering the TPR
and code depends on values that had to be computed in it.

Paolo, can you change the last sentence to ", which means we don't
inject pending IRR immediately."?  (or do we just forget it?)

Thanks.

---
 My model of what is happening had misconceptions about Windows, KVM and
 SVM ... explained BSOD in IRQL 0xc more directly though, so it lasted
 till Paolo's review -- the list of things to (re)read is long.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: SVM: fix cr8 intercept window

2014-03-13 Thread Gleb Natapov
On Wed, Mar 12, 2014 at 06:20:01PM +0100, Paolo Bonzini wrote:
> Il 12/03/2014 11:40, Radim Krčmář ha scritto:
> >2014-03-11 22:05-0300, Marcelo Tosatti:
> >>On Tue, Mar 11, 2014 at 07:11:18PM +0100, Radim Krčmář wrote:
> >>>We always disable cr8 intercept in its handler, but only re-enable it
> >>>if handling KVM_REQ_EVENT, so there can be a window where we do not
> >>>intercept cr8 writes, which allows an interrupt to disrupt a higher
> >>>priority task.
> >>>
> >>>Fix this by disabling intercepts in the same function that re-enables
> >>>them when needed. This fixes BSOD in Windows 2008.
> >>>
> >>>Cc: 
> >>>Signed-off-by: Radim Krčmář 
> >>>---
> >>> arch/x86/kvm/svm.c | 6 +++---
> >>> 1 file changed, 3 insertions(+), 3 deletions(-)
> >>>
> >>>diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> >>>index 64d9bb9..f676c18 100644
> >>>--- a/arch/x86/kvm/svm.c
> >>>+++ b/arch/x86/kvm/svm.c
> >>>@@ -3003,10 +3003,8 @@ static int cr8_write_interception(struct vcpu_svm 
> >>>*svm)
> >>>   u8 cr8_prev = kvm_get_cr8(&svm->vcpu);
> >>>   /* instruction emulation calls kvm_set_cr8() */
> >>>   r = cr_interception(svm);
> >>>-  if (irqchip_in_kernel(svm->vcpu.kvm)) {
> >>>-  clr_cr_intercept(svm, INTERCEPT_CR8_WRITE);
> >>>+  if (irqchip_in_kernel(svm->vcpu.kvm))
> >>>   return r;
> >>>-  }
> 
> I think that the old code here makes little sense, and for two reasons:
> 
I agree that old code is wrong and the patch looks correct, but I only
see how the bug may cause pending IRR to not be delivered in time,
not how interrupt can disrupt a higher priority task.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: SVM: fix cr8 intercept window

2014-03-12 Thread Paolo Bonzini

Il 12/03/2014 11:40, Radim Krčmář ha scritto:

2014-03-11 22:05-0300, Marcelo Tosatti:

On Tue, Mar 11, 2014 at 07:11:18PM +0100, Radim Krčmář wrote:

We always disable cr8 intercept in its handler, but only re-enable it
if handling KVM_REQ_EVENT, so there can be a window where we do not
intercept cr8 writes, which allows an interrupt to disrupt a higher
priority task.

Fix this by disabling intercepts in the same function that re-enables
them when needed. This fixes BSOD in Windows 2008.

Cc: 
Signed-off-by: Radim Krčmář 
---
 arch/x86/kvm/svm.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 64d9bb9..f676c18 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3003,10 +3003,8 @@ static int cr8_write_interception(struct vcpu_svm *svm)
u8 cr8_prev = kvm_get_cr8(&svm->vcpu);
/* instruction emulation calls kvm_set_cr8() */
r = cr_interception(svm);
-   if (irqchip_in_kernel(svm->vcpu.kvm)) {
-   clr_cr_intercept(svm, INTERCEPT_CR8_WRITE);
+   if (irqchip_in_kernel(svm->vcpu.kvm))
return r;
-   }


I think that the old code here makes little sense, and for two reasons:

1) There are other ways to change the TPR, and the condition for 
setting/clearing the CR8 intercept should be the same for all of them. 
Now CR8 is the really optimized one, but there is no reason to treat 
them differently and it just complicates understanding the code.


So it is a good thing that your patch moves the clearing of the CR8 
write intercept in a generic place (the setting of the intercept is 
already generic).  Doesn't say much about the correctness of the patch; 
it would be just an optimization.  But at least the old code is the 
"smelly" one.



2) Unconditionally disabling the CR8 intercept is definitely wrong. 
What matters is the change in the PPR; if the processor priority is the 
same as before, or higher, absolutely nothing has changed from the point 
of view of interrupt delivery; if we had an interrupt in the IRR, 
waiting to be delivered, we still have it, and we should keep the CR8 
intercept enabled.


Your patch does the right thing by virtue of apic_update_ppr setting 
KVM_REQ_EVENT (which ultimately calls update_cr8_intercept) exactly if 
the PPR has been lowered.  The call chain is 
kvm_set_cr8->kvm_lapic_set_tpr->apic_set_tpr->apic_update_ppr.


At the end of this email I'll show an example of why this actually is 
relatively common on Windows guests.



So, IMO there is no doubt that the change is semantically correct.  The 
next question then is whether it undoes the V_TPR optimization.  You can 
prove it by a sort of induction; consider a sequences of events that 
start and end with the same IRR, assume CR8 is not intercepted at the 
beginning, and prove that CR8 is still not intercepted afterwards.


We can assume that all changes to the TPR are balanced and properly 
nested (except you can go low->med->high->low).



The simple sequences are:

1) changes in TPR with no interrupts in the middle; remember that 
Windows doesn't really ever disable/enable preemption or interrupt flags 
like Linux does.  It only modifies the TPR ("raise/lower the IRQL", they 
call it).  We're assuming that the CR8 intercept is initially disabled, 
so a raised-IRQL section of the code that doesn't cause other vmexits 
will obviously run at full speed.  Not much to see here.


2) delivery of an unmasked interrupt (with priority P) and subsequent 
EOI.  Changes to TPR don't really matter until EOI, because they are 
always to priority >= P and they are balanced.  So we ignore them.


To summarize: an interrupt with priority P is going to be delivered, the 
VCPU is running at TPR <= P, interrupts are allowed, and the CR8 
intercept is disabled.


The interrupt is injected via apic_set_irr/kvm_make_request, and this 
causes a call to update_cr8_intercept.  If TPR < P, the intercept will 
remain disabled.  When the EOI is sent, we get another event and another 
call to update_cr8_intercept; again, the intercept stays cleared because 
IRR == -1.


If TPR == P, the intercept is set while the interrupt handler runs, but 
it is still disabled at the end of the interrupt.  Looks like another 
bugfix; before your patch, it would remain enabled, which is useless. 
The TPR == P case is actually interesting for Windows, more below.



The complicated sequences are:

3) a change in the TPR, where an interrupt is masked while the 
high-priority task runs.  The interrupt is what will cause the intercept 
to be set.  As soon as the TPR is restored, the interrupt will be 
delivered and the intercept cleared (TPR < IRR).  We fall back to case 2 
above.


4) interrupts that are received while interrupts are not allowed.  Here 
KVM does not inject the interrupt; it requests the interrupt window and 
clears the intercept.  Clearing the intercept is okay, because no 
interrupt can be delivered anyway and TP

Re: [PATCH] KVM: SVM: fix cr8 intercept window

2014-03-12 Thread Marcelo Tosatti
On Wed, Mar 12, 2014 at 11:40:48AM +0100, Radim Krčmář wrote:
> 2014-03-11 22:05-0300, Marcelo Tosatti:
> > On Tue, Mar 11, 2014 at 07:11:18PM +0100, Radim Krčmář wrote:
> > > We always disable cr8 intercept in its handler, but only re-enable it
> > > if handling KVM_REQ_EVENT, so there can be a window where we do not
> > > intercept cr8 writes, which allows an interrupt to disrupt a higher
> > > priority task.
> > > 
> > > Fix this by disabling intercepts in the same function that re-enables
> > > them when needed. This fixes BSOD in Windows 2008.
> > > 
> > > Cc: 
> > > Signed-off-by: Radim Krčmář 
> > > ---
> > >  arch/x86/kvm/svm.c | 6 +++---
> > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > > index 64d9bb9..f676c18 100644
> > > --- a/arch/x86/kvm/svm.c
> > > +++ b/arch/x86/kvm/svm.c
> > > @@ -3003,10 +3003,8 @@ static int cr8_write_interception(struct vcpu_svm 
> > > *svm)
> > >   u8 cr8_prev = kvm_get_cr8(&svm->vcpu);
> > >   /* instruction emulation calls kvm_set_cr8() */
> > >   r = cr_interception(svm);
> > > - if (irqchip_in_kernel(svm->vcpu.kvm)) {
> > > - clr_cr_intercept(svm, INTERCEPT_CR8_WRITE);
> > > + if (irqchip_in_kernel(svm->vcpu.kvm))
> > >   return r;
> > > - }
> > >   if (cr8_prev <= kvm_get_cr8(&svm->vcpu))
> > >   return r;
> > >   kvm_run->exit_reason = KVM_EXIT_SET_TPR;
> > > @@ -3568,6 +3566,8 @@ static void update_cr8_intercept(struct kvm_vcpu 
> > > *vcpu, int tpr, int irr)
> > >   if (is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK))
> > >   return;
> > >  
> > > + clr_cr_intercept(svm, INTERCEPT_CR8_WRITE);
> > > +
> > >   if (irr == -1)
> > >   return;
> > 
> > Shouldnt IRR be injected if TPR < IRR ? (via KVM_REQ_EVENT).
> > 
> > 1) IRR has interrupt 10.
> > 2) TPR now 9 due to CR8 write.
> > 3) 10 should be injected.
> 
> Definitely should, and we will set KVM_REQ_EVENT through kvm_set_cr8()
> if we lower the TPR.
> (I checked that the bug isn't in apic_update_ppr().)

Yep.

> > Also not clearing the intercept can cause continuous CR8 writes to 
> > exit until KVM_REQ_EVENT ? 
> 
> It is intended, I suppose this is because we run with V_INTR_MASKING, so
> writes to CR8 only affect V_TPR register; guest then raises it once more
> and APIC incorrectly gives us low priority interrupt.

Reviewed-by: Marcelo Tosatti 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: SVM: fix cr8 intercept window

2014-03-12 Thread Radim Krčmář
2014-03-11 22:05-0300, Marcelo Tosatti:
> On Tue, Mar 11, 2014 at 07:11:18PM +0100, Radim Krčmář wrote:
> > We always disable cr8 intercept in its handler, but only re-enable it
> > if handling KVM_REQ_EVENT, so there can be a window where we do not
> > intercept cr8 writes, which allows an interrupt to disrupt a higher
> > priority task.
> > 
> > Fix this by disabling intercepts in the same function that re-enables
> > them when needed. This fixes BSOD in Windows 2008.
> > 
> > Cc: 
> > Signed-off-by: Radim Krčmář 
> > ---
> >  arch/x86/kvm/svm.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index 64d9bb9..f676c18 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -3003,10 +3003,8 @@ static int cr8_write_interception(struct vcpu_svm 
> > *svm)
> > u8 cr8_prev = kvm_get_cr8(&svm->vcpu);
> > /* instruction emulation calls kvm_set_cr8() */
> > r = cr_interception(svm);
> > -   if (irqchip_in_kernel(svm->vcpu.kvm)) {
> > -   clr_cr_intercept(svm, INTERCEPT_CR8_WRITE);
> > +   if (irqchip_in_kernel(svm->vcpu.kvm))
> > return r;
> > -   }
> > if (cr8_prev <= kvm_get_cr8(&svm->vcpu))
> > return r;
> > kvm_run->exit_reason = KVM_EXIT_SET_TPR;
> > @@ -3568,6 +3566,8 @@ static void update_cr8_intercept(struct kvm_vcpu 
> > *vcpu, int tpr, int irr)
> > if (is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK))
> > return;
> >  
> > +   clr_cr_intercept(svm, INTERCEPT_CR8_WRITE);
> > +
> > if (irr == -1)
> > return;
> 
> Shouldnt IRR be injected if TPR < IRR ? (via KVM_REQ_EVENT).
> 
> 1) IRR has interrupt 10.
> 2) TPR now 9 due to CR8 write.
> 3) 10 should be injected.

Definitely should, and we will set KVM_REQ_EVENT through kvm_set_cr8()
if we lower the TPR.
(I checked that the bug isn't in apic_update_ppr().)

> Also not clearing the intercept can cause continuous CR8 writes to 
> exit until KVM_REQ_EVENT ? 

It is intended, I suppose this is because we run with V_INTR_MASKING, so
writes to CR8 only affect V_TPR register; guest then raises it once more
and APIC incorrectly gives us low priority interrupt.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: SVM: fix cr8 intercept window

2014-03-11 Thread Marcelo Tosatti
On Tue, Mar 11, 2014 at 07:11:18PM +0100, Radim Krčmář wrote:
> We always disable cr8 intercept in its handler, but only re-enable it
> if handling KVM_REQ_EVENT, so there can be a window where we do not
> intercept cr8 writes, which allows an interrupt to disrupt a higher
> priority task.
> 
> Fix this by disabling intercepts in the same function that re-enables
> them when needed. This fixes BSOD in Windows 2008.
> 
> Cc: 
> Signed-off-by: Radim Krčmář 
> ---
>  arch/x86/kvm/svm.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 64d9bb9..f676c18 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -3003,10 +3003,8 @@ static int cr8_write_interception(struct vcpu_svm *svm)
>   u8 cr8_prev = kvm_get_cr8(&svm->vcpu);
>   /* instruction emulation calls kvm_set_cr8() */
>   r = cr_interception(svm);
> - if (irqchip_in_kernel(svm->vcpu.kvm)) {
> - clr_cr_intercept(svm, INTERCEPT_CR8_WRITE);
> + if (irqchip_in_kernel(svm->vcpu.kvm))
>   return r;
> - }
>   if (cr8_prev <= kvm_get_cr8(&svm->vcpu))
>   return r;
>   kvm_run->exit_reason = KVM_EXIT_SET_TPR;
> @@ -3568,6 +3566,8 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, 
> int tpr, int irr)
>   if (is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK))
>   return;
>  
> + clr_cr_intercept(svm, INTERCEPT_CR8_WRITE);
> +
>   if (irr == -1)
>   return;

Shouldnt IRR be injected if TPR < IRR ? (via KVM_REQ_EVENT).

1) IRR has interrupt 10.
2) TPR now 9 due to CR8 write.
3) 10 should be injected.

Also not clearing the intercept can cause continuous CR8 writes to 
exit until KVM_REQ_EVENT ? 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: SVM: fix cr8 intercept window

2014-03-11 Thread Radim Krčmář
We always disable cr8 intercept in its handler, but only re-enable it
if handling KVM_REQ_EVENT, so there can be a window where we do not
intercept cr8 writes, which allows an interrupt to disrupt a higher
priority task.

Fix this by disabling intercepts in the same function that re-enables
them when needed. This fixes BSOD in Windows 2008.

Cc: 
Signed-off-by: Radim Krčmář 
---
 arch/x86/kvm/svm.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 64d9bb9..f676c18 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3003,10 +3003,8 @@ static int cr8_write_interception(struct vcpu_svm *svm)
u8 cr8_prev = kvm_get_cr8(&svm->vcpu);
/* instruction emulation calls kvm_set_cr8() */
r = cr_interception(svm);
-   if (irqchip_in_kernel(svm->vcpu.kvm)) {
-   clr_cr_intercept(svm, INTERCEPT_CR8_WRITE);
+   if (irqchip_in_kernel(svm->vcpu.kvm))
return r;
-   }
if (cr8_prev <= kvm_get_cr8(&svm->vcpu))
return r;
kvm_run->exit_reason = KVM_EXIT_SET_TPR;
@@ -3568,6 +3566,8 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, 
int tpr, int irr)
if (is_guest_mode(vcpu) && (vcpu->arch.hflags & HF_VINTR_MASK))
return;
 
+   clr_cr_intercept(svm, INTERCEPT_CR8_WRITE);
+
if (irr == -1)
return;
 
-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html