Re: [PATCH] KVM: VMX: Fix race between pending IRQ and NMI

2008-11-25 Thread Jan Kiszka
Avi Kivity wrote:
> Jan Kiszka wrote:
> But I think I see a bigger issue - if we inject an regular interrupt
> while another is pending, then we will encounter this problem.  Looks
> like we have to enable the interrupt window after injecting an
> interrupt
> if there are still pending interrupts.
> 
 Yeah, probably. I'm just wondering now if we can set
 exit-on-interrupt-window while the vcpu state is interruptible (ie.
 _before_ the injection). There is some entry check like this for NMIs,
 but maybe no for interrupts. Need to check.
 
>>> Turns out it's not necessary, since the guest eoi will cause an exit and
>>> allow the code to request an interrupt window.
>>> 
>>
>> But you added explicit handling now nevertheless?
>>   
> 
> Yes, in case some eoi-less mode is introduced either by hardware or
> paravirt.  I regard the fact that it works as accidental (though applies
> to  x86 virtualization in general).
> 
>>> I've added an apic test program so we can track these issues
>>> (user/test/x86/apic.c).
>>>
>>> 
>>
>> That's good. BTW, your NMI race fix is still lacking support for the
>> -no-kvm-irqchip case. Will post an according patch later today.
>>   
> 
> Actually, I couldn't get the 5.2 guest to boot with -no-kvm-irqchip: it
> hangs and needs some help by running 'info registers'.

Oh, yes, someone borked something here. That must have happened just
recently I think. And it must be a userspace bug as some older kvm that
happened to hang around here (-74) works fine against latest kernel. No
time to investigate further right now, sorry.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2 ES-OS
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Fix race between pending IRQ and NMI

2008-11-25 Thread Avi Kivity

Jan Kiszka wrote:

But I think I see a bigger issue - if we inject an regular interrupt
while another is pending, then we will encounter this problem.  Looks
like we have to enable the interrupt window after injecting an interrupt
if there are still pending interrupts.



Yeah, probably. I'm just wondering now if we can set
exit-on-interrupt-window while the vcpu state is interruptible (ie.
_before_ the injection). There is some entry check like this for NMIs,
but maybe no for interrupts. Need to check.
  
  

Turns out it's not necessary, since the guest eoi will cause an exit and
allow the code to request an interrupt window.



But you added explicit handling now nevertheless?
  


Yes, in case some eoi-less mode is introduced either by hardware or 
paravirt.  I regard the fact that it works as accidental (though applies 
to  x86 virtualization in general).



I've added an apic test program so we can track these issues
(user/test/x86/apic.c).




That's good. BTW, your NMI race fix is still lacking support for the
-no-kvm-irqchip case. Will post an according patch later today.
  


Actually, I couldn't get the 5.2 guest to boot with -no-kvm-irqchip: it 
hangs and needs some help by running 'info registers'.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Fix race between pending IRQ and NMI

2008-11-24 Thread Jan Kiszka
Avi Kivity wrote:
> Jan Kiszka wrote:
>>> But I think I see a bigger issue - if we inject an regular interrupt
>>> while another is pending, then we will encounter this problem.  Looks
>>> like we have to enable the interrupt window after injecting an interrupt
>>> if there are still pending interrupts.
>>> 
>>
>> Yeah, probably. I'm just wondering now if we can set
>> exit-on-interrupt-window while the vcpu state is interruptible (ie.
>> _before_ the injection). There is some entry check like this for NMIs,
>> but maybe no for interrupts. Need to check.
>>   
> 
> Turns out it's not necessary, since the guest eoi will cause an exit and
> allow the code to request an interrupt window.

But you added explicit handling now nevertheless?

> 
> I've added an apic test program so we can track these issues
> (user/test/x86/apic.c).
> 

That's good. BTW, your NMI race fix is still lacking support for the
-no-kvm-irqchip case. Will post an according patch later today.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2 ES-OS
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Fix race between pending IRQ and NMI

2008-11-22 Thread Avi Kivity

Jan Kiszka wrote:

But I think I see a bigger issue - if we inject an regular interrupt
while another is pending, then we will encounter this problem.  Looks
like we have to enable the interrupt window after injecting an interrupt
if there are still pending interrupts.



Yeah, probably. I'm just wondering now if we can set
exit-on-interrupt-window while the vcpu state is interruptible (ie.
_before_ the injection). There is some entry check like this for NMIs,
but maybe no for interrupts. Need to check.
  


Turns out it's not necessary, since the guest eoi will cause an exit and 
allow the code to request an interrupt window.


I've added an apic test program so we can track these issues 
(user/test/x86/apic.c).


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Fix race between pending IRQ and NMI

2008-11-21 Thread Avi Kivity

Jan Kiszka wrote:

 


enable_nmi_window() should cause an exit once the interrupt has been
injected (likely before the first interrupt handler instruction was
executed, but after the stack frame was created).  So the nmi will not
be delayed.



Right now, you only call enable_nmi_window() if that window is currently
closed - and that's not the common case I'm worried about.
  


That's a thinko.  I'll check if requesting it unconditionally is allowed 
and make it unconditional.



But I think I see a bigger issue - if we inject an regular interrupt
while another is pending, then we will encounter this problem.  Looks
like we have to enable the interrupt window after injecting an interrupt
if there are still pending interrupts.



Yeah, probably. I'm just wondering now if we can set
exit-on-interrupt-window while the vcpu state is interruptible (ie.
_before_ the injection). There is some entry check like this for NMIs,
but maybe no for interrupts. Need to check.
  


I think that it is allowed.

The manual says "These VM exits follow event injection if such injection 
is specified for VM entry." for  both interrupt windows and nmi windows 
(22.6.5 and 22.6.6).


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Fix race between pending IRQ and NMI

2008-11-21 Thread Jan Kiszka
Avi Kivity wrote:
> Jan Kiszka wrote:
>>> The nmi handler could change the tpr to mask the preempted interrupt;
>>> but the code would not notice that.  Once the interrupt was injected the
>>> guest would see an interrupt at a higher priority than it has programmed
>>> the hardware to allow.
>>> 
>>
>> I consider this a bit far fetch. What sane NMI handler would fiddle with
>> the APIC? It would be fairly tricky to properly synchronize this with
>> the rest of the OS.
>>   
> 
> Sure, this is not a realistic guest.
> 
>>> Basically, once we commit to an interrupt via kvm_cpu_get_interrupt(),
>>> we must inject it before the any instruction gets executed.
>>>
>>> I don't think any real guest would notice, though.
>>>
>>> 
>>
>> Well, I have no problems with your approach (when also applied on the
>> user space irqchip path) of keeping the order *if* we can ensure that
>> only the first instruction of the IRQ handler is executed and we will
>> then inject the NMI. Otherwise this opens a prio inversion between IRQs
>> and NMIs. The point is that, unless I'm overseeing some detail right
>> now, your approach will inject the pending NMI only once the guest
>> /happens/ to exit the VM, right? If yes, then it's a no-go IMHO, also
>> for keeping this property with the queue approach.
>>   
> 
> enable_nmi_window() should cause an exit once the interrupt has been
> injected (likely before the first interrupt handler instruction was
> executed, but after the stack frame was created).  So the nmi will not
> be delayed.

Right now, you only call enable_nmi_window() if that window is currently
closed - and that's not the common case I'm worried about.

> 
> But I think I see a bigger issue - if we inject an regular interrupt
> while another is pending, then we will encounter this problem.  Looks
> like we have to enable the interrupt window after injecting an interrupt
> if there are still pending interrupts.

Yeah, probably. I'm just wondering now if we can set
exit-on-interrupt-window while the vcpu state is interruptible (ie.
_before_ the injection). There is some entry check like this for NMIs,
but maybe no for interrupts. Need to check.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2 ES-OS
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Fix race between pending IRQ and NMI

2008-11-20 Thread Avi Kivity

Jan Kiszka wrote:

The nmi handler could change the tpr to mask the preempted interrupt;
but the code would not notice that.  Once the interrupt was injected the
guest would see an interrupt at a higher priority than it has programmed
the hardware to allow.



I consider this a bit far fetch. What sane NMI handler would fiddle with
the APIC? It would be fairly tricky to properly synchronize this with
the rest of the OS.
  


Sure, this is not a realistic guest.


Basically, once we commit to an interrupt via kvm_cpu_get_interrupt(),
we must inject it before the any instruction gets executed.

I don't think any real guest would notice, though.




Well, I have no problems with your approach (when also applied on the
user space irqchip path) of keeping the order *if* we can ensure that
only the first instruction of the IRQ handler is executed and we will
then inject the NMI. Otherwise this opens a prio inversion between IRQs
and NMIs. The point is that, unless I'm overseeing some detail right
now, your approach will inject the pending NMI only once the guest
/happens/ to exit the VM, right? If yes, then it's a no-go IMHO, also
for keeping this property with the queue approach.
  


enable_nmi_window() should cause an exit once the interrupt has been 
injected (likely before the first interrupt handler instruction was 
executed, but after the stack frame was created).  So the nmi will not 
be delayed.


But I think I see a bigger issue - if we inject an regular interrupt 
while another is pending, then we will encounter this problem.  Looks 
like we have to enable the interrupt window after injecting an interrupt 
if there are still pending interrupts.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Fix race between pending IRQ and NMI

2008-11-20 Thread Jan Kiszka
Avi Kivity wrote:
> Avi Kivity wrote:
>> Jan Kiszka wrote:
>>> Jiajun kindly provided me a RHEL kernel and initrd (2.6.18-53-el5) which
>>> I ran for a while (or booted a few times) to trigger the hang. Basically
>>> you need high IRQ load (preferably via LAPIC, to exploit that un-acked
>>> IRQs will block low-prio IRQs as well) + high NMI load (e.g. via NMI
>>> watchdog).
>>>   
>>
>> I was able to reproduce it easily by zapping the mmu every second.
>>
>> Attached is a patch the fixes it for me.  Basically it avoids the nmi
>> path if an interrupt is being injected.  This is closer to my event
>> queue plan, and also is similar to what the code does today with
>> exceptions (avoid ->inject_pending_irq() if an exception is pending).
>>
> 
> Oh, and I think this is more correct than the previous approach of
> letting the nmi preempt the interrupt.
> 
> The nmi handler could change the tpr to mask the preempted interrupt;
> but the code would not notice that.  Once the interrupt was injected the
> guest would see an interrupt at a higher priority than it has programmed
> the hardware to allow.

I consider this a bit far fetch. What sane NMI handler would fiddle with
the APIC? It would be fairly tricky to properly synchronize this with
the rest of the OS.

> 
> Basically, once we commit to an interrupt via kvm_cpu_get_interrupt(),
> we must inject it before the any instruction gets executed.
> 
> I don't think any real guest would notice, though.
> 

Well, I have no problems with your approach (when also applied on the
user space irqchip path) of keeping the order *if* we can ensure that
only the first instruction of the IRQ handler is executed and we will
then inject the NMI. Otherwise this opens a prio inversion between IRQs
and NMIs. The point is that, unless I'm overseeing some detail right
now, your approach will inject the pending NMI only once the guest
/happens/ to exit the VM, right? If yes, then it's a no-go IMHO, also
for keeping this property with the queue approach.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2 ES-OS
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Fix race between pending IRQ and NMI

2008-11-19 Thread Avi Kivity

Avi Kivity wrote:

Jan Kiszka wrote:

Jiajun kindly provided me a RHEL kernel and initrd (2.6.18-53-el5) which
I ran for a while (or booted a few times) to trigger the hang. Basically
you need high IRQ load (preferably via LAPIC, to exploit that un-acked
IRQs will block low-prio IRQs as well) + high NMI load (e.g. via NMI
watchdog).
  


I was able to reproduce it easily by zapping the mmu every second.

Attached is a patch the fixes it for me.  Basically it avoids the nmi 
path if an interrupt is being injected.  This is closer to my event 
queue plan, and also is similar to what the code does today with 
exceptions (avoid ->inject_pending_irq() if an exception is pending).




Oh, and I think this is more correct than the previous approach of 
letting the nmi preempt the interrupt.


The nmi handler could change the tpr to mask the preempted interrupt; 
but the code would not notice that.  Once the interrupt was injected the 
guest would see an interrupt at a higher priority than it has programmed 
the hardware to allow.


Basically, once we commit to an interrupt via kvm_cpu_get_interrupt(), 
we must inject it before the any instruction gets executed.


I don't think any real guest would notice, though.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Fix race between pending IRQ and NMI

2008-11-19 Thread Avi Kivity

Jan Kiszka wrote:

Jiajun kindly provided me a RHEL kernel and initrd (2.6.18-53-el5) which
I ran for a while (or booted a few times) to trigger the hang. Basically
you need high IRQ load (preferably via LAPIC, to exploit that un-acked
IRQs will block low-prio IRQs as well) + high NMI load (e.g. via NMI
watchdog).
  


I was able to reproduce it easily by zapping the mmu every second.

Attached is a patch the fixes it for me.  Basically it avoids the nmi 
path if an interrupt is being injected.  This is closer to my event 
queue plan, and also is similar to what the code does today with 
exceptions (avoid ->inject_pending_irq() if an exception is pending).


Please review (and test if possible).

--
error compiling committee.c: too many arguments to function

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ebf5406..93f9010 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3268,7 +3268,10 @@ static void vmx_intr_assist(struct kvm_vcpu *vcpu)
vmx_update_window_states(vcpu);
 
if (vcpu->arch.nmi_pending && !vcpu->arch.nmi_injected) {
-   if (vcpu->arch.nmi_window_open) {
+   if (vcpu->arch.interrupt.pending) {
+   if (!vcpu->arch.nmi_window_open)
+   enable_nmi_window(vcpu);
+   } else if (vcpu->arch.nmi_window_open) {
vcpu->arch.nmi_pending = false;
vcpu->arch.nmi_injected = true;
} else {


Re: [PATCH] KVM: VMX: Fix race between pending IRQ and NMI

2008-11-16 Thread Jan Kiszka
Avi Kivity wrote:
> Jan Kiszka wrote:
>>> The complexity, as shown by the decision tree, is exploding, however.
>>> 
>>
>> Yes, I had the same impression while gluing the fix together. And I also
>> wondered how things may look like for SVM once we support NMIs there as
>> well. However, I didn't want to add a new regression source by
>> refactoring this sensitive code at this point.
>>   
> 
> Right.  We want a short-term fix (your patch, or perhaps some simplified
> version thereof) and a long-term fix which will be simpler to understand.
> 
> 
>>> I wonder whether this patch can be simplified, and what to do in the
>>> long term.
>>>
>>> First, this clearly has to be in subarch independent code, it's too
>>> complex to live in vmx.c.  Second, how about this:
>>>
>>> - create an 'event' object, that can be an exception, interrupt, or
>>> nmi.  An event can be injected or pending; its other attributes are
>>> vector number, has_error, and exception code.
>>> - an event has 'inject' and 'check_injected' methods
>>> - events live in a queue, either a real array or simply three entries
>>> for the three types which are checked in order of precedence, like now
>>> - the pre-entry logic looks like
>>>
>>> if (!injected_event && queue_len > 0)
>>>injected_event = queue_head()
>>> if (injected_event)
>>>injected_event.inject()
>>>
>>> - the post-exit logic looks like
>>>
>>> if (injected_event)
>>>if (injected_event.check_injected())
>>>injected_event = null
>>>queue_pop()
>>>
>>> The queue is resorted only when injected_event = null.
>>> 
>>
>> Don't understand if I got the last sentence correctly, 
> 
> So long as an event is being injected, we don't want to mess with the
> priority queue.
> 
>> but generally
>> this sounds nice. Let's try to express my understanding:
>>
>> We have an event class of which at most one object is instantiated per
>> type (which are: exception, NMI, IRQ), ie. we will never have more than
>> 3 instances at the same time. These objects are kept in queue, _always_
>> sorted according to their priority (in descending order: exception, NMI,
>> IRQ). On pre-entry, the topmost object is taken and injected to the
>> guest. The queue may only change after the post-exit check (and before
>> the pre-entry injection, of course) so that we can track if some object
>> finally has been successfully injected and can be deleted from the queue
>> (or replaced with a new one, provided by its backing source). Did I get
>> your idea correctly?
>>   
> 
> Yes.
> 
> Reflecting some more, there is some talk in SDM 3A 5.9 about handling
> simultaneous exceptions, and MAY want to use the queue to resolve this
> according to the rules provided there.  On the other hand, we MAY want
> to tear 5.9 our of the manual and live with the existing complexity.

Mmmh, not ever exception is equal - that would add complexity, indeed.

> 
> Even more, we want to be able to remove an event that is in the queue
> but has not been injected yet.  This is useful for svm, where you can
> queue an interrupt when it is masked by eflags.if or apic.tpr, and svm
> will inject it when it becomes unmasked.  In other words, svm implements
> part of our queue in hardware.

That's a smart feature: one reason for vmexits less plus reduced VMM
complexity.

> 
> So when an irq line is asserted, the ioapic locates the victim lapic,
> sets the IRR bit, and IPIs it.  This forces an exit; the uninjected
> interrupt is placed back in the queue.  The lapic code sees a higher
> priority interrupt, yanks the lower interrupt from the queue, replaces
> it with the new interrupt, and we go back to the guest.
> 
>>> Thoughts?  I think this will close the race as well as simplify
>>> things. It's similar to the current logic, except that the if ()s are
>>> taken out
>>> of the context of individual events and placed into a generic context,
>>> once.
>>>
>>> I also want to think about the short term solution.  If you can think of
>>> a simpler way to close the race, I'd like to hear of it.
>>> 
>>
>> Frankly, not at the moment. This patch tries to extend the existing
>> logic only as far as required to fix the issue, while keeping the rest
>> as is to avoid breaking some other corner case. In short: conservative
>> fixing. Unless someone (/me is busy the next weeks) pulls a patch to
>> implement your refactoring idea out of some hat, I would suggest to
>> apply my patch for now, but keep the rework on the todo list.
>>   
> 
> As mentioned above, that's my plan.  I'll see if I can simplify it a
> bit.  Is there an easy way to test it?

Jiajun kindly provided me a RHEL kernel and initrd (2.6.18-53-el5) which
I ran for a while (or booted a few times) to trigger the hang. Basically
you need high IRQ load (preferably via LAPIC, to exploit that un-acked
IRQs will block low-prio IRQs as well) + high NMI load (e.g. via NMI
watchdog).

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] KVM: VMX: Fix race between pending IRQ and NMI

2008-11-16 Thread Avi Kivity

Jan Kiszka wrote:

The complexity, as shown by the decision tree, is exploding, however.



Yes, I had the same impression while gluing the fix together. And I also
wondered how things may look like for SVM once we support NMIs there as
well. However, I didn't want to add a new regression source by
refactoring this sensitive code at this point.
  


Right.  We want a short-term fix (your patch, or perhaps some simplified 
version thereof) and a long-term fix which will be simpler to understand.




I wonder whether this patch can be simplified, and what to do in the
long term.

First, this clearly has to be in subarch independent code, it's too
complex to live in vmx.c.  Second, how about this:

- create an 'event' object, that can be an exception, interrupt, or
nmi.  An event can be injected or pending; its other attributes are
vector number, has_error, and exception code.
- an event has 'inject' and 'check_injected' methods
- events live in a queue, either a real array or simply three entries
for the three types which are checked in order of precedence, like now
- the pre-entry logic looks like

if (!injected_event && queue_len > 0)
   injected_event = queue_head()
if (injected_event)
   injected_event.inject()

- the post-exit logic looks like

if (injected_event)
   if (injected_event.check_injected())
   injected_event = null
   queue_pop()

The queue is resorted only when injected_event = null.



Don't understand if I got the last sentence correctly, 


So long as an event is being injected, we don't want to mess with the 
priority queue.



but generally
this sounds nice. Let's try to express my understanding:

We have an event class of which at most one object is instantiated per
type (which are: exception, NMI, IRQ), ie. we will never have more than
3 instances at the same time. These objects are kept in queue, _always_
sorted according to their priority (in descending order: exception, NMI,
IRQ). On pre-entry, the topmost object is taken and injected to the
guest. The queue may only change after the post-exit check (and before
the pre-entry injection, of course) so that we can track if some object
finally has been successfully injected and can be deleted from the queue
(or replaced with a new one, provided by its backing source). Did I get
your idea correctly?
  


Yes.

Reflecting some more, there is some talk in SDM 3A 5.9 about handling 
simultaneous exceptions, and MAY want to use the queue to resolve this 
according to the rules provided there.  On the other hand, we MAY want 
to tear 5.9 our of the manual and live with the existing complexity.


Even more, we want to be able to remove an event that is in the queue 
but has not been injected yet.  This is useful for svm, where you can 
queue an interrupt when it is masked by eflags.if or apic.tpr, and svm 
will inject it when it becomes unmasked.  In other words, svm implements 
part of our queue in hardware.


So when an irq line is asserted, the ioapic locates the victim lapic, 
sets the IRR bit, and IPIs it.  This forces an exit; the uninjected 
interrupt is placed back in the queue.  The lapic code sees a higher 
priority interrupt, yanks the lower interrupt from the queue, replaces 
it with the new interrupt, and we go back to the guest.


Thoughts?  I think this will close the race as well as simplify things. 
It's similar to the current logic, except that the if ()s are taken out

of the context of individual events and placed into a generic context,
once.

I also want to think about the short term solution.  If you can think of
a simpler way to close the race, I'd like to hear of it.



Frankly, not at the moment. This patch tries to extend the existing
logic only as far as required to fix the issue, while keeping the rest
as is to avoid breaking some other corner case. In short: conservative
fixing. Unless someone (/me is busy the next weeks) pulls a patch to
implement your refactoring idea out of some hat, I would suggest to
apply my patch for now, but keep the rework on the todo list.
  


As mentioned above, that's my plan.  I'll see if I can simplify it a 
bit.  Is there an easy way to test it?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Fix race between pending IRQ and NMI

2008-11-16 Thread Jan Kiszka
Avi Kivity wrote:
> Jan Kiszka wrote:
>> This patch addresses item #2215532 in the kvm bug tracker, but was
>> finally also visible with other Linux guests that use the NMI watchdog:
>>
>> There is a subtle race in kvm-intel between a pending IRQ and a briefly
>> later arriving NMI (e.g. from the watchdog). If the IRQ was injected but
>> the guest exited again on ejection due to some page fault, the flag
>> interrupt.pending remained true. If now some NMI just happened to be
>> pended as well, that one overruled the IRQ and was re-injected instead
>> (what is OK!). But during the next run of vmx_complete_interrupts the
>> originally pending IRQ fell on the floor and was forgotten. That means
>> we dequeued some IRQ from the [A]PIC, but never delivered it,
>> effectively causing a stall of IRQ deliveries. You may guess that it
>> took me a while to understand this...
>>
>> The patch below addresses the issue by turning interrupt.pending into a
>> three-state variable: NONE, QUEUED (but not currently injected), and
>> INJECTED. If we overwrite some IRQ injection with an NMI, the state gets
>> properly updated. Moreover, we only transit from INJECTED to NONE to
>> avoid loosing IRQs.
>>
>> To simplify review and maintenance, the patch aligns the decision
>> pattern in vmx_intr_assist with do_interrupt_requests.
>>
>>   
> 
> Good catch.
> 
> The complexity, as shown by the decision tree, is exploding, however.

Yes, I had the same impression while gluing the fix together. And I also
wondered how things may look like for SVM once we support NMIs there as
well. However, I didn't want to add a new regression source by
refactoring this sensitive code at this point.

> I wonder whether this patch can be simplified, and what to do in the
> long term.
> 
> First, this clearly has to be in subarch independent code, it's too
> complex to live in vmx.c.  Second, how about this:
> 
> - create an 'event' object, that can be an exception, interrupt, or
> nmi.  An event can be injected or pending; its other attributes are
> vector number, has_error, and exception code.
> - an event has 'inject' and 'check_injected' methods
> - events live in a queue, either a real array or simply three entries
> for the three types which are checked in order of precedence, like now
> - the pre-entry logic looks like
> 
> if (!injected_event && queue_len > 0)
>injected_event = queue_head()
> if (injected_event)
>injected_event.inject()
> 
> - the post-exit logic looks like
> 
> if (injected_event)
>if (injected_event.check_injected())
>injected_event = null
>queue_pop()
> 
> The queue is resorted only when injected_event = null.

Don't understand if I got the last sentence correctly, but generally
this sounds nice. Let's try to express my understanding:

We have an event class of which at most one object is instantiated per
type (which are: exception, NMI, IRQ), ie. we will never have more than
3 instances at the same time. These objects are kept in queue, _always_
sorted according to their priority (in descending order: exception, NMI,
IRQ). On pre-entry, the topmost object is taken and injected to the
guest. The queue may only change after the post-exit check (and before
the pre-entry injection, of course) so that we can track if some object
finally has been successfully injected and can be deleted from the queue
(or replaced with a new one, provided by its backing source). Did I get
your idea correctly?

> 
> Thoughts?  I think this will close the race as well as simplify things. 
> It's similar to the current logic, except that the if ()s are taken out
> of the context of individual events and placed into a generic context,
> once.
> 
> I also want to think about the short term solution.  If you can think of
> a simpler way to close the race, I'd like to hear of it.

Frankly, not at the moment. This patch tries to extend the existing
logic only as far as required to fix the issue, while keeping the rest
as is to avoid breaking some other corner case. In short: conservative
fixing. Unless someone (/me is busy the next weeks) pulls a patch to
implement your refactoring idea out of some hat, I would suggest to
apply my patch for now, but keep the rework on the todo list.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] KVM: VMX: Fix race between pending IRQ and NMI

2008-11-16 Thread Avi Kivity

Jan Kiszka wrote:

This patch addresses item #2215532 in the kvm bug tracker, but was
finally also visible with other Linux guests that use the NMI watchdog:

There is a subtle race in kvm-intel between a pending IRQ and a briefly
later arriving NMI (e.g. from the watchdog). If the IRQ was injected but
the guest exited again on ejection due to some page fault, the flag
interrupt.pending remained true. If now some NMI just happened to be
pended as well, that one overruled the IRQ and was re-injected instead
(what is OK!). But during the next run of vmx_complete_interrupts the
originally pending IRQ fell on the floor and was forgotten. That means
we dequeued some IRQ from the [A]PIC, but never delivered it,
effectively causing a stall of IRQ deliveries. You may guess that it
took me a while to understand this...

The patch below addresses the issue by turning interrupt.pending into a
three-state variable: NONE, QUEUED (but not currently injected), and
INJECTED. If we overwrite some IRQ injection with an NMI, the state gets
properly updated. Moreover, we only transit from INJECTED to NONE to
avoid loosing IRQs.

To simplify review and maintenance, the patch aligns the decision
pattern in vmx_intr_assist with do_interrupt_requests.

  


Good catch.

The complexity, as shown by the decision tree, is exploding, however.  I 
wonder whether this patch can be simplified, and what to do in the long 
term.


First, this clearly has to be in subarch independent code, it's too 
complex to live in vmx.c.  Second, how about this:


- create an 'event' object, that can be an exception, interrupt, or 
nmi.  An event can be injected or pending; its other attributes are 
vector number, has_error, and exception code.

- an event has 'inject' and 'check_injected' methods
- events live in a queue, either a real array or simply three entries 
for the three types which are checked in order of precedence, like now

- the pre-entry logic looks like

if (!injected_event && queue_len > 0)
   injected_event = queue_head()
if (injected_event)
   injected_event.inject()

- the post-exit logic looks like

if (injected_event)
   if (injected_event.check_injected())
   injected_event = null
   queue_pop()

The queue is resorted only when injected_event = null.

Thoughts?  I think this will close the race as well as simplify things.  
It's similar to the current logic, except that the if ()s are taken out 
of the context of individual events and placed into a generic context, once.


I also want to think about the short term solution.  If you can think of 
a simpler way to close the race, I'd like to hear of it.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: VMX: Fix race between pending IRQ and NMI

2008-11-10 Thread Jan Kiszka
This patch addresses item #2215532 in the kvm bug tracker, but was
finally also visible with other Linux guests that use the NMI watchdog:

There is a subtle race in kvm-intel between a pending IRQ and a briefly
later arriving NMI (e.g. from the watchdog). If the IRQ was injected but
the guest exited again on ejection due to some page fault, the flag
interrupt.pending remained true. If now some NMI just happened to be
pended as well, that one overruled the IRQ and was re-injected instead
(what is OK!). But during the next run of vmx_complete_interrupts the
originally pending IRQ fell on the floor and was forgotten. That means
we dequeued some IRQ from the [A]PIC, but never delivered it,
effectively causing a stall of IRQ deliveries. You may guess that it
took me a while to understand this...

The patch below addresses the issue by turning interrupt.pending into a
three-state variable: NONE, QUEUED (but not currently injected), and
INJECTED. If we overwrite some IRQ injection with an NMI, the state gets
properly updated. Moreover, we only transit from INJECTED to NONE to
avoid loosing IRQs.

To simplify review and maintenance, the patch aligns the decision
pattern in vmx_intr_assist with do_interrupt_requests.

Signed-off-by: Jan Kiszka <[EMAIL PROTECTED]>
---
 arch/x86/include/asm/kvm_host.h |6 +++
 arch/x86/kvm/vmx.c  |   61 +++-
 arch/x86/kvm/x86.h  |4 +-
 3 files changed, 49 insertions(+), 22 deletions(-)

Index: b/arch/x86/include/asm/kvm_host.h
===
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -301,7 +301,11 @@ struct kvm_vcpu_arch {
} exception;
 
struct kvm_queued_interrupt {
-   bool pending;
+   enum {
+   KVMIRQ_NONE,
+   KVMIRQ_QUEUED,
+   KVMIRQ_INJECTED
+   } pending;
u8 nr;
} interrupt;
 
Index: b/arch/x86/kvm/vmx.c
===
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1037,7 +1037,7 @@ static int set_guest_debug(struct kvm_vc
 
 static int vmx_get_irq(struct kvm_vcpu *vcpu)
 {
-   if (!vcpu->arch.interrupt.pending)
+   if (vcpu->arch.interrupt.pending == KVMIRQ_NONE)
return -1;
return vcpu->arch.interrupt.nr;
 }
@@ -2487,9 +2487,16 @@ static void do_interrupt_requests(struct
}
if (vcpu->arch.nmi_injected) {
vmx_inject_nmi(vcpu);
+   if (vcpu->arch.interrupt.pending == KVMIRQ_INJECTED)
+   /*
+* Degrade pending state, we will properly reinject
+* after the NMI.
+*/
+   vcpu->arch.interrupt.pending = KVMIRQ_QUEUED;
if (vcpu->arch.nmi_pending || kvm_run->request_nmi_window)
enable_nmi_window(vcpu);
-   else if (vcpu->arch.irq_summary
+   else if (vcpu->arch.interrupt.pending != KVMIRQ_NONE
+|| vcpu->arch.irq_summary
 || kvm_run->request_interrupt_window)
enable_irq_window(vcpu);
return;
@@ -2498,14 +2505,18 @@ static void do_interrupt_requests(struct
enable_nmi_window(vcpu);
 
if (vcpu->arch.interrupt_window_open) {
-   if (vcpu->arch.irq_summary && !vcpu->arch.interrupt.pending)
+   if (vcpu->arch.irq_summary &&
+   vcpu->arch.interrupt.pending == KVMIRQ_NONE)
kvm_do_inject_irq(vcpu);
 
-   if (vcpu->arch.interrupt.pending)
+   if (vcpu->arch.interrupt.pending != KVMIRQ_NONE) {
+   vcpu->arch.interrupt.pending = KVMIRQ_INJECTED;
vmx_inject_irq(vcpu, vcpu->arch.interrupt.nr);
+   }
}
if (!vcpu->arch.interrupt_window_open &&
-   (vcpu->arch.irq_summary || kvm_run->request_interrupt_window))
+   (vcpu->arch.irq_summary || kvm_run->request_interrupt_window
+|| vcpu->arch.interrupt.pending != KVMIRQ_NONE))
enable_irq_window(vcpu);
 }
 
@@ -2624,7 +2635,8 @@ static int handle_exception(struct kvm_v
cr2 = vmcs_readl(EXIT_QUALIFICATION);
KVMTRACE_3D(PAGE_FAULT, vcpu, error_code, (u32)cr2,
(u32)((u64)cr2 >> 32), handler);
-   if (vcpu->arch.interrupt.pending || 
vcpu->arch.exception.pending)
+   if (vcpu->arch.interrupt.pending != KVMIRQ_NONE
+   || vcpu->arch.exception.pending)
kvm_mmu_unprotect_page_virt(vcpu, cr2);
return kvm_mmu_page_fault(vcpu, cr2, error_code);
}
@@ -3244,7 +3256,8 @@ static void vmx_complete_interrupts(stru