Avi Kivity wrote:
> Jan Kiszka wrote:
>> This patch addresses item #2215532 in the kvm bug tracker, but was
>> finally also visible with other Linux guests that use the NMI watchdog:
>>
>> There is a subtle race in kvm-intel between a pending IRQ and a briefly
>> later arriving NMI (e.g. from the watchdog). If the IRQ was injected but
>> the guest exited again on ejection due to some page fault, the flag
>> interrupt.pending remained true. If now some NMI just happened to be
>> pended as well, that one overruled the IRQ and was re-injected instead
>> (what is OK!). But during the next run of vmx_complete_interrupts the
>> originally pending IRQ fell on the floor and was forgotten. That means
>> we dequeued some IRQ from the [A]PIC, but never delivered it,
>> effectively causing a stall of IRQ deliveries. You may guess that it
>> took me a while to understand this...
>>
>> The patch below addresses the issue by turning interrupt.pending into a
>> three-state variable: NONE, QUEUED (but not currently injected), and
>> INJECTED. If we overwrite some IRQ injection with an NMI, the state gets
>> properly updated. Moreover, we only transit from INJECTED to NONE to
>> avoid loosing IRQs.
>>
>> To simplify review and maintenance, the patch aligns the decision
>> pattern in vmx_intr_assist with do_interrupt_requests.
>>
>>   
> 
> Good catch.
> 
> The complexity, as shown by the decision tree, is exploding, however.

Yes, I had the same impression while gluing the fix together. And I also
wondered how things may look like for SVM once we support NMIs there as
well. However, I didn't want to add a new regression source by
refactoring this sensitive code at this point.

> I wonder whether this patch can be simplified, and what to do in the
> long term.
> 
> First, this clearly has to be in subarch independent code, it's too
> complex to live in vmx.c.  Second, how about this:
> 
> - create an 'event' object, that can be an exception, interrupt, or
> nmi.  An event can be injected or pending; its other attributes are
> vector number, has_error, and exception code.
> - an event has 'inject' and 'check_injected' methods
> - events live in a queue, either a real array or simply three entries
> for the three types which are checked in order of precedence, like now
> - the pre-entry logic looks like
> 
> if (!injected_event && queue_len > 0)
>    injected_event = queue_head()
> if (injected_event)
>    injected_event.inject()
> 
> - the post-exit logic looks like
> 
> if (injected_event)
>    if (injected_event.check_injected())
>        injected_event = null
>        queue_pop()
> 
> The queue is resorted only when injected_event = null.

Don't understand if I got the last sentence correctly, but generally
this sounds nice. Let's try to express my understanding:

We have an event class of which at most one object is instantiated per
type (which are: exception, NMI, IRQ), ie. we will never have more than
3 instances at the same time. These objects are kept in queue, _always_
sorted according to their priority (in descending order: exception, NMI,
IRQ). On pre-entry, the topmost object is taken and injected to the
guest. The queue may only change after the post-exit check (and before
the pre-entry injection, of course) so that we can track if some object
finally has been successfully injected and can be deleted from the queue
(or replaced with a new one, provided by its backing source). Did I get
your idea correctly?

> 
> Thoughts?  I think this will close the race as well as simplify things. 
> It's similar to the current logic, except that the if ()s are taken out
> of the context of individual events and placed into a generic context,
> once.
> 
> I also want to think about the short term solution.  If you can think of
> a simpler way to close the race, I'd like to hear of it.

Frankly, not at the moment. This patch tries to extend the existing
logic only as far as required to fix the issue, while keeping the rest
as is to avoid breaking some other corner case. In short: conservative
fixing. Unless someone (/me is busy the next weeks) pulls a patch to
implement your refactoring idea out of some hat, I would suggest to
apply my patch for now, but keep the rework on the todo list.

Jan

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to