Jan Kiszka wrote:
This patch addresses item #2215532 in the kvm bug tracker, but was
finally also visible with other Linux guests that use the NMI watchdog:

There is a subtle race in kvm-intel between a pending IRQ and a briefly
later arriving NMI (e.g. from the watchdog). If the IRQ was injected but
the guest exited again on ejection due to some page fault, the flag
interrupt.pending remained true. If now some NMI just happened to be
pended as well, that one overruled the IRQ and was re-injected instead
(what is OK!). But during the next run of vmx_complete_interrupts the
originally pending IRQ fell on the floor and was forgotten. That means
we dequeued some IRQ from the [A]PIC, but never delivered it,
effectively causing a stall of IRQ deliveries. You may guess that it
took me a while to understand this...

The patch below addresses the issue by turning interrupt.pending into a
three-state variable: NONE, QUEUED (but not currently injected), and
INJECTED. If we overwrite some IRQ injection with an NMI, the state gets
properly updated. Moreover, we only transit from INJECTED to NONE to
avoid loosing IRQs.

To simplify review and maintenance, the patch aligns the decision
pattern in vmx_intr_assist with do_interrupt_requests.


Good catch.

The complexity, as shown by the decision tree, is exploding, however. I wonder whether this patch can be simplified, and what to do in the long term.

First, this clearly has to be in subarch independent code, it's too complex to live in vmx.c. Second, how about this:

- create an 'event' object, that can be an exception, interrupt, or nmi. An event can be injected or pending; its other attributes are vector number, has_error, and exception code.
- an event has 'inject' and 'check_injected' methods
- events live in a queue, either a real array or simply three entries for the three types which are checked in order of precedence, like now
- the pre-entry logic looks like

if (!injected_event && queue_len > 0)
   injected_event = queue_head()
if (injected_event)
   injected_event.inject()

- the post-exit logic looks like

if (injected_event)
   if (injected_event.check_injected())
       injected_event = null
       queue_pop()

The queue is resorted only when injected_event = null.

Thoughts? I think this will close the race as well as simplify things. It's similar to the current logic, except that the if ()s are taken out of the context of individual events and placed into a generic context, once.

I also want to think about the short term solution. If you can think of a simpler way to close the race, I'd like to hear of it.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to