Jan Kiszka wrote:
The complexity, as shown by the decision tree, is exploding, however.

Yes, I had the same impression while gluing the fix together. And I also
wondered how things may look like for SVM once we support NMIs there as
well. However, I didn't want to add a new regression source by
refactoring this sensitive code at this point.

Right. We want a short-term fix (your patch, or perhaps some simplified version thereof) and a long-term fix which will be simpler to understand.


I wonder whether this patch can be simplified, and what to do in the
long term.

First, this clearly has to be in subarch independent code, it's too
complex to live in vmx.c.  Second, how about this:

- create an 'event' object, that can be an exception, interrupt, or
nmi.  An event can be injected or pending; its other attributes are
vector number, has_error, and exception code.
- an event has 'inject' and 'check_injected' methods
- events live in a queue, either a real array or simply three entries
for the three types which are checked in order of precedence, like now
- the pre-entry logic looks like

if (!injected_event && queue_len > 0)
   injected_event = queue_head()
if (injected_event)
   injected_event.inject()

- the post-exit logic looks like

if (injected_event)
   if (injected_event.check_injected())
       injected_event = null
       queue_pop()

The queue is resorted only when injected_event = null.

Don't understand if I got the last sentence correctly,

So long as an event is being injected, we don't want to mess with the priority queue.

but generally
this sounds nice. Let's try to express my understanding:

We have an event class of which at most one object is instantiated per
type (which are: exception, NMI, IRQ), ie. we will never have more than
3 instances at the same time. These objects are kept in queue, _always_
sorted according to their priority (in descending order: exception, NMI,
IRQ). On pre-entry, the topmost object is taken and injected to the
guest. The queue may only change after the post-exit check (and before
the pre-entry injection, of course) so that we can track if some object
finally has been successfully injected and can be deleted from the queue
(or replaced with a new one, provided by its backing source). Did I get
your idea correctly?

Yes.

Reflecting some more, there is some talk in SDM 3A 5.9 about handling simultaneous exceptions, and MAY want to use the queue to resolve this according to the rules provided there. On the other hand, we MAY want to tear 5.9 our of the manual and live with the existing complexity.

Even more, we want to be able to remove an event that is in the queue but has not been injected yet. This is useful for svm, where you can queue an interrupt when it is masked by eflags.if or apic.tpr, and svm will inject it when it becomes unmasked. In other words, svm implements part of our queue in hardware.

So when an irq line is asserted, the ioapic locates the victim lapic, sets the IRR bit, and IPIs it. This forces an exit; the uninjected interrupt is placed back in the queue. The lapic code sees a higher priority interrupt, yanks the lower interrupt from the queue, replaces it with the new interrupt, and we go back to the guest.

Thoughts? I think this will close the race as well as simplify things. It's similar to the current logic, except that the if ()s are taken out
of the context of individual events and placed into a generic context,
once.

I also want to think about the short term solution.  If you can think of
a simpler way to close the race, I'd like to hear of it.

Frankly, not at the moment. This patch tries to extend the existing
logic only as far as required to fix the issue, while keeping the rest
as is to avoid breaking some other corner case. In short: conservative
fixing. Unless someone (/me is busy the next weeks) pulls a patch to
implement your refactoring idea out of some hat, I would suggest to
apply my patch for now, but keep the rework on the todo list.

As mentioned above, that's my plan. I'll see if I can simplify it a bit. Is there an easy way to test it?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to