Avi Kivity wrote: > Jan Kiszka wrote: >>> The complexity, as shown by the decision tree, is exploding, however. >>> >> >> Yes, I had the same impression while gluing the fix together. And I also >> wondered how things may look like for SVM once we support NMIs there as >> well. However, I didn't want to add a new regression source by >> refactoring this sensitive code at this point. >> > > Right. We want a short-term fix (your patch, or perhaps some simplified > version thereof) and a long-term fix which will be simpler to understand. > > >>> I wonder whether this patch can be simplified, and what to do in the >>> long term. >>> >>> First, this clearly has to be in subarch independent code, it's too >>> complex to live in vmx.c. Second, how about this: >>> >>> - create an 'event' object, that can be an exception, interrupt, or >>> nmi. An event can be injected or pending; its other attributes are >>> vector number, has_error, and exception code. >>> - an event has 'inject' and 'check_injected' methods >>> - events live in a queue, either a real array or simply three entries >>> for the three types which are checked in order of precedence, like now >>> - the pre-entry logic looks like >>> >>> if (!injected_event && queue_len > 0) >>> injected_event = queue_head() >>> if (injected_event) >>> injected_event.inject() >>> >>> - the post-exit logic looks like >>> >>> if (injected_event) >>> if (injected_event.check_injected()) >>> injected_event = null >>> queue_pop() >>> >>> The queue is resorted only when injected_event = null. >>> >> >> Don't understand if I got the last sentence correctly, > > So long as an event is being injected, we don't want to mess with the > priority queue. > >> but generally >> this sounds nice. Let's try to express my understanding: >> >> We have an event class of which at most one object is instantiated per >> type (which are: exception, NMI, IRQ), ie. we will never have more than >> 3 instances at the same time. These objects are kept in queue, _always_ >> sorted according to their priority (in descending order: exception, NMI, >> IRQ). On pre-entry, the topmost object is taken and injected to the >> guest. The queue may only change after the post-exit check (and before >> the pre-entry injection, of course) so that we can track if some object >> finally has been successfully injected and can be deleted from the queue >> (or replaced with a new one, provided by its backing source). Did I get >> your idea correctly? >> > > Yes. > > Reflecting some more, there is some talk in SDM 3A 5.9 about handling > simultaneous exceptions, and MAY want to use the queue to resolve this > according to the rules provided there. On the other hand, we MAY want > to tear 5.9 our of the manual and live with the existing complexity.
Mmmh, not ever exception is equal - that would add complexity, indeed. > > Even more, we want to be able to remove an event that is in the queue > but has not been injected yet. This is useful for svm, where you can > queue an interrupt when it is masked by eflags.if or apic.tpr, and svm > will inject it when it becomes unmasked. In other words, svm implements > part of our queue in hardware. That's a smart feature: one reason for vmexits less plus reduced VMM complexity. > > So when an irq line is asserted, the ioapic locates the victim lapic, > sets the IRR bit, and IPIs it. This forces an exit; the uninjected > interrupt is placed back in the queue. The lapic code sees a higher > priority interrupt, yanks the lower interrupt from the queue, replaces > it with the new interrupt, and we go back to the guest. > >>> Thoughts? I think this will close the race as well as simplify >>> things. It's similar to the current logic, except that the if ()s are >>> taken out >>> of the context of individual events and placed into a generic context, >>> once. >>> >>> I also want to think about the short term solution. If you can think of >>> a simpler way to close the race, I'd like to hear of it. >>> >> >> Frankly, not at the moment. This patch tries to extend the existing >> logic only as far as required to fix the issue, while keeping the rest >> as is to avoid breaking some other corner case. In short: conservative >> fixing. Unless someone (/me is busy the next weeks) pulls a patch to >> implement your refactoring idea out of some hat, I would suggest to >> apply my patch for now, but keep the rework on the todo list. >> > > As mentioned above, that's my plan. I'll see if I can simplify it a > bit. Is there an easy way to test it? Jiajun kindly provided me a RHEL kernel and initrd (2.6.18-53-el5) which I ran for a while (or booted a few times) to trigger the hang. Basically you need high IRQ load (preferably via LAPIC, to exploit that un-acked IRQs will block low-prio IRQs as well) + high NMI load (e.g. via NMI watchdog). Jan
signature.asc
Description: OpenPGP digital signature