On Thu, May 14, 2009 at 10:34:11PM +0800, Dong, Eddie wrote:
> Gleb Natapov wrote:
> > On Thu, May 14, 2009 at 09:43:33PM +0800, Dong, Eddie wrote:
> >> Avi Kivity wrote:
> >>> Dong, Eddie wrote:
> >>>> OK.
> >>>> Also back to Gleb's question, the reason I want to do that is to
> >>>> simplify event generation mechanism in current KVM.
> >>>> 
> >>>> Today KVM use additional layer of exception/nmi/interrupt such as
> >>>> vcpu.arch.exception.pending, vcpu->arch.interrupt.pending &
> >>>> vcpu->arch.nmi_injected. All those additional layer is due to
> >>>> compete of VM_ENTRY_INTR_INFO_FIELD
> >>>> write to inject the event. Both SVM & VMX has only one resource to
> >>>> inject the virtual event but KVM generates 3 catagory of events in
> >>>> parallel which further requires additional
> >>>> logic to dictate among them.
> >>> 
> >>> I thought of using a queue to hold all pending events (in a common
> >>> format), sort it by priority, and inject the head.
> >> 
> >> The SDM Table 5-4 requires to merge 2 events together, i.e. convert
> >> to #DF/ 
> >> Triple fault or inject serially when 2 events happens no matter NMI,
> >> IRQ or exception. 
> >> 
> >> As if considering above events merging activity, that is a single
> >> element queue. 
> > I don't know how you got to this conclusion from you previous
> > statement. 
> > See explanation to table 5-2 for instate where it is stated that
> > interrupt should be held pending if there is exception with higher
> > priority. Should be held pending where? In the queue, like we do. Note
> > that low prio exceptions are just dropped since they will be
> > regenerated. 
> 
> I have different understanding here.
> My understanding is that "held" means NO INTA in HW, i.e. LAPIC still hold 
> this IRQ.
> 
And what if INTA already happened and CPU is ready to fetch IDT for
interrupt vector and at this very moment CPU faults?

> > 
> >>  We could have either:  1) A pure SW "queue" that will be flush to HW
> >> register later (VM_ENTRY_INTR_INFO_FIELD), 2) Direct use HW register.
> >> 
> > We have three event sources 1) exceptions 2) IRQ 3) NMI. We should
> > have 
> > queue of three elements sorted by priority. On each entry we should
> 
> Table 5-4 alreadys says NMI/IRQ is BENIGN.
Table 5-2 applies here not table 5-4 I think.

> 
> > inject an event with highest priority. And remove it from queue on
> > exit. 
> 
> The problem is that we have to decide to inject only one of above 3, and 
> discard the rest.
> Whether priority them or merge (to one event as Table 5-4) is another story.
Only a small number of event are merged into #DF. Most handled serially
(SDM does not define what serially means unfortunately), so I don't
understand where "discard the rest" is come from. We can discard
exception since it will be regenerated anyway, but IRQ and NMI is
another story. SDM says that IRQ should be held pending (once again not
much explanation here), nothing about NMI.

> > 
> >> 
> >> A potential benefit is that it can avoid duplicated code and
> >> potential bugs 
> >> in current code as following patch shows if I understand correctly:
> >> 
> >> --- a/arch/x86/kvm/vmx.c
> >> +++ b/arch/x86/kvm/vmx.c
> >> @@ -2599,7 +2599,7 @@ static int handle_exception(struct kvm_vcpu
> >>                 *vcpu, struct kvm_run *kvm_run) cr2 =
> >>                 vmcs_readl(EXIT_QUALIFICATION);
> >>                             KVMTRACE_3D(PAGE_FAULT, vcpu,
> >> error_code, (u32)cr2, (u32)((u64)cr2 >> 32), handler); -            
> >> if (vcpu->arch.interrupt.pending || vcpu->arch.exception.pending ) +
> >>                         if (vcpu->arch.interrupt.pending ||
> >>                 vcpu->arch.exception.pending  ||
> >>         vcpu->arch.nmi_injected) kvm_mmu_unprotect_page_virt(vcpu,
> >> cr2); return kvm_mmu_page_fault(vcpu, cr2, error_code); } 
> > This fix is already in Avi's tree (not yet pushed).
> > 
> >> Either way are OK and up to you. BTW Xen uses HW register directly
> >> to representing 
> >> an pending event.
> >> 
> > In this particular case I don't mind to use HW register either, but I
> > don't see any advantage.
> > 
> >>> 
> >>>> One example is that exception has higher priority
> >>>> than NMI/IRQ injection in current code which is not true in
> >>>> reality. 
> >>>> 
> >>> 
> >>> I don't think it matters in practice, since the guest will see it
> >>> as a timing issue.  NMIs and IRQs are asynchronous (even those
> >>> generated by the guest through the local APIC).
> >> 
> >> Yes. But also cause IRQ injection be delayed which may have side
> >> effect. 
> >> For example if guest exception handler is very longer or if guest
> >> VCPU fall into recursive #GP. Within current logic, a guest IRQ
> >> event from KDB (IPI) running 
> >> on VCPU0, as an example, can't force the dead loop VCPU1 into KDB
> >> since it 
> >> is recursively #GP.
> > If one #GP causes another #GP this is a #DF. If CPU has a chance to
> 
> Means another #GP in next instruction i.e. Beginning of #GP handler in guest.
> No #DF here.
> 
In this case we will enter guest with "NMI windows open" request and
should exit immediately before first instruction of #GP handler. At this
moment KVM will be able to inject NMI.

> > executes 
> > something in between KVM will have a chance to inject NMI.
> 
> Could have no chance in some cases though not very common.
> 
> > 
> >> 
> >>> 
> >>>> Another issue is that an failed event from previous injection say
> >>>> IRQ or NMI may be discarded if an virtual exception happens in the
> >>>> EXIT handling now. With the patch of generic double fault handling,
> >>>> this case should be handled as normally.
> >>>> 
> >>> 
> >>> Discarding an exception is usually okay as it will be regenerated. 
> >>> I don't think we discard interrupts or NMIs.
> >> In reality (Running OS in guest), it doesn't happen so far. But
> >> architecturally, 
> >> it could. For example KVM injects an IRQ, but VM Resume get #PF and
> >> back to KVM with IDT_VECTORING valid. Then KVM will put back the
> >> failed 
> >> IRQ to interrupt queue. But if #PF handling generates another
> >> exception, 
> >> then the interrupt queue won't be able to be injected, since KVM
> >> inject 
> >> exception first. And the interrupt queue is discarded at next VM
> >> Exit. 
> >> 
> > I acknowledge the presence of the bug although I was not able to
> > write a test case 
> > to cause it yet, but it is easy to fix this without changing code too
> > much. Unified event queue and clearing of only injected event on exit
> > should do the trick. 
> 
> Yes, minor.
> 
> Eddie

--
                        Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to