Hi Eddie,
  Back from vacation. Just catching up on email now....

On Thu, 2007-06-07 at 16:20 +0800, Dong, Eddie wrote:
>
> Greg:
>       Here are some detail comments towarding the LAPIC device model.
>       1: irqabstraction layer
>       vcpu->irq.pending holds the abstract processor interrupt request
> (called localint, extint, nmi etc in V09).  But API kvm_vcpu_intr only
> set the interrupt request, no clear.

By design

>   So far I only noticed when an
> interrupt request is pop or injected,  vcpu->irq.pending bit may be
> cleared. But there is case when guest raise TPR bar, an localint could
> be cleared.

Yep, by design.  And it will be set again when the TPR bar is lowered.

>  Same for extint when PIC IMR is masked.

Yeah, we have a hole here.  If IMR is changed to mask a pending vector,
it is conceivable that it could remain pending in the kernel.  Note that
this hole was always in KVM, even before my patch.  Because of this I am
of the opinion that it doesn't need to be fixed as part of the LAPIC
work per se before merging.  However, that being said, we can lay the
groundwork to support this now by adding an "int level" to the structure
that is passed in the KVM_ISA_INTERRUPT ioctl.  Going forward, someone
can patch the QEMU::i8259 to send clear events when IMR changes.  

>  
>       In Xen, since there is no notion of abstract processor interrupt
> request, VMM check interrupt directly from PIC and APIC, so Xen doesn't
> have problem. I guess Windows will fail here.

No, this is incorrect (and Windows boots fine even w/ACPI enabled, BTW).
For the LAPIC, we do check the LAPIC model directly (and therefore
consider TPR, etc).  The abstract interrupt mechanism simply tells the
vcpu that it needs to check the LAPIC.  It is not authoritative in
deciding if something actually gets injected.

For the PIC, we don't check directly (the PICs vectors are cached in the
kernel), but again note that this is how KVM has always worked.  I think
fixing the KVM_ISA_INTERRUPT::level mechanism + adding IMR-clear support
shores up any issue there, however. 

> 
>       2: APIC timer
>       a: V09 uses hrtimer for LAPIC timer, apic->timer.last_update is
> updated every time when __apic_timer_fn is invoked at time of the APIC
> timer fired. This impose an accumulated difference since the fire time
> is already some ns later after expected time.
>       Xen solve this issue by increase apic->timer.last_update with
> the PERIOD, i.e. APIC_BUS_CYCLE_NS * apic->timer.divide_count *
> APIC_TMICT.
>       b: Seems current approach starts hrtimer whenever APIC_TMICT is
> updated. Should we check APIC_LVT to see if it is masked here? (instead
> of doing in its callback function:__apic_timer_fn). Also why APIC_TMCCT
> is updated here? I think TMCCT is reloaded only when it reaches 0 and
> LVTT works in periodic mode.
>       c: I didn't see LVTT mask status refelect the hrtimer
> cancel/start, do I miss something?

I inherited most of lapic.c from Dor, and I believe he inherited most of
it from an older version of Xen.  While I have come to understand much
of the inner workings of the LAPIC during the course of developing this
patch, the timer is still a relative enigma to me.  Therefore, I do not
have any comment as to the reasons why something was done here the way
it was, nor to the validity of the problems you are highlighting.
Perhaps Dor will know.

But that being said, patches against v09 to fix problems you see are
always welcome.


> 
>       3:  Assume a senario there is an valid IDT_VECTORING_INFO_FIELD,
> following code(after patch) in handle_exception push back the failed
> interrupt vector, i.e. vcpu->irq.deferred.
> 
> ........
>         if (is_external_interrupt(vect_info)) {
>                 /*
>                  * An exception was taken while we were trying to inject
> an
>                  * IRQ.  We must defer the injection of the vector until
>                  * the next window.
>                  */
>                 int irq = vect_info & VECTORING_INFO_VECTOR_MASK;
>                 kvm_vcpu_irq_push(vcpu, irq);
>         }
> 
> 
>       a:  Now, the abstracted processor interrupt, i.e.
> vcpu->irq.pending, could be 0, __kvm_vcpu_irq_pending invoked at
> beginning of do_interrupt_requests will set kvm_irqpin_localint in the
> abstracted processor interrupt (vcpu->irq.pending). But if at same time,
> we get an external IRQ, i.e. vcpu->irq.pending is set with both localint
> & extint.
>       From following code , kvm_irqpin_extint has higher priority than
> kvm_irqpin_localint.

Yes, I understand this scenario.  It is the same problem I was trying to
describe earlier when I said extint/nmi can inadvertently get
prioritized over deferred.  I will fix this.

> 
>         while (pending) {
>                 kvm_irqpin_t pin = __fls(pending);
> 
>                 switch (pin) {
>                 case kvm_irqpin_localint:
>                 case kvm_irqpin_extint:
>                 case kvm_irqpin_nmi:
>                         do_intr_requests(vcpu, kvm_run, pin);
>                         break;
>               ..............
> 
>       Now in do_intr_requests, we get an extirq vector instead of
> vcpu->irq.deferred from following code since pin=kvm_irqpin_extint. That
> means we inject an new irq instead of original failed irq in
> IDT_VECTORING_INFO_FIELD.
> 
>               ........
>                 switch (pin) {
>                 case kvm_irqpin_localint:
>                         r = kvm_vcpu_irq_pop(vcpu, &ack);
>                         break;
>                 case kvm_irqpin_extint:
>                         r = kvm_irqdevice_ack(&vcpu->kvm->isa_irq, 0,
> &ack);
>                         if (!(ack.flags &
> KVM_IRQACKDATA_VECTOR_PENDING))
>                                 __clear_bit(pin, &vcpu->irq.pending);
>                         break;
>                 case kvm_irqpin_nmi:
> 
> 
>       Anyway due to SMP & in-kernel APIC, I'd like to suggest we move
> IDT_VECTORING_INFO_FIELD to do_interrupt_requests like vmx_intr_assist
> in Xen where physical IRQ is disabled.

I haven't looked at the new Xen code, but I will try to take a peek.
Its probably moot since I am confident that the fix I am suggesting
allows the deferred mechanism to work the way I intended, but I will
keep an open mind to alternative solutions.

> 
> 
> 
> thx, eddie


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
kvm-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Reply via email to