On Thu, 2007-06-21 at 13:31 +0000, Gregory Haskins wrote:
> On Thu, 2007-06-21 at 21:02 +0800, Dong, Eddie wrote:
> > 
> > Achitectually not only that. A premature IRR->ISR will cause guest OS
> > confuse in many place. A guest (say BIOS) may turn from interrupt
> > enabled mode to polling mode which polls IRR to detect if 
> > there is pending IRQ to handle. In this case we have trouble.
> 
> I completely agree.  But what I am saying is that I can eliminate the
> premature IRR->ISR with the change I am proposing.

After thinking about this some more, I think I finally understand your
point.  I was thinking that you were objecting to the multiple unacked
vectors in ISR.  But really this could be a problem for *any* unacked
vectors?  Hmm..if this is the case, you were right and I was wrong.  My
bad :)

So assuming this newly enlightened position is true, I think this means
we have a choice:

1) Drop support for mixed "level-1" mode and move the PIC to the kernel
now as Eddie is doing

2) Keep the level-1/2 distinction, and add support for making sure that
once a vector is acked in the PIC, it truely gets put into service
immediately.

I can think of a really simple interface for (2).  All we really need to
do is 

a) go back to synchronous injection (as previously suggested)
b) promote kvm_cpuirq_extint to be higher than nmis and localints.

This will ensure the vector is immediately placed into service even over
in-kernel sources.  Deferred interrupts would still take a higher
priority, but we would never intack the pic if deferred interrupts were
pending since irq.pending would be true.

Thoughts?
-Greg

> 
> > 
> > 
> > > However, as I mentioned we can fix that issue with just a few new
> > > lines of 
> > > code and by
> > > reverting the userspace injection model to the one used in prior
> > > releases without having to implement an entire in-kernel PIC to do it.
> > 
> > In-kernel PIC gives us a full in kernel irqchip solution and performance
> > gain. Some OSes may use PIC only.
> 
> Agreed.  I will answer this down below with the level-0/1/2 question
> since they are related.
> 
> 
> > 
> > > 
> > > I think moving the PIC into the kernel has the advantage of allowing
> > > us to move the PIT into the kernel too (which is huge, IMHO), but
> > 
> > Not very big, I just want to do one by one. we have done the code
> > in Xen already.
> > 
> > > that is its sole advantage.  Don't get me wrong: I am in favor of
> > > doing it. However, I wanted to point out that this particular
> > > solution to the problem you found essentially is invalidating
> > > "level-1" mode by only supporting level-0 or level-2, not fixing it. 
> > 
> > Not exactly understand level-0/1/2 meaning? Do u mean we mixed 
> > irqchip mode is a feature requirement? I didn't see the necessity, 
> > maybe I am wrong. But isn't both PIC and APIC in kernel much
> > nature and simple for us?
> 
> At one point there was a debate between moving just the LAPIC, or moving
> everything (LAPIC/PIC/IOAPIC/PIT).  Avi suggested that we start with
> just the LAPIC and see where we were at.  Because we need to remain
> forward and backwards ABI compatible, I added an ioctl for setting the
> in-kernel-pic level.  "0" is backwards compatible mode (everything in
> userspace).  "1" is LAPIC only. "2" is presumable everything in the
> kernel, but today only 0 and 1 are the only two defined so there could
> be more than 3 levels someday if we wanted.
> 
> > 
> > If you really think supporting mixed irqchip mode is a must
> 
> I will leave it to Avi to decide since he implicitly suggested it.  But
> suffice to say, if we *dont* want to support it we will need to get the
> other in-kernel stuff into the lapic branch in its entirety before it
> can be merged so we dont create an ABI issue.
> 
> 
> > , we 
> > need to introduce an intack I/F to notify user level irqchip if the irq 
> > is really ACKed or not. We can do that but I doubt the necessity.
> 
> This is where we disagree, as I know you have mentioned this before.
> The solution I am proposing (which reverts the userspace injection
> method) would not require this ACK that you mention.  The reason is that
> the synchronous nature of the injection takes care of things naturally.
> Here is how:
> 
> With sync-injection, we can only move a vector from the IRR->ISR if
> RFLAGS.IF and !irq.pending.  Once we inject, the vector will move to ISR
> and get queued in kernel (thereby setting an irq.pending bit).  It may
> not inject to the guest immediately, but thats ok because no new vectors
> will be acked in the PIC until that previous vector is processed
> (because irq.pending will remain set).  Therefore, we will never
> prematurely move IRR->ISR until the previous vector really is
> "in-service".  Does this make sense?
> 
> My suggestion assumes that a single vector can stay in ISR without
> actually being in service without ill effect (Todays v9 code could
> presumably have multiple vectors in ISR without being in service, which
> I admit is bogus).  If this assumption is false, then I agree with you
> that we need extra intack I/F.  What are you thoughts here?
> 
> 
> > 
> > > Perhaps we are "ok" with that, but I was under the distinct
> > > impression from Avi et. al. that these variable levels of support
> > > were a design goal for debugging purposes. 
> > > 
> > > I would prefer that we just fix level-1 mode with the changes I
> > > mentioned instead of just disabling it (in addition to adding level-2
> > > mode as Eddie is working on). 
> > > 
> > >> 
> > >> This patch fixes this by introducing new VM level IOCTL
> > >> KVM_SET_ISA_IRQ_LEVEL & KVM_CREATE_PIC to avoid the ABI hole.  The
> > >> original KVM_INTERRUPT is still there to be backward compatible.
> > >> 
> > >> WIth this patch, old Qemu, new qemu (after this patch), old kvm, new
> > >> kvm can work in any combination. Both user level code and kernel
> > >> code will automatically decide the irq source base on irqchip
> > >> location (user or kernel). 
> > >> 
> > >> Some known issues (TODO):
> > >>  1: SVM support is on going.
> > >>  The code for VMX to inject irq is same with Xen now since the
> > >> situation is same now (irqchip in hyprevisor), SVM code should be
> > >> able to directly reuse from Xen too.
> > >>  2: live migration break.
> > >>  3: Temply APIC support is removed in CPUID to wait for the merge
> > >> of in kernel APIC patch
> > > 
> > > Note that you will need more than just the APIC patch.  My patch only
> > > moves the LAPIC down.  The IOAPIC and 8259s are still in userspace.
> > > Your patch only moves the 8259s down.  This means there is a gap where
> > > the IOAPIC used to be that still needs to be addressed.
> > 
> > Why do u think I/O APIC must be moved down too? A new IOCTL like
> > KVM_IOAPIC_MESSAGE can solve the interface issue IMO and quit 
> > natual.
> 
> I suppose, but it somewhat defeats the purpose IMO.  Every pin in the
> 8259 that gets tickled implicitly means an IOAPIC pin was tickled also.
> Do we really want to go to userspace for that?  Essentially what happens
> then is the PIT ends up having to go through userspace for every tick at
> that point.  On the flip side, the IOAPIC model is very simple so I
> think it makes more sense to move it one to one with the PIC.
> 
> Regards,
> -Greg
> 
> 


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Reply via email to