On Thu, 2007-06-21 at 13:31 +0000, Gregory Haskins wrote: > On Thu, 2007-06-21 at 21:02 +0800, Dong, Eddie wrote: > > > > Achitectually not only that. A premature IRR->ISR will cause guest OS > > confuse in many place. A guest (say BIOS) may turn from interrupt > > enabled mode to polling mode which polls IRR to detect if > > there is pending IRQ to handle. In this case we have trouble. > > I completely agree. But what I am saying is that I can eliminate the > premature IRR->ISR with the change I am proposing.
After thinking about this some more, I think I finally understand your point. I was thinking that you were objecting to the multiple unacked vectors in ISR. But really this could be a problem for *any* unacked vectors? Hmm..if this is the case, you were right and I was wrong. My bad :) So assuming this newly enlightened position is true, I think this means we have a choice: 1) Drop support for mixed "level-1" mode and move the PIC to the kernel now as Eddie is doing 2) Keep the level-1/2 distinction, and add support for making sure that once a vector is acked in the PIC, it truely gets put into service immediately. I can think of a really simple interface for (2). All we really need to do is a) go back to synchronous injection (as previously suggested) b) promote kvm_cpuirq_extint to be higher than nmis and localints. This will ensure the vector is immediately placed into service even over in-kernel sources. Deferred interrupts would still take a higher priority, but we would never intack the pic if deferred interrupts were pending since irq.pending would be true. Thoughts? -Greg > > > > > > > > However, as I mentioned we can fix that issue with just a few new > > > lines of > > > code and by > > > reverting the userspace injection model to the one used in prior > > > releases without having to implement an entire in-kernel PIC to do it. > > > > In-kernel PIC gives us a full in kernel irqchip solution and performance > > gain. Some OSes may use PIC only. > > Agreed. I will answer this down below with the level-0/1/2 question > since they are related. > > > > > > > > > > I think moving the PIC into the kernel has the advantage of allowing > > > us to move the PIT into the kernel too (which is huge, IMHO), but > > > > Not very big, I just want to do one by one. we have done the code > > in Xen already. > > > > > that is its sole advantage. Don't get me wrong: I am in favor of > > > doing it. However, I wanted to point out that this particular > > > solution to the problem you found essentially is invalidating > > > "level-1" mode by only supporting level-0 or level-2, not fixing it. > > > > Not exactly understand level-0/1/2 meaning? Do u mean we mixed > > irqchip mode is a feature requirement? I didn't see the necessity, > > maybe I am wrong. But isn't both PIC and APIC in kernel much > > nature and simple for us? > > At one point there was a debate between moving just the LAPIC, or moving > everything (LAPIC/PIC/IOAPIC/PIT). Avi suggested that we start with > just the LAPIC and see where we were at. Because we need to remain > forward and backwards ABI compatible, I added an ioctl for setting the > in-kernel-pic level. "0" is backwards compatible mode (everything in > userspace). "1" is LAPIC only. "2" is presumable everything in the > kernel, but today only 0 and 1 are the only two defined so there could > be more than 3 levels someday if we wanted. > > > > > If you really think supporting mixed irqchip mode is a must > > I will leave it to Avi to decide since he implicitly suggested it. But > suffice to say, if we *dont* want to support it we will need to get the > other in-kernel stuff into the lapic branch in its entirety before it > can be merged so we dont create an ABI issue. > > > > , we > > need to introduce an intack I/F to notify user level irqchip if the irq > > is really ACKed or not. We can do that but I doubt the necessity. > > This is where we disagree, as I know you have mentioned this before. > The solution I am proposing (which reverts the userspace injection > method) would not require this ACK that you mention. The reason is that > the synchronous nature of the injection takes care of things naturally. > Here is how: > > With sync-injection, we can only move a vector from the IRR->ISR if > RFLAGS.IF and !irq.pending. Once we inject, the vector will move to ISR > and get queued in kernel (thereby setting an irq.pending bit). It may > not inject to the guest immediately, but thats ok because no new vectors > will be acked in the PIC until that previous vector is processed > (because irq.pending will remain set). Therefore, we will never > prematurely move IRR->ISR until the previous vector really is > "in-service". Does this make sense? > > My suggestion assumes that a single vector can stay in ISR without > actually being in service without ill effect (Todays v9 code could > presumably have multiple vectors in ISR without being in service, which > I admit is bogus). If this assumption is false, then I agree with you > that we need extra intack I/F. What are you thoughts here? > > > > > > > Perhaps we are "ok" with that, but I was under the distinct > > > impression from Avi et. al. that these variable levels of support > > > were a design goal for debugging purposes. > > > > > > I would prefer that we just fix level-1 mode with the changes I > > > mentioned instead of just disabling it (in addition to adding level-2 > > > mode as Eddie is working on). > > > > > >> > > >> This patch fixes this by introducing new VM level IOCTL > > >> KVM_SET_ISA_IRQ_LEVEL & KVM_CREATE_PIC to avoid the ABI hole. The > > >> original KVM_INTERRUPT is still there to be backward compatible. > > >> > > >> WIth this patch, old Qemu, new qemu (after this patch), old kvm, new > > >> kvm can work in any combination. Both user level code and kernel > > >> code will automatically decide the irq source base on irqchip > > >> location (user or kernel). > > >> > > >> Some known issues (TODO): > > >> 1: SVM support is on going. > > >> The code for VMX to inject irq is same with Xen now since the > > >> situation is same now (irqchip in hyprevisor), SVM code should be > > >> able to directly reuse from Xen too. > > >> 2: live migration break. > > >> 3: Temply APIC support is removed in CPUID to wait for the merge > > >> of in kernel APIC patch > > > > > > Note that you will need more than just the APIC patch. My patch only > > > moves the LAPIC down. The IOAPIC and 8259s are still in userspace. > > > Your patch only moves the 8259s down. This means there is a gap where > > > the IOAPIC used to be that still needs to be addressed. > > > > Why do u think I/O APIC must be moved down too? A new IOCTL like > > KVM_IOAPIC_MESSAGE can solve the interface issue IMO and quit > > natual. > > I suppose, but it somewhat defeats the purpose IMO. Every pin in the > 8259 that gets tickled implicitly means an IOAPIC pin was tickled also. > Do we really want to go to userspace for that? Essentially what happens > then is the PIT ends up having to go through userspace for every tick at > that point. On the flip side, the IOAPIC model is very simple so I > think it makes more sense to move it one to one with the PIC. > > Regards, > -Greg > > ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel