Hi Jan, On Fri, Nov 21, 2008 at 08:54:56AM +0100, Jan Kiszka wrote: > Eduardo Habkost wrote: > > On Thu, Nov 20, 2008 at 12:22:53PM -0200, Eduardo Habkost wrote: > >> Hi, > >> > >> When using a kvm.git kernel as host, I am getting guest boot failures > >> when booting Fedora Rawhide kernel (2.6.27.5-117.fc10.x86_64). Guest > >> stops booting at: > >> > >> ENABLING IO-APIC IRQs > >> ..TIMER: vector=0x30 apic1=0 pin1=0 apic2=-1 pin2=-1 > >> ..MP-BIOS bug: 8254 timer not connected to IO-APIC > >> ...trying to set up timer (IRQ0) through the 8259A ... > >> ..... (found apic 0 pin 0) ... > >> ....... failed. > >> ...trying to set up timer as Virtual Wire IRQ... > >> ..... failed. > >> ...trying to set up timer as ExtINT IRQ... > > > > I've just found out this problem happens because the guest has HZ=1000 > > and the host had HZ=250 and no CONFIG_HIGH_RES_TIMERS. > > > > With this setup, the host is not managing to inject enough timer > > interrupts during the mdelay() loop on timer_irq_works(). > > > > Interesting, and plausible. > > My observation so far is a sporadic test failure, often correlating with > some raised host OS load. I'm running a high-res kernel, but that cannot > prevent that this only 10 ticks long loop of the guest may obtain too > few CPU cycles to handle enough of them once in a while (IIRC, it needs > 4 out of the 10 ticks to declare the timer routing functional).
Using in-kernel PIT? This is a potential problem which can be worked around by disabling the whole thing either via no_timer_check or paravirt equivalent (Glauber?) but for the non-paravirt case it seems its not the culprit. Possible failure scenarios: 1) lpj miscalibration (SMP guests), which kvm-clock deals with. 2) proper lpj calibration, so m/udelay behave as expected, but not enough interrupts can be injected due to CPU starvation as you mention. On my testbox, with each pCPU running a cycle hog on nice -10, the first timer_irq_works call (via IOAPIC) won't fail (guest is truly starved). Host with both CONFIG_PREEMPT/CONFIG_PREEMPT_VOLUNTARY. And moreover, code attempts to first deliver via IOAPIC, then 8259A, then virtual wire. Reports show all three failing. 3) Failure to inject the interrupt will break the in-kernel PIT ack logic. The VMX NMI/IRQ race you fixed can certainly cause this. Can you reproduce it with the fix (and CONFIG_KVM_CLOCK=y) ? Any other possibilities? > Maybe Gleb's anti-coalesce patches for the PIC can also deal with your > timer resolution conflict. At least worth a try... > > Jan > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html