Hello, After upgrading to QEMU 1.7.0, CentOS 5.x guests often fail to boot with the following kernel apic=debug output:
> ACPI: Core revision 20060707 > enabled ExtINT on CPU#0 > ENABLING IO-APIC IRQs > ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1 > ..MP-BIOS bug: 8254 timer not connected to IO-APIC > ...trying to set up timer (IRQ0) through the 8259A ... failed. > ...trying to set up timer as Virtual Wire IRQ... failed. > ...trying to set up timer as ExtINT IRQ... failed . > Kernel panic - not syncing: IO-APIC + timer doesn't work! Try using the 'noapic' kernel parameter This happens greater than 50% of the time in my configuration. Adding the noapic or no_timer_check parameter causes the guest to boot properly. I'd like to find a way to restore the previous behavior, which didn't require these guest kernel parameters. Host is a fully updated Fedora 20, kernel 3.12.10-300.fc20.x86_64 with an Intel Core i5-2500 CPU. Guest is a fully updated base install of CentOS 5.10, kernel 2.6.18-371.4.1.el5.x86_64 (installed with "noapic", but booted with default parameters). QEMU invocation: ./x86_64-softmmu/qemu-system-x86_64 -m 4096 -cpu host -enable-kvm -drive file=~/ddn-001.img,cache=off -serial telnet:0.0.0.0:4444,server,nowait A git bisect points to this commit as the culprit: b1bbfe7 aio / timers: On timer modification, qemu_notify or aio_notify which was part of the Aug 2013 timer rewrite. Reverting this hunk in particular makes the issue go away: > @@ -522,9 +531,7 @@ void qemu_mod_timer_ns(QEMUTimer *ts, int64_t expire_time) > } > /* Interrupt execution to force deadline recalculation. */ > qemu_clock_warp(ts->timer_list->clock); > - if (use_icount) { > - timerlist_notify(ts->timer_list); > - } > + timerlist_notify(ts->timer_list); > } > } (Note this was later refactored into timerlist_rearm() in 1.7.0, so I mean that I modified timerlist_rearm() in 1.7.0 to read as that hunk did before the b1bbfe7 commit.) This doesn't appear to be a solution, because with the timer rewrite, QEMU moves its periodic (1 ms) qemu_notify_event() call to break out of the main event loop from a SIGALRM handler to the rearm of a QEMU timer. Presumably QEMU is counting on these generic callbacks. It appears that in QEMU 1.7.0, QEMU/KVM doesn't inject timer interrupts, or alternatively the guest doesn't handle them, quickly enough to pass the timer check in the guest kernel reliably. I've found that if I suppress the first 20ms of calls to timerlist_notify() in timerlist_rearm() by timers on the QEMU_CLOCK_VIRTUAL, the system is able to boot successfully and remains stable. Not calling qemu_notify_event() on the first 20 ticks of QEMU_CLOCK_VIRTUAL seems to alter the timings enough to produce a reliable result. I tried this after realizing that the guest kernel enables the HPET, which enables the QEMU virtual clock, immediately before the guest timer check occurs. I also observed that the kernel boots fine with the "nohpet" parameter, and I suspected that this could be a source of resource contention. Finally, the QEMU options to disable KVM PIT IRQ reinjection and to disable the kvm kernel irqchip altogether result in less frequent panics, but the guest still panics within 100 boots. Thanks for any assistance you can provide. Matt