On 02/21/2014 06:27 AM, Alex Bligh wrote:
> 
> On 21 Feb 2014, at 04:34, Matt Lupfer wrote:
>
>>
>> This doesn't appear to be a solution, because with the timer rewrite, QEMU
>> moves its periodic (1 ms) qemu_notify_event() call to break out of
>> the main event loop from a SIGALRM handler to the rearm of a QEMU timer.
>> Presumably QEMU is counting on these generic callbacks.
> 
> This is somewhat bizarre as the code you are reverting causes the main loop
> to be broken out of *more*.
> 
> It's also happening only when someone calls qemu_mod_timer_ns. I'm
> not sure what precisely the kernel is doing there, but perhaps it
> is modifying a timer repeatedly and checking it fires within a given
> time?
> 

Thanks for the response.  The hpet_timer() callback calls timer_mod()
every 1 ms.  That timerlist has no notify callback so it in turn calls
qemu_notify_event().

The guest kernel is only enabling the HPET timer and looking for
timer interrupts.

>> It appears that in QEMU 1.7.0, QEMU/KVM doesn't inject timer interrupts, or
>> alternatively the guest doesn't handle them, quickly enough to pass
>> the timer check in the guest kernel reliably.
> 
> Yes that would suggest a latency type thing. The other thing that may
> have happened is that the work done is being reprioritised, so rather
> than respond to timer events immediately it's off doing some disk I/O
> or similar, though frankly that's hard to understand when the kernel
> is booting.
> 

I did some more debugging and found the problem was elsewhere.  This
different timer behavior is exposing a bug in the HPET implementation.

It's possible for the QEMU timer underlying the HPET to call the hpet_timer()
callback between when the timer is created and when the HPET device is
enabled (both actions initiated by the guest writing to HPET registers).
When this happens, the QEMU timer is rearmed to an expiration
time based on uninitialized values.  That's preventing the system timer
interrupt from ticking in the guest during the timer check at boot.

The changes to the timer implementation just makes this a lot more likely
to happen on CentOS 5.x kernels.

The fix looks straightforward.  I'll send a patch to the list.

Matt

Reply via email to