On Monday 29 December 2008 02:38:07 Marcelo Tosatti wrote: > On Tue, Nov 25, 2008 at 01:52:59PM +0100, Andi Kleen wrote: > > > But yeah - the remapping of HPET timers to virtual HPET timers sounds > > > pretty tough. I wonder if one could overcome that with a little > > > hardware support though ... > > > > For gettimeofday better make TSC work. Even in the best case (no > > virtualization) it is much faster than HPET because it sits in the CPU, > > while HPET is far away on the external south bridge. > > The tsc clock on older Linux 2.6 kernels compensates for lost ticks. > The algorithm uses the PIT count (latched) to measure the delay between > interrupt generation and handling, and sums that value, on the next > interrupt, to the TSC delta. > > Sheng investigated this problem in the discussions before in-kernel PIT > was merged: > > http://www.mail-archive.com/kvm-de...@lists.sourceforge.net/msg13873.html > > The algorithm overcompensates for lost ticks and the guest time runs > faster than the hosts. > > There are two issues: > > 1) A bug in the in-kernel PIT which miscalculates the count value. > > 2) For the case where more than one interrupt is lost, and later > reinjected, the value read from PIT count is meaningless for the purpose > of the tsc algorithm. The count is interpreted as the delay until the > next interrupt, which is not the case with reinjection. > > As Sheng mentioned in the thread above, Xen pulls back the TSC value > when reinjecting interrupts. VMWare ESX has a notion of "virtual TSC", > which I believe is similar in this context. > > For KVM I believe the best immediate solution (for now) is to provide an > option to disable reinjection, behaving similarly to real hardware. The > advantage is simplicity compared to virtualizing the time sources. > > The QEMU PIT emulation has a limit on the rate of interrupt reinjection, > perhaps something similar should be investigated in the future. > > The following patch (which contains the bugfix for 1) and disabled > reinjection) fixes the severe time drift on RHEL4 with "clock=tsc". > What I'm proposing is to condition reinjection with an option > (-kvm-pit-no-reinject or something).
I agree that it should go with a user space option to disable rejection, as it's hard to overcome the problem that we delayed interrupt injection... -- regards Yang, Sheng > Comments or better ideas? > > > diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c > index e665d1c..608af7b 100644 > --- a/arch/x86/kvm/i8254.c > +++ b/arch/x86/kvm/i8254.c > @@ -201,13 +201,16 @@ static int __pit_timer_fn(struct kvm_kpit_state *ps) > if (!atomic_inc_and_test(&pt->pending)) > set_bit(KVM_REQ_PENDING_TIMER, &vcpu0->requests); > > + if (atomic_read(&pt->pending) > 1) > + atomic_set(&pt->pending, 1); > + > if (vcpu0 && waitqueue_active(&vcpu0->wq)) > wake_up_interruptible(&vcpu0->wq); > > hrtimer_add_expires_ns(&pt->timer, pt->period); > pt->scheduled = hrtimer_get_expires_ns(&pt->timer); > if (pt->period) > - ps->channels[0].count_load_time = > hrtimer_get_expires(&pt->timer); > + ps->channels[0].count_load_time = ktime_get(); > > return (pt->period == 0 ? 0 : 1); > } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html