Re: Clock jumps

2010-06-02 Thread Orion Poplawski

On 05/26/2010 11:31 AM, Alexander Graf wrote:


On 26.05.2010, at 19:10, Orion Poplawski wrote:


On 05/25/2010 12:21 AM, Gleb Natapov wrote:

Adding kvm to CC.

On Mon, May 24, 2010 at 04:06:32PM +, Orion Poplawski wrote:

I have a KVM virtual machine running 2.6.33.4-95.fc13.x86_64 on a CentOS 5.5
host whose clock jumps about 8-12 hours a couple times a day.  I have no idea
what is causing it.  Fedora 12 and Centos 5.5 KVM machines run fine on the same
host.  Is there any debugging I can enable to see what is jumping the clock?

kvm-clock: cpu 0, msr 0:1ba4741, boot clock
kvm-clock: cpu 0, msr 0:1e15741, primary cpu clock
Switching to clocksource kvm-clock
rtc_cmos 00:01: setting system clock to 2010-05-20 16:59:48 UTC (1274374788)



Thanks, though I don't think it made it there.  I'm also not sure it's 
completely limited to KVM, though that is the only running system I am 
currently seeing the problem on.  I also see clock jumps during anaconda 
installs on physical hardware and apparently they have been present since at 
least F11.  Might be unrelated though.

I'm really at a loss of how to debug this though.


Do you have ntpd running inside the guest? I have a bug report lying around 
about 2.6.33 with kvm-clock jumping in time when ntpd is used: 
https://bugzilla.novell.com/show_bug.cgi?id=582260

Alex



Turning off ntpd and chronyd did not help for me.


--
Orion Poplawski
Technical Manager 303-415-9701 x222
NWRA/CoRA DivisionFAX: 303-415-9702
3380 Mitchell Lane  or...@cora.nwra.com
Boulder, CO 80301  http://www.cora.nwra.com
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Clock jumps

2010-05-27 Thread john stultz
On Fri, 2010-05-28 at 02:33 +0200, Bernhard Schmidt wrote:
> On 28.05.2010 02:00, john stultz wrote:
> > Looking at the diff:
> > --- dmesg-lenny 2010-05-27 16:45:33.0 -0700
> > +++ dmesg-squeeze   2010-05-27 16:46:14.0 -0700
> > @@ -132,8 +132,8 @@
> >   console [ttyS1] enabled
> >   hpet clockevent registered
> >   Fast TSC calibration using PIT
> > -Detected 2660.398 MHz processor.
> > -Calibrating delay loop (skipped), value calculated using timer frequency.. 
> > 5320.79 BogoMIPS (lpj=10641592)
> > +Detected 2613.324 MHz processor.
> > +Calibrating delay loop (skipped), value calculated using timer frequency.. 
> > 5226.64 BogoMIPS (lpj=10453296)
> >   Security Framework initialized
> >   Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
> >   Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
> > @@ -160,7 +160,7 @@
> >   CPU0: Intel(R) Xeon(R) CPU3075  @ 2.66GHz stepping 0b
> >   Booting Node   0, Processors  #1
> >   Brought up 2 CPUs
> > -Total of 2 processors activated (10640.79 BogoMIPS).
> > +Total of 2 processors activated (10546.63 BogoMIPS).
> >   NET: Registered protocol family 16
> >   ACPI: bus type pci registered
> >   PCI: MMCONFIG for domain  [bus 00-ff] at [mem 0xe000-0xefff] 
> > (base 0xe000)
> >
> > So you can see in the above the during the second boot the TSC
> > calibration was badly mis-calculated. This was the cause of the skew.
> >
> > Not sure how that might be linked to the distro upgrade. It could have
> > been something like SMI damage during the calibration time, but I
> > thought the calibration loop watched for that.
> >
> > Bernhard: I expect with all those vms, this machine isn't rebooted
> > frequently. So could you look at the logs to see how much the  "Detected
> > .yyy MHz processor." line varies by across a few other boots (if
> > they still exist?).
> 
> Correct, the box isn't rebooted often, but I do have a few dmesg outputs 
> laying around. lpj was always almost the same until the very last boot 
> which screwed up the clock.
> 
> dmesg:[0.00] Linux version 2.6.33 (r...@svr02) (gcc version 
> 4.3.2 (Debian 4.3.2-1.1) ) #1 SMP Sun Mar 7 23:01:45 CET 2010
> dmesg:[0.008005] Calibrating delay loop (skipped), value calculated 
> using timer frequency.. 5226.64 BogoMIPS (lpj=10453296)
> dmesg:[0.288002] Total of 2 processors activated (10546.63 BogoMIPS).

Yea. The bogomips/loops per jiffies are actually calculated with a
different chunk of code (although its interesting it miscalculated in
both cases).

Could you send the "Detected .yyy MHz processor." lines as well?

thanks
-john


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Clock jumps

2010-05-27 Thread Bernhard Schmidt

On 28.05.2010 02:00, john stultz wrote:

Hi John,


Looking at the diff:
--- dmesg-lenny 2010-05-27 16:45:33.0 -0700
+++ dmesg-squeeze   2010-05-27 16:46:14.0 -0700
@@ -132,8 +132,8 @@
  console [ttyS1] enabled
  hpet clockevent registered
  Fast TSC calibration using PIT
-Detected 2660.398 MHz processor.
-Calibrating delay loop (skipped), value calculated using timer frequency.. 
5320.79 BogoMIPS (lpj=10641592)
+Detected 2613.324 MHz processor.
+Calibrating delay loop (skipped), value calculated using timer frequency.. 
5226.64 BogoMIPS (lpj=10453296)
  Security Framework initialized
  Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
  Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
@@ -160,7 +160,7 @@
  CPU0: Intel(R) Xeon(R) CPU3075  @ 2.66GHz stepping 0b
  Booting Node   0, Processors  #1
  Brought up 2 CPUs
-Total of 2 processors activated (10640.79 BogoMIPS).
+Total of 2 processors activated (10546.63 BogoMIPS).
  NET: Registered protocol family 16
  ACPI: bus type pci registered
  PCI: MMCONFIG for domain  [bus 00-ff] at [mem 0xe000-0xefff] 
(base 0xe000)

So you can see in the above the during the second boot the TSC
calibration was badly mis-calculated. This was the cause of the skew.

Not sure how that might be linked to the distro upgrade. It could have
been something like SMI damage during the calibration time, but I
thought the calibration loop watched for that.

Bernhard: I expect with all those vms, this machine isn't rebooted
frequently. So could you look at the logs to see how much the  "Detected
.yyy MHz processor." line varies by across a few other boots (if
they still exist?).


Correct, the box isn't rebooted often, but I do have a few dmesg outputs 
laying around. lpj was always almost the same until the very last boot 
which screwed up the clock.


dmesg:[0.00] Linux version 2.6.33 (r...@svr02) (gcc version 
4.3.2 (Debian 4.3.2-1.1) ) #1 SMP Sun Mar 7 23:01:45 CET 2010
dmesg:[0.008005] Calibrating delay loop (skipped), value calculated 
using timer frequency.. 5226.64 BogoMIPS (lpj=10453296)

dmesg:[0.288002] Total of 2 processors activated (10546.63 BogoMIPS).
dmesg.0:[0.00] Linux version 2.6.33 (r...@svr02) (gcc version 
4.3.2 (Debian 4.3.2-1.1) ) #1 SMP Sun Mar 7 23:01:45 CET 2010
dmesg.0:[0.008005] Calibrating delay loop (skipped), value 
calculated using timer frequency.. 5320.79 BogoMIPS (lpj=10641592)

dmesg.0:[0.274022] Total of 2 processors activated (10640.79 BogoMIPS).
dmesg.1.gz:[0.00] Linux version 2.6.32-rc7 (r...@svr02) (gcc 
version 4.3.2 (Debian 4.3.2-1.1) ) #1 SMP Thu Nov 19 14:36:03 CET 2009
dmesg.1.gz:[0.012004] Calibrating delay loop (skipped), value 
calculated using timer frequency.. 5319.06 BogoMIPS (lpj=10638120)
dmesg.1.gz:[0.016000] Calibrating delay using timer specific 
routine.. 5319.99 BogoMIPS (lpj=10639980)
dmesg.1.gz:[0.260003] Total of 2 processors activated (10639.05 
BogoMIPS).
dmesg.2.gz:[0.00] Linux version 2.6.32-rc7 (r...@svr02) (gcc 
version 4.3.2 (Debian 4.3.2-1.1) ) #1 SMP Thu Nov 19 14:36:03 CET 2009
dmesg.2.gz:[0.012005] Calibrating delay loop (skipped), value 
calculated using timer frequency.. 5319.35 BogoMIPS (lpj=10638712)
dmesg.2.gz:[0.016000] Calibrating delay using timer specific 
routine.. 5319.99 BogoMIPS (lpj=10639990)
dmesg.2.gz:[0.261567] Total of 2 processors activated (10639.35 
BogoMIPS).
dmesg.3.gz:[0.00] Linux version 2.6.32-rc7 (r...@svr02) (gcc 
version 4.3.2 (Debian 4.3.2-1.1) ) #1 SMP Thu Nov 19 14:36:03 CET 2009
dmesg.3.gz:[0.012005] Calibrating delay loop (skipped), value 
calculated using timer frequency.. 5319.97 BogoMIPS (lpj=10639956)
dmesg.3.gz:[0.016000] Calibrating delay using timer specific 
routine.. 5319.99 BogoMIPS (lpj=10639987)
dmesg.3.gz:[0.257152] Total of 2 processors activated (10639.97 
BogoMIPS).
dmesg.4.gz:[0.00] Linux version 2.6.32-rc7 (r...@svr02) (gcc 
version 4.3.2 (Debian 4.3.2-1.1) ) #1 SMP Thu Nov 19 14:36:03 CET 2009
dmesg.4.gz:[0.012005] Calibrating delay loop (skipped), value 
calculated using timer frequency.. 5319.84 BogoMIPS (lpj=10639688)
dmesg.4.gz:[0.016000] Calibrating delay using timer specific 
routine.. 5319.99 BogoMIPS (lpj=10639993)
dmesg.4.gz:[0.253571] Total of 2 processors activated (10639.84 
BogoMIPS).


If necessary I can reboot once more, but I'd like to avoid it.

Bernhard
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Clock jumps

2010-05-27 Thread john stultz
On Thu, 2010-05-27 at 23:48 +0200, Bernhard Schmidt wrote:
> On 27.05.2010 21:08, john stultz wrote:
> > I'd be very interested in hearing more about the host side issue. So
> > this happened with the same kernel that you were using before, with no
> > trouble?
> 
> Correct.
> 
> > Could you also send dmesg output from this boot? And if you can find
> > any older dmesg logs to compare with, send those too?
> 
> See http://users.birkenwald.de/~berni/temp/dmesg-lenny and 
> http://users.birkenwald.de/~berni/temp/dmesg-squeeze . Although running 
> on the same kernel binary the initrd changed greatly when upgrading, so 
> ordering/timing between those two is off.
> 
> Note that the dmesg output is captured right after boot. I think I 
> remember seeing a "TSC unstable" message pretty soon after boot, but I 
> might be mixing it up with my other AMD-based KVM server. I don't hold 
> normal (non-boot) logs that long, so I can't tell for sure.

Looking at the diff:
--- dmesg-lenny 2010-05-27 16:45:33.0 -0700
+++ dmesg-squeeze   2010-05-27 16:46:14.0 -0700
@@ -132,8 +132,8 @@
 console [ttyS1] enabled
 hpet clockevent registered
 Fast TSC calibration using PIT
-Detected 2660.398 MHz processor.
-Calibrating delay loop (skipped), value calculated using timer frequency.. 
5320.79 BogoMIPS (lpj=10641592)
+Detected 2613.324 MHz processor.
+Calibrating delay loop (skipped), value calculated using timer frequency.. 
5226.64 BogoMIPS (lpj=10453296)
 Security Framework initialized
 Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
 Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
@@ -160,7 +160,7 @@
 CPU0: Intel(R) Xeon(R) CPU3075  @ 2.66GHz stepping 0b
 Booting Node   0, Processors  #1
 Brought up 2 CPUs
-Total of 2 processors activated (10640.79 BogoMIPS).
+Total of 2 processors activated (10546.63 BogoMIPS).
 NET: Registered protocol family 16
 ACPI: bus type pci registered
 PCI: MMCONFIG for domain  [bus 00-ff] at [mem 0xe000-0xefff] (base 
0xe000)

So you can see in the above the during the second boot the TSC
calibration was badly mis-calculated. This was the cause of the skew.

Not sure how that might be linked to the distro upgrade. It could have
been something like SMI damage during the calibration time, but I
thought the calibration loop watched for that.

Bernhard: I expect with all those vms, this machine isn't rebooted
frequently. So could you look at the logs to see how much the  "Detected
.yyy MHz processor." line varies by across a few other boots (if
they still exist?).

Ingo/Thomas: Any thoughts, should we be considering dropping the
quick_pit_calibrate() code and always do the slower route?

thanks
-john

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Clock jumps

2010-05-27 Thread Zachary Amsden

On 05/27/2010 12:12 PM, Bernhard Schmidt wrote:

On 27.05.2010 23:53, Zachary Amsden wrote:

Hello Zachary,


I have server side fixes for this kvm-clock which seem to give me a
stable clock on this machine, but for true SMP stability, you will need
Glauber's guest side changes to kvmclock as well. It is impossible to
guarantee strictly monotonic clocksource across multiple CPUs when
frequency is dynamically changing (and also because of the C1E idle
problems).


Is all this relevant only when the host is on TSC? Because I have seen 
these jumps when the host was on HPET and the guests were using 
kvm-clock.


It doesn't matter what the host uses (although the host on TSC with 
unstable TSC can make things worse), tsc and kvmclock sources in the 
guest will be unstable regardless.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Clock jumps

2010-05-27 Thread Zachary Amsden

On 05/27/2010 12:12 PM, Bernhard Schmidt wrote:

On 27.05.2010 23:53, Zachary Amsden wrote:

Hello Zachary,


I have server side fixes for this kvm-clock which seem to give me a
stable clock on this machine, but for true SMP stability, you will need
Glauber's guest side changes to kvmclock as well. It is impossible to
guarantee strictly monotonic clocksource across multiple CPUs when
frequency is dynamically changing (and also because of the C1E idle
problems).


Is all this relevant only when the host is on TSC? Because I have seen 
these jumps when the host was on HPET and the guests were using 
kvm-clock.


Anyway, can you send me both patches? I'd like to try it, but I have 
completely lost track where the up-to-date patches are.


There's more than two, there's quite a bit, I'll send it to the list soon.

Zach
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Clock jumps

2010-05-27 Thread Bernhard Schmidt

On 27.05.2010 23:53, Zachary Amsden wrote:

Hello Zachary,


I have server side fixes for this kvm-clock which seem to give me a
stable clock on this machine, but for true SMP stability, you will need
Glauber's guest side changes to kvmclock as well. It is impossible to
guarantee strictly monotonic clocksource across multiple CPUs when
frequency is dynamically changing (and also because of the C1E idle
problems).


Is all this relevant only when the host is on TSC? Because I have seen 
these jumps when the host was on HPET and the guests were using kvm-clock.


Anyway, can you send me both patches? I'd like to try it, but I have 
completely lost track where the up-to-date patches are.


Bernhard
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Clock jumps

2010-05-27 Thread Bernhard Schmidt

On 27.05.2010 21:08, john stultz wrote:

Hi John,


I'd be very interested in hearing more about the host side issue. So
this happened with the same kernel that you were using before, with no
trouble?


Correct.


Could you also send dmesg output from this boot? And if you can find
any older dmesg logs to compare with, send those too?


See http://users.birkenwald.de/~berni/temp/dmesg-lenny and 
http://users.birkenwald.de/~berni/temp/dmesg-squeeze . Although running 
on the same kernel binary the initrd changed greatly when upgrading, so 
ordering/timing between those two is off.


Note that the dmesg output is captured right after boot. I think I 
remember seeing a "TSC unstable" message pretty soon after boot, but I 
might be mixing it up with my other AMD-based KVM server. I don't hold 
normal (non-boot) logs that long, so I can't tell for sure.


If you need any more info feel free to contact me.

Bernhard
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Clock jumps

2010-05-27 Thread Zachary Amsden

On 05/27/2010 08:32 AM, Bernhard Schmidt wrote:

Alexander Graf  wrote:

Hi,

   

Do you have ntpd running inside the guest? I have a bug report lying
around about 2.6.33 with kvm-clock jumping in time when ntpd is used:
https://bugzilla.novell.com/show_bug.cgi?id=582260
 

I want to chime in here, I have a very similar problem, but not with
ntpd in the guest.

The host was a HP ProLiant DL320 G5p with a Dualcore Xeon3075. System
was a Debian Lenny with a custom 2.6.33 host kernel and a custom
qemu-kvm 0.11.0 .deb ported from Ubuntu. The host is synced with ntpd.

The guests are various Debian Lenny/Squeeze VMs, with a custom kernel
(2.6.33 at the moment) with kvm-clock. Exclusively amd64 guest
kernels, but one system has i386 userland.

With this setup once in a while (maybe every other week) one VM would
have a sudden clock jump, 6-12 hours into the future. No kernel messages
or other log entries than applications complaining about the clock jump
after the fact. Other VMs were unaffected.

Yesterday I did an upgrade to Debian Squeeze. This involved a new
qemu-kvm (0.12.4), but not a new host kernel. I also upgraded the guest
kernels from 2.6.33 to 2.6.33.4.

First of all, after the reboot the host clock was totally unreliable. I
had a constant skew of up to five seconds per minute in the host clock,
which of course affected the VMs as well.  This problem went away when I
changed from tsc into hpet on the host. The host does CPU frequency
scaling which is, as far as I know, known to affect TSC stability. I
think I remember messages about tsc being unstable in earlier boots,
maybe the detection did just not work this time.

Worse, the clock jump issues in the guest appeared much more often than
before. The higher loaded VMs did not survive ten minutes without
jumping several hours ahead.

Situation has stabilized after setting clocksource hpet in the guest
immediately after boot. So it seems kvm-clock has some issues here.

I've seen a preliminary patch floating around on the ML by Zachary
Amsden, but I haven't tried it yet. It talks of backward warps, but so
far I've only seen forward warps (the VM time suddenly jumps into the
future), so it might be unrelated.
   


I have an AMD Turion TL-52 machine with unreliable TSC.  It varies with 
CPU frequency, which is okay, we can compensate for that, but worse, it 
has broken clocking when in C1E idle.  Apparently, it divides down the 
clock input to an idle core, so it only runs at 1/16 or whatever of the 
rate, and adds a multiplier to the TSC increment, so it scales by 16 
instead of by 1 (whatever the actual numbers are I don't know, but this 
illustrates the point).  When it wakes up to service a cache probe from 
another core, it now runs with full clock rate ... and still uses the 
multiplier for the TSC increment.


The effect is that idle CPUs have TSC which may increase faster than 
running CPUs.  Given time, this delta can add to a very large number (in 
theory, it's a random walk, but it can go very far off).  If a VM is 
running on this CPU and happens to match the idle pattern without 
switching CPUs, time can effectively run accelerated on that VM, and 
very rapidly things are going to get confused.


Newer kernels should detect the host clock being unreliable quite 
quickly; my F13 machine detects it right away at boot.


I have server side fixes for this kvm-clock which seem to give me a 
stable clock on this machine, but for true SMP stability, you will need 
Glauber's guest side changes to kvmclock as well.  It is impossible to 
guarantee strictly monotonic clocksource across multiple CPUs when 
frequency is dynamically changing (and also because of the C1E idle 
problems).


There is one remaining problem to fix, the reset of TSC on reboot in SMP 
will destabilize the TSCs again, but now I've actually got VMs running 
again (different bug), that shouldn't be long.


Zach
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Clock jumps

2010-05-27 Thread john stultz
On Thu, May 27, 2010 at 11:32 AM, Bernhard Schmidt  wrote:
> Alexander Graf  wrote:
>> Do you have ntpd running inside the guest? I have a bug report lying
>> around about 2.6.33 with kvm-clock jumping in time when ntpd is used:
>> https://bugzilla.novell.com/show_bug.cgi?id=582260
>
> I want to chime in here, I have a very similar problem, but not with
> ntpd in the guest.
>
> The host was a HP ProLiant DL320 G5p with a Dualcore Xeon3075. System
> was a Debian Lenny with a custom 2.6.33 host kernel and a custom
> qemu-kvm 0.11.0 .deb ported from Ubuntu. The host is synced with ntpd.
>
> The guests are various Debian Lenny/Squeeze VMs, with a custom kernel
> (2.6.33 at the moment) with kvm-clock. Exclusively amd64 guest
> kernels, but one system has i386 userland.
>
> With this setup once in a while (maybe every other week) one VM would
> have a sudden clock jump, 6-12 hours into the future. No kernel messages
> or other log entries than applications complaining about the clock jump
> after the fact. Other VMs were unaffected.
>
> Yesterday I did an upgrade to Debian Squeeze. This involved a new
> qemu-kvm (0.12.4), but not a new host kernel. I also upgraded the guest
> kernels from 2.6.33 to 2.6.33.4.
>
> First of all, after the reboot the host clock was totally unreliable. I
> had a constant skew of up to five seconds per minute in the host clock,
> which of course affected the VMs as well.  This problem went away when I
> changed from tsc into hpet on the host. The host does CPU frequency
> scaling which is, as far as I know, known to affect TSC stability. I
> think I remember messages about tsc being unstable in earlier boots,
> maybe the detection did just not work this time.

I'd be very interested in hearing more about the host side issue. So
this happened with the same kernel that you were using before, with no
trouble?

Could you also send dmesg output from this boot? And if you can find
any older dmesg logs to compare with, send those too?

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Clock jumps

2010-05-27 Thread Bernhard Schmidt
Alexander Graf  wrote:

Hi,

> Do you have ntpd running inside the guest? I have a bug report lying
> around about 2.6.33 with kvm-clock jumping in time when ntpd is used:
> https://bugzilla.novell.com/show_bug.cgi?id=582260

I want to chime in here, I have a very similar problem, but not with
ntpd in the guest.

The host was a HP ProLiant DL320 G5p with a Dualcore Xeon3075. System
was a Debian Lenny with a custom 2.6.33 host kernel and a custom
qemu-kvm 0.11.0 .deb ported from Ubuntu. The host is synced with ntpd.

The guests are various Debian Lenny/Squeeze VMs, with a custom kernel
(2.6.33 at the moment) with kvm-clock. Exclusively amd64 guest
kernels, but one system has i386 userland.

With this setup once in a while (maybe every other week) one VM would
have a sudden clock jump, 6-12 hours into the future. No kernel messages
or other log entries than applications complaining about the clock jump
after the fact. Other VMs were unaffected.

Yesterday I did an upgrade to Debian Squeeze. This involved a new
qemu-kvm (0.12.4), but not a new host kernel. I also upgraded the guest
kernels from 2.6.33 to 2.6.33.4.

First of all, after the reboot the host clock was totally unreliable. I
had a constant skew of up to five seconds per minute in the host clock,
which of course affected the VMs as well.  This problem went away when I
changed from tsc into hpet on the host. The host does CPU frequency
scaling which is, as far as I know, known to affect TSC stability. I
think I remember messages about tsc being unstable in earlier boots,
maybe the detection did just not work this time.

Worse, the clock jump issues in the guest appeared much more often than
before. The higher loaded VMs did not survive ten minutes without
jumping several hours ahead. 

Situation has stabilized after setting clocksource hpet in the guest
immediately after boot. So it seems kvm-clock has some issues here.

I've seen a preliminary patch floating around on the ML by Zachary
Amsden, but I haven't tried it yet. It talks of backward warps, but so
far I've only seen forward warps (the VM time suddenly jumps into the 
future), so it might be unrelated.

Bernhard

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Clock jumps

2010-05-26 Thread Orion Poplawski

On 05/26/2010 11:31 AM, Alexander Graf wrote:


On 26.05.2010, at 19:10, Orion Poplawski wrote:


On 05/25/2010 12:21 AM, Gleb Natapov wrote:

Adding kvm to CC.

On Mon, May 24, 2010 at 04:06:32PM +, Orion Poplawski wrote:

I have a KVM virtual machine running 2.6.33.4-95.fc13.x86_64 on a CentOS 5.5
host whose clock jumps about 8-12 hours a couple times a day.  I have no idea
what is causing it.  Fedora 12 and Centos 5.5 KVM machines run fine on the same
host.  Is there any debugging I can enable to see what is jumping the clock?

kvm-clock: cpu 0, msr 0:1ba4741, boot clock
kvm-clock: cpu 0, msr 0:1e15741, primary cpu clock
Switching to clocksource kvm-clock
rtc_cmos 00:01: setting system clock to 2010-05-20 16:59:48 UTC (1274374788)



Thanks, though I don't think it made it there.  I'm also not sure it's 
completely limited to KVM, though that is the only running system I am 
currently seeing the problem on.  I also see clock jumps during anaconda 
installs on physical hardware and apparently they have been present since at 
least F11.  Might be unrelated though.

I'm really at a loss of how to debug this though.


Do you have ntpd running inside the guest? I have a bug report lying around 
about 2.6.33 with kvm-clock jumping in time when ntpd is used: 
https://bugzilla.novell.com/show_bug.cgi?id=582260

Alex



That bug looks just like what I'm seeing.  I even see the soft lockup messages 
sometimes as well.  May actually be seeing it with a Fedora 12 guest as well - 
but it results in a hard hang.


--
Orion Poplawski
Technical Manager 303-415-9701 x222
NWRA/CoRA DivisionFAX: 303-415-9702
3380 Mitchell Lane  or...@cora.nwra.com
Boulder, CO 80301  http://www.cora.nwra.com
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Clock jumps

2010-05-26 Thread Orion Poplawski

On 05/26/2010 11:31 AM, Alexander Graf wrote:


Do you have ntpd running inside the guest? I have a bug report lying around 
about 2.6.33 with kvm-clock jumping in time when ntpd is used: 
https://bugzilla.novell.com/show_bug.cgi?id=582260

Alex



I've used ntpd and chronyd.  I haven't tried running without either.  I'll do 
that now...



--
Orion Poplawski
Technical Manager 303-415-9701 x222
NWRA/CoRA DivisionFAX: 303-415-9702
3380 Mitchell Lane  or...@cora.nwra.com
Boulder, CO 80301  http://www.cora.nwra.com
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Clock jumps

2010-05-26 Thread Alexander Graf

On 26.05.2010, at 19:10, Orion Poplawski wrote:

> On 05/25/2010 12:21 AM, Gleb Natapov wrote:
>> Adding kvm to CC.
>> 
>> On Mon, May 24, 2010 at 04:06:32PM +, Orion Poplawski wrote:
>>> I have a KVM virtual machine running 2.6.33.4-95.fc13.x86_64 on a CentOS 5.5
>>> host whose clock jumps about 8-12 hours a couple times a day.  I have no 
>>> idea
>>> what is causing it.  Fedora 12 and Centos 5.5 KVM machines run fine on the 
>>> same
>>> host.  Is there any debugging I can enable to see what is jumping the clock?
>>> 
>>> kvm-clock: cpu 0, msr 0:1ba4741, boot clock
>>> kvm-clock: cpu 0, msr 0:1e15741, primary cpu clock
>>> Switching to clocksource kvm-clock
>>> rtc_cmos 00:01: setting system clock to 2010-05-20 16:59:48 UTC (1274374788)
> 
> 
> Thanks, though I don't think it made it there.  I'm also not sure it's 
> completely limited to KVM, though that is the only running system I am 
> currently seeing the problem on.  I also see clock jumps during anaconda 
> installs on physical hardware and apparently they have been present since at 
> least F11.  Might be unrelated though.
> 
> I'm really at a loss of how to debug this though.

Do you have ntpd running inside the guest? I have a bug report lying around 
about 2.6.33 with kvm-clock jumping in time when ntpd is used: 
https://bugzilla.novell.com/show_bug.cgi?id=582260

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Clock jumps

2010-05-26 Thread Orion Poplawski

On 05/25/2010 12:21 AM, Gleb Natapov wrote:

Adding kvm to CC.

On Mon, May 24, 2010 at 04:06:32PM +, Orion Poplawski wrote:

I have a KVM virtual machine running 2.6.33.4-95.fc13.x86_64 on a CentOS 5.5
host whose clock jumps about 8-12 hours a couple times a day.  I have no idea
what is causing it.  Fedora 12 and Centos 5.5 KVM machines run fine on the same
host.  Is there any debugging I can enable to see what is jumping the clock?

kvm-clock: cpu 0, msr 0:1ba4741, boot clock
kvm-clock: cpu 0, msr 0:1e15741, primary cpu clock
Switching to clocksource kvm-clock
rtc_cmos 00:01: setting system clock to 2010-05-20 16:59:48 UTC (1274374788)



Thanks, though I don't think it made it there.  I'm also not sure it's 
completely limited to KVM, though that is the only running system I am 
currently seeing the problem on.  I also see clock jumps during anaconda 
installs on physical hardware and apparently they have been present since at 
least F11.  Might be unrelated though.


I'm really at a loss of how to debug this though.

--
Orion Poplawski
Technical Manager 303-415-9701 x222
NWRA/CoRA DivisionFAX: 303-415-9702
3380 Mitchell Lane  or...@cora.nwra.com
Boulder, CO 80301  http://www.cora.nwra.com
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Clock jumps

2010-05-24 Thread Gleb Natapov
Adding kvm to CC.

On Mon, May 24, 2010 at 04:06:32PM +, Orion Poplawski wrote:
> I have a KVM virtual machine running 2.6.33.4-95.fc13.x86_64 on a CentOS 5.5
> host whose clock jumps about 8-12 hours a couple times a day.  I have no idea
> what is causing it.  Fedora 12 and Centos 5.5 KVM machines run fine on the 
> same
> host.  Is there any debugging I can enable to see what is jumping the clock?
> 
> kvm-clock: cpu 0, msr 0:1ba4741, boot clock
> kvm-clock: cpu 0, msr 0:1e15741, primary cpu clock
> Switching to clocksource kvm-clock
> rtc_cmos 00:01: setting system clock to 2010-05-20 16:59:48 UTC (1274374788)
> 
> Thanks,
> 
>  Orion
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html