Hi,
I've seen some strange time behavior in some of our VMs usually
triggered by live migration. In some VMs we have seen some significant
time drift which NTP was not able to correct after doing a live
migration.
I've not been able so far to reproduce the same case, however, I did
notice that live migration does introduce some increase in clock jitter
values, and I am not sure if that might have anything to do with any
significant time drift.
Here is an example of a CentOS 6 guest running under qemu 1.2 before
doing a live migration:
[root@centos ~]# ntpq -pcrv
remote refid st t when poll reach delay offset jitter
==
+helium.constant 18.26.4.105 2 u 65 64 377 60.539 -0.011 0.554
-209.118.204.201 128.9.176.30 2 u 47 64 377 15.750 -1.835 0.388
*time3.chpc.utah 198.60.22.2402 u 46 64 377 30.5853.934 0.253
+dns2.untangle.c 216.218.254.202 2 u 21 64 377 22.1962.345 0.740
associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,
version="ntpd 4.2.6p5@1.2349-o Sat Dec 20 02:53:39 UTC 2014 (1)",
processor="x86_64", system="Linux/2.6.32-504.3.3.el6.x86_64", leap=00,
stratum=3, precision=-21, rootdelay=32.355, rootdisp=53.173,
refid=155.101.3.115,
reftime=d86264f3.444c75e7 Thu, Jan 15 2015 16:10:27.266,
clock=d86265ed.10a34c1c Thu, Jan 15 2015 16:14:37.064, peer=3418, tc=6,
mintc=3, offset=0.000, frequency=2.863, sys_jitter=2.024,
clk_jitter=2.283, clk_wander=0.000
[root@centos ~]# ntpdc -c kerninfo
pll offset: 0 s
pll frequency:2.863 ppm
maximum error:0.19838 s
estimated error: 0.002282 s
status: 2001 pll nano
pll time constant:6
precision:1e-09 s
frequency tolerance: 500 ppm
Immediately after live migration, you can see that there is an increase in
jitter values:
[root@centos ~]# ntpq -pcrv
remote refid st t when poll reach delay offset jitter
==
-helium.constant 18.26.4.105 2 u 59 64 377 60.556 -0.916 31.921
+209.118.204.201 128.9.176.30 2 u 38 64 377 15.717 28.879 12.220
+time3.chpc.utah 132.163.4.1032 u 45 64 353 30.6393.240 26.975
*dns2.untangle.c 216.218.254.202 2 u 17 64 377 22.248 33.039 11.791
associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,
version="ntpd 4.2.6p5@1.2349-o Sat Dec 20 02:53:39 UTC 2014 (1)",
processor="x86_64", system="Linux/2.6.32-504.3.3.el6.x86_64", leap=00,
stratum=3, precision=-21, rootdelay=25.086, rootdisp=83.736,
refid=74.123.29.4,
reftime=d8626838.47529689 Thu, Jan 15 2015 16:24:24.278,
clock=d8626849.4920018a Thu, Jan 15 2015 16:24:41.285, peer=3419, tc=6,
mintc=3, offset=24.118, frequency=11.560, sys_jitter=15.145,
clk_jitter=8.056, clk_wander=2.757
[root@centos ~]# ntpdc -c kerninfo
pll offset: 0.0211957 s
pll frequency:11.560 ppm
maximum error:0.112523 s
estimated error: 0.008055 s
status: 2001 pll nano
pll time constant:6
precision:1e-09 s
frequency tolerance: 500 ppm
The increase in the jitter and offset values is well within the 500 ppm
frequency tolerance limit, and therefore are easily corrected by subsequent NTP
clock sync events, but some live migrations do cause much higher jitter and
offset jumps, which can not be corrected by NTP and cause the time to go way
off. Any idea why this is the case?
I've tried backporting the patches
(9a48bcd1b82494671c09b0eefdb882581499 and
317b0a6d8ba44e9bf8f9c3dbd776c4536843d82c) on top of upstream qemu 1.2,
but it actually caused even higher jitter in the order of 100+ ppm.
Any idea what I might be missing?
** Patch added: "backport.patch"
https://bugs.launchpad.net/qemu/+bug/1297218/+attachment/4301780/+files/backport.patch
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to qemu in Ubuntu.
https://bugs.launchpad.net/bugs/1297218
Title:
guest hangs after live migration due to tsc jump
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions
--
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at:
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs