I can reasonably assume that this solved my problem. I've live migrated 41 VM's 5 times between 2 hypervisors without the 100% cpu problem appearing.
My production servers run 2.0.0+dfsg-2ubuntu1.22, and still observe the same problem. Attached is the patch that I created with quilt in debian/patches; This one mirrors the 4 patches that are listed in the debian bugreport[1]. The patch should apply cleanly with qemu 2.0.0+dfsg-2ubuntu1.24 (from trusty-security). Let's hope others can benefit from an ubuntu update :) [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=786789 ** Patch added: "backport-fixtime.patch" https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1297218/+attachment/4693333/+files/backport-fixtime.patch -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1297218 Title: guest hangs after live migration due to tsc jump Status in QEMU: New Status in glusterfs package in Ubuntu: Invalid Status in qemu package in Ubuntu: Fix Released Status in glusterfs source package in Trusty: Confirmed Status in qemu source package in Trusty: Confirmed Bug description: We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a Gluster filesystem. Guests can be live migrated between them. However, live migration often leads to the guest being stuck at 100% for a while. In that case, the dmesg output for such a guest will show (once it recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this particular example, a guest was migrated and only after 11 minutes (662 seconds) did it become responsive again. It seems that newly booted guests doe not suffer from this problem, these can be migrated back and forth at will. After a day or so, the problem becomes apparent. It also seems that migrating from server A to server B causes much more problems than going from B back to A. If necessary, I can do more measurements to qualify these observations. The VM servers run Ubuntu 13.04 with these packages: Kernel: 3.8.0-35-generic x86_64 Libvirt: 1.0.2 Qemu: 1.4.0 Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using libgfapi yet as the Ubuntu libvirt is not linked against libgfapi). The interconnect between both machines (both for migration and gluster) is 10GbE. Both servers are synced to NTP and well within 1ms form one another. Guests are either Ubuntu 13.04 or 13.10. On the guests, the current_clocksource is kvm-clock. The XML definition of the guests only contains: <clock offset='utc'/> Now as far as I've read in the documentation of kvm-clock, it specifically supports live migrations, so I'm a bit surprised at these problems. There isn't all that much information to find on these issue, although I have found postings by others that seem to have run into the same issues, but without a solution. --- ApportVersion: 2.14.1-0ubuntu3 Architecture: amd64 DistroRelease: Ubuntu 14.04 Package: libvirt (not installed) ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 BOOTIF=01-00-25-90-75-b5-c8 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Tags: trusty apparmor apparmor apparmor apparmor apparmor Uname: Linux 3.13.0-24-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True modified.conffile..etc.default.libvirt.bin: [modified] modified.conffile..etc.libvirt.libvirtd.conf: [modified] modified.conffile..etc.libvirt.qemu.conf: [modified] modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted] mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662 mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837 mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions