TL;DR: - a KVM guest with the kernel change as identified above - works on Bionic host (kernel 4.15 / qemu 2.11 / libvirt 4.0) - migrating on a Xenial host (kernel 4.4 / qemu 2.5 / libvirt 1.3.1) fails VQ 0 size 0x100 Guest index 0x8101 inconsistent with Host index 0x81: delta 0x8080 error while loading state for instance 0x0 of device 'pci@800000020000000:01.0/virtio-net' - not fixed in latest 4.19 kernel - only failing on ppc64el (not x86) - maybe high/low word related - qemu bisecting found a high/low word related virtio issue and fix in the 2.6 stable series that
Note: generated names are odd (hashes are ok), most 4.13 here are actually 4.14 in development. GOOD v4.13 Mon Sep 10 10:03:38 BAD v4.14 Mon Sep 10 10:51:31 Step-1: 15d8ffc9 #1 Mon Sep 10 12:36:30 bad Step-2: bafb0762 #2 Mon Sep 10 13:04:52 good Step-3: b63f6044 #3 Mon Sep 10 13:24:27 bad Step-4: e08af95d #4 Mon Sep 10 13:44:11 bad Step-5: 2a493216 #5 Mon Sep 10 14:25:50 bad Step-6: a248878d #6 Mon Sep 10 14:50:47 bad Step-7: 160e22aa #7 Mon Sep 10 15:09:03 good Step-8: 727f8914 #8 Mon Sep 10 18:30:06 good Step-9: 4a3c67a6 #9 Mon Sep 10 20:37:37 bad Step-10: 04584957 #10 Tue Sep 11 04:35:41 bad Step-11: f7ce9103 #11 Tue Sep 11 05:30:50 bad Step-12: 192f68cf #12 Tue Sep 11 05:49:50 good Step-13: 3f93522f #13 Tue Sep 11 06:13:01 bad Step-14: 4941d472 #14 Tue Sep 11 06:40:05 good Offending change identified as: commit 3f93522ffab2d46a36b57adf324a54e674fc9536 Author: Jason Wang <jasow...@redhat.com> Date: Wed Jul 19 16:54:49 2017 +0800 virtio-net: switch off offloads on demand if possible on XDP set Current XDP implementation wants guest offloads feature to be disabled on device. This is inconvenient and means guest can't benefit from offloads if XDP is not used. This patch tries to address this limitation by disabling the offloads on demand through control guest offloads. Guest offloads will be disabled and enabled on demand on XDP set. Signed-off-by: Jason Wang <jasow...@redhat.com> Signed-off-by: David S. Miller <da...@davemloft.net> To check if any commit in the latest kernel fixed the issue: 4.19-rc3 as of today (11da3a7f): bad => Not fixed yet as a guest kernel commit. => Also I don't see any further how we could fix hat on the kernel side, despite the issue being introduced there Since we had the report that a Bionic Host would be ok I bumped the test env up one by one. (in order) Libvirt 1.3.1 -> 4.0: still bad kernel 4.4 -> 4.15: still bad qemu 2.5 -> 2.11: working So we are actually looking for a qemu fix for a kernel introduced issue it seems. Via UCA we can access some rather easily. qemu 2.5 (X) bad qemu 2.6.1 (Y) good qemu 2.8 (Z) good qemu 2.10 (A) good qemu 2.11: (B) good So a qemu bisect for 2.5->2.6 it shall be :-/ Back then this was still based on full debian versions so no bisect directly in the packaging repo on these old versions. Using checkinstall and the configure line of the qemu yakkety version (reset machine type to upstream type and linking spapr-rtas.bin [qemu-slof] and others to the expected place). ln -s /usr/share/slof/* /usr/share/qemu/slof.bin ln -s /usr/share/seabios/* /usr/share/qemu/ ln -s /usr/lib/ipxe/qemu/* /usr/share/qemu/ But I realized that 2.6.0 was affected as well. Maybe the fix was part of 2.6.1? I checked the last 2.6.0 publish we had back in Yakkety and it failed as well. So after all the bisect might be much smaller between 2.6.0 and its upstream stable branch. Verified start points with builds from git. qemu-2.6.0-bisect-start: old behavior (bad) qemu-2.6.2-bisect-start: new behavior (good) Eventually came down to: git bisect start # new: [529d45e151d82a772cd9b9af64bb25f88fba6567] Update version for 2.6.2 release git bisect new 529d45e151d82a772cd9b9af64bb25f88fba6567 # old: [bfc766d38e1fae5767d43845c15c79ac8fa6d6af] Update version for v2.6.0 release git bisect old bfc766d38e1fae5767d43845c15c79ac8fa6d6af # new: [ec211e742683d4bc187839b01a4b0056617681a1] atapi: fix halted DMA reset git bisect new ec211e742683d4bc187839b01a4b0056617681a1 # old: [71798fda8b6ef8df47c7640ba0bc24d7060ad307] vmsvga: shadow fifo registers git bisect old 71798fda8b6ef8df47c7640ba0bc24d7060ad307 # old: [909d87d347a7a5e08c32cbdb67bb2927fcefbf34] virtio: set low features early on load git bisect old 909d87d347a7a5e08c32cbdb67bb2927fcefbf34 # new: [28eae0af65dcae887d3cd32212c702ee708c84be] Fix some typos found by codespell git bisect new 28eae0af65dcae887d3cd32212c702ee708c84be # new: [704ab2fce49fa404a61c6dac85003bcc1e3d0192] blockdev: Fix regression with the default naming of throttling groups git bisect new 704ab2fce49fa404a61c6dac85003bcc1e3d0192 # new: [025c4e39f479eb498ee63b634d961a4cf357773e] s390x/ipl: fix reboots for migration from different bios git bisect new 025c4e39f479eb498ee63b634d961a4cf357773e # new: [82c85167791f0057752c2084f8480bf19401f314] Revert "virtio-net: unbreak self announcement and guest offloads after migration" git bisect new 82c85167791f0057752c2084f8480bf19401f314 # first new commit: [82c85167791f0057752c2084f8480bf19401f314] Revert "virtio-net: unbreak self announcement and guest offloads after migration" And the fixing qemu change being: 82c85167791f0057752c2084f8480bf19401f314 is the first new commit commit 82c85167791f0057752c2084f8480bf19401f314 Author: Michael S. Tsirkin <m...@redhat.com> Date: Mon Jul 4 14:47:37 2016 +0300 Revert "virtio-net: unbreak self announcement and guest offloads after migration" This reverts commit 1f8828ef573c83365b4a87a776daf8bcef1caa21. Cc: qemu-sta...@nongnu.org Reported-by: Robin Geuze <rob...@transip.nl> Tested-by: Robin Geuze <rob...@transip.nl> Signed-off-by: Michael S. Tsirkin <m...@redhat.com> (cherry picked from commit 6c6668232e71b7cf7ff39fa1a7abf660c40f9cea) Signed-off-by: Michael Roth <mdr...@linux.vnet.ibm.com> Its backport needs to be bundled with another fix to actually work (the commit before). I'll try to backport and prep a PPA with those fixes for Xenial -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1783140 Title: KVM live migration fails Status in The Ubuntu-power-systems project: In Progress Status in linux package in Ubuntu: Triaged Status in qemu package in Ubuntu: Incomplete Bug description: Environment: 2 POWER8 with Ubuntu 16.04.4 LTS as KVM hypervisor. 1 KVM guest with Ubuntu 18.04 LTS. Virtual disk for the guest is a qcow2 file on an NFS share, accessible from both hypervisors, so live migration is possible and works for all other guests (SLES, RHEL, Ubunutu 16.04), Live migratino of Ubuntu 18.04 guest fails on ppc, while the same test on an x86_64 environment suceeds. root@pkvm2:~# virsh migrate --persistent --live p8lnxtst4 qemu+ssh://pkvm1/system error: internal error: early end of file from monitor, possible problem: 2018-07-23T11:12:25.586385Z qemu-system-ppc64: VQ 0 size 0x100 Guest index 0x38aa inconsistent with Host index 0xa980: delta 0x8f2a 2018-07-23T11:12:25.586434Z qemu-system-ppc64: error while loading state for instance 0x0 of device 'pci@800000020000000:01.0/virtio-net' 2018-07-23T11:12:25.587246Z qemu-system-ppc64: load of migration failed: Operation not permitted root@pkvm2:~# uname -a Linux pkvm2 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 08:51:21 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1783140/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp