On Mon, Apr 3, 2017 at 9:11 PM, Stefan Hajnoczi <stefa...@gmail.com> wrote: > On Fri, Mar 31, 2017 at 02:12:36PM -0600, Chris Friesen wrote: >> I'm running into an issue with live-migrating a guest from a host running >> qemu-kvm-ev 2.3.0-31 to a host running qemu-kvm-ev 2.6.0-27.1. This is a >> libvirt-tunnelled migration, in the context of upgrading an OpenStack >> install to newer software. The source host is running CentOS 7.2.1511, >> while the dest host is running CentOS 7.3.1611. >> >> I'll include the qemu commandlines for the source/dest at the bottom. >> >> Initially we have a bunch of guests running on compute-2 (which is running >> qemu-kvm-ev 2.3.0). We then started live-migrating them one at a time to >> compute-0 (which is running qemu-kvm-ev 2.6.0). Three of them migrated >> successfully. The fourth (which was essentially identical in configuration >> to the first three) failed, as per the following logs in >> /var/log/libvirt/qemu/instance-0000000e.log: >> >> >> 2017-03-29T06:38:37.886940Z qemu-kvm: VQ 2 size 0x80 < last_avail_idx 0x47b >> - used_idx 0x47c >> 2017-03-29T06:38:37.886974Z qemu-kvm: error while loading state for instance >> 0x0 of device '0000:00:07.0/virtio-balloon' >> 2017-03-29T06:38:37.888684Z qemu-kvm: load of migration failed: Operation >> not permitted >> 2017-03-29 06:38:37.896+0000: shutting down >> >> >> Does anyone know of an existing bug report covering this issue? (I took a >> look and didn't see anything obviously related.) > > This is the virtio-balloon device. If you remove the device the live > migration should work reliably. > > Alternatively, you can temporarily rmmod virtio_balloon inside the guest > for live migration. After migration you can modprobe virtio_balloon > again. > > last_avail_idx 0x47b with used_idx 0x47c is an invalid device state. > I've diffed qemu-kvm-ev 2.6.0-27.1 hw/virtio/virtio-balloon.c against > qemu.git/master and do not see an obvious bug. I also compared > qemu-kvm-ev 2.3.0-31 with qemu-kvm-ev 2.6.0-27.1.
The device likely got into the invalid state as part of a previous migration to an unfixed QEMU. I second Stefan's suggestion to temporarily remove the device or unload the driver. Thanks! Ladi >> >> >> The qemu commandline on the source compute node is: >> >> >> /usr/libexec/qemu-kvm -c 0x00000000000000000000000000000001 -n 4 >> --proc-type=secondary --file-prefix=vs -- -enable-dpdk -name >> instance-0000000e -S -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off -m 512 >> -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -object >> memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/mnt/huge-2048kB/libvirt/qemu,share=yes,size=536870912,host-nodes=1,policy=bind >> -numa node,nodeid=0,cpus=0,memdev=ram-node0 -uuid >> 57ae849f-aa66-422a-90a2-62db6c59db29 -smbios type=1,manufacturer=Fedora >> Project,product=OpenStack >> Nova,version=13.0.0-0.tis.4,serial=4c8121f1-d927-424e-8712-88b1de45be37,uuid=57ae849f-aa66-422a-90a2-62db6c59db29,family=Virtual >> Machine -no-user-config -nodefaults -chardev >> socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-instance-0000000e/monitor.sock,server,nowait >> -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew >> -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot >> reboot-timeout=5000,strict=on -device >> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive >> file=/dev/disk/by-path/ip-192.168.205.6:3260-iscsi-iqn.2010-10.org.openstack:volume-ac57fcaa-7ecd-4d3b-8671-3bc740337a42-lun-0,if=none,id=drive-virtio-disk0,format=raw,serial=ac57fcaa-7ecd-4d3b-8671-3bc740337a42,cache=none,aio=native >> -device >> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 >> -chardev >> socket,id=charnet0,path=/var/run/vswitch/usvhost-9e574d3c-32dd-4d39-97e6-447b15fb00b4 >> -netdev type=vhost-user,id=hostnet0,chardev=charnet0 -device >> virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:b0:59:a9,bus=pci.0,addr=0x3 >> -chardev >> socket,id=charnet1,path=/var/run/vswitch/usvhost-7bc48d91-f215-4394-99ff-eb7f20d9ff1e >> -netdev type=vhost-user,id=hostnet1,chardev=charnet1 -device >> virtio-net-pci,netdev=hostnet1,id=net1,mac=fa:16:3e:8b:6f:09,bus=pci.0,addr=0x4 >> -chardev >> socket,id=charnet2,path=/var/run/vswitch/usvhost-c32e2d0d-9ed4-4f4b-abc9-539a12a86008 >> -netdev type=vhost-user,id=hostnet2,chardev=charnet2 -device >> virtio-net-pci,netdev=hostnet2,id=net2,mac=fa:16:3e:07:ca:a0,bus=pci.0,addr=0x5 >> -chardev >> file,id=charserial0,path=/etc/nova/instances/57ae849f-aa66-422a-90a2-62db6c59db29/console.log >> -device isa-serial,chardev=charserial0,id=serial0 -chardev >> pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device >> usb-tablet,id=input0 -vnc 0.0.0.0:11 -k en-us -device >> cirrus-vga,id=video0,bus=pci.0,addr=0x2 -incoming fd:25 -device >> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on >> >> >> >> The complete instance-0000000e.log file on the destination is: >> >> 2017-03-29 06:38:35.962+0000: starting up libvirt version: 2.0.0, package: >> 10.el7_3.2.tis.24 (Unknown, 2017-03-15-14:59:22, >> yow-dsulliva-lx-vm1.wrs.com), qemu version: 2.6.0 >> (qemu-kvm-ev-2.6.0-27.1.el7.tis.31), hostname: compute-0 >> LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin >> QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm '-c >> 0x00000000000000000000000000000001' '-n 4' --proc-type=secondary >> --file-prefix=vs -- -enable-dpdk -name >> guest=instance-0000000e,debug-threads=on -S -object >> secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-10-instance-0000000e/master-key.aes >> -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off -m 512 -realtime mlock=off >> -smp 1,sockets=1,cores=1,threads=1 -object >> memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/mnt/huge-2048kB/libvirt/qemu,share=yes,size=536870912,host-nodes=0,policy=bind >> -numa node,nodeid=0,cpus=0,memdev=ram-node0 -uuid >> 57ae849f-aa66-422a-90a2-62db6c59db29 -smbios 'type=1,manufacturer=Fedora >> Project,product=OpenStack >> Nova,version=13.0.0-0.tis.4,serial=4c8121f1-d927-424e-8712-88b1de45be37,uuid=57ae849f-aa66-422a-90a2-62db6c59db29,family=Virtual >> Machine' -no-user-config -nodefaults -chardev >> socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-10-instance-0000000e/monitor.sock,server,nowait >> -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew >> -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot >> reboot-timeout=5000,strict=on -device >> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive >> file=/dev/disk/by-path/ip-192.168.205.6:3260-iscsi-iqn.2010-10.org.openstack:volume-ac57fcaa-7ecd-4d3b-8671-3bc740337a42-lun-0,format=raw,if=none,id=drive-virtio-disk0,serial=ac57fcaa-7ecd-4d3b-8671-3bc740337a42,cache=none,aio=native >> -device >> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 >> -chardev >> socket,id=charnet0,path=/var/run/vswitch/usvhost-9e574d3c-32dd-4d39-97e6-447b15fb00b4 >> -netdev type=vhost-user,id=hostnet0,chardev=charnet0 -device >> virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:b0:59:a9,bus=pci.0,addr=0x3 >> -chardev >> socket,id=charnet1,path=/var/run/vswitch/usvhost-7bc48d91-f215-4394-99ff-eb7f20d9ff1e >> -netdev type=vhost-user,id=hostnet1,chardev=charnet1 -device >> virtio-net-pci,netdev=hostnet1,id=net1,mac=fa:16:3e:8b:6f:09,bus=pci.0,addr=0x4 >> -chardev >> socket,id=charnet2,path=/var/run/vswitch/usvhost-c32e2d0d-9ed4-4f4b-abc9-539a12a86008 >> -netdev type=vhost-user,id=hostnet2,chardev=charnet2 -device >> virtio-net-pci,netdev=hostnet2,id=net2,mac=fa:16:3e:07:ca:a0,bus=pci.0,addr=0x5 >> -add-fd set=0,fd=51 -chardev file,id=charserial0,path=/dev/fdset/0,append=on >> -device isa-serial,chardev=charserial0,id=serial0 -chardev >> pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device >> usb-tablet,id=input0 -vnc 0.0.0.0:9 -k en-us -device >> cirrus-vga,id=video0,bus=pci.0,addr=0x2 -incoming defer -device >> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on >> Domain id=10 is tainted: high-privileges >> EAL:eal_memory.c:1591: WARNING: Address Space Layout Randomization (ASLR) is >> enabled in the kernel. >> EAL:eal_memory.c:1593: This may cause issues with mapping memory into >> secondary processes >> char device redirected to /dev/pts/9 (label charserial1) >> 2017-03-29T06:38:37.886940Z qemu-kvm: VQ 2 size 0x80 < last_avail_idx 0x47b >> - used_idx 0x47c >> 2017-03-29T06:38:37.886974Z qemu-kvm: error while loading state for instance >> 0x0 of device '0000:00:07.0/virtio-balloon' >> 2017-03-29T06:38:37.888684Z qemu-kvm: load of migration failed: Operation >> not permitted >> 2017-03-29 06:38:37.896+0000: shutting down >> >> >> For what it's worth, the differences between the two qemu command lines are >> as follows: >> >> source: >> -name instance-0000000e -chardev >> file,id=charserial0,path=/etc/nova/instances/57ae849f-aa66-422a-90a2-62db6c59db29/console.log >> -vnc 0.0.0.0:9 -incoming fd:25 >> >> destination: >> -name guest=instance-0000000e,debug-threads=on -object >> secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-10-instance-0000000e/master-key.aes >> -add-fd set=0,fd=51 -chardev file,id=charserial0,path=/dev/fdset/0,append=on >> -vnc 0.0.0.0:11 -incoming defer >> >> Thanks, >> Chris >>