OK, indeed I can reproduce the problem. It's specific to the filedescriptor migration. An easy way to reproduce it is by doing:
git clone https://github.com/autotest/virt-test.git cd virt-test ./run -t qemu --bootstrap ./run -t qemu --tests type_specific.io-github-autotest-qemu.migrate.default.fd That's it. I will see if I can bisect this quickly to pinpoint the QEMU commit that brought the regression. The qemu master commit I just tested is: commit 449008f86418583a1f0fb946cf91ee7b4797317d Merge: 5c697ae bc5baff Author: Peter Maydell <peter.mayd...@linaro.org> Date: Wed Feb 11 05:14:41 2015 +0000 Merge remote-tracking branch 'remotes/awilliam/tags/vfio-update-20150210.0' into staging RCU fixes and cleanup (Paolo Bonzini) Switch to v2 IOMMU interface (Alex Williamson) DEBUG build fix (Alexey Kardashevskiy) # gpg: Signature made Tue 10 Feb 2015 17:37:06 GMT using RSA key ID 3BB08B22 # gpg: Good signature from "Alex Williamson <alex.william...@redhat.com >" # gpg: aka "Alex Williamson <a...@shazbot.org>" # gpg: aka "Alex Williamson <alwil...@redhat.com>" # gpg: aka "Alex Williamson <alex.l.william...@gmail.com >" * remotes/awilliam/tags/vfio-update-20150210.0: vfio: Fix debug message compile error vfio: Use vfio type1 v2 IOMMU interface vfio: unmap and free BAR data in instance_finalize vfio: free dynamically-allocated data in instance_finalize vfio: cleanup vfio_get_device error path, remove vfio_populate_device callback memory: unregister AddressSpace MemoryListener within BQL Signed-off-by: Peter Maydell <peter.mayd...@linaro.org> On Thu, Feb 12, 2015 at 8:19 PM, Lucas Meneghel Rodrigues <look...@gmail.com > wrote: > From what the log says, after a round of migrations 'info migrate' does > not respond after 4 minutes, timing out. Virt Test then shuts down the VM. > When it tries to check the qcow2 image, it is corrupted. I'm checking out > the latest master to see how reproducible this problem is. > > On Thu, Feb 12, 2015 at 8:12 PM, Juan Quintela <quint...@redhat.com> > wrote: > >> >> Hi >> >> while testing my changes I noticed that virt-test was failing. I >> check-out master, and failures are there. >> >> This is one extract of the log after the 1st failure. Notice that it >> fails randomly, not every time. >> >> I have to go to bed right now, so if anybody beats me with a fix, I >> would be happy when I wakeup. >> >> Thanks, Juan. >> >> >> 22:54:07 DEBUG| (monitor hmp1) Response to 'info migrate' >> 22:54:07 DEBUG| (monitor hmp1) capabilities: xbzrle: off rdma-pin-all: >> off auto-converge: off zero-blocks: off >> 22:54:07 DEBUG| (monitor hmp1) Migration status: active >> 22:54:07 DEBUG| (monitor hmp1) total time: 2003 milliseconds >> 22:54:07 DEBUG| (monitor hmp1) expected downtime: 300 milliseconds >> 22:54:07 DEBUG| (monitor hmp1) setup: 3 milliseconds >> 22:54:07 DEBUG| (monitor hmp1) transferred ram: 67619 kbytes >> 22:54:07 DEBUG| (monitor hmp1) throughput: 268.61 mbps >> 22:54:07 DEBUG| (monitor hmp1) remaining ram: 103056 kbytes >> 22:54:07 DEBUG| (monitor hmp1) total ram: 1065796 kbytes >> 22:54:07 DEBUG| (monitor hmp1) duplicate: 224304 pages >> 22:54:07 DEBUG| (monitor hmp1) skipped: 0 pages >> 22:54:07 DEBUG| (monitor hmp1) normal: 16380 pages >> 22:54:07 DEBUG| (monitor hmp1) normal bytes: 65520 kbytes >> 22:54:07 DEBUG| (monitor hmp1) dirty sync count: 0 >> 22:54:09 DEBUG| Waiting for migration to complete (4.006475 secs) >> 22:54:09 DEBUG| (monitor hmp1) Sending command 'info migrate' >> 22:54:09 DEBUG| Send command: info migrate >> 22:54:09 DEBUG| (monitor hmp1) Response to 'info migrate' >> 22:54:09 DEBUG| (monitor hmp1) capabilities: xbzrle: off rdma-pin-all: >> off auto-converge: off zero-blocks: off >> 22:54:09 DEBUG| (monitor hmp1) Migration status: active >> 22:54:09 DEBUG| (monitor hmp1) total time: 4008 milliseconds >> 22:54:09 DEBUG| (monitor hmp1) expected downtime: 300 milliseconds >> 22:54:09 DEBUG| (monitor hmp1) setup: 3 milliseconds >> 22:54:09 DEBUG| (monitor hmp1) transferred ram: 131397 kbytes >> 22:54:09 DEBUG| (monitor hmp1) throughput: 268.57 mbps >> 22:54:09 DEBUG| (monitor hmp1) remaining ram: 31392 kbytes >> 22:54:09 DEBUG| (monitor hmp1) total ram: 1065796 kbytes >> 22:54:09 DEBUG| (monitor hmp1) duplicate: 226311 pages >> 22:54:09 DEBUG| (monitor hmp1) skipped: 0 pages >> 22:54:09 DEBUG| (monitor hmp1) normal: 32289 pages >> 22:54:09 DEBUG| (monitor hmp1) normal bytes: 129156 kbytes >> 22:54:09 DEBUG| (monitor hmp1) dirty sync count: 0 >> 22:54:11 DEBUG| Waiting for migration to complete (6.011556 secs) >> 22:54:11 DEBUG| (monitor hmp1) Sending command 'info migrate' >> 22:54:11 DEBUG| Send command: info migrate >> 22:54:32 WARNI| virt-tests-vm1 is not alive. Can not query the register >> status >> 22:58:11 DEBUG| Destroying VM virt-tests-vm1 (PID 10880) >> 22:58:11 DEBUG| Ending VM virt-tests-vm1 process (monitor) >> 22:58:11 INFO | [qemu output] (Process terminated with status 0) >> 22:58:11 DEBUG| VM virt-tests-vm1 down (monitor) >> 22:58:11 DEBUG| Host does not support OpenVSwitch: Missing command: >> ovs-vswitchd >> 22:58:11 DEBUG| Destroying VM virt-tests-vm1 (PID 10763) >> 22:58:11 DEBUG| Shutting down VM virt-tests-vm1 (shell) >> 22:58:11 DEBUG| Login command: 'ssh -o UserKnownHostsFile=/dev/null -o >> StrictHostKeyChecking=no -o PreferredAuthentications=password -p 5000 >> root@192.168.10.200' >> 22:58:11 DEBUG| virt-tests-vm1 alive now. Used to failed to get register >> info from guest 9 times >> 22:58:13 INFO | [qemu output] (Process terminated with status 0) >> 22:58:13 DEBUG| VM virt-tests-vm1 down (shell) >> 22:58:14 DEBUG| Host does not support OpenVSwitch: Missing command: >> ovs-vswitchd >> 22:58:14 DEBUG| Checking image file >> /mnt/kvm/src/virt-test/shared/data/images/jeos-20-64.qcow2 >> 22:58:14 DEBUG| Running '/bin/qemu-img info >> /mnt/kvm/src/virt-test/shared/data/images/jeos-20-64.qcow2' >> 22:58:14 DEBUG| Running '/bin/qemu-img check >> /mnt/kvm/src/virt-test/shared/data/images/jeos-20-64.qcow2' >> 22:58:14 ERROR| [stdout] >> 22:58:14 ERROR| [stdout] 1 errors were found on the image. >> 22:58:14 ERROR| [stdout] Data may be corrupted, or further writes to the >> image may corrupt it. >> 22:58:14 ERROR| [stdout] 13495/163840 = 8.24% allocated, 0.03% >> fragmented, 0.00% compressed clusters >> 22:58:14 ERROR| [stdout] Image end offset: 885129216 >> 22:58:14 ERROR| [stderr] ERROR cluster 13505 refcount=1 reference=2 >> 22:58:14 ERROR| Errors found on image: >> '/mnt/kvm/src/virt-test/shared/data/images/jeos-20-64.qcow2' >> 22:58:14 WARNI| virt-tests-vm1 is not alive. Can not query the register >> status >> 22:58:14 DEBUG| Thread quit. Used to failed to get register info from >> guest 20150212-225320-Mb1E4VV7 for 1 times. >> >> > > > -- > Lucas > -- Lucas