On Thu, Feb 12, 2015 at 8:56 PM, Lucas Meneghel Rodrigues <look...@gmail.com > wrote:
> OK, indeed I can reproduce the problem. It's specific to the > filedescriptor migration. An easy way to reproduce it is by doing: > > git clone https://github.com/autotest/virt-test.git > > cd virt-test > ./run -t qemu --bootstrap > ./run -t qemu > --tests type_specific.io-github-autotest-qemu.migrate.default.fd > A little correction here, it should've been: ./run -t qemu --tests type_specific.io-github-autotest-qemu.migrate.default.fd --qemu-bin /path/to/qemu-built-from-master > > That's it. I will see if I can bisect this quickly to pinpoint the QEMU > commit that brought the regression. > > The qemu master commit I just tested is: > > commit 449008f86418583a1f0fb946cf91ee7b4797317d > Merge: 5c697ae bc5baff > Author: Peter Maydell <peter.mayd...@linaro.org> > Date: Wed Feb 11 05:14:41 2015 +0000 > > Merge remote-tracking branch > 'remotes/awilliam/tags/vfio-update-20150210.0' into staging > > RCU fixes and cleanup (Paolo Bonzini) > Switch to v2 IOMMU interface (Alex Williamson) > DEBUG build fix (Alexey Kardashevskiy) > > # gpg: Signature made Tue 10 Feb 2015 17:37:06 GMT using RSA key ID > 3BB08B22 > # gpg: Good signature from "Alex Williamson < > alex.william...@redhat.com>" > # gpg: aka "Alex Williamson <a...@shazbot.org>" > # gpg: aka "Alex Williamson <alwil...@redhat.com>" > # gpg: aka "Alex Williamson < > alex.l.william...@gmail.com>" > > * remotes/awilliam/tags/vfio-update-20150210.0: > vfio: Fix debug message compile error > vfio: Use vfio type1 v2 IOMMU interface > vfio: unmap and free BAR data in instance_finalize > vfio: free dynamically-allocated data in instance_finalize > vfio: cleanup vfio_get_device error path, remove > vfio_populate_device callback > memory: unregister AddressSpace MemoryListener within BQL > > Signed-off-by: Peter Maydell <peter.mayd...@linaro.org> > > > On Thu, Feb 12, 2015 at 8:19 PM, Lucas Meneghel Rodrigues < > look...@gmail.com> wrote: > >> From what the log says, after a round of migrations 'info migrate' does >> not respond after 4 minutes, timing out. Virt Test then shuts down the VM. >> When it tries to check the qcow2 image, it is corrupted. I'm checking out >> the latest master to see how reproducible this problem is. >> >> On Thu, Feb 12, 2015 at 8:12 PM, Juan Quintela <quint...@redhat.com> >> wrote: >> >>> >>> Hi >>> >>> while testing my changes I noticed that virt-test was failing. I >>> check-out master, and failures are there. >>> >>> This is one extract of the log after the 1st failure. Notice that it >>> fails randomly, not every time. >>> >>> I have to go to bed right now, so if anybody beats me with a fix, I >>> would be happy when I wakeup. >>> >>> Thanks, Juan. >>> >>> >>> 22:54:07 DEBUG| (monitor hmp1) Response to 'info migrate' >>> 22:54:07 DEBUG| (monitor hmp1) capabilities: xbzrle: off >>> rdma-pin-all: off auto-converge: off zero-blocks: off >>> 22:54:07 DEBUG| (monitor hmp1) Migration status: active >>> 22:54:07 DEBUG| (monitor hmp1) total time: 2003 milliseconds >>> 22:54:07 DEBUG| (monitor hmp1) expected downtime: 300 milliseconds >>> 22:54:07 DEBUG| (monitor hmp1) setup: 3 milliseconds >>> 22:54:07 DEBUG| (monitor hmp1) transferred ram: 67619 kbytes >>> 22:54:07 DEBUG| (monitor hmp1) throughput: 268.61 mbps >>> 22:54:07 DEBUG| (monitor hmp1) remaining ram: 103056 kbytes >>> 22:54:07 DEBUG| (monitor hmp1) total ram: 1065796 kbytes >>> 22:54:07 DEBUG| (monitor hmp1) duplicate: 224304 pages >>> 22:54:07 DEBUG| (monitor hmp1) skipped: 0 pages >>> 22:54:07 DEBUG| (monitor hmp1) normal: 16380 pages >>> 22:54:07 DEBUG| (monitor hmp1) normal bytes: 65520 kbytes >>> 22:54:07 DEBUG| (monitor hmp1) dirty sync count: 0 >>> 22:54:09 DEBUG| Waiting for migration to complete (4.006475 secs) >>> 22:54:09 DEBUG| (monitor hmp1) Sending command 'info migrate' >>> 22:54:09 DEBUG| Send command: info migrate >>> 22:54:09 DEBUG| (monitor hmp1) Response to 'info migrate' >>> 22:54:09 DEBUG| (monitor hmp1) capabilities: xbzrle: off >>> rdma-pin-all: off auto-converge: off zero-blocks: off >>> 22:54:09 DEBUG| (monitor hmp1) Migration status: active >>> 22:54:09 DEBUG| (monitor hmp1) total time: 4008 milliseconds >>> 22:54:09 DEBUG| (monitor hmp1) expected downtime: 300 milliseconds >>> 22:54:09 DEBUG| (monitor hmp1) setup: 3 milliseconds >>> 22:54:09 DEBUG| (monitor hmp1) transferred ram: 131397 kbytes >>> 22:54:09 DEBUG| (monitor hmp1) throughput: 268.57 mbps >>> 22:54:09 DEBUG| (monitor hmp1) remaining ram: 31392 kbytes >>> 22:54:09 DEBUG| (monitor hmp1) total ram: 1065796 kbytes >>> 22:54:09 DEBUG| (monitor hmp1) duplicate: 226311 pages >>> 22:54:09 DEBUG| (monitor hmp1) skipped: 0 pages >>> 22:54:09 DEBUG| (monitor hmp1) normal: 32289 pages >>> 22:54:09 DEBUG| (monitor hmp1) normal bytes: 129156 kbytes >>> 22:54:09 DEBUG| (monitor hmp1) dirty sync count: 0 >>> 22:54:11 DEBUG| Waiting for migration to complete (6.011556 secs) >>> 22:54:11 DEBUG| (monitor hmp1) Sending command 'info migrate' >>> 22:54:11 DEBUG| Send command: info migrate >>> 22:54:32 WARNI| virt-tests-vm1 is not alive. Can not query the register >>> status >>> 22:58:11 DEBUG| Destroying VM virt-tests-vm1 (PID 10880) >>> 22:58:11 DEBUG| Ending VM virt-tests-vm1 process (monitor) >>> 22:58:11 INFO | [qemu output] (Process terminated with status 0) >>> 22:58:11 DEBUG| VM virt-tests-vm1 down (monitor) >>> 22:58:11 DEBUG| Host does not support OpenVSwitch: Missing command: >>> ovs-vswitchd >>> 22:58:11 DEBUG| Destroying VM virt-tests-vm1 (PID 10763) >>> 22:58:11 DEBUG| Shutting down VM virt-tests-vm1 (shell) >>> 22:58:11 DEBUG| Login command: 'ssh -o UserKnownHostsFile=/dev/null -o >>> StrictHostKeyChecking=no -o PreferredAuthentications=password -p 5000 >>> root@192.168.10.200' >>> 22:58:11 DEBUG| virt-tests-vm1 alive now. Used to failed to get register >>> info from guest 9 times >>> 22:58:13 INFO | [qemu output] (Process terminated with status 0) >>> 22:58:13 DEBUG| VM virt-tests-vm1 down (shell) >>> 22:58:14 DEBUG| Host does not support OpenVSwitch: Missing command: >>> ovs-vswitchd >>> 22:58:14 DEBUG| Checking image file >>> /mnt/kvm/src/virt-test/shared/data/images/jeos-20-64.qcow2 >>> 22:58:14 DEBUG| Running '/bin/qemu-img info >>> /mnt/kvm/src/virt-test/shared/data/images/jeos-20-64.qcow2' >>> 22:58:14 DEBUG| Running '/bin/qemu-img check >>> /mnt/kvm/src/virt-test/shared/data/images/jeos-20-64.qcow2' >>> 22:58:14 ERROR| [stdout] >>> 22:58:14 ERROR| [stdout] 1 errors were found on the image. >>> 22:58:14 ERROR| [stdout] Data may be corrupted, or further writes to the >>> image may corrupt it. >>> 22:58:14 ERROR| [stdout] 13495/163840 = 8.24% allocated, 0.03% >>> fragmented, 0.00% compressed clusters >>> 22:58:14 ERROR| [stdout] Image end offset: 885129216 >>> 22:58:14 ERROR| [stderr] ERROR cluster 13505 refcount=1 reference=2 >>> 22:58:14 ERROR| Errors found on image: >>> '/mnt/kvm/src/virt-test/shared/data/images/jeos-20-64.qcow2' >>> 22:58:14 WARNI| virt-tests-vm1 is not alive. Can not query the register >>> status >>> 22:58:14 DEBUG| Thread quit. Used to failed to get register info from >>> guest 20150212-225320-Mb1E4VV7 for 1 times. >>> >>> >> >> >> -- >> Lucas >> > > > > -- > Lucas > -- Lucas