OK, indeed I can reproduce the problem. It's specific to the filedescriptor
migration. An easy way to reproduce it is by doing:

git clone https://github.com/autotest/virt-test.git

cd virt-test
./run -t qemu --bootstrap
./run -t qemu
--tests type_specific.io-github-autotest-qemu.migrate.default.fd

That's it. I will see if I can bisect this quickly to pinpoint the QEMU
commit that brought the regression.

The qemu master commit I just tested is:

commit 449008f86418583a1f0fb946cf91ee7b4797317d
Merge: 5c697ae bc5baff
Author: Peter Maydell <peter.mayd...@linaro.org>
Date:   Wed Feb 11 05:14:41 2015 +0000

    Merge remote-tracking branch
'remotes/awilliam/tags/vfio-update-20150210.0' into staging

    RCU fixes and cleanup (Paolo Bonzini)
    Switch to v2 IOMMU interface (Alex Williamson)
    DEBUG build fix (Alexey Kardashevskiy)

    # gpg: Signature made Tue 10 Feb 2015 17:37:06 GMT using RSA key ID
3BB08B22
    # gpg: Good signature from "Alex Williamson <alex.william...@redhat.com
>"
    # gpg:                 aka "Alex Williamson <a...@shazbot.org>"
    # gpg:                 aka "Alex Williamson <alwil...@redhat.com>"
    # gpg:                 aka "Alex Williamson <alex.l.william...@gmail.com
>"

    * remotes/awilliam/tags/vfio-update-20150210.0:
      vfio: Fix debug message compile error
      vfio: Use vfio type1 v2 IOMMU interface
      vfio: unmap and free BAR data in instance_finalize
      vfio: free dynamically-allocated data in instance_finalize
      vfio: cleanup vfio_get_device error path, remove vfio_populate_device
callback
      memory: unregister AddressSpace MemoryListener within BQL

    Signed-off-by: Peter Maydell <peter.mayd...@linaro.org>


On Thu, Feb 12, 2015 at 8:19 PM, Lucas Meneghel Rodrigues <look...@gmail.com
> wrote:

> From what the log says, after a round of migrations 'info migrate' does
> not respond after 4 minutes, timing out. Virt Test then shuts down the VM.
> When it tries to check the qcow2 image, it is corrupted. I'm checking out
> the latest master to see how reproducible this problem is.
>
> On Thu, Feb 12, 2015 at 8:12 PM, Juan Quintela <quint...@redhat.com>
> wrote:
>
>>
>> Hi
>>
>> while testing my changes I noticed that virt-test was failing.  I
>> check-out master, and failures are there.
>>
>> This is one extract of the log after the 1st failure.  Notice that it
>> fails randomly, not every time.
>>
>> I have to go to bed right now, so if anybody beats me with a fix, I
>> would be happy when I wakeup.
>>
>> Thanks, Juan.
>>
>>
>> 22:54:07 DEBUG| (monitor hmp1) Response to 'info migrate'
>> 22:54:07 DEBUG| (monitor hmp1)    capabilities: xbzrle: off rdma-pin-all:
>> off auto-converge: off zero-blocks: off
>> 22:54:07 DEBUG| (monitor hmp1)    Migration status: active
>> 22:54:07 DEBUG| (monitor hmp1)    total time: 2003 milliseconds
>> 22:54:07 DEBUG| (monitor hmp1)    expected downtime: 300 milliseconds
>> 22:54:07 DEBUG| (monitor hmp1)    setup: 3 milliseconds
>> 22:54:07 DEBUG| (monitor hmp1)    transferred ram: 67619 kbytes
>> 22:54:07 DEBUG| (monitor hmp1)    throughput: 268.61 mbps
>> 22:54:07 DEBUG| (monitor hmp1)    remaining ram: 103056 kbytes
>> 22:54:07 DEBUG| (monitor hmp1)    total ram: 1065796 kbytes
>> 22:54:07 DEBUG| (monitor hmp1)    duplicate: 224304 pages
>> 22:54:07 DEBUG| (monitor hmp1)    skipped: 0 pages
>> 22:54:07 DEBUG| (monitor hmp1)    normal: 16380 pages
>> 22:54:07 DEBUG| (monitor hmp1)    normal bytes: 65520 kbytes
>> 22:54:07 DEBUG| (monitor hmp1)    dirty sync count: 0
>> 22:54:09 DEBUG| Waiting for migration to complete (4.006475 secs)
>> 22:54:09 DEBUG| (monitor hmp1) Sending command 'info migrate'
>> 22:54:09 DEBUG| Send command: info migrate
>> 22:54:09 DEBUG| (monitor hmp1) Response to 'info migrate'
>> 22:54:09 DEBUG| (monitor hmp1)    capabilities: xbzrle: off rdma-pin-all:
>> off auto-converge: off zero-blocks: off
>> 22:54:09 DEBUG| (monitor hmp1)    Migration status: active
>> 22:54:09 DEBUG| (monitor hmp1)    total time: 4008 milliseconds
>> 22:54:09 DEBUG| (monitor hmp1)    expected downtime: 300 milliseconds
>> 22:54:09 DEBUG| (monitor hmp1)    setup: 3 milliseconds
>> 22:54:09 DEBUG| (monitor hmp1)    transferred ram: 131397 kbytes
>> 22:54:09 DEBUG| (monitor hmp1)    throughput: 268.57 mbps
>> 22:54:09 DEBUG| (monitor hmp1)    remaining ram: 31392 kbytes
>> 22:54:09 DEBUG| (monitor hmp1)    total ram: 1065796 kbytes
>> 22:54:09 DEBUG| (monitor hmp1)    duplicate: 226311 pages
>> 22:54:09 DEBUG| (monitor hmp1)    skipped: 0 pages
>> 22:54:09 DEBUG| (monitor hmp1)    normal: 32289 pages
>> 22:54:09 DEBUG| (monitor hmp1)    normal bytes: 129156 kbytes
>> 22:54:09 DEBUG| (monitor hmp1)    dirty sync count: 0
>> 22:54:11 DEBUG| Waiting for migration to complete (6.011556 secs)
>> 22:54:11 DEBUG| (monitor hmp1) Sending command 'info migrate'
>> 22:54:11 DEBUG| Send command: info migrate
>> 22:54:32 WARNI| virt-tests-vm1 is not alive. Can not query the register
>> status
>> 22:58:11 DEBUG| Destroying VM virt-tests-vm1 (PID 10880)
>> 22:58:11 DEBUG| Ending VM virt-tests-vm1 process (monitor)
>> 22:58:11 INFO | [qemu output] (Process terminated with status 0)
>> 22:58:11 DEBUG| VM virt-tests-vm1 down (monitor)
>> 22:58:11 DEBUG| Host does not support OpenVSwitch: Missing command:
>> ovs-vswitchd
>> 22:58:11 DEBUG| Destroying VM virt-tests-vm1 (PID 10763)
>> 22:58:11 DEBUG| Shutting down VM virt-tests-vm1 (shell)
>> 22:58:11 DEBUG| Login command: 'ssh -o UserKnownHostsFile=/dev/null -o
>> StrictHostKeyChecking=no -o PreferredAuthentications=password -p 5000
>> root@192.168.10.200'
>> 22:58:11 DEBUG| virt-tests-vm1 alive now. Used to failed to get register
>> info from guest 9 times
>> 22:58:13 INFO | [qemu output] (Process terminated with status 0)
>> 22:58:13 DEBUG| VM virt-tests-vm1 down (shell)
>> 22:58:14 DEBUG| Host does not support OpenVSwitch: Missing command:
>> ovs-vswitchd
>> 22:58:14 DEBUG| Checking image file
>> /mnt/kvm/src/virt-test/shared/data/images/jeos-20-64.qcow2
>> 22:58:14 DEBUG| Running '/bin/qemu-img info
>> /mnt/kvm/src/virt-test/shared/data/images/jeos-20-64.qcow2'
>> 22:58:14 DEBUG| Running '/bin/qemu-img check
>> /mnt/kvm/src/virt-test/shared/data/images/jeos-20-64.qcow2'
>> 22:58:14 ERROR| [stdout]
>> 22:58:14 ERROR| [stdout] 1 errors were found on the image.
>> 22:58:14 ERROR| [stdout] Data may be corrupted, or further writes to the
>> image may corrupt it.
>> 22:58:14 ERROR| [stdout] 13495/163840 = 8.24% allocated, 0.03%
>> fragmented, 0.00% compressed clusters
>> 22:58:14 ERROR| [stdout] Image end offset: 885129216
>> 22:58:14 ERROR| [stderr] ERROR cluster 13505 refcount=1 reference=2
>> 22:58:14 ERROR| Errors found on image:
>> '/mnt/kvm/src/virt-test/shared/data/images/jeos-20-64.qcow2'
>> 22:58:14 WARNI| virt-tests-vm1 is not alive. Can not query the register
>> status
>> 22:58:14 DEBUG| Thread quit. Used to failed to get register info from
>> guest 20150212-225320-Mb1E4VV7 for 1 times.
>>
>>
>
>
> --
> Lucas
>



-- 
Lucas

Reply via email to