Copying Alex.

OK, after bisecting, this is what I've got:

8118f0950fc77cce7873002a5021172dd6e040b5 is the first bad commit
commit 8118f0950fc77cce7873002a5021172dd6e040b5
Author: Alexander Graf <ag...@suse.de>
Date:   Thu Jan 22 15:01:39 2015 +0100

    migration: Append JSON description of migration stream

    One of the annoyances of the current migration format is the fact that
    it's not self-describing. In fact, it's not properly describing at all.
    Some code randomly scattered throughout QEMU elaborates roughly how to
    read and write a stream of bytes.

    We discussed an idea during KVM Forum 2013 to add a JSON description of
    the migration protocol itself to the migration stream. This patch
    adds a section after the VM_END migration end marker that contains
    description data on what the device sections of the stream are composed
of.

    This approach is backwards compatible with any QEMU version reading the
    stream, because QEMU just stops reading after the VM_END marker and
ignores
    any data following it.

    With an additional external program this allows us to decipher the
    contents of any migration stream and hopefully make migration bugs
easier
    to track down.

    Signed-off-by: Alexander Graf <ag...@suse.de>
    Signed-off-by: Amit Shah <amit.s...@redhat.com>
    Signed-off-by: Juan Quintela <quint...@redhat.com>

:040000 040000 e9a8888ac242a61fbd05bbb0daa3e8877970e738
61df81f831bc86b29f65883523ea95abb36f1ec5 M hw
:040000 040000 fe0659bed17d86c43657c26622d64fd44a1af037
7092a6b6515a3d0077f68ff2d80dbd74597a244f M include
:040000 040000 d90d6f1fe839abf21a45eaba5829d5a6a22abeb1
c2b1dcda197d96657458d699c185e39ae45f3c6c M migration
:100644 100644 98895fee81edfbc659fc42d467e930d06b1afa7d
80407662ad3ed860d33a9d35f5c44b1d19c4612b M savevm.c
:040000 040000 cf218bc2b841cd51ebe3972635be2cfbb1de9dfa
7aaf3d10ef7f73413b228e854fe6f04317151e46 M tests

So there you go. I'm going to sleep, if you need any extra help let me know.

Cheers,

Lucas

On Thu, Feb 12, 2015 at 8:56 PM, Lucas Meneghel Rodrigues <look...@gmail.com
> wrote:

> OK, indeed I can reproduce the problem. It's specific to the
> filedescriptor migration. An easy way to reproduce it is by doing:
>
> git clone https://github.com/autotest/virt-test.git
>
> cd virt-test
> ./run -t qemu --bootstrap
> ./run -t qemu
> --tests type_specific.io-github-autotest-qemu.migrate.default.fd
>
> That's it. I will see if I can bisect this quickly to pinpoint the QEMU
> commit that brought the regression.
>
> The qemu master commit I just tested is:
>
> commit 449008f86418583a1f0fb946cf91ee7b4797317d
> Merge: 5c697ae bc5baff
> Author: Peter Maydell <peter.mayd...@linaro.org>
> Date:   Wed Feb 11 05:14:41 2015 +0000
>
>     Merge remote-tracking branch
> 'remotes/awilliam/tags/vfio-update-20150210.0' into staging
>
>     RCU fixes and cleanup (Paolo Bonzini)
>     Switch to v2 IOMMU interface (Alex Williamson)
>     DEBUG build fix (Alexey Kardashevskiy)
>
>     # gpg: Signature made Tue 10 Feb 2015 17:37:06 GMT using RSA key ID
> 3BB08B22
>     # gpg: Good signature from "Alex Williamson <
> alex.william...@redhat.com>"
>     # gpg:                 aka "Alex Williamson <a...@shazbot.org>"
>     # gpg:                 aka "Alex Williamson <alwil...@redhat.com>"
>     # gpg:                 aka "Alex Williamson <
> alex.l.william...@gmail.com>"
>
>     * remotes/awilliam/tags/vfio-update-20150210.0:
>       vfio: Fix debug message compile error
>       vfio: Use vfio type1 v2 IOMMU interface
>       vfio: unmap and free BAR data in instance_finalize
>       vfio: free dynamically-allocated data in instance_finalize
>       vfio: cleanup vfio_get_device error path, remove
> vfio_populate_device callback
>       memory: unregister AddressSpace MemoryListener within BQL
>
>     Signed-off-by: Peter Maydell <peter.mayd...@linaro.org>
>
>
> On Thu, Feb 12, 2015 at 8:19 PM, Lucas Meneghel Rodrigues <
> look...@gmail.com> wrote:
>
>> From what the log says, after a round of migrations 'info migrate' does
>> not respond after 4 minutes, timing out. Virt Test then shuts down the VM.
>> When it tries to check the qcow2 image, it is corrupted. I'm checking out
>> the latest master to see how reproducible this problem is.
>>
>> On Thu, Feb 12, 2015 at 8:12 PM, Juan Quintela <quint...@redhat.com>
>> wrote:
>>
>>>
>>> Hi
>>>
>>> while testing my changes I noticed that virt-test was failing.  I
>>> check-out master, and failures are there.
>>>
>>> This is one extract of the log after the 1st failure.  Notice that it
>>> fails randomly, not every time.
>>>
>>> I have to go to bed right now, so if anybody beats me with a fix, I
>>> would be happy when I wakeup.
>>>
>>> Thanks, Juan.
>>>
>>>
>>> 22:54:07 DEBUG| (monitor hmp1) Response to 'info migrate'
>>> 22:54:07 DEBUG| (monitor hmp1)    capabilities: xbzrle: off
>>> rdma-pin-all: off auto-converge: off zero-blocks: off
>>> 22:54:07 DEBUG| (monitor hmp1)    Migration status: active
>>> 22:54:07 DEBUG| (monitor hmp1)    total time: 2003 milliseconds
>>> 22:54:07 DEBUG| (monitor hmp1)    expected downtime: 300 milliseconds
>>> 22:54:07 DEBUG| (monitor hmp1)    setup: 3 milliseconds
>>> 22:54:07 DEBUG| (monitor hmp1)    transferred ram: 67619 kbytes
>>> 22:54:07 DEBUG| (monitor hmp1)    throughput: 268.61 mbps
>>> 22:54:07 DEBUG| (monitor hmp1)    remaining ram: 103056 kbytes
>>> 22:54:07 DEBUG| (monitor hmp1)    total ram: 1065796 kbytes
>>> 22:54:07 DEBUG| (monitor hmp1)    duplicate: 224304 pages
>>> 22:54:07 DEBUG| (monitor hmp1)    skipped: 0 pages
>>> 22:54:07 DEBUG| (monitor hmp1)    normal: 16380 pages
>>> 22:54:07 DEBUG| (monitor hmp1)    normal bytes: 65520 kbytes
>>> 22:54:07 DEBUG| (monitor hmp1)    dirty sync count: 0
>>> 22:54:09 DEBUG| Waiting for migration to complete (4.006475 secs)
>>> 22:54:09 DEBUG| (monitor hmp1) Sending command 'info migrate'
>>> 22:54:09 DEBUG| Send command: info migrate
>>> 22:54:09 DEBUG| (monitor hmp1) Response to 'info migrate'
>>> 22:54:09 DEBUG| (monitor hmp1)    capabilities: xbzrle: off
>>> rdma-pin-all: off auto-converge: off zero-blocks: off
>>> 22:54:09 DEBUG| (monitor hmp1)    Migration status: active
>>> 22:54:09 DEBUG| (monitor hmp1)    total time: 4008 milliseconds
>>> 22:54:09 DEBUG| (monitor hmp1)    expected downtime: 300 milliseconds
>>> 22:54:09 DEBUG| (monitor hmp1)    setup: 3 milliseconds
>>> 22:54:09 DEBUG| (monitor hmp1)    transferred ram: 131397 kbytes
>>> 22:54:09 DEBUG| (monitor hmp1)    throughput: 268.57 mbps
>>> 22:54:09 DEBUG| (monitor hmp1)    remaining ram: 31392 kbytes
>>> 22:54:09 DEBUG| (monitor hmp1)    total ram: 1065796 kbytes
>>> 22:54:09 DEBUG| (monitor hmp1)    duplicate: 226311 pages
>>> 22:54:09 DEBUG| (monitor hmp1)    skipped: 0 pages
>>> 22:54:09 DEBUG| (monitor hmp1)    normal: 32289 pages
>>> 22:54:09 DEBUG| (monitor hmp1)    normal bytes: 129156 kbytes
>>> 22:54:09 DEBUG| (monitor hmp1)    dirty sync count: 0
>>> 22:54:11 DEBUG| Waiting for migration to complete (6.011556 secs)
>>> 22:54:11 DEBUG| (monitor hmp1) Sending command 'info migrate'
>>> 22:54:11 DEBUG| Send command: info migrate
>>> 22:54:32 WARNI| virt-tests-vm1 is not alive. Can not query the register
>>> status
>>> 22:58:11 DEBUG| Destroying VM virt-tests-vm1 (PID 10880)
>>> 22:58:11 DEBUG| Ending VM virt-tests-vm1 process (monitor)
>>> 22:58:11 INFO | [qemu output] (Process terminated with status 0)
>>> 22:58:11 DEBUG| VM virt-tests-vm1 down (monitor)
>>> 22:58:11 DEBUG| Host does not support OpenVSwitch: Missing command:
>>> ovs-vswitchd
>>> 22:58:11 DEBUG| Destroying VM virt-tests-vm1 (PID 10763)
>>> 22:58:11 DEBUG| Shutting down VM virt-tests-vm1 (shell)
>>> 22:58:11 DEBUG| Login command: 'ssh -o UserKnownHostsFile=/dev/null -o
>>> StrictHostKeyChecking=no -o PreferredAuthentications=password -p 5000
>>> root@192.168.10.200'
>>> 22:58:11 DEBUG| virt-tests-vm1 alive now. Used to failed to get register
>>> info from guest 9 times
>>> 22:58:13 INFO | [qemu output] (Process terminated with status 0)
>>> 22:58:13 DEBUG| VM virt-tests-vm1 down (shell)
>>> 22:58:14 DEBUG| Host does not support OpenVSwitch: Missing command:
>>> ovs-vswitchd
>>> 22:58:14 DEBUG| Checking image file
>>> /mnt/kvm/src/virt-test/shared/data/images/jeos-20-64.qcow2
>>> 22:58:14 DEBUG| Running '/bin/qemu-img info
>>> /mnt/kvm/src/virt-test/shared/data/images/jeos-20-64.qcow2'
>>> 22:58:14 DEBUG| Running '/bin/qemu-img check
>>> /mnt/kvm/src/virt-test/shared/data/images/jeos-20-64.qcow2'
>>> 22:58:14 ERROR| [stdout]
>>> 22:58:14 ERROR| [stdout] 1 errors were found on the image.
>>> 22:58:14 ERROR| [stdout] Data may be corrupted, or further writes to the
>>> image may corrupt it.
>>> 22:58:14 ERROR| [stdout] 13495/163840 = 8.24% allocated, 0.03%
>>> fragmented, 0.00% compressed clusters
>>> 22:58:14 ERROR| [stdout] Image end offset: 885129216
>>> 22:58:14 ERROR| [stderr] ERROR cluster 13505 refcount=1 reference=2
>>> 22:58:14 ERROR| Errors found on image:
>>> '/mnt/kvm/src/virt-test/shared/data/images/jeos-20-64.qcow2'
>>> 22:58:14 WARNI| virt-tests-vm1 is not alive. Can not query the register
>>> status
>>> 22:58:14 DEBUG| Thread quit. Used to failed to get register info from
>>> guest 20150212-225320-Mb1E4VV7 for 1 times.
>>>
>>>
>>
>>
>> --
>> Lucas
>>
>
>
>
> --
> Lucas
>



-- 
Lucas

Reply via email to