Re: [Qemu-block] [PATCH for-2.12 v2] qemu-iotests: update 185 output

2018-04-08 Thread QingFeng Hao


在 2018/4/4 23:01, Stefan Hajnoczi 写道:
> Commit 4486e89c219c0d1b9bd8dfa0b1dd5b0d51ff2268 ("vl: introduce
> vm_shutdown()") added a bdrv_drain_all() call.  As a side-effect of the
> drain operation the block job iterates one more time than before.  The
> 185 output no longer matches and the test is failing now.
> 
> It may be possible to avoid the superfluous block job iteration, but
> that type of patch is not suitable late in the QEMU 2.12 release cycle.
> 
> This patch simply updates the 185 output file.  The new behavior is
> correct, just not optimal, so make the test pass again.
> 
> Fixes: 4486e89c219c0d1b9bd8dfa0b1dd5b0d51ff2268 ("vl: introduce 
> vm_shutdown()")
> Cc: Kevin Wolf 
> Cc: QingFeng Hao 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  tests/qemu-iotests/185 | 10 ++
>  tests/qemu-iotests/185.out | 12 +++-
>  2 files changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/tests/qemu-iotests/185 b/tests/qemu-iotests/185
> index f5b47e4c1a..298d88d04e 100755
> --- a/tests/qemu-iotests/185
> +++ b/tests/qemu-iotests/185
> @@ -92,9 +92,8 @@ echo === Start commit job and exit qemu ===
>  echo
> 
>  # Note that the reference output intentionally includes the 'offset' field in
> -# BLOCK_JOB_CANCELLED events for all of the following block jobs. They are
> -# predictable and any change in the offsets would hint at a bug in the job
> -# throttling code.
> +# BLOCK_JOB_* events for all of the following block jobs. They are 
> predictable
> +# and any change in the offsets would hint at a bug in the job throttling 
> code.
>  #
>  # In order to achieve these predictable offsets, all of the following tests
>  # use speed=65536. Each job will perform exactly one iteration before it has
> @@ -102,11 +101,14 @@ echo
>  # command to be received (after receiving the command, the rest runs
>  # synchronously, so jobs can arbitrarily continue or complete).
>  #
> +# Jobs present while QEMU is terminating iterate once more due to
> +# bdrv_drain_all().
> +#
>  # The buffer size for commit and streaming is 512k (waiting for 8 seconds 
> after
>  # the first request), for active commit and mirror it's large enough to cover
>  # the full 4M, and for backup it's the qcow2 cluster size, which we know is
>  # 64k. As all of these are at least as large as the speed, we are sure that 
> the
> -# offset doesn't advance after the first iteration before qemu exits.
> +# offset advances exactly twice before qemu exits.
> 
>  _send_qemu_cmd $h \
>  "{ 'execute': 'block-commit',
Reviewed-by: QingFeng Hao 

> diff --git a/tests/qemu-iotests/185.out b/tests/qemu-iotests/185.out
> index 57eaf8d699..2c4b04de73 100644

[...]
> 

-- 
Regards
QingFeng Hao




Re: [Qemu-block] [Qemu-discuss] qemu-img convert stuck

2018-04-08 Thread Benny Zlotnik
$ gdb -p 13024 -batch -ex "thread apply all bt"
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x7f98275cfaff in ppoll () from /lib64/libc.so.6

Thread 1 (Thread 0x7f983e30ab00 (LWP 13024)):
#0  0x7f98275cfaff in ppoll () from /lib64/libc.so.6
#1  0x55b55cf59d69 in qemu_poll_ns ()
#2  0x55b55cf5ba45 in aio_poll ()
#3  0x55b55ceedc0f in bdrv_get_block_status_above ()
#4  0x55b55cea3611 in convert_iteration_sectors ()
#5  0x55b55cea4352 in img_convert ()
#6  0x55b55ce9d819 in main ()


On Sun, Apr 8, 2018 at 10:28 PM, Nir Soffer  wrote:

> On Sun, Apr 8, 2018 at 9:27 PM Benny Zlotnik  wrote:
>
>> Hi,
>>
>> As part of copy operation initiated by rhev got stuck for more than a day
>> and consumes plenty of CPU
>> vdsm 13024  3117 99 Apr07 ?1-06:58:43 /usr/bin/qemu-img
>> convert
>> -p -t none -T none -f qcow2
>> /rhev/data-center/bb422fac-81c5-4fea-8782-3498bb5c8a59/
>> 26989331-2c39-4b34-a7ed-d7dd7703646c/images/597e12b6-
>> 19f5-45bd-868f-767600c7115e/62a5492e-e120-4c25-898e-9f5f5629853e
>> -O raw /rhev/data-center/mnt/mantis-nfs-lif1.lab.eng.tlv2.redhat.com:
>> _vol__service/26989331-2c39-4b34-a7ed-d7dd7703646c/images/
>> 9ece9408-9ca6-48cd-992a-6f590c710672/06d6d3c0-beb8-4b6b-ab00-56523df185da
>>
>> The target image appears to have no data yet:
>> qemu-img info 06d6d3c0-beb8-4b6b-ab00-56523df185da"
>> image: 06d6d3c0-beb8-4b6b-ab00-56523df185da
>> file format: raw
>> virtual size: 120G (128849018880 bytes)
>> disk size: 0
>>
>> strace -p 13024 -tt -T -f shows only:
>> ...
>> 21:13:01.309382 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0,
>> 0},
>> NULL, 8) = 0 (Timeout) <0.10>
>> 21:13:01.309411 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0,
>> 0},
>> NULL, 8) = 0 (Timeout) <0.09>
>> 21:13:01.309440 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0,
>> 0},
>> NULL, 8) = 0 (Timeout) <0.09>
>> 21:13:01.309468 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0,
>> 0},
>> NULL, 8) = 0 (Timeout) <0.10>
>>
>> version: qemu-img-rhev-2.9.0-16.el7_4.13.x86_64
>>
>> What could cause this? I'll provide any additional information needed
>>
>
> A backtrace may help, try:
>
> gdb -p 13024 -batch -ex "thread apply all bt"
>
> Also adding Kevin and qemu-block.
>
> Nir
>


Re: [Qemu-block] [Qemu-discuss] qemu-img convert stuck

2018-04-08 Thread Nir Soffer
On Sun, Apr 8, 2018 at 9:27 PM Benny Zlotnik  wrote:

> Hi,
>
> As part of copy operation initiated by rhev got stuck for more than a day
> and consumes plenty of CPU
> vdsm 13024  3117 99 Apr07 ?1-06:58:43 /usr/bin/qemu-img convert
> -p -t none -T none -f qcow2
>
> /rhev/data-center/bb422fac-81c5-4fea-8782-3498bb5c8a59/26989331-2c39-4b34-a7ed-d7dd7703646c/images/597e12b6-19f5-45bd-868f-767600c7115e/62a5492e-e120-4c25-898e-9f5f5629853e
> -O raw /rhev/data-center/mnt/mantis-nfs-lif1.lab.eng.tlv2.redhat.com:
>
> _vol__service/26989331-2c39-4b34-a7ed-d7dd7703646c/images/9ece9408-9ca6-48cd-992a-6f590c710672/06d6d3c0-beb8-4b6b-ab00-56523df185da
>
> The target image appears to have no data yet:
> qemu-img info 06d6d3c0-beb8-4b6b-ab00-56523df185da"
> image: 06d6d3c0-beb8-4b6b-ab00-56523df185da
> file format: raw
> virtual size: 120G (128849018880 bytes)
> disk size: 0
>
> strace -p 13024 -tt -T -f shows only:
> ...
> 21:13:01.309382 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0, 0},
> NULL, 8) = 0 (Timeout) <0.10>
> 21:13:01.309411 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0, 0},
> NULL, 8) = 0 (Timeout) <0.09>
> 21:13:01.309440 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0, 0},
> NULL, 8) = 0 (Timeout) <0.09>
> 21:13:01.309468 ppoll([{fd=12, events=POLLIN|POLLERR|POLLHUP}], 1, {0, 0},
> NULL, 8) = 0 (Timeout) <0.10>
>
> version: qemu-img-rhev-2.9.0-16.el7_4.13.x86_64
>
> What could cause this? I'll provide any additional information needed
>

A backtrace may help, try:

gdb -p 13024 -batch -ex "thread apply all bt"

Also adding Kevin and qemu-block.

Nir


Re: [Qemu-block] [RFC PATCH 0/8] qemu-img convert with copy offloading

2018-04-08 Thread Fam Zheng
On Fri, 04/06 13:41, Paolo Bonzini wrote:
> On 05/04/2018 14:55, Stefan Hajnoczi wrote:
> > bdrv_copy_file_range() will invoke bdrv_co_copy_file_range_src() on
> > src[qcow2].  The qcow2 block driver will invoke
> > bdrv_co_copy_file_range_src() on src[file].  The file-posix driver will
> > invoke bdrv_co_copy_file_range_dst() on dst[raw].  The raw driver will
> > invoke bdrv_co_copy_file_range_dst() on dst[file], which sees that
> > src_bds (src[file]) is also file-posix and then goes ahead with
> > copy_file_range(2).
> > 
> > In the case where src[qcow2] is on file-posix but dst[raw] is on iSCSI,
> > the iSCSI .bdrv_co_copy_file_range_dst() call fails with -ENOTSUP and
> > the block layer can fall back to a traditional copy operation.
> > 
> > With this approach src[qcow2] could take a lock or keep track of a
> > serializing request struct so that other requests cannot interfere with
> > the operation, and it's done in a natural way since we remain in the
> > qcow2 function until the entire operation completes.  There's no need
> > for bookkeeping structs or callbacks.
> 
> Could there be AB-BA deadlock if the guest attempts a concurrent copy
> from A to B and from B to A?

I don't think bs_src need to hold its locks when calling into bs_dst for mapping
write ranges. So it should be safe.

Fam