Re: [Qemu-devel] [PATCH v2 1/1] qemu-img: wait for convert coroutines to complete

2017-04-24 Thread Anton Nefedov

On 04/21/2017 03:37 PM, Peter Lieven wrote:

Am 21.04.2017 um 14:19 schrieb Anton Nefedov:

On 04/21/2017 01:44 PM, Peter Lieven wrote:

Am 21.04.2017 um 12:04 schrieb Anton Nefedov:

On error path (like i/o error in one of the coroutines), it's required to
  - wait for coroutines completion before cleaning the common structures
  - reenter dependent coroutines so they ever finish

Introduced in 2d9187bc65.

Signed-off-by: Anton Nefedov 
---
[..]




And what if we error out in the read path? Wouldn't be something like this 
easier?


diff --git a/qemu-img.c b/qemu-img.c
index 22f559a..4ff1085 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1903,6 +1903,16 @@ static int convert_do_copy(ImgConvertState *s)
 main_loop_wait(false);
 }

+/* on error path we need to enter all coroutines that are still
+ * running before cleaning up common structures */
+if (s->ret) {
+for (i = 0; i < s->num_coroutines; i++) {
+ if (s->co[i]) {
+ qemu_coroutine_enter(s->co[i]);
+ }
+}
+}
+
 if (s->compressed && !s->ret) {
 /* signal EOF to align */
 ret = blk_pwrite_compressed(s->target, 0, NULL, 0);


Peter



seemed a bit too daring to me to re-enter every coroutine potentially including 
the ones that yielded waiting for I/O completion.
If that's ok - that is for sure easier :)


I think we should enter every coroutine that is still running and have it 
terminate correctly. It was my mistake that I have not
done this in the original patch. Can you check if the above fixes your test 
cases that triggered the bug?



hi, sorry I'm late with the answer

this segfaults in bdrv_close(). Looks like it tries to finish some i/o 
which coroutine we have already entered and terminated?


(gdb) run
Starting program: /vz/anefedov/qemu-build/us/./qemu-img convert -O qcow2 
./harddisk.hdd.c ./harddisk.hdd

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7fffeac2d700 (LWP 436020)]
[New Thread 0x7fffe4ed6700 (LWP 436021)]
qemu-img: error while reading sector 20480: Input/output error
qemu-img: error while writing sector 19712: Operation now in progress

Program received signal SIGSEGV, Segmentation fault.
aio_co_wake (co=0x0) at /mnt/code/us-qemu/util/async.c:454
454 ctx = atomic_read(>ctx);
(gdb) bt
#0  aio_co_wake (co=0x0) at /mnt/code/us-qemu/util/async.c:454
/* [Anton]: thread_pool_co_cb () here */
#1  0x55634629 in thread_pool_completion_bh 
(opaque=0x55cfe020) at /mnt/code/us-qemu/util/thread-pool.c:189
#2  0x55633b31 in aio_bh_call (bh=0x55cfe0f0) at 
/mnt/code/us-qemu/util/async.c:90
#3  aio_bh_poll (ctx=ctx@entry=0x55cee6d0) at 
/mnt/code/us-qemu/util/async.c:118
#4  0x55636f14 in aio_poll (ctx=ctx@entry=0x55cee6d0, 
blocking=) at /mnt/code/us-qemu/util/aio-posix.c:682
#5  0x555c52d4 in bdrv_drain_recurse 
(bs=bs@entry=0x55d22560) at /mnt/code/us-qemu/block/io.c:164
#6  0x555c5aed in bdrv_drained_begin 
(bs=bs@entry=0x55d22560) at /mnt/code/us-qemu/block/io.c:248
#7  0x55581443 in bdrv_close (bs=0x55d22560) at 
/mnt/code/us-qemu/block.c:2909

#8  bdrv_delete (bs=0x55d22560) at /mnt/code/us-qemu/block.c:3100
#9  bdrv_unref (bs=0x55d22560) at /mnt/code/us-qemu/block.c:4087
#10 0x555baf44 in blk_remove_bs (blk=blk@entry=0x55d22380) 
at /mnt/code/us-qemu/block/block-backend.c:552
#11 0x555bb173 in blk_delete (blk=0x55d22380) at 
/mnt/code/us-qemu/block/block-backend.c:238
#12 blk_unref (blk=blk@entry=0x55d22380) at 
/mnt/code/us-qemu/block/block-backend.c:282
#13 0x5557a22c in img_convert (argc=, 
argv=) at /mnt/code/us-qemu/qemu-img.c:2359
#14 0x55574189 in main (argc=5, argv=0x7fffe4a0) at 
/mnt/code/us-qemu/qemu-img.c:4464




Peter



/Anton



Re: [Qemu-devel] [PATCH v2 1/1] qemu-img: wait for convert coroutines to complete

2017-04-21 Thread Anton Nefedov

On 04/21/2017 01:44 PM, Peter Lieven wrote:

Am 21.04.2017 um 12:04 schrieb Anton Nefedov:

On error path (like i/o error in one of the coroutines), it's required to
  - wait for coroutines completion before cleaning the common structures
  - reenter dependent coroutines so they ever finish

Introduced in 2d9187bc65.

Signed-off-by: Anton Nefedov 
---
[..]




And what if we error out in the read path? Wouldn't be something like this 
easier?


diff --git a/qemu-img.c b/qemu-img.c
index 22f559a..4ff1085 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1903,6 +1903,16 @@ static int convert_do_copy(ImgConvertState *s)
 main_loop_wait(false);
 }

+/* on error path we need to enter all coroutines that are still
+ * running before cleaning up common structures */
+if (s->ret) {
+for (i = 0; i < s->num_coroutines; i++) {
+ if (s->co[i]) {
+ qemu_coroutine_enter(s->co[i]);
+ }
+}
+}
+
 if (s->compressed && !s->ret) {
 /* signal EOF to align */
 ret = blk_pwrite_compressed(s->target, 0, NULL, 0);


Peter



seemed a bit too daring to me to re-enter every coroutine potentially 
including the ones that yielded waiting for I/O completion.

If that's ok - that is for sure easier :)

Or maybe some intermediate variant, as you suggest but only enter the 
"wait_sector" coroutines (and main_loop_wait for the rest)?

Or do not even re-enter them, like just

-while (s->ret == -EINPROGRESS) {
+while (s->running_coroutines != s->waiting_coroutines) {
 main_loop_wait(false);
 }


/Anton



Re: [Qemu-devel] [PATCH v2 1/1] qemu-img: wait for convert coroutines to complete

2017-04-21 Thread Peter Lieven
Am 21.04.2017 um 14:19 schrieb Anton Nefedov:
> On 04/21/2017 01:44 PM, Peter Lieven wrote:
>> Am 21.04.2017 um 12:04 schrieb Anton Nefedov:
>>> On error path (like i/o error in one of the coroutines), it's required to
>>>   - wait for coroutines completion before cleaning the common structures
>>>   - reenter dependent coroutines so they ever finish
>>>
>>> Introduced in 2d9187bc65.
>>>
>>> Signed-off-by: Anton Nefedov 
>>> ---
>>> [..]
>>>
>>
>>
>> And what if we error out in the read path? Wouldn't be something like this 
>> easier?
>>
>>
>> diff --git a/qemu-img.c b/qemu-img.c
>> index 22f559a..4ff1085 100644
>> --- a/qemu-img.c
>> +++ b/qemu-img.c
>> @@ -1903,6 +1903,16 @@ static int convert_do_copy(ImgConvertState *s)
>>  main_loop_wait(false);
>>  }
>>
>> +/* on error path we need to enter all coroutines that are still
>> + * running before cleaning up common structures */
>> +if (s->ret) {
>> +for (i = 0; i < s->num_coroutines; i++) {
>> + if (s->co[i]) {
>> + qemu_coroutine_enter(s->co[i]);
>> + }
>> +}
>> +}
>> +
>>  if (s->compressed && !s->ret) {
>>  /* signal EOF to align */
>>  ret = blk_pwrite_compressed(s->target, 0, NULL, 0);
>>
>>
>> Peter
>>
>
> seemed a bit too daring to me to re-enter every coroutine potentially 
> including the ones that yielded waiting for I/O completion.
> If that's ok - that is for sure easier :)

I think we should enter every coroutine that is still running and have it 
terminate correctly. It was my mistake that I have not
done this in the original patch. Can you check if the above fixes your test 
cases that triggered the bug?

Peter



Re: [Qemu-devel] [PATCH v2 1/1] qemu-img: wait for convert coroutines to complete

2017-04-21 Thread Peter Lieven
Am 21.04.2017 um 12:04 schrieb Anton Nefedov:
> On error path (like i/o error in one of the coroutines), it's required to
>   - wait for coroutines completion before cleaning the common structures
>   - reenter dependent coroutines so they ever finish
>
> Introduced in 2d9187bc65.
>
> Signed-off-by: Anton Nefedov 
> ---
>  qemu-img.c | 44 +++-
>  1 file changed, 27 insertions(+), 17 deletions(-)
>
> diff --git a/qemu-img.c b/qemu-img.c
> index b220cf7..24950c1 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -1735,6 +1735,27 @@ static int coroutine_fn 
> convert_co_write(ImgConvertState *s, int64_t sector_num,
>  return 0;
>  }
>  
> +static void coroutine_fn convert_reenter_waiting(ImgConvertState *s,
> + uint64_t wr_offs)
> +{
> +int i;
> +if (s->wr_in_order) {
> +/* reenter the coroutine that might have waited
> + * for the write to complete */
> +for (i = 0; i < s->num_coroutines; i++) {
> +if (s->co[i] && s->wait_sector_num[i] == wr_offs) {
> +/*
> + * A -> B -> A cannot occur because A has
> + * s->wait_sector_num[i] == -1 during A -> B.  Therefore
> + * B will never enter A during this time window.
> + */
> +qemu_coroutine_enter(s->co[i]);
> +break;
> +}
> +}
> +}
> +}
> +
>  static void coroutine_fn convert_co_do_copy(void *opaque)
>  {
>  ImgConvertState *s = opaque;
> @@ -1792,6 +1813,7 @@ static void coroutine_fn convert_co_do_copy(void 
> *opaque)
>  error_report("error while reading sector %" PRId64
>   ": %s", sector_num, strerror(-ret));
>  s->ret = ret;
> +convert_reenter_waiting(s, sector_num + n);
>  goto out;
>  }
>  } else if (!s->min_sparse && status == BLK_ZERO) {
> @@ -1803,6 +1825,7 @@ static void coroutine_fn convert_co_do_copy(void 
> *opaque)
>  /* keep writes in order */
>  while (s->wr_offs != sector_num) {
>  if (s->ret != -EINPROGRESS) {
> +convert_reenter_waiting(s, sector_num + n);
>  goto out;
>  }
>  s->wait_sector_num[index] = sector_num;
> @@ -1816,25 +1839,12 @@ static void coroutine_fn convert_co_do_copy(void 
> *opaque)
>  error_report("error while writing sector %" PRId64
>   ": %s", sector_num, strerror(-ret));
>  s->ret = ret;
> +convert_reenter_waiting(s, sector_num + n);
>  goto out;
>  }
>  
> -if (s->wr_in_order) {
> -/* reenter the coroutine that might have waited
> - * for this write to complete */
> -s->wr_offs = sector_num + n;
> -for (i = 0; i < s->num_coroutines; i++) {
> -if (s->co[i] && s->wait_sector_num[i] == s->wr_offs) {
> -/*
> - * A -> B -> A cannot occur because A has
> - * s->wait_sector_num[i] == -1 during A -> B.  Therefore
> - * B will never enter A during this time window.
> - */
> -qemu_coroutine_enter(s->co[i]);
> -break;
> -}
> -}
> -}
> +s->wr_offs = sector_num + n;
> +convert_reenter_waiting(s, s->wr_offs);
>  }
>  
>  out:
> @@ -1899,7 +1909,7 @@ static int convert_do_copy(ImgConvertState *s)
>  qemu_coroutine_enter(s->co[i]);
>  }
>  
> -while (s->ret == -EINPROGRESS) {
> +while (s->running_coroutines) {
>  main_loop_wait(false);
>  }
>  


And what if we error out in the read path? Wouldn't be something like this 
easier?


diff --git a/qemu-img.c b/qemu-img.c
index 22f559a..4ff1085 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1903,6 +1903,16 @@ static int convert_do_copy(ImgConvertState *s)
 main_loop_wait(false);
 }
 
+/* on error path we need to enter all coroutines that are still
+ * running before cleaning up common structures */
+if (s->ret) {
+for (i = 0; i < s->num_coroutines; i++) {
+ if (s->co[i]) {
+ qemu_coroutine_enter(s->co[i]);
+ }
+}
+}
+
 if (s->compressed && !s->ret) {
 /* signal EOF to align */
 ret = blk_pwrite_compressed(s->target, 0, NULL, 0);


Peter