On Tue, 09/27 16:18, Stefan Hajnoczi wrote:
> Commit 0ed93d84edabc7656f5c998ae1a346fe8b94ca54 ("linux-aio: process
> completions from ioq_submit()") added an optimization that processes
> completions each time ioq_submit() returns with requests in flight.
> This commit introduces a "Co-routine re-entered recursively" error which
> can be triggered with -drive format=qcow2,aio=native.
> 
> Fam Zheng <f...@redhat.com>, Kevin Wolf <kw...@redhat.com>, and I
> debugged the following backtrace:
> 
>   (gdb) bt
>   #0  0x00007ffff0a046f5 in raise () at /lib64/libc.so.6
>   #1  0x00007ffff0a062fa in abort () at /lib64/libc.so.6
>   #2  0x0000555555ac0013 in qemu_coroutine_enter (co=0x5555583464d0) at 
> util/qemu-coroutine.c:113
>   #3  0x0000555555a4b663 in qemu_laio_process_completions 
> (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:218
>   #4  0x0000555555a4b874 in ioq_submit (s=s@entry=0x555557e2f7f0) at 
> block/linux-aio.c:331
>   #5  0x0000555555a4ba12 in laio_do_submit (fd=fd@entry=13, 
> laiocb=laiocb@entry=0x555559d38ae0, offset=offset@entry=2932727808, 
> type=type@entry=1) at block/linux-aio.c:383
>   #6  0x0000555555a4bbd3 in laio_co_submit (bs=<optimized out>, 
> s=0x555557e2f7f0, fd=13, offset=2932727808, qiov=0x555559d38e20, type=1) at 
> block/linux-aio.c:402
>   #7  0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x55555663bcb0, 
> offset=offset@entry=2932727808, bytes=bytes@entry=8192, 
> qiov=qiov@entry=0x555559d38e20, flags=0) at block/io.c:804
>   #8  0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x55555663bcb0, 
> req=req@entry=0x555559d38d20, offset=offset@entry=2932727808, 
> bytes=bytes@entry=8192, align=align@entry=512, 
> qiov=qiov@entry=0x555559d38e20, flags=0) at block/io.c:1041
>   #9  0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, 
> offset=2932727808, bytes=8192, qiov=qiov@entry=0x555559d38e20, 
> flags=flags@entry=0) at block/io.c:1133
>   #10 0x0000555555a29629 in qcow2_co_preadv (bs=0x555556635890, 
> offset=6178725888, bytes=8192, qiov=0x555557527840, flags=<optimized out>) at 
> block/qcow2.c:1509
>   #11 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x555556635890, 
> offset=offset@entry=6178725888, bytes=bytes@entry=8192, 
> qiov=qiov@entry=0x555557527840, flags=0) at block/io.c:804
>   #12 0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x555556635890, 
> req=req@entry=0x555559d39000, offset=offset@entry=6178725888, 
> bytes=bytes@entry=8192, align=align@entry=1, qiov=qiov@entry=0x555557527840, 
> flags=0) at block/io.c:1041
>   #13 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, 
> offset=offset@entry=6178725888, bytes=bytes@entry=8192, 
> qiov=qiov@entry=0x555557527840, flags=flags@entry=0) at block/io.c:1133
>   #14 0x0000555555a4515a in blk_co_preadv (blk=0x5555566356d0, 
> offset=6178725888, bytes=8192, qiov=0x555557527840, flags=0) at 
> block/block-backend.c:783
>   #15 0x0000555555a45266 in blk_aio_read_entry (opaque=0x5555577025e0) at 
> block/block-backend.c:991
>   #16 0x0000555555ac0cfa in coroutine_trampoline (i0=<optimized out>, 
> i1=<optimized out>) at util/coroutine-ucontext.c:78
> 
> It turned out that re-entrant ioq_submit() and completion processing
> between three requests caused this error.  The following check is not
> sufficient to prevent recursively entering coroutines:
> 
>   if (laiocb->co != qemu_coroutine_self()) {
>       qemu_coroutine_enter(laiocb->co);
>   }
> 
> As the following coroutine backtrace shows, not just the current
> coroutine (self) can be entered.  There might also be other coroutines
> that are currently entered and transferred control due to the qcow2 lock
> (CoMutex):
> 
>   (gdb) qemu coroutine 0x5555583464d0
>   #0  0x0000555555ac0c90 in qemu_coroutine_switch 
> (from_=from_@entry=0x5555583464d0, to_=to_@entry=0x5555572f9890, 
> action=action@entry=COROUTINE_ENTER) at util/coroutine-ucontext.c:175
>   #1  0x0000555555abfe54 in qemu_coroutine_enter (co=0x5555572f9890) at 
> util/qemu-coroutine.c:117
>   #2  0x0000555555ac031c in qemu_co_queue_run_restart 
> (co=co@entry=0x5555583462c0) at util/qemu-coroutine-lock.c:60
>   #3  0x0000555555abfe5e in qemu_coroutine_enter (co=0x5555583462c0) at 
> util/qemu-coroutine.c:119
>   #4  0x0000555555a4b663 in qemu_laio_process_completions 
> (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:218
>   #5  0x0000555555a4b874 in ioq_submit (s=s@entry=0x555557e2f7f0) at 
> block/linux-aio.c:331
>   #6  0x0000555555a4ba12 in laio_do_submit (fd=fd@entry=13, 
> laiocb=laiocb@entry=0x55555a338b40, offset=offset@entry=2911477760, 
> type=type@entry=1) at block/linux-aio.c:383
>   #7  0x0000555555a4bbd3 in laio_co_submit (bs=<optimized out>, 
> s=0x555557e2f7f0, fd=13, offset=2911477760, qiov=0x55555a338e80, type=1) at 
> block/linux-aio.c:402
>   #8  0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x55555663bcb0, 
> offset=offset@entry=2911477760, bytes=bytes@entry=8192, 
> qiov=qiov@entry=0x55555a338e80, flags=0) at block/io.c:804
>   #9  0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x55555663bcb0, 
> req=req@entry=0x55555a338d80, offset=offset@entry=2911477760, 
> bytes=bytes@entry=8192, align=align@entry=512, 
> qiov=qiov@entry=0x55555a338e80, flags=0) at block/io.c:1041
>   #10 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, 
> offset=2911477760, bytes=8192, qiov=qiov@entry=0x55555a338e80, 
> flags=flags@entry=0) at block/io.c:1133
>   #11 0x0000555555a29629 in qcow2_co_preadv (bs=0x555556635890, 
> offset=6157475840, bytes=8192, qiov=0x5555575df720, flags=<optimized out>) at 
> block/qcow2.c:1509
>   #12 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x555556635890, 
> offset=offset@entry=6157475840, bytes=bytes@entry=8192, 
> qiov=qiov@entry=0x5555575df720, flags=0) at block/io.c:804
>   #13 0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x555556635890, 
> req=req@entry=0x55555a339060, offset=offset@entry=6157475840, 
> bytes=bytes@entry=8192, align=align@entry=1, qiov=qiov@entry=0x5555575df720, 
> flags=0) at block/io.c:1041
>   #14 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, 
> offset=offset@entry=6157475840, bytes=bytes@entry=8192, 
> qiov=qiov@entry=0x5555575df720, flags=flags@entry=0) at block/io.c:1133
>   #15 0x0000555555a4515a in blk_co_preadv (blk=0x5555566356d0, 
> offset=6157475840, bytes=8192, qiov=0x5555575df720, flags=0) at 
> block/block-backend.c:783
>   #16 0x0000555555a45266 in blk_aio_read_entry (opaque=0x555557231aa0) at 
> block/block-backend.c:991
>   #17 0x0000555555ac0cfa in coroutine_trampoline (i0=<optimized out>, 
> i1=<optimized out>) at util/coroutine-ucontext.c:78
> 
> Use the new qemu_coroutine_entered() function instead of comparing
> against qemu_coroutine_self().  This is correct because:
> 
> 1. If a coroutine is not entered then it must have yielded to wait for
>    I/O completion.  It is therefore safe to enter.
> 
> 2. If a coroutine is entered then it must be in
>    ioq_submit()/qemu_laio_process_completions() because otherwise it
>    would be yielded while waiting for I/O completion.  Therefore it will
>    check laio->ret and return from ioq_submit() instead of yielding,
>    i.e. it's guaranteed not to hang.
> 
> Reported-by: Fam Zheng <f...@redhat.com>
> Tested-by: Fam Zheng <f...@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com>
> ---
>  block/linux-aio.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/block/linux-aio.c b/block/linux-aio.c
> index d4e19d4..1685ec2 100644
> --- a/block/linux-aio.c
> +++ b/block/linux-aio.c
> @@ -94,9 +94,12 @@ static void qemu_laio_process_completion(struct 
> qemu_laiocb *laiocb)
>  
>      laiocb->ret = ret;
>      if (laiocb->co) {
> -        /* Jump and continue completion for foreign requests, don't do
> -         * anything for current request, it will be completed shortly. */
> -        if (laiocb->co != qemu_coroutine_self()) {
> +        /* If the coroutine is already entered it must be in ioq_submit() and
> +         * will notice laio->ret has been filled in when it eventually runs
> +         * later.  Coroutines cannot be entered recursively so avoid doing
> +         * that!
> +         */
> +        if (!qemu_coroutine_entered(laiocb->co)) {
>              qemu_coroutine_enter(laiocb->co);
>          }
>      } else {
> -- 
> 2.7.4
> 

Reviewed-by: Fam Zheng <f...@redhat.com>

Reply via email to