Hey Stefan, On Tue, Sep 27, 2016 at 4:06 PM, Stefan Hajnoczi <stefa...@redhat.com> wrote: > Commit 0ed93d84edabc7656f5c998ae1a346fe8b94ca54 ("linux-aio: process > completions from ioq_submit()") added an optimization that processes > completions each time ioq_submit() returns with requests in flight. > This commit introduces a "Co-routine re-entered recursively" error which > can be triggered with -drive format=qcow2,aio=native. > > Fam Zheng <f...@redhat.com>, Kevin Wolf <kw...@redhat.com>, and I > debugged the following backtrace: > > (gdb) bt > #0 0x00007ffff0a046f5 in raise () at /lib64/libc.so.6 > #1 0x00007ffff0a062fa in abort () at /lib64/libc.so.6 > #2 0x0000555555ac0013 in qemu_coroutine_enter (co=0x5555583464d0) at > util/qemu-coroutine.c:113 > #3 0x0000555555a4b663 in qemu_laio_process_completions > (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:218 > #4 0x0000555555a4b874 in ioq_submit (s=s@entry=0x555557e2f7f0) at > block/linux-aio.c:331 > #5 0x0000555555a4ba12 in laio_do_submit (fd=fd@entry=13, > laiocb=laiocb@entry=0x555559d38ae0, offset=offset@entry=2932727808, > type=type@entry=1) at block/linux-aio.c:383 > #6 0x0000555555a4bbd3 in laio_co_submit (bs=<optimized out>, > s=0x555557e2f7f0, fd=13, offset=2932727808, qiov=0x555559d38e20, type=1) at > block/linux-aio.c:402 > #7 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x55555663bcb0, > offset=offset@entry=2932727808, bytes=bytes@entry=8192, > qiov=qiov@entry=0x555559d38e20, flags=0) at block/io.c:804 > #8 0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x55555663bcb0, > req=req@entry=0x555559d38d20, offset=offset@entry=2932727808, > bytes=bytes@entry=8192, align=align@entry=512, > qiov=qiov@entry=0x555559d38e20, flags=0) at block/io.c:1041 > #9 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, > offset=2932727808, bytes=8192, qiov=qiov@entry=0x555559d38e20, > flags=flags@entry=0) at block/io.c:1133 > #10 0x0000555555a29629 in qcow2_co_preadv (bs=0x555556635890, > offset=6178725888, bytes=8192, qiov=0x555557527840, flags=<optimized out>) at > block/qcow2.c:1509 > #11 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x555556635890, > offset=offset@entry=6178725888, bytes=bytes@entry=8192, > qiov=qiov@entry=0x555557527840, flags=0) at block/io.c:804 > #12 0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x555556635890, > req=req@entry=0x555559d39000, offset=offset@entry=6178725888, > bytes=bytes@entry=8192, align=align@entry=1, qiov=qiov@entry=0x555557527840, > flags=0) at block/io.c:1041 > #13 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, > offset=offset@entry=6178725888, bytes=bytes@entry=8192, > qiov=qiov@entry=0x555557527840, flags=flags@entry=0) at block/io.c:1133 > #14 0x0000555555a4515a in blk_co_preadv (blk=0x5555566356d0, > offset=6178725888, bytes=8192, qiov=0x555557527840, flags=0) at > block/block-backend.c:783 > #15 0x0000555555a45266 in blk_aio_read_entry (opaque=0x5555577025e0) at > block/block-backend.c:991 > #16 0x0000555555ac0cfa in coroutine_trampoline (i0=<optimized out>, > i1=<optimized out>) at util/coroutine-ucontext.c:78 > > It turned out that re-entrant ioq_submit() and completion processing > between three requests caused this error. The following check is not > sufficient to prevent recursively entering coroutines: > > if (laiocb->co != qemu_coroutine_self()) { > qemu_coroutine_enter(laiocb->co); > } > > As the following coroutine backtrace shows, not just the current > coroutine (self) can be entered. There might also be other coroutines > that are currently entered and transferred control due to the qcow2 lock > (CoMutex):
I doubt that that was introduced by the commit you've specified: 0ed93d84edab. Before my patch coroutine was unconditionally entered. The following is what was changed by 0ed93d84edab: if (laiocb->co) { - qemu_coroutine_enter(laiocb->co); + /* Jump and continue completion for foreign requests, don't do + * anything for current request, it will be completed shortly. */ + if (laiocb->co != qemu_coroutine_self()) { + qemu_coroutine_enter(laiocb->co); + } If you have a strong reproduction, could you please verify that. What worries me is the following: 1. Issue was introduced before and was unnoticed (ok). 2. Issue - is something else and/or was really introduced by commit 0ed93d84edab (not ok). Of course the 2. is not nice. Thanks. -- Roman