On Tue, Sep 27, 2016 at 04:29:55PM +0200, Roman Penyaev wrote: > On Tue, Sep 27, 2016 at 4:06 PM, Stefan Hajnoczi <stefa...@redhat.com> wrote: > > Commit 0ed93d84edabc7656f5c998ae1a346fe8b94ca54 ("linux-aio: process > > completions from ioq_submit()") added an optimization that processes > > completions each time ioq_submit() returns with requests in flight. > > This commit introduces a "Co-routine re-entered recursively" error which > > can be triggered with -drive format=qcow2,aio=native. > > > > Fam Zheng <f...@redhat.com>, Kevin Wolf <kw...@redhat.com>, and I > > debugged the following backtrace: > > > > (gdb) bt > > #0 0x00007ffff0a046f5 in raise () at /lib64/libc.so.6 > > #1 0x00007ffff0a062fa in abort () at /lib64/libc.so.6 > > #2 0x0000555555ac0013 in qemu_coroutine_enter (co=0x5555583464d0) at > > util/qemu-coroutine.c:113 > > #3 0x0000555555a4b663 in qemu_laio_process_completions > > (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:218 > > #4 0x0000555555a4b874 in ioq_submit (s=s@entry=0x555557e2f7f0) at > > block/linux-aio.c:331 > > #5 0x0000555555a4ba12 in laio_do_submit (fd=fd@entry=13, > > laiocb=laiocb@entry=0x555559d38ae0, offset=offset@entry=2932727808, > > type=type@entry=1) at block/linux-aio.c:383 > > #6 0x0000555555a4bbd3 in laio_co_submit (bs=<optimized out>, > > s=0x555557e2f7f0, fd=13, offset=2932727808, qiov=0x555559d38e20, type=1) at > > block/linux-aio.c:402 > > #7 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x55555663bcb0, > > offset=offset@entry=2932727808, bytes=bytes@entry=8192, > > qiov=qiov@entry=0x555559d38e20, flags=0) at block/io.c:804 > > #8 0x0000555555a52b34 in bdrv_aligned_preadv > > (bs=bs@entry=0x55555663bcb0, req=req@entry=0x555559d38d20, > > offset=offset@entry=2932727808, bytes=bytes@entry=8192, > > align=align@entry=512, qiov=qiov@entry=0x555559d38e20, flags=0) at > > block/io.c:1041 > > #9 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, > > offset=2932727808, bytes=8192, qiov=qiov@entry=0x555559d38e20, > > flags=flags@entry=0) at block/io.c:1133 > > #10 0x0000555555a29629 in qcow2_co_preadv (bs=0x555556635890, > > offset=6178725888, bytes=8192, qiov=0x555557527840, flags=<optimized out>) > > at block/qcow2.c:1509 > > #11 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x555556635890, > > offset=offset@entry=6178725888, bytes=bytes@entry=8192, > > qiov=qiov@entry=0x555557527840, flags=0) at block/io.c:804 > > #12 0x0000555555a52b34 in bdrv_aligned_preadv > > (bs=bs@entry=0x555556635890, req=req@entry=0x555559d39000, > > offset=offset@entry=6178725888, bytes=bytes@entry=8192, > > align=align@entry=1, qiov=qiov@entry=0x555557527840, flags=0) at > > block/io.c:1041 > > #13 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, > > offset=offset@entry=6178725888, bytes=bytes@entry=8192, > > qiov=qiov@entry=0x555557527840, flags=flags@entry=0) at block/io.c:1133 > > #14 0x0000555555a4515a in blk_co_preadv (blk=0x5555566356d0, > > offset=6178725888, bytes=8192, qiov=0x555557527840, flags=0) at > > block/block-backend.c:783 > > #15 0x0000555555a45266 in blk_aio_read_entry (opaque=0x5555577025e0) at > > block/block-backend.c:991 > > #16 0x0000555555ac0cfa in coroutine_trampoline (i0=<optimized out>, > > i1=<optimized out>) at util/coroutine-ucontext.c:78 > > > > It turned out that re-entrant ioq_submit() and completion processing > > between three requests caused this error. The following check is not > > sufficient to prevent recursively entering coroutines: > > > > if (laiocb->co != qemu_coroutine_self()) { > > qemu_coroutine_enter(laiocb->co); > > } > > > > As the following coroutine backtrace shows, not just the current > > coroutine (self) can be entered. There might also be other coroutines > > that are currently entered and transferred control due to the qcow2 lock > > (CoMutex): > > I doubt that that was introduced by the commit you've specified: > 0ed93d84edab. > > Before my patch coroutine was unconditionally entered. The following > is what was changed by 0ed93d84edab: > > if (laiocb->co) { > - qemu_coroutine_enter(laiocb->co); > + /* Jump and continue completion for foreign requests, don't do > + * anything for current request, it will be completed shortly. */ > + if (laiocb->co != qemu_coroutine_self()) { > + qemu_coroutine_enter(laiocb->co); > + }
Unconditionally entering was safe prior to 0ed93d84edab since all coroutines yielded and qemu_coroutine_entered() would be false all the time. Therefore it wasn't necessary to protect against re-entering a coroutine. > If you have a strong reproduction, could you please verify that. The bug is 100% deterministic. Just boot up a guest with -drive format=qcow2,aio=native. I noticed that I forgot to include a second backtrace in the commit description. I am resending the patch series with the missing backtrace. Hopefully that will make the bug clearer. Stefan
signature.asc
Description: PGP signature