On 05/09/2018 11:33, Sergio Lopez wrote: > AIO Coroutines shouldn't by managed by an AioContext different than the > one assigned when they are created. aio_co_enter avoids entering a > coroutine from a different AioContext, calling aio_co_schedule instead. > > Scheduled coroutines are then entered by co_schedule_bh_cb using > qemu_coroutine_enter, which just calls qemu_aio_coroutine_enter with the > current AioContext obtained with qemu_get_current_aio_context. > Eventually, co->ctx will be set to the AioContext passed as an argument > to qemu_aio_coroutine_enter. > > This means that, if an IO Thread's AioConext is being processed by the > Main Thread (due to aio_poll being called with a BDS AioContext, as it > happens in AIO_WAIT_WHILE among other places), the AioContext from some > coroutines may be wrongly replaced with the one from the Main Thread. > > This is the root cause behind some crashes, mainly triggered by the > drain code at block/io.c. The most common are these abort and failed > assertion: > > util/async.c:aio_co_schedule > 456 if (scheduled) { > 457 fprintf(stderr, > 458 "%s: Co-routine was already scheduled in '%s'\n", > 459 __func__, scheduled); > 460 abort(); > 461 } > > util/qemu-coroutine-lock.c: > 286 assert(mutex->holder == self); > > But it's also known to cause random errors at different locations, and > even SIGSEGV with broken coroutine backtraces. > > By using qemu_aio_coroutine_enter directly in co_schedule_bh_cb, we can > pass the correct AioContext as an argument, making sure co->ctx is not > wrongly altered. > > Signed-off-by: Sergio Lopez <s...@redhat.com> > --- > util/async.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/util/async.c b/util/async.c > index 05979f8014..c10642a385 100644 > --- a/util/async.c > +++ b/util/async.c > @@ -400,7 +400,7 @@ static void co_schedule_bh_cb(void *opaque) > > /* Protected by write barrier in qemu_aio_coroutine_enter */ > atomic_set(&co->scheduled, NULL); > - qemu_coroutine_enter(co); > + qemu_aio_coroutine_enter(ctx, co); > aio_context_release(ctx); > } > } >
Ouch. Reviewed-by: Paolo Bonzini <pbonz...@redhat.com> Paolo