On 23/07/19 21:06, Stefan Hajnoczi wrote: > The tests/test-bdrv-drain /bdrv-drain/iothread/drain test case does the > following: > > 1. The preadv coroutine calls aio_bh_schedule_oneshot() and then yields. > 2. The one-shot BH executes in another AioContext. All it does is call > aio_co_wakeup(preadv_co). > 3. The preadv coroutine is re-entered and returns. > > There is a race condition in aio_co_wake() where the preadv coroutine > returns and the test case destroys the preadv IOThread. aio_co_wake() > can still be running in the other AioContext and it performs an access > to the freed IOThread AioContext. > > Here is the race in aio_co_schedule(): > > QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines, > co, co_scheduled_next); > <-- race: co may execute before we invoke qemu_bh_schedule()! > qemu_bh_schedule(ctx->co_schedule_bh); > > So if co causes ctx to be freed then we're in trouble. Fix this problem > by holding a reference to ctx. > > Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com> > --- > util/async.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/util/async.c b/util/async.c > index 8d2105729c..4e4c7af51e 100644 > --- a/util/async.c > +++ b/util/async.c > @@ -459,9 +459,17 @@ void aio_co_schedule(AioContext *ctx, Coroutine *co) > abort(); > } > > + /* The coroutine might run and release the last ctx reference before we > + * invoke qemu_bh_schedule(). Take a reference to keep ctx alive until > + * we're done. > + */ > + aio_context_ref(ctx); > + > QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines, > co, co_scheduled_next); > qemu_bh_schedule(ctx->co_schedule_bh); > + > + aio_context_unref(ctx); > } > > void aio_co_wake(struct Coroutine *co) >
This must have been painful to debug. Reviewed-by: Paolo Bonzini <pbonz...@redhat.com> Paolo