On 23/07/19 21:06, Stefan Hajnoczi wrote:
> The tests/test-bdrv-drain /bdrv-drain/iothread/drain test case does the
> following:
> 
> 1. The preadv coroutine calls aio_bh_schedule_oneshot() and then yields.
> 2. The one-shot BH executes in another AioContext.  All it does is call
>    aio_co_wakeup(preadv_co).
> 3. The preadv coroutine is re-entered and returns.
> 
> There is a race condition in aio_co_wake() where the preadv coroutine
> returns and the test case destroys the preadv IOThread.  aio_co_wake()
> can still be running in the other AioContext and it performs an access
> to the freed IOThread AioContext.
> 
> Here is the race in aio_co_schedule():
> 
>   QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines,
>                             co, co_scheduled_next);
>   <-- race: co may execute before we invoke qemu_bh_schedule()!
>   qemu_bh_schedule(ctx->co_schedule_bh);
> 
> So if co causes ctx to be freed then we're in trouble.  Fix this problem
> by holding a reference to ctx.
> 
> Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com>
> ---
>  util/async.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/util/async.c b/util/async.c
> index 8d2105729c..4e4c7af51e 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -459,9 +459,17 @@ void aio_co_schedule(AioContext *ctx, Coroutine *co)
>          abort();
>      }
>  
> +    /* The coroutine might run and release the last ctx reference before we
> +     * invoke qemu_bh_schedule().  Take a reference to keep ctx alive until
> +     * we're done.
> +     */
> +    aio_context_ref(ctx);
> +
>      QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines,
>                                co, co_scheduled_next);
>      qemu_bh_schedule(ctx->co_schedule_bh);
> +
> +    aio_context_unref(ctx);
>  }
>  
>  void aio_co_wake(struct Coroutine *co)
> 

This must have been painful to debug.

Reviewed-by: Paolo Bonzini <pbonz...@redhat.com>

Paolo

Reply via email to