Am 01.03.2023 um 17:16 hat Stefan Hajnoczi geschrieben:
> On Fri, Feb 03, 2023 at 08:17:28AM -0500, Emanuele Giuseppe Esposito wrote:
> > Remove usage of aio_context_acquire by always submitting asynchronous
> > AIO to the current thread's LinuxAioState.
> > 
> > In order to prevent mistakes from the caller side, avoid passing 
> > LinuxAioState
> > in laio_io_{plug/unplug} and laio_co_submit, and document the functions
> > to make clear that they work in the current thread's AioContext.
> > 
> > Signed-off-by: Emanuele Giuseppe Esposito <eespo...@redhat.com>
> > ---
> >  include/block/aio.h               |  4 ----
> >  include/block/raw-aio.h           | 18 ++++++++++++------
> >  include/sysemu/block-backend-io.h |  6 ++++++
> >  block/file-posix.c                | 10 +++-------
> >  block/linux-aio.c                 | 29 +++++++++++++++++------------
> >  5 files changed, 38 insertions(+), 29 deletions(-)
> > 
> > diff --git a/include/block/aio.h b/include/block/aio.h
> > index 8fba6a3584..b6b396cfcb 100644
> > --- a/include/block/aio.h
> > +++ b/include/block/aio.h
> > @@ -208,10 +208,6 @@ struct AioContext {
> >      struct ThreadPool *thread_pool;
> >  
> >  #ifdef CONFIG_LINUX_AIO
> > -    /*
> > -     * State for native Linux AIO.  Uses aio_context_acquire/release for
> > -     * locking.
> > -     */
> >      struct LinuxAioState *linux_aio;
> >  #endif
> >  #ifdef CONFIG_LINUX_IO_URING
> > diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
> > index f8cda9df91..db614472e6 100644
> > --- a/include/block/raw-aio.h
> > +++ b/include/block/raw-aio.h
> > @@ -49,14 +49,20 @@
> >  typedef struct LinuxAioState LinuxAioState;
> >  LinuxAioState *laio_init(Error **errp);
> >  void laio_cleanup(LinuxAioState *s);
> > -int coroutine_fn laio_co_submit(BlockDriverState *bs, LinuxAioState *s, 
> > int fd,
> > -                                uint64_t offset, QEMUIOVector *qiov, int 
> > type,
> > -                                uint64_t dev_max_batch);
> > +
> > +/* laio_co_submit: submit I/O requests in the thread's current AioContext. 
> > */
> > +int coroutine_fn laio_co_submit(int fd, uint64_t offset, QEMUIOVector 
> > *qiov,
> > +                                int type, uint64_t dev_max_batch);
> > +
> >  void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context);
> >  void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context);
> > -void laio_io_plug(BlockDriverState *bs, LinuxAioState *s);
> > -void laio_io_unplug(BlockDriverState *bs, LinuxAioState *s,
> > -                    uint64_t dev_max_batch);
> > +
> > +/*
> > + * laio_io_plug/unplug work in the thread's current AioContext, therefore 
> > the
> > + * caller must ensure that they are paired in the same IOThread.
> > + */
> > +void laio_io_plug(void);
> > +void laio_io_unplug(uint64_t dev_max_batch);
> >  #endif
> >  /* io_uring.c - Linux io_uring implementation */
> >  #ifdef CONFIG_LINUX_IO_URING
> > diff --git a/include/sysemu/block-backend-io.h 
> > b/include/sysemu/block-backend-io.h
> > index 031a27ba10..d41698ccc5 100644
> > --- a/include/sysemu/block-backend-io.h
> > +++ b/include/sysemu/block-backend-io.h
> > @@ -74,8 +74,14 @@ void blk_iostatus_set_err(BlockBackend *blk, int error);
> >  int blk_get_max_iov(BlockBackend *blk);
> >  int blk_get_max_hw_iov(BlockBackend *blk);
> >  
> > +/*
> > + * blk_io_plug/unplug are thread-local operations. This means that multiple
> > + * IOThreads can simultaneously call plug/unplug, but the caller must 
> > ensure
> > + * that each unplug() is called in the same IOThread of the matching 
> > plug().
> > + */
> >  void blk_io_plug(BlockBackend *blk);
> >  void blk_io_unplug(BlockBackend *blk);
> > +
> >  AioContext *blk_get_aio_context(BlockBackend *blk);
> >  BlockAcctStats *blk_get_stats(BlockBackend *blk);
> >  void *blk_aio_get(const AIOCBInfo *aiocb_info, BlockBackend *blk,
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index fa227d9d14..fa99d1c25a 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -2095,10 +2095,8 @@ static int coroutine_fn raw_co_prw(BlockDriverState 
> > *bs, uint64_t offset,
> >  #endif
> >  #ifdef CONFIG_LINUX_AIO
> >      } else if (s->use_linux_aio) {
> > -        LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
> >          assert(qiov->size == bytes);
> > -        return laio_co_submit(bs, aio, s->fd, offset, qiov, type,
> > -                              s->aio_max_batch);
> > +        return laio_co_submit(s->fd, offset, qiov, type, s->aio_max_batch);
> 
> I'm having second thoughts here. This is correct in an IOThread today,
> but the main loop thread case concerns me:
> 
> This patch changes behavior when the main loop or vCPU thread submits
> I/O. Before, the IOThread's LinuxAioState would be used. Now the main
> loop's LinuxAioState will be used instead and aio callbacks will be
> invoked in the main loop thread instead of the IOThread.

You mean we have a device that has a separate iothread, but a request is
submitted from the main thread? This isn't even allowed today; if a node
is in an iothread, all I/O must be submitted from that iothread. Do you
know any code that does submit I/O from the main thread instead?

> This change will be fine when QEMU block layer support is complete, but
> will does it already work today?
> 
> When blk_preadv() is called from a non-coroutine in the main loop thread
> then the coroutine is spawned in the IOThread today. So we avoid the
> issue.
> 
> But when blk_preadv() is called from a coroutine in the main loop thread
> we'll have multi-queue activity (I/O being processed in both the main
> loop thread and IOThread).

That's a bug then. But calling blk_*() from coroutine context should be
quite rare anyway in the current code. I can think of .run in the block
jobs and possible some exports.

Actually, we may have a bug in the export code. blk_exp_add() enables
support for changing iothreads only depending on whether the user
requested it, but doesn't check if the export driver actually supports
it. Most do, but FUSE just ignores AioContext changes (it does use the
initial iothread of the node, though, not always the main thread).

> I like this patch series and think it's the right thing to do, but I'm
> not sure if it's safe to do this yet. We first need to be sure all aio
> callbacks are thread-safe (may are already, but there are probably
> still some that are not).

I would argue that if we do have buggy code like this, the new code is
probably better than the old one because getting callbacks scheduled in
a different thread is the more surprising behaviour. It's probably done
by code that doesn't expect to ever run in iothreads, so staying in the
main loop certainly feels safer.

Kevin

Attachment: signature.asc
Description: PGP signature

Reply via email to