On Tue, Apr 04, 2023 at 03:43:20PM +0200, Paolo Bonzini wrote:
> On 4/3/23 20:29, Stefan Hajnoczi wrote:
> > The aio_disable_external() API temporarily suspends file descriptor
> > monitoring
> > in the event loop. The block layer uses this to prevent new I/O requests
> > being
> > submitted from the guest and elsewhere between bdrv_drained_begin() and
> > bdrv_drained_end().
> >
> > While the block layer still needs to prevent new I/O requests in drained
> > sections, the aio_disable_external() API can be replaced with
> > .drained_begin/end/poll() callbacks that have been added to BdrvChildClass
> > and
> > BlockDevOps.
> >
> > This newer .bdrained_begin/end/poll() approach is attractive because it
> > works
> > without specifying a specific AioContext. The block layer is moving towards
> > multi-queue and that means multiple AioContexts may be processing I/O
> > simultaneously.
> >
> > The aio_disable_external() was always somewhat hacky. It suspends all file
> > descriptors that were registered with is_external=true, even if they have
> > nothing to do with the BlockDriverState graph nodes that are being drained.
> > It's better to solve a block layer problem in the block layer than to have
> > an
> > odd event loop API solution.
> >
> > That covers the motivation for this change, now on to the specifics of this
> > series:
> >
> > While it would be nice if a single conceptual approach could be applied to
> > all
> > is_external=true file descriptors, I ended up looking at callers on a
> > case-by-case basis. There are two general ways I migrated code away from
> > is_external=true:
> >
> > 1. Block exports are typically best off unregistering fds in
> > .drained_begin()
> > and registering them again in .drained_end(). The .drained_poll()
> > function
> > waits for in-flight requests to finish using a reference counter.
> >
> > 2. Emulated storage controllers like virtio-blk and virtio-scsi are a little
> > simpler. They can rely on BlockBackend's request queuing during drain
> > feature. Guest I/O request coroutines are suspended in a drained
> > section and
> > resume upon the end of the drained section.
>
> Sorry, I disagree with this.
>
> Request queuing was shown to cause deadlocks; Hanna's latest patch is piling
> another hack upon it, instead in my opinion we should go in the direction of
> relying _less_ (or not at all) on request queuing.
>
> I am strongly convinced that request queuing must apply only after
> bdrv_drained_begin has returned, which would also fix the IDE TRIM bug
> reported by Fiona Ebner. The possible livelock scenario is generally not a
> problem because 1) outside an iothread you have anyway the BQL that prevents
> a vCPU from issuing more I/O operations during bdrv_drained_begin 2) in
> iothreads you have aio_disable_external() instead of .drained_begin().
>
> It is also less tidy to start a request during the drained_begin phase,
> because a request that has been submitted has to be completed (cancel
> doesn't really work).
>
> So in an ideal world, request queuing would not only apply only after
> bdrv_drained_begin has returned, it would log a warning and .drained_begin()
> should set up things so that there are no such warnings.
That's fine, I will give .drained_begin/end/poll() a try with virtio-blk
and virtio-scsi in the next revision.
Stefan
signature.asc
Description: PGP signature