On 5/24/22 12:36, Kevin Wolf wrote:
* IO_OR_GS_CODE() functions, when called from coroutine context, expect to
be called with the AioContext lock taken (example: bdrv_co_yield_to_drain)

Am I right to say this is not inherently part of the definition of
IO_OR_GS_CODE(), but just a property that these functions have in
practice? In practice, the functions that are IO_OR_GS_CODE()
are those that call AIO_WAIT_WHILE().

Yes.

Using a different code path means that the restrictions from
AIO_WAIT_WHILE() don't really apply any more and these functions become
effectively IO_CODE() when called in a coroutine. (At least I'm not
aware of any other piece of code apart from AIO_WAIT_WHILE() that makes
a function IO_OR_GS_CODE().)

The second point is correct.

The first depends on the definition of the coroutine path. For example, bdrv_co_yield_to_drain() expects to run with bdrv_get_aio_context(bs)'s lock taken. An iothread cannot take another iothread's AioContext lock to avoid AB-BA deadlocks; therefore, bdrv_co_yield_to_drain() can only run in the main thread or in bs's home iothread.

* to call these functions with the lock taken, the code has to run in the
BDS's home iothread.  Attempts to do otherwise results in deadlocks (the
main loop's AIO_WAIT_WHILEs expect progress from the iothread, that cannot
happen without releasing the aiocontext lock)

This problem can't happen in the main thread itself, AIO_WAIT_WHILE() is
safe both in the home thread and the main thread (at least as long as
you lock only once) because it temporarily drops the lock. It has also
become the definition of IO_OR_GS_CODE(): This code has to run in the
home thread or the main thread.

It doesn't work to run bdrv_open_driver() in the home iothread because bdrv_open_driver can change the AioContext of a BDS (causing extreme confusion on what runs where and what AioContext locks have to be taken and released).

So in order to have bdrv_open_driver() run in the main thread, Emanuele added a variant of generated_co_wrapper that waits on the main thread. But it didn't work either, because then AIO_WAIT_WHILE() does not release the BDS's AioContext lock.

When ->bdrv_open() calls bdrv_replace_child_noperm() and it tries to drain, bdrv_co_yield_to_drain() schedules a bottom half on the iothread and yields; the bottom half can never run, because the AioContext lock is already taken elsewhere.

        main thread                             ctx home thread
        aio_context_acquire(ctx)
        bdrv_open()
          drv->bdrv_co_open()
            bdrv_replace_child_noperm()
              bdrv_co_yield_to_drain()
                aio_context_release(ctx)
                aio_schedule_bh_oneshot()

So, bdrv_open_driver() and bdrv_close() are un-coroutin-izable. I guess we could split the callback in two parts, one doing I/O and one doing graph manipulation (it may even be a good idea---the ->bdrv_inactivate() callback even exists already in the case of bdrv_close()---but I'm not sure it's always applicable).

Come to think of it, I believe that many of the functions we declared
IO_CODE() are actually just IO_OR_GS_CODE() (at best; for iothreads,
they certainly require running in the home thread, but the main thread
allowed by IO_OR_GS_CODE() might not work). We have all the coroutine
machinery so that the AioContext lock of the current thread is
automatically released and reacquired across yield. However, this is the
wrong AioContext when called from a different thread, so we need an
additional lock - which should be dropped during yield, too, but it
doesn't happen.

There's no need for additional locks. Drivers have to be protected by individual locks, which are either QemuMutex and dropped during yield, or CoMutex. A QemuMutex used to protect a CoQueue is also dropped safely during yield.

The IO_CODE() distinction is indeed mostly theoretical until the file I/O BlockDriver protocols are protected by the AioContext lock and therefore are actually IO_OR_GS_CODE(). But that's the least of our worries; if file-posix AIO is done on qemu_get_current_aio_context() instead of bdrv_get_aio_context(bs), the AioContext lock is not needed anymore for I/O.

Apart from file I/O, all drivers were thread-safe at some point (now it seems blklogwrites.c is not, but the code also seems not safe against power loss and I wonder if it should be just deleted).

Switching to drain for locking doesn't solve the problem, but only
possibly defer it. In order to complete the multiqueue work, we need
to make IO_CODE() functions actually conform to the definition of
IO_CODE(). Do we have a plan what this should look like in the final
state when all the multiqueue work is completed? Or will we only have
coroutine locks which don't have this problem?

The latter apart from the occasional QemuMutex, which is used only when it does not cause the problem.

* running the code in the BDS's home iothread is not possible for
GLOBAL_STATE_CODE() functions (unless the BDS home iothread is the main
thread, but that cannot be guaranteed in general)

There is nothing that stops GLOBAL_STATE_CODE() from scheduling work in
the home iothread of a BDS and then waiting for it - or if it is a
coroutine, even to reschedule itself into the BDS home thread
temporarily.

There are still things that I/O thread cannot do, for example bdrv_set_aio_context(). Even if it worked, I don't think it would be a good idea. It basically would mean a "movable" main thread. It's even harder to reason on it.

So that's how I got to the point where I don't think it's possible to proceed with the idea of coroutin-izing more code (a prerequisite for a coroutine-based graph rwlock) without first removing the AioContext lock. But removing the AioContext lock requires a way to make the graph operations thread-safe, and then we go back to draining.

Paolo

Reply via email to