On Wed, Dec 13, 2023 at 10:15 PM Stefan Hajnoczi <stefa...@redhat.com> wrote:
> Alternatives welcome! (A cleaner version of this approach might be to forbid
> cross-thread aio_set_fd_handler() calls and to refactor all
> aio_set_fd_handler() callers so they come from the AioContext's home thread.
> I'm starting to think that only the aio_notify() and aio_schedule_bh() APIs
> should be thread-safe.)

I think that's pretty hard because aio_set_fd_handler() is a pretty
important part of the handoff from one AioContext to another and also
of drained_begin()/end(), and both of these things run in the main
thread.

Regarding how to solve this issue, there is a lot of
"underdocumenting" of the locking policy in aio-posix.c, and indeed it
makes running aio_set_fd_handler() in the target AioContext tempting;
but it is also scary to rely on the iothread being able to react
quickly. I'm also worried that we're changing the logic just because
we don't understand the old one, but then we add technical debt.

So, as a first step, I would take inspiration from the block layer
locking work, and add assertions to functions like poll_set_started()
or find_aio_handler(). Is the list_lock elevated (nonzero)? Or locked?
Are we in the iothread? And likewise, for each list, does insertion
happen from the iothread or with the list_lock taken (and possibly
elevated)? Does removal happen from the iothread or with list_lock
zero+taken?

After this step,  we should have a clearer idea of the possible states
of the node (based on the lists, the state is a subset of
{poll_started, deleted, ready}) and draw a nice graph of the
transitions. We should also understand if any calls to
QLIST_IS_INSERTED() have correctness issues.

Good news, I don't think any memory barriers are needed here. One
thing that we already do correctly is that, once a node is deleted, we
try to skip work; see for example poll_set_started(). This also
provides a good place to do cleanup work for deleted nodes, including
calling poll_end(): aio_free_deleted_handlers(), because it runs with
list_lock zero and taken, just like the tail of
aio_remove_fd_handler(). It's the safest possible place to do cleanup
and to take a lock. Therefore we have:

- a fast path in the iothread that runs without any concurrence with
stuff happening in the main thread

- a slow path in the iothread that runs with list_lock zero and taken.
The slow path shares logic with the main thread, meaning that
aio_free_deleted_handlers() and aio_remove_fd_handler() should share
some functions called by both.

If the code is organized this way, any wrong bits should jump out more
easily. For example, these two lines in aio_remove_fd_handler() are
clearly misplaced

    node->pfd.revents = 0;
    node->poll_ready = false;

because they run in the main thread but they touch iothread data! They
should be after qemu_lockcnt_count() is checked to be zero.

Regarding the call to io_poll_ready(), I would hope that it is
unnecessary; in other words, that after drained_end() the virtqueue
notification would be raised. Yes, virtio_queue_set_notification is
edge triggered rather than level triggered, so it would be necessary
to add a check with virtio_queue_host_notifier_aio_poll() and
virtio_queue_host_notifier_aio_poll_ready() in
virtio_queue_aio_attach_host_notifier, but that does not seem too bad
because virtio is the only user of the io_poll_begin and io_poll_end
callbacks. It would have to be documented though.

Paolo


Paolo

>
> Stefan Hajnoczi (3):
>   aio-posix: run aio_set_fd_handler() in target AioContext
>   aio: use counter instead of ctx->list_lock
>   aio-posix: call ->poll_end() when removing AioHandler
>
>  include/block/aio.h |  22 ++---
>  util/aio-posix.c    | 197 ++++++++++++++++++++++++++++++++------------
>  util/async.c        |   2 -
>  util/fdmon-epoll.c  |   6 +-
>  4 files changed, 152 insertions(+), 75 deletions(-)
>
> --
> 2.43.0
>


Reply via email to