Hanna and Fiona encountered a bug in aio_set_fd_handler(): there is no matching io_poll_end() call upon removing an AioHandler when io_poll_begin() was previously called. The missing io_poll_end() call leaves virtqueue notifications disabled and the virtqueue's ioeventfd will never become readable anymore.
The details of how virtio-scsi devices using IOThreads can hang after hotplug/unplug are covered here: https://issues.redhat.com/browse/RHEL-3934 Hanna is currently away over the December holidays. I'm sending these RFC patches in the meantime. They demonstrate running aio_set_fd_handler() in the AioContext home thread and adding the missing io_poll_end() call. The downside to my approach is that aio_set_fd_handler() becomes a synchronization point that waits for the remote AioContext thread to finish running a BH. Synchronization points are prone to deadlocks if the caller invokes them while holding a lock that the remote AioContext needs to make progress or if the remote AioContext cannot make progress before we make progress in our own event loop. To minimize these concerns I have based this patch series on my AioContext lock removal series and only allow the main loop thread to call aio_set_fd_handler() on other threads (which I think is already the convention today). Another concern is that aio_set_fd_handler() now invokes user-provided io_poll_end(), io_poll(), and io_poll_ready() functions. The io_poll_ready() callback might contain a nested aio_poll() call, so there is a new place where nested event loops can occur and hence a new re-entrant code path that I haven't thought about yet. But there you have it. Please let me know what you think and try your reproducers to see if this fixes the missing io_poll_end() issue. Thanks! Alternatives welcome! (A cleaner version of this approach might be to forbid cross-thread aio_set_fd_handler() calls and to refactor all aio_set_fd_handler() callers so they come from the AioContext's home thread. I'm starting to think that only the aio_notify() and aio_schedule_bh() APIs should be thread-safe.) Stefan Hajnoczi (3): aio-posix: run aio_set_fd_handler() in target AioContext aio: use counter instead of ctx->list_lock aio-posix: call ->poll_end() when removing AioHandler include/block/aio.h | 22 ++--- util/aio-posix.c | 197 ++++++++++++++++++++++++++++++++------------ util/async.c | 2 - util/fdmon-epoll.c | 6 +- 4 files changed, 152 insertions(+), 75 deletions(-) -- 2.43.0