On Mon, 10/17 15:54, Paolo Bonzini wrote: > This patch reorganizes aio_poll callers to establish new rules for > dataplane locking. The idea is that I/O operations on a dataplane > BDS (i.e. one where the AioContext is not the main one) do not call > aio_poll anymore. Instead, they wait for the operation to end in the > other I/O thread, at which point the other I/O thread calls bdrv_wakeup > to wake up the main thread. > > With this change, only one thread runs aio_poll for an AioContext. > While aio_context_acquire/release is still needed to protect the BDSes, > it need not interrupt the other thread's event loop anymore, and therefore > it does not need contention callbacks anymore. Thus the patch can remove > RFifoLock. This fixes possible hangs in bdrv_drain_all, reproducible (for > example) by unplugging a virtio-scsi-dataplane device while there is I/O > going on for a virtio-blk-dataplane on the same I/O thread. > > Patch 1 is a bugfix that I already posted. > > Patch 2 makes blockjobs independent of aio_poll, the reason for which > should be apparent from the explanation above. > > Patch 3 is an independent mirror bugfix, that I wanted to submit separately > but happens to fix a hang in COLO replication. Like patch 1 I believe > it's pre-existing and merely exposed by these patches. > > Patches 4 to 10 introduce the infrastructure to wake up the main thread > while bdrv_drain or other synchronous operations are running. Patches 11 > to 16 do other changes to prepare for this. Notably bdrv_drain_all > needs to be called without holding any AioContext lock, so bdrv_reopen > releases the lock temporarily (and callers of bdrv_reopen needs fixing). > > Patch 17 then does the big change, after which there are just some > cleanups left to do. > > Paolo > > Fam Zheng (1): > qed: Implement .bdrv_drain > > Paolo Bonzini (19): > replication: interrupt failover if the main device is closed > blockjob: introduce .drain callback for jobs > mirror: use bdrv_drained_begin/bdrv_drained_end > block: add BDS field to count in-flight requests > block: change drain to look only at one child at a time > block: introduce BDRV_POLL_WHILE > nfs: move nfs_set_events out of the while loops > nfs: use BDRV_POLL_WHILE > sheepdog: use BDRV_POLL_WHILE > aio: introduce qemu_get_current_aio_context > iothread: detach all block devices before stopping them > replication: pass BlockDriverState to reopen_backing_file > block: prepare bdrv_reopen_multiple to release AioContext > qemu-io: acquire AioContext > qemu-img: call aio_context_acquire/release around block job > block: only call aio_poll on the current thread's AioContext > iothread: release AioContext around aio_poll > qemu-thread: introduce QemuRecMutex > aio: convert from RFifoLock to QemuRecMutex
Modulo the one harmful question, series: Reviewed-by: Fam Zheng <f...@redhat.com>