Now that virtio-scsi-pci and virtio-blk-pci map 1 virtqueue per vCPU, a serious slow down may be observed on setups with a big enough number of vCPUs.
Exemple with a pseries guest on a bi-POWER9 socket system (128 HW threads): 1 0m20.922s 0m21.346s 2 0m21.230s 0m20.350s 4 0m21.761s 0m20.997s 8 0m22.770s 0m20.051s 16 0m22.038s 0m19.994s 32 0m22.928s 0m20.803s 64 0m26.583s 0m22.953s 128 0m41.273s 0m32.333s 256 2m4.727s 1m16.924s 384 6m5.563s 3m26.186s Both perf and gprof indicate that QEMU is hogging CPUs when setting up the ioeventfds: 67.88% swapper [kernel.kallsyms] [k] power_pmu_enable 9.47% qemu-kvm [kernel.kallsyms] [k] smp_call_function_single 8.64% qemu-kvm [kernel.kallsyms] [k] power_pmu_enable =>2.79% qemu-kvm qemu-kvm [.] memory_region_ioeventfd_before =>2.12% qemu-kvm qemu-kvm [.] address_space_update_ioeventfds 0.56% kworker/8:0-mm [kernel.kallsyms] [k] smp_call_function_single address_space_update_ioeventfds() is called when committing an MR transaction, i.e. for each ioeventfd with the current code base, and it internally loops on all ioventfds: static void address_space_update_ioeventfds(AddressSpace *as) { [...] FOR_EACH_FLAT_RANGE(fr, view) { for (i = 0; i < fr->mr->ioeventfd_nb; ++i) { This means that the setup of ioeventfds for these devices has quadratic time complexity. This series introduce generic APIs to allow batch creation and deletion of ioeventfds, and converts virtio-blk and virtio-scsi to use them. This greatly improves the numbers: 1 0m21.271s 0m22.076s 2 0m20.912s 0m19.716s 4 0m20.508s 0m19.310s 8 0m21.374s 0m20.273s 16 0m21.559s 0m21.374s 32 0m22.532s 0m21.271s 64 0m26.550s 0m22.007s 128 0m29.115s 0m27.446s 256 0m44.752s 0m41.004s 384 1m2.884s 0m58.023s The series deliberately spans over multiple subsystems for easier review and experimenting. It also does some preliminary fixes on the way. It is thus posted as an RFC for now, but if the general idea is acceptable, I guess a non-RFC could be posted and maybe extend the feature to some other devices that might suffer from similar scaling issues, e.g. vhost-scsi-pci, vhost-user-scsi-pci and vhost-user-blk-pci, even if I haven't checked. This should fix https://bugzilla.redhat.com/show_bug.cgi?id=1927108 which reported the issue for virtio-scsi-pci. Greg Kurz (8): memory: Allow eventfd add/del without starting a transaction virtio: Introduce virtio_bus_set_host_notifiers() virtio: Add API to batch set host notifiers virtio-pci: Batch add/del ioeventfds in a single MR transaction virtio-blk: Fix rollback path in virtio_blk_data_plane_start() virtio-blk: Use virtio_bus_set_host_notifiers() virtio-scsi: Set host notifiers and callbacks separately virtio-scsi: Use virtio_bus_set_host_notifiers() hw/virtio/virtio-pci.h | 1 + include/exec/memory.h | 48 ++++++++++++++++------ include/hw/virtio/virtio-bus.h | 7 ++++ hw/block/dataplane/virtio-blk.c | 26 +++++------- hw/scsi/virtio-scsi-dataplane.c | 68 ++++++++++++++++++-------------- hw/virtio/virtio-bus.c | 70 +++++++++++++++++++++++++++++++++ hw/virtio/virtio-pci.c | 53 +++++++++++++++++-------- softmmu/memory.c | 42 ++++++++++++-------- 8 files changed, 225 insertions(+), 90 deletions(-) -- 2.26.3