v6: * Rebased onto QEMU 5.1 and added the now-necessary machine compat opts.
v4: * Sorry for the long delay. I considered replacing this series with a simpler approach. Real hardware ships with a fixed number of queues (e.g. 128). The equivalent can be done in QEMU too. That way we don't need to magically si= ze num_queues. In the end I decided against this approach because the Linux virtio_blk.ko and virtio_scsi.ko guest drivers unconditionally initialized all available queues until recently (it was written with num_queues=3Dnum_vcpus in mind). It doesn't make sense for a 1 CPU guest to bring up 128 virtqueues (waste of resources and possibly weird performance effects with blk-mq). * Honor maximum number of MSI-X vectors and virtqueues [Daniel Berrange] * Update commit descriptions to mention maximum MSI-X vector and virtqueue caps [Raphael] v3: * Introduce virtio_pci_optimal_num_queues() helper to enforce VIRTIO_QUEUE_M= AX in one place * Use VIRTIO_SCSI_VQ_NUM_FIXED constant in all cases [Cornelia] * Update hw/core/machine.c compat properties for QEMU 5.0 [Michael] v3: * Add new performance results that demonstrate the scalability * Mention that this is PCI-specific [Cornelia] v2: * Let the virtio-DEVICE-pci device select num-queues because the optimal multi-queue configuration may differ between virtio-pci, virtio-mmio, and virtio-ccw [Cornelia] Enabling multi-queue on virtio-pci storage devices improves performance on SMP guests because the completion interrupt is handled on the vCPU that submitted the I/O request. This avoids IPIs inside the guest. Note that performance is unchanged in these cases: 1. Uniprocessor guests. They don't have IPIs. 2. Application threads might be scheduled on the sole vCPU that handles completion interrupts purely by chance. (This is one reason why benchmark results can vary noticably between runs.) 3. Users may bind the application to the vCPU that handles completion interrupts. Set the number of queues to the number of vCPUs by default on virtio-blk and virtio-scsi PCI devices. Older machine types continue to default to 1 queue for live migration compatibility. Random read performance: IOPS q=3D1 78k q=3D32 104k +33% Boot time: Duration q=3D1 51s q=3D32 1m41s +98% Guest configuration: 32 vCPUs, 101 virtio-blk-pci disks Previously measured results on a 4 vCPU guest were also positive but showed a smaller 1-4% performance improvement. They are no longer valid because significant event loop optimizations have been merged. Peter Maydell (1): Open 5.2 development tree Stefan Hajnoczi (6): hw: add 5.2 machine types and 5.1 compat options virtio-pci: add virtio_pci_optimal_num_queues() helper virtio-scsi: introduce a constant for fixed virtqueues virtio-scsi-pci: default num_queues to -smp N virtio-blk-pci: default num_queues to -smp N vhost-user-blk-pci: default num_queues to -smp N hw/virtio/virtio-pci.h | 9 +++++++++ include/hw/boards.h | 3 +++ include/hw/i386/pc.h | 3 +++ include/hw/virtio/vhost-user-blk.h | 2 ++ include/hw/virtio/virtio-blk.h | 2 ++ include/hw/virtio/virtio-scsi.h | 5 +++++ hw/arm/virt.c | 9 ++++++++- hw/block/vhost-user-blk.c | 6 +++++- hw/block/virtio-blk.c | 6 +++++- hw/core/machine.c | 9 +++++++++ hw/i386/pc.c | 4 ++++ hw/i386/pc_piix.c | 14 ++++++++++++- hw/i386/pc_q35.c | 13 +++++++++++- hw/ppc/spapr.c | 15 ++++++++++++-- hw/s390x/s390-virtio-ccw.c | 14 ++++++++++++- hw/scsi/vhost-scsi.c | 3 ++- hw/scsi/vhost-user-scsi.c | 5 +++-- hw/scsi/virtio-scsi.c | 13 ++++++++---- hw/virtio/vhost-scsi-pci.c | 9 +++++++-- hw/virtio/vhost-user-blk-pci.c | 4 ++++ hw/virtio/vhost-user-scsi-pci.c | 9 +++++++-- hw/virtio/virtio-blk-pci.c | 7 ++++++- hw/virtio/virtio-pci.c | 32 ++++++++++++++++++++++++++++++ hw/virtio/virtio-scsi-pci.c | 9 +++++++-- VERSION | 2 +- 25 files changed, 184 insertions(+), 23 deletions(-) --=20 2.26.2