v3: Small tweak on "cmd" in 1/2 and "sreq" in 2/2. Zeroing is relatively expensive since we have big request structures. VirtQueueElement (>48k!) and sense_buf (256 bytes) are two points to look at.
This visibly reduces overhead of request handling when testing with the unmerged "null" driver and virtio-scsi dataplane. Before, the issue is very obvious with perf top: perf top -G -p `pidof qemu-system-x86_64` ----------------------------------------- + 16.50% libc-2.17.so [.] __memset_sse2 + 2.28% libc-2.17.so [.] _int_malloc + 2.25% [vdso] [.] 0x0000000000000cd1 + 2.02% [kernel] [k] _raw_spin_lock_irqsave + 1.97% libpthread-2.17.so [.] pthread_mutex_lock + 1.87% libpthread-2.17.so [.] pthread_mutex_unlock + 1.81% [kernel] [k] fget_light + 1.70% libc-2.17.so [.] malloc After, the high __memset_sse2 and _int_malloc is gone: perf top -G -p `pidof qemu-system-x86_64` ----------------------------------------- + 4.20% [kernel] [k] vcpu_enter_guest + 3.97% [kernel] [k] vmx_vcpu_run + 2.63% [kernel] [k] _raw_spin_lock_irqsave + 1.72% [kernel] [k] native_read_msr_safe + 1.65% [kernel] [k] __srcu_read_lock + 1.64% [kernel] [k] _raw_spin_unlock_irqrestore + 1.57% [vdso] [.] 0x00000000000008d8 + 1.49% libc-2.17.so [.] _int_malloc + 1.29% libpthread-2.17.so [.] pthread_mutex_unlock + 1.26% [kernel] [k] native_write_msr_safe See the commit message of patch 2 for some fio test data. Thanks, Fam Fam Zheng (2): scsi: Optimize scsi_req_alloc virtio-scsi: Optimize virtio_scsi_init_req hw/scsi/scsi-bus.c | 8 +++++--- hw/scsi/virtio-scsi.c | 24 +++++++++++++++++------- include/hw/scsi/scsi.h | 21 ++++++++++++++------- 3 files changed, 36 insertions(+), 17 deletions(-) -- 1.9.3