Il 16/09/2014 09:20, Fam Zheng ha scritto: > v3: Small tweak on "cmd" in 1/2 and "sreq" in 2/2. > > Zeroing is relatively expensive since we have big request structures. > VirtQueueElement (>48k!) and sense_buf (256 bytes) are two points to look at. > > This visibly reduces overhead of request handling when testing with the > unmerged "null" driver and virtio-scsi dataplane. Before, the issue is very > obvious with perf top: > > perf top -G -p `pidof qemu-system-x86_64` > ----------------------------------------- > + 16.50% libc-2.17.so [.] __memset_sse2 > + 2.28% libc-2.17.so [.] _int_malloc > + 2.25% [vdso] [.] 0x0000000000000cd1 > + 2.02% [kernel] [k] _raw_spin_lock_irqsave > + 1.97% libpthread-2.17.so [.] pthread_mutex_lock > + 1.87% libpthread-2.17.so [.] pthread_mutex_unlock > + 1.81% [kernel] [k] fget_light > + 1.70% libc-2.17.so [.] malloc > > After, the high __memset_sse2 and _int_malloc is gone: > > perf top -G -p `pidof qemu-system-x86_64` > ----------------------------------------- > + 4.20% [kernel] [k] vcpu_enter_guest > + 3.97% [kernel] [k] vmx_vcpu_run > + 2.63% [kernel] [k] _raw_spin_lock_irqsave > + 1.72% [kernel] [k] native_read_msr_safe > + 1.65% [kernel] [k] __srcu_read_lock > + 1.64% [kernel] [k] _raw_spin_unlock_irqrestore > + 1.57% [vdso] [.] 0x00000000000008d8 > + 1.49% libc-2.17.so [.] _int_malloc > + 1.29% libpthread-2.17.so [.] pthread_mutex_unlock > + 1.26% [kernel] [k] native_write_msr_safe > > See the commit message of patch 2 for some fio test data.
Thanks, applied to scsi-next. Paolo