Il 16/09/2014 09:20, Fam Zheng ha scritto:
> v3: Small tweak on "cmd" in 1/2 and "sreq" in 2/2.
> 
> Zeroing is relatively expensive since we have big request structures.
> VirtQueueElement (>48k!) and sense_buf (256 bytes) are two points to look at.
> 
> This visibly reduces overhead of request handling when testing with the
> unmerged "null" driver and virtio-scsi dataplane. Before, the issue is very
> obvious with perf top:
> 
> perf top -G -p `pidof qemu-system-x86_64`
> -----------------------------------------
> +  16.50%  libc-2.17.so             [.] __memset_sse2
> +   2.28%  libc-2.17.so             [.] _int_malloc
> +   2.25%  [vdso]                   [.] 0x0000000000000cd1
> +   2.02%  [kernel]                 [k] _raw_spin_lock_irqsave
> +   1.97%  libpthread-2.17.so       [.] pthread_mutex_lock
> +   1.87%  libpthread-2.17.so       [.] pthread_mutex_unlock
> +   1.81%  [kernel]                 [k] fget_light
> +   1.70%  libc-2.17.so             [.] malloc
> 
> After, the high __memset_sse2 and _int_malloc is gone:
> 
> perf top -G -p `pidof qemu-system-x86_64`
> -----------------------------------------
> +   4.20%  [kernel]                 [k] vcpu_enter_guest
> +   3.97%  [kernel]                 [k] vmx_vcpu_run
> +   2.63%  [kernel]                 [k] _raw_spin_lock_irqsave
> +   1.72%  [kernel]                 [k] native_read_msr_safe
> +   1.65%  [kernel]                 [k] __srcu_read_lock
> +   1.64%  [kernel]                 [k] _raw_spin_unlock_irqrestore
> +   1.57%  [vdso]                   [.] 0x00000000000008d8
> +   1.49%  libc-2.17.so             [.] _int_malloc
> +   1.29%  libpthread-2.17.so       [.] pthread_mutex_unlock
> +   1.26%  [kernel]                 [k] native_write_msr_safe
> 
> See the commit message of patch 2 for some fio test data.

Thanks, applied to scsi-next.

Paolo


Reply via email to