Hi, > Subject: [Qemu-devel] [PATCH v2 00/10] virtio/vring: optimization patches > > This includes two optimization of virtio: > > - "slimming down" VirtQueueElements by not including room for > 1024 buffers. This makes malloc much faster. > > - optimizations to limit the number of address_space_translate > calls in virtio.c, from Vincenzo and myself. >
Very nice! After apply those optimizations (patch 8, 9, 10), I got 4MB/sec bonus of speed. Based on my virtio-crypto device benchmark, ase-128-cbc algorithm. I haven't rebase other patches of this patch set yet, maybe I can get other bonus, can I? Before applying those three patches: Testing AES-128-CBC cipher: Encrypting in chunks of 256 bytes: done. 246.84 MiB in 5.02 secs: 49.13 MiB/sec (1011061 packets) Encrypting in chunks of 256 bytes: done. 247.03 MiB in 5.02 secs: 49.16 MiB/sec (1011840 packets) Encrypting in chunks of 256 bytes: done. 246.98 MiB in 5.02 secs: 49.17 MiB/sec (1011636 packets) Encrypting in chunks of 256 bytes: done. 247.14 MiB in 5.02 secs: 49.19 MiB/sec (1012270 packets) Encrypting in chunks of 256 bytes: done. 246.96 MiB in 5.02 secs: 49.16 MiB/sec (1011565 packets) Encrypting in chunks of 256 bytes: done. 246.97 MiB in 5.02 secs: 49.18 MiB/sec (1011594 packets) Encrypting in chunks of 256 bytes: done. 246.89 MiB in 5.02 secs: 49.15 MiB/sec (1011259 packets) Encrypting in chunks of 256 bytes: done. 246.96 MiB in 5.02 secs: 49.15 MiB/sec (1011561 packets) 'Perf top' shows: 23.61% qemu-kvm [.] address_space_translate 14.49% qemu-kvm [.] qemu_get_ram_ptr 4.65% qemu-kvm [.] phys_page_find 4.31% qemu-kvm [.] address_space_translate_internal 3.18% libpthread-2.19.so [.] __pthread_mutex_unlock_usercnt 2.83% qemu-kvm [.] qemu_ram_addr_from_host 2.40% qemu-kvm [.] address_space_map 2.34% libc-2.19.so [.] _int_malloc 2.22% libc-2.19.so [.] _int_free 1.96% libc-2.19.so [.] malloc 1.71% libpthread-2.19.so [.] pthread_mutex_lock 1.40% qemu-kvm [.] find_next_zero_bit 1.38% libc-2.19.so [.] malloc_consolidate 1.31% qemu-kvm [.] lduw_le_phys 1.27% libc-2.19.so [.] __memcpy_sse2_unaligned 1.05% qemu-kvm [.] qemu_get_ram_block 1.05% qemu-kvm [.] object_unref 1.04% qemu-kvm [.] memory_region_get_ram_addr After applying those optimizations: Encrypting in chunks of 256 bytes: done. 267.92 MiB in 5.03 secs: 53.31 MiB/sec (1097399 packets) Encrypting in chunks of 256 bytes: done. 268.05 MiB in 5.02 secs: 53.35 MiB/sec (1097935 packets) Encrypting in chunks of 256 bytes: done. 265.40 MiB in 5.02 secs: 52.82 MiB/sec (1087091 packets) Encrypting in chunks of 256 bytes: done. 263.18 MiB in 5.01 secs: 52.50 MiB/sec (1077999 packets) Encrypting in chunks of 256 bytes: done. 266.85 MiB in 5.01 secs: 53.29 MiB/sec (1093010 packets) Encrypting in chunks of 256 bytes: done. 267.64 MiB in 5.02 secs: 53.28 MiB/sec (1096251 packets) Encrypting in chunks of 256 bytes: done. 267.30 MiB in 5.02 secs: 53.24 MiB/sec (1094861 packets) Encrypting in chunks of 256 bytes: done. 267.29 MiB in 5.02 secs: 53.25 MiB/sec (1094833 packets) 'Perf top' shows: 22.56% qemu-kvm [.] address_space_translate 13.29% qemu-kvm [.] qemu_get_ram_ptr 4.71% qemu-kvm [.] phys_page_find 4.43% qemu-kvm [.] address_space_translate_internal 3.47% libpthread-2.19.so [.] __pthread_mutex_unlock_usercnt 3.08% qemu-kvm [.] qemu_ram_addr_from_host 2.62% qemu-kvm [.] address_space_map 2.61% libc-2.19.so [.] _int_malloc 2.58% libc-2.19.so [.] _int_free 2.38% libc-2.19.so [.] malloc 2.06% libpthread-2.19.so [.] pthread_mutex_lock 1.68% libc-2.19.so [.] malloc_consolidate 1.35% libc-2.19.so [.] __memcpy_sse2_unaligned 1.23% qemu-kvm [.] lduw_le_phys 1.18% qemu-kvm [.] find_next_zero_bit 1.02% qemu-kvm [.] object_unref Regards, -Gonglei