On 5/17/19 3:04 PM, David Marchand wrote:
On Fri, May 17, 2019 at 2:23 PM Maxime Coquelin <[email protected] <mailto:[email protected]>> wrote:Some OVS-DPDK PVP benchmarks show a performance drop when switching from DPDK v17.11 to v18.11. With the addition of packed ring layout support, rte_vhost_enqueue_burst and rte_vhost_dequeue_burst became very large, and only a part of the instructions are executed (either packed or split ring used). This series aims at improving the I-cache pressure, first by un-inlining split and packed rings, but also by moving parts considered as cold in dedicated functions (dirty page logging, fragmented descriptors buffer management added for CVE-2018-1059). With the series applied, size of the enqueue and dequeue split paths is reduced significantly: +---------+--------------------+---------------------+ | Version | Enqueue split path | Dequeue split path | +---------+--------------------+---------------------+ | v19.05 | 16461B | 25521B | | +series | 7286B | 11285B | +---------+--------------------+---------------------+ Using perf tool to monitor iTLB-load-misses event while doing PVP benchmark with testpmd as vswitch, we can see the number of iTLB misses being reduced: - v19.05: # perf stat --repeat 10 -C 2,3 -e iTLB-load-miss -- sleep 10 Performance counter stats for 'CPU(s) 2,3' (10 runs):2,438 iTLB-load-miss ( +- 13.43% )10.00058928 +- 0.00000336 seconds time elapsed ( +- 0.00% ) - +series: # perf stat --repeat 10 -C 2,3 -e iTLB-load-miss -- sleep 10 Performance counter stats for 'CPU(s) 2,3' (10 runs):55 iTLB-load-miss ( +- 10.08% )10.00059466 +- 0.00000283 seconds time elapsed ( +- 0.00% ) The series also force the inlining of some rte_memcpy helpers, as by adding packed ring support, some of them were not more inlined but embedded as functions in the virtio_net object file, which was not expected. Finally, the series simplifies the descriptors buffers prefetching, by doing it in the recently introduced descriptor buffer mapping function. Maxime Coquelin (4): vhost: un-inline dirty pages logging functions vhost: do not inline packed and split functions vhost: do not inline unlikely fragmented buffers code vhost: simplify descriptor's buffer prefetching root (1): eal/x86: force inlining of all memcpy and mov helpers root ? "oops" :-)
Indeed... Oops!
-- David Marchand

