On Thu, 14 Jan 2016 01:13:18 -0500 Zhihong Wang <zhihong.wang at intel.com> wrote:
> This patch set optimizes DPDK memcpy for AVX512 platforms, to make full > utilization of hardware resources and deliver high performance. > > In current DPDK, memcpy holds a large proportion of execution time in > libs like Vhost, especially for large packets, and this patch can bring > considerable benefits. > > The implementation is based on the current DPDK memcpy framework, some > background introduction can be found in these threads: > http://dpdk.org/ml/archives/dev/2014-November/008158.html > http://dpdk.org/ml/archives/dev/2015-January/011800.html > > Code changes are: > > 1. Read CPUID to check if AVX512 is supported by CPU > > 2. Predefine AVX512 macro if AVX512 is enabled by compiler > > 3. Implement AVX512 memcpy and choose the right implementation based on > predefined macros > > 4. Decide alignment unit for memcpy perf test based on predefined macros > > Zhihong Wang (4): > lib/librte_eal: Identify AVX512 CPU flag > mk: Predefine AVX512 macro for compiler > lib/librte_eal: Optimize memcpy for AVX512 platforms > app/test: Adjust alignment unit for memcpy perf test > > app/test/test_memcpy_perf.c | 6 + > .../common/include/arch/x86/rte_cpuflags.h | 2 + > .../common/include/arch/x86/rte_memcpy.h | 247 > ++++++++++++++++++++- > mk/rte.cpuflags.mk | 4 + > 4 files changed, 255 insertions(+), 4 deletions(-) > This really looks like code that could benefit from Gcc function multiversioning. The current cpuflags model is useless/flawed in real product deployment