[dpdk-dev] [PATCH v2 0/5] Optimize memcpy for AVX512 platforms
> -Original Message- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Wednesday, January 27, 2016 11:24 PM > To: Wang, Zhihong > Cc: dev at dpdk.org; Ravi Kerur > Subject: Re: [dpdk-dev] [PATCH v2 0/5] Optimize memcpy for AVX512 platforms > > 2016-01-17 22:05, Zhihong Wang: > > This patch set optimizes DPDK memcpy for AVX512 platforms, to make full > > utilization of hardware resources and deliver high performance. > > On a related note, your expertise would be very valuable to review > these patches please: > (memcpy) http://dpdk.org/dev/patchwork/patch/4396/ > (memcmp) http://dpdk.org/dev/patchwork/patch/4788/ Will do, thanks. > > Thanks
[dpdk-dev] [PATCH v2 0/5] Optimize memcpy for AVX512 platforms
2016-01-27 18:48, Ananyev, Konstantin: > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > > > > > Zhihong Wang (5): > > > lib/librte_eal: Identify AVX512 CPU flag > > > mk: Predefine AVX512 macro for compiler > > > lib/librte_eal: Optimize memcpy for AVX512 platforms > > > app/test: Adjust alignment unit for memcpy perf test > > > lib/librte_eal: Tune memcpy for prior platforms > > > > > > app/test/test_memcpy_perf.c| 6 + > > > .../common/include/arch/x86/rte_cpuflags.h | 2 + > > > .../common/include/arch/x86/rte_memcpy.h | 269 > > > - > > > mk/rte.cpuflags.mk | 4 + > > > 4 files changed, 268 insertions(+), 13 deletions(-) > > > > The maintainers of arch/x86 are Bruce and Konstantin. > > I guess there is no comment and we can apply this cool series? > > Yes, looks ok to me. Applied, thanks Some benchmark feedbacks would be welcome.
[dpdk-dev] [PATCH v2 0/5] Optimize memcpy for AVX512 platforms
Hi Thomas, > -Original Message- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Wednesday, January 27, 2016 3:31 PM > To: Richardson, Bruce; Ananyev, Konstantin > Cc: dev at dpdk.org; Wang, Zhihong > Subject: Re: [dpdk-dev] [PATCH v2 0/5] Optimize memcpy for AVX512 platforms > > > Zhihong Wang (5): > > lib/librte_eal: Identify AVX512 CPU flag > > mk: Predefine AVX512 macro for compiler > > lib/librte_eal: Optimize memcpy for AVX512 platforms > > app/test: Adjust alignment unit for memcpy perf test > > lib/librte_eal: Tune memcpy for prior platforms > > > > app/test/test_memcpy_perf.c| 6 + > > .../common/include/arch/x86/rte_cpuflags.h | 2 + > > .../common/include/arch/x86/rte_memcpy.h | 269 > > - > > mk/rte.cpuflags.mk | 4 + > > 4 files changed, 268 insertions(+), 13 deletions(-) > > The maintainers of arch/x86 are Bruce and Konstantin. > I guess there is no comment and we can apply this cool series? Yes, looks ok to me. Konstantin
[dpdk-dev] [PATCH v2 0/5] Optimize memcpy for AVX512 platforms
> Zhihong Wang (5): > lib/librte_eal: Identify AVX512 CPU flag > mk: Predefine AVX512 macro for compiler > lib/librte_eal: Optimize memcpy for AVX512 platforms > app/test: Adjust alignment unit for memcpy perf test > lib/librte_eal: Tune memcpy for prior platforms > > app/test/test_memcpy_perf.c| 6 + > .../common/include/arch/x86/rte_cpuflags.h | 2 + > .../common/include/arch/x86/rte_memcpy.h | 269 > - > mk/rte.cpuflags.mk | 4 + > 4 files changed, 268 insertions(+), 13 deletions(-) The maintainers of arch/x86 are Bruce and Konstantin. I guess there is no comment and we can apply this cool series?
[dpdk-dev] [PATCH v2 0/5] Optimize memcpy for AVX512 platforms
2016-01-17 22:05, Zhihong Wang: > This patch set optimizes DPDK memcpy for AVX512 platforms, to make full > utilization of hardware resources and deliver high performance. On a related note, your expertise would be very valuable to review these patches please: (memcpy) http://dpdk.org/dev/patchwork/patch/4396/ (memcmp) http://dpdk.org/dev/patchwork/patch/4788/ Thanks
[dpdk-dev] [PATCH v2 0/5] Optimize memcpy for AVX512 platforms
> -Original Message- > From: Stephen Hemminger [mailto:stephen at networkplumber.org] > Sent: Tuesday, January 19, 2016 4:06 AM > To: Wang, Zhihong > Cc: dev at dpdk.org; Ananyev, Konstantin ; > Richardson, Bruce ; Xie, Huawei > > Subject: Re: [PATCH v2 0/5] Optimize memcpy for AVX512 platforms > > On Sun, 17 Jan 2016 22:05:09 -0500 > Zhihong Wang wrote: > > > This patch set optimizes DPDK memcpy for AVX512 platforms, to make full > > utilization of hardware resources and deliver high performance. > > > > In current DPDK, memcpy holds a large proportion of execution time in > > libs like Vhost, especially for large packets, and this patch can bring > > considerable benefits. > > > > The implementation is based on the current DPDK memcpy framework, some > > background introduction can be found in these threads: > > http://dpdk.org/ml/archives/dev/2014-November/008158.html > > http://dpdk.org/ml/archives/dev/2015-January/011800.html > > > > Code changes are: > > > > 1. Read CPUID to check if AVX512 is supported by CPU > > > > 2. Predefine AVX512 macro if AVX512 is enabled by compiler > > > > 3. Implement AVX512 memcpy and choose the right implementation based > on > > predefined macros > > > > 4. Decide alignment unit for memcpy perf test based on predefined macros > > Cool, I like it. How much impact does this have on VHOST? The impact is significant especially for enqueue (Detailed numbers might not be appropriate here due to policy :-), only how I test it), because VHOST actually spends a lot of time doing memcpy. Simply measure 1024B RX/TX time cost and compare it with 64B's and you'll get a sense of it, although not precise. My test cases include NIC2VM2NIC and VM2VM scenarios, which are the main use cases currently, and use both throughput and RX/TX cycles for evaluation.
[dpdk-dev] [PATCH v2 0/5] Optimize memcpy for AVX512 platforms
On Sun, 17 Jan 2016 22:05:09 -0500 Zhihong Wang wrote: > This patch set optimizes DPDK memcpy for AVX512 platforms, to make full > utilization of hardware resources and deliver high performance. > > In current DPDK, memcpy holds a large proportion of execution time in > libs like Vhost, especially for large packets, and this patch can bring > considerable benefits. > > The implementation is based on the current DPDK memcpy framework, some > background introduction can be found in these threads: > http://dpdk.org/ml/archives/dev/2014-November/008158.html > http://dpdk.org/ml/archives/dev/2015-January/011800.html > > Code changes are: > > 1. Read CPUID to check if AVX512 is supported by CPU > > 2. Predefine AVX512 macro if AVX512 is enabled by compiler > > 3. Implement AVX512 memcpy and choose the right implementation based on > predefined macros > > 4. Decide alignment unit for memcpy perf test based on predefined macros Cool, I like it. How much impact does this have on VHOST?
[dpdk-dev] [PATCH v2 0/5] Optimize memcpy for AVX512 platforms
This patch set optimizes DPDK memcpy for AVX512 platforms, to make full utilization of hardware resources and deliver high performance. In current DPDK, memcpy holds a large proportion of execution time in libs like Vhost, especially for large packets, and this patch can bring considerable benefits. The implementation is based on the current DPDK memcpy framework, some background introduction can be found in these threads: http://dpdk.org/ml/archives/dev/2014-November/008158.html http://dpdk.org/ml/archives/dev/2015-January/011800.html Code changes are: 1. Read CPUID to check if AVX512 is supported by CPU 2. Predefine AVX512 macro if AVX512 is enabled by compiler 3. Implement AVX512 memcpy and choose the right implementation based on predefined macros 4. Decide alignment unit for memcpy perf test based on predefined macros -- Changes in v2: 1. Tune performance for prior platforms Zhihong Wang (5): lib/librte_eal: Identify AVX512 CPU flag mk: Predefine AVX512 macro for compiler lib/librte_eal: Optimize memcpy for AVX512 platforms app/test: Adjust alignment unit for memcpy perf test lib/librte_eal: Tune memcpy for prior platforms app/test/test_memcpy_perf.c| 6 + .../common/include/arch/x86/rte_cpuflags.h | 2 + .../common/include/arch/x86/rte_memcpy.h | 269 - mk/rte.cpuflags.mk | 4 + 4 files changed, 268 insertions(+), 13 deletions(-) -- 2.5.0