[dpdk-dev] [PATCH v2 0/5] Optimize memcpy for AVX512 platforms

2016-01-28 Thread Wang, Zhihong


> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Wednesday, January 27, 2016 11:24 PM
> To: Wang, Zhihong 
> Cc: dev at dpdk.org; Ravi Kerur 
> Subject: Re: [dpdk-dev] [PATCH v2 0/5] Optimize memcpy for AVX512 platforms
> 
> 2016-01-17 22:05, Zhihong Wang:
> > This patch set optimizes DPDK memcpy for AVX512 platforms, to make full
> > utilization of hardware resources and deliver high performance.
> 
> On a related note, your expertise would be very valuable to review
> these patches please:
> (memcpy) http://dpdk.org/dev/patchwork/patch/4396/
> (memcmp) http://dpdk.org/dev/patchwork/patch/4788/

Will do, thanks.

> 
> Thanks


[dpdk-dev] [PATCH v2 0/5] Optimize memcpy for AVX512 platforms

2016-01-27 Thread Thomas Monjalon
2016-01-27 18:48, Ananyev, Konstantin:
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > 
> > > Zhihong Wang (5):
> > >   lib/librte_eal: Identify AVX512 CPU flag
> > >   mk: Predefine AVX512 macro for compiler
> > >   lib/librte_eal: Optimize memcpy for AVX512 platforms
> > >   app/test: Adjust alignment unit for memcpy perf test
> > >   lib/librte_eal: Tune memcpy for prior platforms
> > >
> > >  app/test/test_memcpy_perf.c|   6 +
> > >  .../common/include/arch/x86/rte_cpuflags.h |   2 +
> > >  .../common/include/arch/x86/rte_memcpy.h   | 269 
> > > -
> > >  mk/rte.cpuflags.mk |   4 +
> > >  4 files changed, 268 insertions(+), 13 deletions(-)
> > 
> > The maintainers of arch/x86 are Bruce and Konstantin.
> > I guess there is no comment and we can apply this cool series?
> 
> Yes, looks ok to me.

Applied, thanks

Some benchmark feedbacks would be welcome.


[dpdk-dev] [PATCH v2 0/5] Optimize memcpy for AVX512 platforms

2016-01-27 Thread Ananyev, Konstantin
Hi Thomas,

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Wednesday, January 27, 2016 3:31 PM
> To: Richardson, Bruce; Ananyev, Konstantin
> Cc: dev at dpdk.org; Wang, Zhihong
> Subject: Re: [dpdk-dev] [PATCH v2 0/5] Optimize memcpy for AVX512 platforms
> 
> > Zhihong Wang (5):
> >   lib/librte_eal: Identify AVX512 CPU flag
> >   mk: Predefine AVX512 macro for compiler
> >   lib/librte_eal: Optimize memcpy for AVX512 platforms
> >   app/test: Adjust alignment unit for memcpy perf test
> >   lib/librte_eal: Tune memcpy for prior platforms
> >
> >  app/test/test_memcpy_perf.c|   6 +
> >  .../common/include/arch/x86/rte_cpuflags.h |   2 +
> >  .../common/include/arch/x86/rte_memcpy.h   | 269 
> > -
> >  mk/rte.cpuflags.mk |   4 +
> >  4 files changed, 268 insertions(+), 13 deletions(-)
> 
> The maintainers of arch/x86 are Bruce and Konstantin.
> I guess there is no comment and we can apply this cool series?

Yes, looks ok to me.
Konstantin


[dpdk-dev] [PATCH v2 0/5] Optimize memcpy for AVX512 platforms

2016-01-27 Thread Thomas Monjalon
> Zhihong Wang (5):
>   lib/librte_eal: Identify AVX512 CPU flag
>   mk: Predefine AVX512 macro for compiler
>   lib/librte_eal: Optimize memcpy for AVX512 platforms
>   app/test: Adjust alignment unit for memcpy perf test
>   lib/librte_eal: Tune memcpy for prior platforms
> 
>  app/test/test_memcpy_perf.c|   6 +
>  .../common/include/arch/x86/rte_cpuflags.h |   2 +
>  .../common/include/arch/x86/rte_memcpy.h   | 269 
> -
>  mk/rte.cpuflags.mk |   4 +
>  4 files changed, 268 insertions(+), 13 deletions(-)

The maintainers of arch/x86 are Bruce and Konstantin.
I guess there is no comment and we can apply this cool series?


[dpdk-dev] [PATCH v2 0/5] Optimize memcpy for AVX512 platforms

2016-01-27 Thread Thomas Monjalon
2016-01-17 22:05, Zhihong Wang:
> This patch set optimizes DPDK memcpy for AVX512 platforms, to make full
> utilization of hardware resources and deliver high performance.

On a related note, your expertise would be very valuable to review
these patches please:
(memcpy) http://dpdk.org/dev/patchwork/patch/4396/
(memcmp) http://dpdk.org/dev/patchwork/patch/4788/

Thanks


[dpdk-dev] [PATCH v2 0/5] Optimize memcpy for AVX512 platforms

2016-01-19 Thread Wang, Zhihong
> -Original Message-
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Tuesday, January 19, 2016 4:06 AM
> To: Wang, Zhihong 
> Cc: dev at dpdk.org; Ananyev, Konstantin ;
> Richardson, Bruce ; Xie, Huawei
> 
> Subject: Re: [PATCH v2 0/5] Optimize memcpy for AVX512 platforms
> 
> On Sun, 17 Jan 2016 22:05:09 -0500
> Zhihong Wang  wrote:
> 
> > This patch set optimizes DPDK memcpy for AVX512 platforms, to make full
> > utilization of hardware resources and deliver high performance.
> >
> > In current DPDK, memcpy holds a large proportion of execution time in
> > libs like Vhost, especially for large packets, and this patch can bring
> > considerable benefits.
> >
> > The implementation is based on the current DPDK memcpy framework, some
> > background introduction can be found in these threads:
> > http://dpdk.org/ml/archives/dev/2014-November/008158.html
> > http://dpdk.org/ml/archives/dev/2015-January/011800.html
> >
> > Code changes are:
> >
> >   1. Read CPUID to check if AVX512 is supported by CPU
> >
> >   2. Predefine AVX512 macro if AVX512 is enabled by compiler
> >
> >   3. Implement AVX512 memcpy and choose the right implementation based
> on
> >  predefined macros
> >
> >   4. Decide alignment unit for memcpy perf test based on predefined macros
> 
> Cool, I like it. How much impact does this have on VHOST?

The impact is significant especially for enqueue (Detailed numbers might not
be appropriate here due to policy :-), only how I test it), because VHOST 
actually
spends a lot of time doing memcpy. Simply measure 1024B RX/TX time cost and
compare it with 64B's and you'll get a sense of it, although not precise.

My test cases include NIC2VM2NIC and VM2VM scenarios, which are the main
use cases currently, and use both throughput and RX/TX cycles for evaluation.



[dpdk-dev] [PATCH v2 0/5] Optimize memcpy for AVX512 platforms

2016-01-18 Thread Stephen Hemminger
On Sun, 17 Jan 2016 22:05:09 -0500
Zhihong Wang  wrote:

> This patch set optimizes DPDK memcpy for AVX512 platforms, to make full
> utilization of hardware resources and deliver high performance.
> 
> In current DPDK, memcpy holds a large proportion of execution time in
> libs like Vhost, especially for large packets, and this patch can bring
> considerable benefits.
> 
> The implementation is based on the current DPDK memcpy framework, some
> background introduction can be found in these threads:
> http://dpdk.org/ml/archives/dev/2014-November/008158.html
> http://dpdk.org/ml/archives/dev/2015-January/011800.html
> 
> Code changes are:
> 
>   1. Read CPUID to check if AVX512 is supported by CPU
> 
>   2. Predefine AVX512 macro if AVX512 is enabled by compiler
> 
>   3. Implement AVX512 memcpy and choose the right implementation based on
>  predefined macros
> 
>   4. Decide alignment unit for memcpy perf test based on predefined macros

Cool, I like it. How much impact does this have on VHOST?


[dpdk-dev] [PATCH v2 0/5] Optimize memcpy for AVX512 platforms

2016-01-17 Thread Zhihong Wang
This patch set optimizes DPDK memcpy for AVX512 platforms, to make full
utilization of hardware resources and deliver high performance.

In current DPDK, memcpy holds a large proportion of execution time in
libs like Vhost, especially for large packets, and this patch can bring
considerable benefits.

The implementation is based on the current DPDK memcpy framework, some
background introduction can be found in these threads:
http://dpdk.org/ml/archives/dev/2014-November/008158.html
http://dpdk.org/ml/archives/dev/2015-January/011800.html

Code changes are:

  1. Read CPUID to check if AVX512 is supported by CPU

  2. Predefine AVX512 macro if AVX512 is enabled by compiler

  3. Implement AVX512 memcpy and choose the right implementation based on
 predefined macros

  4. Decide alignment unit for memcpy perf test based on predefined macros

--
Changes in v2:

  1. Tune performance for prior platforms

Zhihong Wang (5):
  lib/librte_eal: Identify AVX512 CPU flag
  mk: Predefine AVX512 macro for compiler
  lib/librte_eal: Optimize memcpy for AVX512 platforms
  app/test: Adjust alignment unit for memcpy perf test
  lib/librte_eal: Tune memcpy for prior platforms

 app/test/test_memcpy_perf.c|   6 +
 .../common/include/arch/x86/rte_cpuflags.h |   2 +
 .../common/include/arch/x86/rte_memcpy.h   | 269 -
 mk/rte.cpuflags.mk |   4 +
 4 files changed, 268 insertions(+), 13 deletions(-)

-- 
2.5.0