[dpdk-dev] [PATCH v3 4/4] eal/arm: introduce CONFIG_RTE_ARCH_ARM_NEON_MEMCPY

2016-03-21 Thread Jan Viktorin
On Mon, 21 Mar 2016 06:24:37 -0700 (PDT)
Thomas Monjalon  wrote:

> 2016-03-21 13:21, Jan Viktorin:
> > On Mon, 21 Mar 2016 13:42:31 +0800
> > Jianbo Liu  wrote:
> >   
> > > On 20 March 2016 at 03:58, Jan Viktorin  
> > > wrote:  
> > > > The flag is used to enable memcpy optimizations in EAL. As it is not 
> > > > always
> > > > the performance benefit, the flag allows to disable it.
> > > >
> > > > Signed-off-by: Jan Viktorin 
> > > > ---
> > > >  config/defconfig_arm-armv7a-linuxapp-gcc   | 1 +
> > > >  lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h | 8 ++--
> > > >  2 files changed, 7 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc 
> > > > b/config/defconfig_arm-armv7a-linuxapp-gcc
> > > > index 96c3343..2c60c2c 100644
> > > > --- a/config/defconfig_arm-armv7a-linuxapp-gcc
> > > > +++ b/config/defconfig_arm-armv7a-linuxapp-gcc
> > > > @@ -36,6 +36,7 @@ CONFIG_RTE_ARCH="arm"
> > > >  CONFIG_RTE_ARCH_ARM=y
> > > >  CONFIG_RTE_ARCH_ARMv7=y
> > > >  CONFIG_RTE_ARCH_ARM_TUNE="cortex-a9"
> > > > +CONFIG_RTE_ARCH_ARM_NEON_MEMCPY=y
> > > >
> > > If it's not always benefit, why not disable here since it is common
> > > armv7a config, and enable in your or other user's own config file?  
> > 
> > Jianbo, you are right. In that case, I'd just turn it off by default.
> > And when there is a new platform-specific defconfig, it can enable it.
> > 
> > Anyway, I am thinking of adding some comment into the rte_memcpy_32.h
> > file describing the potential of the NEON code. What about:
> > 
> > /* Enable in your defconfig to accelerate memcpy operations. Consider
> >enabling this for Cortex-A15. For Cortex-A7 and Cortex-A9, It might
> >accelerate short data copies (< 64 B). */
> > 
> > Thomas, do you consider this enough?  
> 
> Yes it is perfect.
> Why not put it in defconfig_arm-armv7a-linuxapp-gcc?

So, for now, I leave the patch as is and just add the comment.

Jan


[dpdk-dev] [PATCH v3 4/4] eal/arm: introduce CONFIG_RTE_ARCH_ARM_NEON_MEMCPY

2016-03-21 Thread Jianbo Liu
On 20 March 2016 at 03:58, Jan Viktorin  wrote:
> The flag is used to enable memcpy optimizations in EAL. As it is not always
> the performance benefit, the flag allows to disable it.
>
> Signed-off-by: Jan Viktorin 
> ---
>  config/defconfig_arm-armv7a-linuxapp-gcc   | 1 +
>  lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h | 8 ++--
>  2 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc 
> b/config/defconfig_arm-armv7a-linuxapp-gcc
> index 96c3343..2c60c2c 100644
> --- a/config/defconfig_arm-armv7a-linuxapp-gcc
> +++ b/config/defconfig_arm-armv7a-linuxapp-gcc
> @@ -36,6 +36,7 @@ CONFIG_RTE_ARCH="arm"
>  CONFIG_RTE_ARCH_ARM=y
>  CONFIG_RTE_ARCH_ARMv7=y
>  CONFIG_RTE_ARCH_ARM_TUNE="cortex-a9"
> +CONFIG_RTE_ARCH_ARM_NEON_MEMCPY=y
>
If it's not always benefit, why not disable here since it is common
armv7a config, and enable in your or other user's own config file?

Thanks!
Jianbo


[dpdk-dev] [PATCH v3 4/4] eal/arm: introduce CONFIG_RTE_ARCH_ARM_NEON_MEMCPY

2016-03-21 Thread Jan Viktorin
On Mon, 21 Mar 2016 13:42:31 +0800
Jianbo Liu  wrote:

> On 20 March 2016 at 03:58, Jan Viktorin  wrote:
> > The flag is used to enable memcpy optimizations in EAL. As it is not always
> > the performance benefit, the flag allows to disable it.
> >
> > Signed-off-by: Jan Viktorin 
> > ---
> >  config/defconfig_arm-armv7a-linuxapp-gcc   | 1 +
> >  lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h | 8 ++--
> >  2 files changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc 
> > b/config/defconfig_arm-armv7a-linuxapp-gcc
> > index 96c3343..2c60c2c 100644
> > --- a/config/defconfig_arm-armv7a-linuxapp-gcc
> > +++ b/config/defconfig_arm-armv7a-linuxapp-gcc
> > @@ -36,6 +36,7 @@ CONFIG_RTE_ARCH="arm"
> >  CONFIG_RTE_ARCH_ARM=y
> >  CONFIG_RTE_ARCH_ARMv7=y
> >  CONFIG_RTE_ARCH_ARM_TUNE="cortex-a9"
> > +CONFIG_RTE_ARCH_ARM_NEON_MEMCPY=y
> >  
> If it's not always benefit, why not disable here since it is common
> armv7a config, and enable in your or other user's own config file?

Jianbo, you are right. In that case, I'd just turn it off by default.
And when there is a new platform-specific defconfig, it can enable it.

Anyway, I am thinking of adding some comment into the rte_memcpy_32.h
file describing the potential of the NEON code. What about:

/* Enable in your defconfig to accelerate memcpy operations. Consider
   enabling this for Cortex-A15. For Cortex-A7 and Cortex-A9, It might
   accelerate short data copies (< 64 B). */

Thomas, do you consider this enough?

Jan

> 
> Thanks!
> Jianbo



-- 
   Jan Viktorin  E-mail: Viktorin at RehiveTech.com
   System Architect  Web:www.RehiveTech.com
   RehiveTech
   Brno, Czech Republic


[dpdk-dev] [PATCH v3 4/4] eal/arm: introduce CONFIG_RTE_ARCH_ARM_NEON_MEMCPY

2016-03-21 Thread Thomas Monjalon
2016-03-21 13:21, Jan Viktorin:
> On Mon, 21 Mar 2016 13:42:31 +0800
> Jianbo Liu  wrote:
> 
> > On 20 March 2016 at 03:58, Jan Viktorin  wrote:
> > > The flag is used to enable memcpy optimizations in EAL. As it is not 
> > > always
> > > the performance benefit, the flag allows to disable it.
> > >
> > > Signed-off-by: Jan Viktorin 
> > > ---
> > >  config/defconfig_arm-armv7a-linuxapp-gcc   | 1 +
> > >  lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h | 8 ++--
> > >  2 files changed, 7 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc 
> > > b/config/defconfig_arm-armv7a-linuxapp-gcc
> > > index 96c3343..2c60c2c 100644
> > > --- a/config/defconfig_arm-armv7a-linuxapp-gcc
> > > +++ b/config/defconfig_arm-armv7a-linuxapp-gcc
> > > @@ -36,6 +36,7 @@ CONFIG_RTE_ARCH="arm"
> > >  CONFIG_RTE_ARCH_ARM=y
> > >  CONFIG_RTE_ARCH_ARMv7=y
> > >  CONFIG_RTE_ARCH_ARM_TUNE="cortex-a9"
> > > +CONFIG_RTE_ARCH_ARM_NEON_MEMCPY=y
> > >  
> > If it's not always benefit, why not disable here since it is common
> > armv7a config, and enable in your or other user's own config file?
> 
> Jianbo, you are right. In that case, I'd just turn it off by default.
> And when there is a new platform-specific defconfig, it can enable it.
> 
> Anyway, I am thinking of adding some comment into the rte_memcpy_32.h
> file describing the potential of the NEON code. What about:
> 
> /* Enable in your defconfig to accelerate memcpy operations. Consider
>enabling this for Cortex-A15. For Cortex-A7 and Cortex-A9, It might
>accelerate short data copies (< 64 B). */
> 
> Thomas, do you consider this enough?

Yes it is perfect.
Why not put it in defconfig_arm-armv7a-linuxapp-gcc?


[dpdk-dev] [PATCH v3 4/4] eal/arm: introduce CONFIG_RTE_ARCH_ARM_NEON_MEMCPY

2016-03-20 Thread Thomas Monjalon
2016-03-20 10:46, Jan Viktorin:
> On Sun, 20 Mar 2016 10:41:10 +0100
> Jan Viktorin  wrote:
> 
> > On Sat, 19 Mar 2016 21:14:57 +0100
> > Thomas Monjalon  wrote:
> > 
> > > 2016-03-19 20:58, Jan Viktorin:  
> > > > The flag is used to enable memcpy optimizations in EAL. As it is not 
> > > > always
> > > > the performance benefit, the flag allows to disable it.
> > > 
> > > Ideally the default should be to choose the best optimization.
> > > If it is not possible, it would help to have some comments explaining
> > > how to choose wether enabling NEON memcpy or not.
> 
> The related statistics are mentioned here:
> 
> commit 04a2fde35daf5e9a271e72331a70b48b951d7568
> Author: Vlastimil Kosar 
> Date:   Tue Nov 3 00:47:20 2015 +0100
> 
> eal/arm: add vector memcpy for ARMv7
> 
> It's quite difficult to easily summarize it, especially for so many
> CPUs...

If it is difficult for you, it will be  impossible for the users of
this config option.
When someone will ask what is the best value for his CPU, what will
you answer?
At least, we can add a comment explaining that the performance is not
always better, depending of the buffer size and the CPU.


[dpdk-dev] [PATCH v3 4/4] eal/arm: introduce CONFIG_RTE_ARCH_ARM_NEON_MEMCPY

2016-03-20 Thread Thomas Monjalon
2016-03-20 10:41, Jan Viktorin:
> On Sat, 19 Mar 2016 21:14:57 +0100
> Thomas Monjalon  wrote:
> 
> > 2016-03-19 20:58, Jan Viktorin:
> > > The flag is used to enable memcpy optimizations in EAL. As it is not 
> > > always
> > > the performance benefit, the flag allows to disable it.  
> > 
> > Ideally the default should be to choose the best optimization.
> > If it is not possible, it would help to have some comments explaining
> > how to choose wether enabling NEON memcpy or not.
> 
> Ok, we can rename the option to CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY,
> delete it from the defconfig and change the test in rte_memcpy_32.h to
> 
> #ifndef CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY
> 
> Alternatively, to have a positive test like
> 
> #ifdef CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY
> 
> I can create a bigger change that moves the non-neon-memcpy up in the
> file...
> 
> Should I resend the whole series as v3?

No, I don't think changing the name of the config or moving code
will change anything.
We just need to understand when it must be enabled or disabled.


[dpdk-dev] [PATCH v3 4/4] eal/arm: introduce CONFIG_RTE_ARCH_ARM_NEON_MEMCPY

2016-03-20 Thread Jan Viktorin
On Sun, 20 Mar 2016 10:41:10 +0100
Jan Viktorin  wrote:

> On Sat, 19 Mar 2016 21:14:57 +0100
> Thomas Monjalon  wrote:
> 
> > 2016-03-19 20:58, Jan Viktorin:  
> > > The flag is used to enable memcpy optimizations in EAL. As it is not 
> > > always
> > > the performance benefit, the flag allows to disable it.
> > 
> > Ideally the default should be to choose the best optimization.
> > If it is not possible, it would help to have some comments explaining
> > how to choose wether enabling NEON memcpy or not.

The related statistics are mentioned here:

commit 04a2fde35daf5e9a271e72331a70b48b951d7568
Author: Vlastimil Kosar 
Date:   Tue Nov 3 00:47:20 2015 +0100

eal/arm: add vector memcpy for ARMv7

It's quite difficult to easily summarize it, especially for so many
CPUs...

> 
> Ok, we can rename the option to CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY,
> delete it from the defconfig and change the test in rte_memcpy_32.h to
> 
> #ifndef CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY
> 
> Alternatively, to have a positive test like
> 
> #ifdef CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY
> 
> I can create a bigger change that moves the non-neon-memcpy up in the
> file...
> 
> Should I resend the whole series as v3?
> 
> Regards
> Jan


-- 
  Jan ViktorinE-mail: Viktorin at RehiveTech.com
  System ArchitectWeb:www.RehiveTech.com
  RehiveTech
  Brno, Czech Republic


[dpdk-dev] [PATCH v3 4/4] eal/arm: introduce CONFIG_RTE_ARCH_ARM_NEON_MEMCPY

2016-03-20 Thread Jan Viktorin
On Sat, 19 Mar 2016 21:14:57 +0100
Thomas Monjalon  wrote:

> 2016-03-19 20:58, Jan Viktorin:
> > The flag is used to enable memcpy optimizations in EAL. As it is not always
> > the performance benefit, the flag allows to disable it.  
> 
> Ideally the default should be to choose the best optimization.
> If it is not possible, it would help to have some comments explaining
> how to choose wether enabling NEON memcpy or not.

Ok, we can rename the option to CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY,
delete it from the defconfig and change the test in rte_memcpy_32.h to

#ifndef CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY

Alternatively, to have a positive test like

#ifdef CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY

I can create a bigger change that moves the non-neon-memcpy up in the
file...

Should I resend the whole series as v3?

Regards
Jan


[dpdk-dev] [PATCH v3 4/4] eal/arm: introduce CONFIG_RTE_ARCH_ARM_NEON_MEMCPY

2016-03-19 Thread Thomas Monjalon
2016-03-19 20:58, Jan Viktorin:
> The flag is used to enable memcpy optimizations in EAL. As it is not always
> the performance benefit, the flag allows to disable it.

Ideally the default should be to choose the best optimization.
If it is not possible, it would help to have some comments explaining
how to choose wether enabling NEON memcpy or not.


[dpdk-dev] [PATCH v3 4/4] eal/arm: introduce CONFIG_RTE_ARCH_ARM_NEON_MEMCPY

2016-03-19 Thread Jan Viktorin
The flag is used to enable memcpy optimizations in EAL. As it is not always
the performance benefit, the flag allows to disable it.

Signed-off-by: Jan Viktorin 
---
 config/defconfig_arm-armv7a-linuxapp-gcc   | 1 +
 lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h | 8 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc 
b/config/defconfig_arm-armv7a-linuxapp-gcc
index 96c3343..2c60c2c 100644
--- a/config/defconfig_arm-armv7a-linuxapp-gcc
+++ b/config/defconfig_arm-armv7a-linuxapp-gcc
@@ -36,6 +36,7 @@ CONFIG_RTE_ARCH="arm"
 CONFIG_RTE_ARCH_ARM=y
 CONFIG_RTE_ARCH_ARMv7=y
 CONFIG_RTE_ARCH_ARM_TUNE="cortex-a9"
+CONFIG_RTE_ARCH_ARM_NEON_MEMCPY=y

 CONFIG_RTE_FORCE_INTRINSICS=y
 CONFIG_RTE_ARCH_STRICT_ALIGN=y
diff --git a/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h 
b/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h
index ad8bc65..988125b 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h
@@ -42,7 +42,11 @@ extern "C" {

 #include "generic/rte_memcpy.h"

-#ifdef RTE_MACHINE_CPUFLAG_NEON
+#ifdef RTE_ARCH_ARM_NEON_MEMCPY
+
+#ifndef RTE_MACHINE_CPUFLAG_NEON
+#error "Cannot optimize memcpy by NEON as the CPU seems to not support this"
+#endif

 /* ARM NEON Intrinsics are used to copy data */
 #include 
@@ -325,7 +329,7 @@ rte_memcpy_func(void *dst, const void *src, size_t n)
return memcpy(dst, src, n);
 }

-#endif /* RTE_MACHINE_CPUFLAG_NEON */
+#endif /* RTE_ARCH_ARM_NEON_MEMCPY */

 #ifdef __cplusplus
 }
-- 
2.7.0