Re: [ovs-dev] [PATCH] dpif-netdev-perf: aarch64 support for accurate timing update of TSC cycle counter

2019-11-26 Thread Yanqin Wei (Arm Technology China)
Hi Ilya,

No, we didn't test this patch based on OVS-AF_XDP, but made a black build to 
enable this in OVS-DPDK and test it. 
Currently DPDK-AF_XDP has been tested in latest kernel (not released). So I 
think OVS-AF_XDP is close to be supported for aarch64.  

Furthermore, I found a document about userspace-only mode of Open vSwitch 
without DPDK.  
http://docs.openvswitch.org/en/latest/intro/install/userspace/#using-the-userspace-datapath-with-ovs-vswitchd
So it seems userspace datapath should be decoupled with networking IO, users 
can even customize this. Does it means we need implement all used DPDK API 
inside OVS?

Best Regards,
Wei Yanqin 


> -Original Message-
> From: dev  On Behalf Of Ilya Maximets
> Sent: Tuesday, November 26, 2019 11:38 PM
> To: Malvika Gupta ; d...@openvswitch.org
> Cc: nd ; Honnappa Nagarahalli
> 
> Subject: Re: [ovs-dev] [PATCH] dpif-netdev-perf: aarch64 support for accurate
> timing update of TSC cycle counter
> 
> On 13.11.2019 18:01, Malvika Gupta wrote:
> > The accurate timing implementation in this patch gets the wall clock
> > counter via
> > cntvct_el0 register access. This call is portable to all aarch64
> > architectures and has been verified on an 64-bit arm server.
> >
> > Suggested-by: Yanqin Wei 
> > Signed-off-by: Malvika Gupta 
> > ---
> 
> Thanks for the patch!
> 
> Are you trying to use AF_XDP on aarch64?  Asking because it's the only real
> scenario where this patch can be useful.
> 
> For the patch subject, I'd suggest to shorten it a little.
> 'timing', 'TSC' and 'cycle counter' are kind of synonyms here and doesn't make
> the sentence any clear.  Suggesting something like this:
> "dpif-netdev-perf: Accurate cycle counter update on aarch64."
> 
> What do you think?
> 
> One more comment inline.
> 
> >  lib/dpif-netdev-perf.h | 5 +
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h index
> > ce369375b..4ea7cc355 100644
> > --- a/lib/dpif-netdev-perf.h
> > +++ b/lib/dpif-netdev-perf.h
> > @@ -220,6 +220,11 @@ cycles_counter_update(struct pmd_perf_stats *s)
> >  asm volatile("rdtsc" : "=a" (l), "=d" (h));
> >
> >  return s->last_tsc = ((uint64_t) h << 32) | l;
> > +#elif !defined(_MSC_VER) && defined(__aarch64__)
> > +uint64_t tsc;
> > +asm volatile("mrs %0, cntvct_el0" : "=r" (tsc));
> > +
> > +return s->last_tsc = tsc;
> 
> I think we could drop the 'tsc' local variable here and write directly to s-
> >last_tsc.  Less number of variables and operations.
> 
> Best regards, Ilya Maximets.
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1] netdev: use acquire-release semantics for change_seq in netdev

2019-11-25 Thread Yanqin Wei (Arm Technology China)
Hi Ben,

Thanks for your time to review. I am sorry not to verify this patch by clang 
compiler.  But this patch compiles successfully in gcc 7.4. Maybe in some gcc 
version, the atomic type is necessary for atomic operation.
I will fix it in V2.

Best Regards,
Wei Yanqin

> -Original Message-
> From: Ben Pfaff 
> Sent: Tuesday, November 26, 2019 5:54 AM
> To: Yanqin Wei (Arm Technology China) 
> Cc: d...@openvswitch.org; ovs-dev@openvswitch.org; nd ;
> Gavin Hu (Arm Technology China) 
> Subject: Re: [ovs-dev] [PATCH v1] netdev: use acquire-release semantics for
> change_seq in netdev
> 
> On Mon, Nov 18, 2019 at 10:46:59AM +0800, Yanqin Wei wrote:
> > "rxq_enabled" of netdev is writen in the vhost thread and read by pmd
> > thread once it observes 'change_seq' is updated. This patch is to keep
> > order on aarch64 or other weak memory model CPU to ensure
> > 'rxq_enabled' is observed before 'change_seq'.
> >
> > Reviewed-by: Gavin Hu 
> > Signed-off-by: Yanqin Wei 
> 
> Thanks for the patch.
> 
> This does not compile.  Clang says:
> 
> In file included from ../lib/dpif-netdev.c:54:
> ../lib/netdev-provider.h:97:5: error: address argument to atomic operation
> must be a pointer to _Atomic type ('uint64_t *' (aka 'unsigned long *') 
> invalid)
> ../lib/ovs-atomic-clang.h:82:16: note: expanded from macro
> 'atomic_add_explicit'
> In file included from ../lib/dpif-netdev.c:54:
> ../lib/netdev-provider.h:99:9: error: address argument to atomic operation
> must be a pointer to _Atomic type ('uint64_t *' (aka 'unsigned long *') 
> invalid)
> ../lib/ovs-atomic-clang.h:82:16: note: expanded from macro
> 'atomic_add_explicit'
> ../lib/netdev.c:2044:5: error: address argument to atomic operation must 
> be
> a pointer to _Atomic type ('const uint64_t *' (aka 'const unsigned long *')
> invalid)
> ../lib/ovs-atomic-clang.h:53:15: note: expanded from macro
> 'atomic_read_explicit'
> 
> and many more instances.
> 
> GCC says:
> 
> ../lib/netdev.c:2044:5: error: incorrect type in argument 1 (different
> modifiers)
> ../lib/netdev.c:2044:5:expected void *
> ../lib/netdev.c:2044:5:got unsigned long const *
> ../lib/netdev.c:2044:5: error: incorrect type in argument 1 (different
> modifiers)
> ../lib/netdev.c:2044:5:expected void *
> ../lib/netdev.c:2044:5:got unsigned long const *
> 
> I do tend to think it's correct, otherwise.  I wonder how this has been missed
> for so long.
> 
> Thanks,
> 
> Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1 2/4] travis: Move x86-only addon packages

2019-11-22 Thread Yanqin Wei (Arm Technology China)
Hi Ilya,

Reply inline.

Best Regards,
Wei Yanqin

> -Original Message-
> From: dev  On Behalf Of Ilya Maximets
> Sent: Friday, November 22, 2019 3:30 AM
> To: Lance Yang (Arm Technology China) ;
> d...@openvswitch.org; ovs-dev@openvswitch.org
> Cc: Jieqiang Wang (Arm Technology China) ;
> Ruifeng Wang (Arm Technology China) ; Gavin Hu
> (Arm Technology China) ; Jingzhao Ni (Arm Technology
> China) ; nd 
> Subject: Re: [ovs-dev] [PATCH v1 2/4] travis: Move x86-only addon packages
> 
> On 20.11.2019 9:14, Lance Yang wrote:
> > To enable multiple CPU architectures support, it is necessary to move
> > the x86-only addon packages from .travis.yml file. Otherwise, the
> > x86-only addon packages will break the builds on some other CPU
> architectures.
> >
> > Reviewed-by: Yangqin Wei 
> > Reviewed-by: Malvika Gupta 
> > Reviewed-by: Gavin Hu 
> > Reviewed-by: Ruifeng Wang 
> > Signed-off-by: Lance Yang 
> > ---
> >  .travis.yml  |  2 --
> >  .travis/linux-prepare.sh | 12 
> >  2 files changed, 8 insertions(+), 6 deletions(-)
> 
> Common comment for all the patches in a series:
> * It's better to add a period in the end of a subject line.
[Yanqin] OK.
> 
> >
> > diff --git a/.travis.yml b/.travis.yml index 482efd2..2dc4d43 100644
> > --- a/.travis.yml
> > +++ b/.travis.yml
> > @@ -14,7 +14,6 @@ addons:
> >apt:
> >  packages:
> >- bc
> > -  - gcc-multilib
> >- libssl-dev
> >- llvm-dev
> >- libjemalloc1
> > @@ -26,7 +25,6 @@ addons:
> >- libelf-dev
> >- selinux-policy-dev
> >- libunbound-dev
> > -  - libunbound-dev:i386
> >- libunwind-dev
> >
> >  before_install: ./.travis/${TRAVIS_OS_NAME}-prepare.sh
> > diff --git a/.travis/linux-prepare.sh b/.travis/linux-prepare.sh index
> > 9e3ac0d..8096abe 100755
> > --- a/.travis/linux-prepare.sh
> > +++ b/.travis/linux-prepare.sh
> > @@ -15,10 +15,14 @@ cd ..
> >  pip install --disable-pip-version-check --user six flake8 hacking
> > pip install --user --upgrade docutils
> >
> > -if [ "$M32" ]; then
> > -# 32-bit and 64-bit libunwind can not be installed at the same time.
> > -# This will remove the 64-bit libunwind and install 32-bit version.
> > -sudo apt-get install -y libunwind-dev:i386
> > +if [[ "$TRAVIS_ARCH" == "amd64" ]] || [[ -z "$TRAVIS_ARCH" ]]; then
> 
> The same comment here as for previous ppc64le patch.
> Are you going to ever build 32bit binary on aarch64 on Travis?
> Is it really possible to build 32bit binary on aarch64 with '-m32' flag?
[Yanqin] Not yet. Gcc for aarch64 does not support -m32 flag. Cross compiler is 
required to build 32 bits binary on aarch64 machine.

> 
> > +if [ "$M32" ]; then
> > +# 32-bit and 64-bit libunwind can not be installed at the same 
> > time.
> > +# This will remove the 64-bit libunwind and install 32-bit version.
> > +sudo apt-get install \
> > +-y libunwind-dev:i386 libunbound-dev:i386 gcc-multilib
> 
> Please, add additional indentation level for above line.
[Yanqin] Thanks, will be updated in V2.
> 
> > +fi
> > +
> >  fi
> >
> >  # IPv6 is supported by kernel but disabled in TravisCI images:
> >
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [bug-report] bfd decay unit case failure

2019-11-22 Thread Yanqin Wei (Arm Technology China)
Hi Ben,

This issue is difficult to reproduce in local machine. We will try to do it. 
Could you suggest the log we need  to collect or enable?

Best Regards,
Wei Yanqin

> -Original Message-
> From: dev  On Behalf Of Ben Pfaff
> Sent: Friday, November 22, 2019 1:34 AM
> To: Lance Yang (Arm Technology China) 
> Cc: ovs-dev@openvswitch.org
> Subject: Re: [ovs-dev] [bug-report] bfd decay unit case failure
>
> On Thu, Nov 21, 2019 at 09:25:49AM +, Lance Yang (Arm Technology
> China) wrote:
> > I encountered a unit test failure when I ran the Open vSwitch testsuite on
> Travis CI aarch64 lxd container.
> > The environment can be found at line 7 : build system information section in
> the report on https://travis-ci.org/yzyuestc/ovs/jobs/614941322 . The unit 
> test
> case name is "bfd decay" , you can find the unit test failure details in the
> report after line 6520. The failure is not 100% reproducible on Travis CI.
> > Could anyone give some hint on what is wrong for this unit test case?
>
> We do our best to make the Open vSwitch test cases resist differences in
> timing from one environment to another, but there still may be some that are
> sensitive to it.  This one may be such a test case.  These problems can be
> difficult to debug without being able to trigger them in interactive
> environments.  Have you been able to see it when you build by hand?
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1 2/2] cmap: non-blocking cmap lookup

2019-11-18 Thread Yanqin Wei (Arm Technology China)
Hi Ilya,

New node insertion does not always set bucket slot. It should be only two cases 
would trigger slot updating.
1. node is inserted into an empty slot.
2. rearrange cmap bucket in case of no candidate bucket for new node. see 
"cmap_insert_bfs"

1st case is an empty slot, there is no other node in the list.
2nd case, the slot movement has two step:
a. copy an slot(hash,node list) into another candidate slot.
b. replace the old position with another slot. 
So there is at least one complete slot copy in the cmap. If one slot is 
updating and skipped by bitmap, it will be found in anther candidate bucket.

Best Regards,
Wei Yanqin

> -Original Message-
> From: Ilya Maximets 
> Sent: Tuesday, November 19, 2019 12:06 AM
> To: Ola Liljedahl ; i.maxim...@ovn.org; Yanqin Wei
> (Arm Technology China) ; ovs-dev@openvswitch.org
> Cc: Gavin Hu (Arm Technology China) ; nd
> ; b...@ovn.org
> Subject: Re: [ovs-dev] [PATCH v1 2/2] cmap: non-blocking cmap lookup
> 
> On 18.11.2019 16:55, Ola Liljedahl wrote:
> > On Mon, 2019-11-18 at 16:45 +0100, Ilya Maximets wrote:
> >> On 18.11.2019 3:45, Yanqin Wei wrote:
> >>
> >> Currently cmap_bucket is protected by a counter. Cmap reader will be
> >> blocked until the counter becomes even, writer increments it by 1 to
> >> be odd, and after slot update, increments by 1 again to become even.
> >> If the writer is pending or scheduled out during the writer course,
> >> the reader will be blocked.
> >>
> >> This patch introduces a bitmap as guard variable for (hash,node) pair.
> >> Writer need set slot valid bit to 0 before each change. And after
> >> change, writer sets valid bit to 1 and increments counter. Thus,
> >> reader can ensure slot consistency by only scanning valid slot and
> >> then checking counter+bitmap does not change while examining the bucket.
> >>
> >> The only time a reader has to retry the read operation for single
> >> bucket is when 1)a writer clears a bit in the valid bitmap between a
> >> reader's first and second read of the counter+bitmap.
> >> 2)a writer increments the counter between a reader's first and second
> >> read of counter+bitmap.
> >> I.e. the read operation needs to be retried when some other thread
> >> has made progress (with a write).
> >> There is no spinning/waiting for other threads to complete. This
> >> makes the design non-blocking (for readers).
> >>
> >> And it has almost no additional overhead because counter and bitmap
> >> share one 32 bits variable. No additional load/store for reader and
> >> writer.
> >>
> >>
> >> IIUC, after this change if writer will start inserting the node,
> >> reader will not find any node with the same hash in cmap because it
> >> will check only "valid" slots.  This breaks the cmap API and could lead to
> crash.
> > Why wouldn't readers find other valid slots with the same hash value?
> > It is only the slot that is being updated that is cleared in the valid
> > bitmap duing the update. Other valid slots (irrespective of actual
> > hash values) are unchanged.
> >
> > Am I missing something? Do the users of cmap have some other
> > expectations that are not obvious from looking at the cmap code?
> 
> bucket->nodes[i] is not a single node, but the list of nodes with the
> bucket->same hash
> equal to bucket->hashes[i].
> 
> You may look at the implementation of
> CMAP_NODE_FOR_EACH/CMAP_FOR_EACH_WITH_HASH
> and read comments to 'struct cmap_node' and cmap.{c,h} in general.
> 
> While adding the new node to the list you're restricting access to the whole 
> list
> making it impossible to find any node there.
> 
> Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] travis: support ppc64le builds

2019-11-08 Thread Yanqin Wei (Arm Technology China)
Hi David

> -Original Message-
> From: David Wilder 
> Sent: Thursday, November 7, 2019 3:21 AM
> To: ovs-dev@openvswitch.org
> Cc: i.maxim...@ovn.org; b...@ovn.org; Yanqin Wei (Arm Technology China)
> ; wil...@us.ibm.com
> Subject: [PATCH v3] travis: support ppc64le builds
>
> Add support for travis-ci ppc64le builds.
>
> - Updated matrix in .travis.yml to include an arch: ppc64le build.
> - Move package install needed for 32bit builds to .travis/linux-prepare.sh.
>
> To keep the total build time at an acceptable level only a single build job is
> included in the matrix for ppc64le.
>
> A build report example can be found here [1] [0] http://travis-ci.org/ [1]
> https://travis-ci.org/djlwilder/ovs/builds/607851729
>
> Signed-off-by: David Wilder 
> ---
> Addressed review comments:
> - Cleaned up linux-prepare.sh (v2)
> - Switch from os: linux-ppc64le to arch: ppc64le (v3)
>
>  .travis.yml  | 5 +++--
>  .travis/linux-prepare.sh | 5 -
>  2 files changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/.travis.yml b/.travis.yml
> index 482efd2d1..308c09635 100644
> --- a/.travis.yml
> +++ b/.travis.yml
> @@ -14,7 +14,6 @@ addons:
>apt:
>  packages:
>- bc
> -  - gcc-multilib
>- libssl-dev
>- llvm-dev
>- libjemalloc1
> @@ -26,7 +25,6 @@ addons:
>- libelf-dev
>- selinux-policy-dev
>- libunbound-dev
> -  - libunbound-dev:i386
>- libunwind-dev
>
>  before_install: ./.travis/${TRAVIS_OS_NAME}-prepare.sh
> @@ -52,6 +50,9 @@ matrix:
>  - os: osx
>compiler: clang
>env: OPTS="--disable-ssl"
> +- arch: ppc64le
> +  compiler: gcc
> +  env: OPTS="--disable-ssl"
>
>  script: ./.travis/${TRAVIS_OS_NAME}-build.sh $OPTS
>
> diff --git a/.travis/linux-prepare.sh b/.travis/linux-prepare.sh index
> 9e3ac0df7..d66f480c6 100755
> --- a/.travis/linux-prepare.sh
> +++ b/.travis/linux-prepare.sh
> @@ -18,7 +18,10 @@ pip install --user --upgrade docutils  if [ "$M32" ]; then
>  # 32-bit and 64-bit libunwind can not be installed at the same time.
>  # This will remove the 64-bit libunwind and install 32-bit version.
> -sudo apt-get install -y libunwind-dev:i386
> +sudo apt-get install -y \
> + gcc-multilib \
> + libunwind-dev:i386 \
> + libunbound-dev:i386
[Yanqin] They are x86 specific dependency. It is better to use  "$TRAVIS_ARCH" 
== "amd64" condition.
[Yanqin]  Is gcc-multilib only required for 32bits build?
>  fi
>
>  # IPv6 is supported by kernel but disabled in TravisCI images:
> --
> 2.23.0.162.gf1d4a28

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ] travis: support ppc64le builds

2019-11-07 Thread Yanqin Wei (Arm Technology China)
Hi David, 

I am sorry to reply this late. Yes, if travis support this configuration, it 
should be a good solution for me.

Best regards,
Wei Yanqin

-Original Message-
From: dwilder  
Sent: Wednesday, November 6, 2019 4:32 AM
To: Yanqin Wei (Arm Technology China) 
Cc: Ilya Maximets ; ovs-dev@openvswitch.org; 
wil...@us.ibm.com; nd 
Subject: Re: RE: RE: [ovs-dev] [PATCH ] travis: support ppc64le builds

Hi Wei

If I change my matrix:include to use "arch: ppc64le" rather than "os: 
linux-ppc64le", will it eliminate your concern?

matrix:
   include:
-   - os: linux-ppc64le
+   - arch: ppc64le
   compiler: gcc
   env: OPTS="--disable-ssl"


Later when we want to enable the full matrix on all arch we would add:

1) arch:
   - amd64
   - ppc64le
   - arm64

2) eliminate the include: -arch: ppc64le

3) Add any exclude: - arch:XXX sections for any tests that dont apply.

For example I would add:

  exclude:
- arch: ppc64le
  env: M32=1 OPTS="--disable-ssl"

Regards,
   David

On 2019-11-02 02:09, Yanqin Wei (Arm Technology China) wrote:
> Hi David,
> 
> Thanks for your reply.  Yes, my concern is how to define arch and os 
> in .travis.yml after we cover all builds and cases for arm and ppc.
> This pattern can enable all builds and testsuits for x86 and arm
> arch:
>   - amd64
>   - arm64
> os:
>   - linux
> 
> This can enable all jobs for x86 and ppc.
> arch:
>   - amd64 //default
> os:
>   - linux
>   - linux-ppc64le
> 
> But it does not work to combine them.  This means four kinds of
> arch+os combinations in all.   Arm64+linux-ppc64le is invalid.
> arch:
>   - amd64
>   - arm64
> os:
>   - linux
>   - linux-ppc64le
> 
> So if we finally cover all the builds and cases for arm/ppc,  we have 
> to duplicate all jobs for different cpu arch in the matrix include.
> matrix:
>   include:
> - os: linux-ppc64le
>   env: job1
> - os: linux-ppc64le
>   env: job2
> ...
> - arch: arm64
>   env: job1
> - arch: arm64
>   env: job2
> ...
> 
> But currently either arm or ppc has not cover all the cases, so they 
> can coexist in build-matrix.  And there is no conflict in 
> build/prepare scripts, because both of them use TRAVIS_ARCH variable 
> to indicate cpu arch.
> The patch series to enable arm CI is under internal review. It will be 
> submitted when ready.
> 
> Best Regards,
> Wei Yanqin
> 
> 
> -Original Message-
> From: dwilder 
> Sent: Saturday, November 2, 2019 1:09 AM
> To: Yanqin Wei (Arm Technology China) 
> Cc: Ilya Maximets ; ovs-dev@openvswitch.org; 
> wil...@us.ibm.com; nd 
> Subject: Re: RE: [ovs-dev] [PATCH ] travis: support ppc64le builds
> 
> On 2019-10-30 19:04, Yanqin Wei (Arm Technology China) wrote:
>> Hi,
>> 
>> We are working to support arm64 build for ovs travis CI. It is indeed 
>> to use arch: arm64 to choose cpu architecture, because travis has 
>> provided native arm64 option now.
>> But in this patch it seems ppc64 builds run on the ppc-VM + x86 
>> native machine.
>> Currently arm only select a part of jobs to run, which is defined in 
>> matrix:include. But the final object is to run all jobs. It means 
>> that
>>  arch: arm64 will be moved out of marix. If ppc plans to do the same 
>> in the future, it will conflict with arm jobs.
>> 
>> Best Regards,
>> Wei Yanqin
> 
> Hi,
> I have added a build only test for ppc64le following the model used 
> for osx. I think this is a good start for getting multi-arch support 
> into Ci.
> 
> I agree that running all jobs on the matrix on every arch is good goal.
> I dont completely understand your issue, is your concern the use of os:
> vs arch: ?
> 
> I am glad to work with you to find a solution. Can you share your
> arm64 changes?  We can discuss off-list if you prefer.
> 
> 
>> 
>> -Original Message-
>> From: ovs-dev-boun...@openvswitch.org 
>>  On Behalf Of dwilder
>> Sent: Wednesday, October 30, 2019 1:55 AM
>> To: Ilya Maximets 
>> Cc: ovs-dev@openvswitch.org; wil...@us.ibm.com
>> Subject: Re: [ovs-dev] [PATCH ] travis: support ppc64le builds
>> 
>> On 2019-10-29 09:52, Ilya Maximets wrote:
>>> On 28.10.2019 22:22, David Wilder wrote:
>>>> Add support for travis-ci ppc64le builds.
>>>> 
>>>> - Updated matrix in .travis.yml to include a ppc64le build.
>>>> - Added support to install packages needed by specific 
>>>> architectures.
>>>> 
>>>> To keep the total build time at an acceptable level only a single 
>>>> build job is included in

Re: [ovs-dev] [PATCH ] travis: support ppc64le builds

2019-11-02 Thread Yanqin Wei (Arm Technology China)
Hi David,

Thanks for your reply.  Yes, my concern is how to define arch and os in 
.travis.yml after we cover all builds and cases for arm and ppc.  
This pattern can enable all builds and testsuits for x86 and arm
arch:
  - amd64
  - arm64
os:
  - linux

This can enable all jobs for x86 and ppc. 
arch:
  - amd64 //default
os:
  - linux
  - linux-ppc64le

But it does not work to combine them.  This means four kinds of arch+os 
combinations in all.   Arm64+linux-ppc64le is invalid.
arch:
  - amd64
  - arm64
os:
  - linux
  - linux-ppc64le  

So if we finally cover all the builds and cases for arm/ppc,  we have to 
duplicate all jobs for different cpu arch in the matrix include.
matrix:
  include:
- os: linux-ppc64le
  env: job1
- os: linux-ppc64le
  env: job2
...
- arch: arm64
  env: job1
- arch: arm64
  env: job2
...

But currently either arm or ppc has not cover all the cases, so they can 
coexist in build-matrix.  And there is no conflict in build/prepare scripts, 
because both of them use TRAVIS_ARCH variable to indicate cpu arch.
The patch series to enable arm CI is under internal review. It will be 
submitted when ready.

Best Regards,
Wei Yanqin


-Original Message-
From: dwilder  
Sent: Saturday, November 2, 2019 1:09 AM
To: Yanqin Wei (Arm Technology China) 
Cc: Ilya Maximets ; ovs-dev@openvswitch.org; 
wil...@us.ibm.com; nd 
Subject: Re: RE: [ovs-dev] [PATCH ] travis: support ppc64le builds

On 2019-10-30 19:04, Yanqin Wei (Arm Technology China) wrote:
> Hi,
> 
> We are working to support arm64 build for ovs travis CI. It is indeed 
> to use arch: arm64 to choose cpu architecture, because travis has 
> provided native arm64 option now.
> But in this patch it seems ppc64 builds run on the ppc-VM + x86 native 
> machine.
> Currently arm only select a part of jobs to run, which is defined in 
> matrix:include. But the final object is to run all jobs. It means that
>  arch: arm64 will be moved out of marix. If ppc plans to do the same 
> in the future, it will conflict with arm jobs.
> 
> Best Regards,
> Wei Yanqin

Hi,
I have added a build only test for ppc64le following the model used for osx. I 
think this is a good start for getting multi-arch support into Ci.

I agree that running all jobs on the matrix on every arch is good goal.  
I dont completely understand your issue, is your concern the use of os: 
vs arch: ?

I am glad to work with you to find a solution. Can you share your arm64 
changes?  We can discuss off-list if you prefer.


> 
> -Original Message-
> From: ovs-dev-boun...@openvswitch.org
>  On Behalf Of dwilder
> Sent: Wednesday, October 30, 2019 1:55 AM
> To: Ilya Maximets 
> Cc: ovs-dev@openvswitch.org; wil...@us.ibm.com
> Subject: Re: [ovs-dev] [PATCH ] travis: support ppc64le builds
> 
> On 2019-10-29 09:52, Ilya Maximets wrote:
>> On 28.10.2019 22:22, David Wilder wrote:
>>> Add support for travis-ci ppc64le builds.
>>> 
>>> - Updated matrix in .travis.yml to include a ppc64le build.
>>> - Added support to install packages needed by specific architectures.
>>> 
>>> To keep the total build time at an acceptable level only a single
>>> build job is included in the matrix for ppc64le.
>>> 
>>> A build report example can be found here [1] [0]
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__travis-2Dci.org_;
>>> d=DwICaQ=jf_iaSHvJObTbx-siA1ZOg=7ndxyKjH9UrBD68Us2WP1wI4BwEBQbeAy
>>> z8i_vwCCaI=6JANehIfGoxUMtwHhe4yob4UPeby0Y8ovgzRDIyJZFo=UMYL8rzJNp
>>> h87seC0oJLBiWoe-sUSL80AJy0RMTgBzQ=
>>> [1]
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__travis-2Dci.org_
>>> djlwilder_ovs_builds_604098141=DwICaQ=jf_iaSHvJObTbx-siA1ZOg=7n
>>> dxyKjH9UrBD68Us2WP1wI4BwEBQbeAyz8i_vwCCaI=6JANehIfGoxUMtwHhe4yob4UP
>>> eby0Y8ovgzRDIyJZFo=pyd2yQpQ0snpwGE5El4RYZsatwl74sthM1KLqtIKCnY=
>>> Signed-off-by: David Wilder 
>> 
>> Hi David,
>> Thanks for working on this. I have a couple of question regarding
>> ppc64le support by TravisCI.  It seems that they are not supporting
>> this architecture officially and refusing[1] to solve any issues that
>> appears while using it. There also no official documentation.
>> It's kind of a hidden feature that some projects are using for their
>> own risk [2]. Do you know why this happens or maybe you have some
>> insights about what is going on/how it works?
> 
> Work is going on to increase ppc64le support on Travis by the end of
> the year.  I dont have any details yet. My plan is to keep this to
> build-only ci until then.  Important, ppc64le VM are only available on
> travis-ci.org, they are not available on travis-ci.com.
> 
>> The API is also a bit strange because Travis starte

Re: [ovs-dev] [PATCH ] travis: support ppc64le builds

2019-10-30 Thread Yanqin Wei (Arm Technology China)
Hi,

We are working to support arm64 build for ovs travis CI. It is indeed to use 
arch: arm64 to choose cpu architecture, because travis has provided native 
arm64 option now. 
But in this patch it seems ppc64 builds run on the ppc-VM + x86 native machine. 
 
Currently arm only select a part of jobs to run, which is defined in 
matrix:include. But the final object is to run all jobs. It means that  arch: 
arm64 will be moved out of marix. If ppc plans to do the same in the future, it 
will conflict with arm jobs.

Best Regards,
Wei Yanqin

-Original Message-
From: ovs-dev-boun...@openvswitch.org  On 
Behalf Of dwilder
Sent: Wednesday, October 30, 2019 1:55 AM
To: Ilya Maximets 
Cc: ovs-dev@openvswitch.org; wil...@us.ibm.com
Subject: Re: [ovs-dev] [PATCH ] travis: support ppc64le builds

On 2019-10-29 09:52, Ilya Maximets wrote:
> On 28.10.2019 22:22, David Wilder wrote:
>> Add support for travis-ci ppc64le builds.
>> 
>> - Updated matrix in .travis.yml to include a ppc64le build.
>> - Added support to install packages needed by specific architectures.
>> 
>> To keep the total build time at an acceptable level only a single 
>> build job is included in the matrix for ppc64le.
>> 
>> A build report example can be found here [1] [0] 
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__travis-2Dci.org_;
>> d=DwICaQ=jf_iaSHvJObTbx-siA1ZOg=7ndxyKjH9UrBD68Us2WP1wI4BwEBQbeAy
>> z8i_vwCCaI=6JANehIfGoxUMtwHhe4yob4UPeby0Y8ovgzRDIyJZFo=UMYL8rzJNp
>> h87seC0oJLBiWoe-sUSL80AJy0RMTgBzQ=
>> [1]
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__travis-2Dci.org_
>> djlwilder_ovs_builds_604098141=DwICaQ=jf_iaSHvJObTbx-siA1ZOg=7n
>> dxyKjH9UrBD68Us2WP1wI4BwEBQbeAyz8i_vwCCaI=6JANehIfGoxUMtwHhe4yob4UP
>> eby0Y8ovgzRDIyJZFo=pyd2yQpQ0snpwGE5El4RYZsatwl74sthM1KLqtIKCnY=
>> Signed-off-by: David Wilder 
> 
> Hi David,
> Thanks for working on this. I have a couple of question regarding 
> ppc64le support by TravisCI.  It seems that they are not supporting 
> this architecture officially and refusing[1] to solve any issues that 
> appears while using it. There also no official documentation.
> It's kind of a hidden feature that some projects are using for their 
> own risk [2]. Do you know why this happens or maybe you have some 
> insights about what is going on/how it works?

Work is going on to increase ppc64le support on Travis by the end of the year.  
I dont have any details yet. My plan is to keep this to build-only ci until 
then.  Important, ppc64le VM are only available on travis-ci.org, they are not 
available on travis-ci.com.

> The API is also a bit strange because Travis started to officially 
> support arm builds, but this is done via 'arch' knob, not the 'os'.
> Will it be changed over time for ppc64le?
> 

Sorry, I dont know.

> [1]
> https://urldefense.proofpoint.com/v2/url?u=https-3A__travis-2Dci.commu
> nity_t_ppc64le-2Darch-2Dsupport-2Don-2Dtravis-2Dci-2Dcom-2Dvs-2Dtravis
> -2Dci-2Dorg_2898_2=DwICaQ=jf_iaSHvJObTbx-siA1ZOg=7ndxyKjH9UrBD68
> Us2WP1wI4BwEBQbeAyz8i_vwCCaI=6JANehIfGoxUMtwHhe4yob4UPeby0Y8ovgzRDIy
> JZFo=TrXdSxjvnbbVQz7EzR5r0aE93lZMSdCiIUQT2wt8E3I=
> [2]
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openss
> l_openssl_commit_13da3ad00c80e1da816ca27f6c15b0ecee1bb0b8=DwICaQ=j
> f_iaSHvJObTbx-siA1ZOg=7ndxyKjH9UrBD68Us2WP1wI4BwEBQbeAyz8i_vwCCaI=
> 6JANehIfGoxUMtwHhe4yob4UPeby0Y8ovgzRDIyJZFo=RWVuli-BT8E2IsW3rAA9MtqC
> VPZahNk8k7yqxEbgTT4=
> 
> Few code comments inline.
> 
>> ---
>>   .travis.yml  |  5 +++--
>>   .travis/linux-prepare.sh | 18 ++
>>   2 files changed, 17 insertions(+), 6 deletions(-)
>> 
>> diff --git a/.travis.yml b/.travis.yml index 5676d9748..c99f26815 
>> 100644
>> --- a/.travis.yml
>> +++ b/.travis.yml
>> @@ -14,7 +14,6 @@ addons:
>> apt:
>>   packages:
>> - bc
>> -  - gcc-multilib
>> - libssl-dev
>> - llvm-dev
>> - libjemalloc1
>> @@ -24,7 +23,6 @@ addons:
>> - libelf-dev
>> - selinux-policy-dev
>> - libunbound-dev
>> -  - libunbound-dev:i386
>> - libunwind-dev
>> before_install: ./.travis/${TRAVIS_OS_NAME}-prepare.sh
>> @@ -50,6 +48,9 @@ matrix:
>>   - os: osx
>> compiler: clang
>> env: OPTS="--disable-ssl"
>> +- os: linux-ppc64le
>> +  compiler: gcc
>> +  env: OPTS="--disable-ssl"
>> script: ./.travis/${TRAVIS_OS_NAME}-build.sh $OPTS
>>   diff --git a/.travis/linux-prepare.sh b/.travis/linux-prepare.sh 
>> index e546d32cb..f3a9a6d44 100755
>> --- a/.travis/linux-prepare.sh
>> +++ b/.travis/linux-prepare.sh
>> @@ -15,8 +15,18 @@ cd ..
>>   pip install --disable-pip-version-check --user six flake8 hacking
>>   pip install --user --upgrade docutils
>>   -if [ "$M32" ]; then
>> -# 32-bit and 64-bit libunwind can not be installed at the same 
>> time.
>> -# This will remove the 64-bit libunwind and install 32-bit 
>> version.
>> -sudo apt-get install -y libunwind-dev:i386
>> +# Include packages 

Re: [ovs-dev] SIGILL ovs branch-2.12/arm64/DPDK

2019-09-05 Thread Yanqin Wei (Arm Technology China)
Thanks James,
I can understand the root cause now.

Best Regards,
Wei Yanqin

From: James Page 
Sent: Thursday, September 5, 2019 3:42 PM
To: Yanqin Wei (Arm Technology China) 
Cc: Ilya Maximets ; ovs-dev@openvswitch.org
Subject: Re: [ovs-dev] SIGILL ovs branch-2.12/arm64/DPDK

On Thu, Sep 5, 2019 at 8:34 AM Yanqin Wei (Arm Technology China) 
mailto:yanqin@arm.com>> wrote:
Is it possible a configuration issue for building OVS userspace program , which 
should not enable crc for xgene cpu?  DPDK library is linked with OVS program 
during OVS building. It should not import compiling configuration from DPDK, 
right?

Compiler flags will be generated by the pkg-config provided by DPDK - the CPU 
flags exceed the features of the xgene CPU.

That said, in the context of compiling a DPDK enabled version of OVS for arm64 
for general consumption, having a practical baseline for CPU features that are 
actually useful to arm64 users is important.

Canonical have some fairly earlier arm64 hardware in the Ubuntu build 
infrastructure which is probably below the sensible baseline for DPDK support.

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] SIGILL ovs branch-2.12/arm64/DPDK

2019-09-05 Thread Yanqin Wei (Arm Technology China)
Hi Ilya,
Thanks for the clarification. It looks application is strong couple with DPDK. 
I will follow your discussion under this bug.

Best Regards,
Wei Yanqin

-Original Message-
From: Ilya Maximets 
Sent: Thursday, September 5, 2019 3:41 PM
To: Yanqin Wei (Arm Technology China) ; James Page 

Cc: ovs-dev@openvswitch.org
Subject: Re: [ovs-dev] SIGILL ovs branch-2.12/arm64/DPDK

On 05.09.2019 10:34, Yanqin Wei (Arm Technology China) wrote:
> Is it possible a configuration issue for building OVS userspace program , 
> which should not enable crc for xgene cpu?

There is a arm64-xgene1-linux-gcc target that can be used for a legacy 
make-based build system. New meson based build doesn't support that.

> DPDK library is linked with OVS program during OVS building. It should not 
> import compiling configuration from DPDK, right?

DPDK exports its machine flags, extra defines and libraries to the application 
build process because big and important parts of DPDK code are in header 
inlines and built within the application build process.

>
> Best Regards,
> Wei Yanqin
>
> -Original Message-
> From: Ilya Maximets 
> Sent: Wednesday, September 4, 2019 8:46 PM
> To: Yanqin Wei (Arm Technology China) ; James Page
> 
> Cc: ovs-dev@openvswitch.org
> Subject: Re: [ovs-dev] SIGILL ovs branch-2.12/arm64/DPDK
>
> BTW, I submitted a bug to DPDK:
> https://bugs.dpdk.org/show_bug.cgi?id=344
>
> Best regards, Ilya Maximets.
>
> On 04.09.2019 12:53, Yanqin Wei (Arm Technology China) wrote:
>> Understood. Thanks for the information.
>>
>>
>>
>> Best Regards,
>>
>> Wei Yanqin
>>
>>
>>
>> *From:* James Page 
>> *Sent:* Wednesday, September 4, 2019 5:45 PM
>> *To:* Yanqin Wei (Arm Technology China) 
>> *Cc:* Ilya Maximets ; ovs-dev@openvswitch.org
>> *Subject:* Re: [ovs-dev] SIGILL ovs branch-2.12/arm64/DPDK
>>
>>
>>
>> Hi
>>
>>
>>
>> On Wed, Sep 4, 2019 at 10:30 AM Yanqin Wei (Arm Technology China) 
>> mailto:yanqin@arm.com>> wrote:
>>
>>
>> CRC32 is not a mandatory feature for Armv8.0. What is the arm64 CPU used 
>> in your platform?
>> I'm not sure if it's necessary to use mandatory feature (neon) to 
>> optimize hash libraries, because I thought most arm64 CPUs supported CRC32.
>>
>>
>>
>> The builders we use in Launchpad for package builds are fully virtualized 
>> via OpenStack; I checked and we have some older X-gene CPU's which don't 
>> have crc32 support.
>>
>>
>>
>> IMPORTANT NOTICE: The contents of this email and any attachments are 
>> confidential and may also be privileged. If you are not the intended 
>> recipient, please notify the sender immediately and do not disclose the 
>> contents to any other person, use it for any purpose, or store or copy the 
>> information in any medium. Thank you.
> IMPORTANT NOTICE: The contents of this email and any attachments are 
> confidential and may also be privileged. If you are not the intended 
> recipient, please notify the sender immediately and do not disclose the 
> contents to any other person, use it for any purpose, or store or copy the 
> information in any medium. Thank you.
>
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] SIGILL ovs branch-2.12/arm64/DPDK

2019-09-05 Thread Yanqin Wei (Arm Technology China)
Is it possible a configuration issue for building OVS userspace program , which 
should not enable crc for xgene cpu?  DPDK library is linked with OVS program 
during OVS building. It should not import compiling configuration from DPDK, 
right?

Best Regards,
Wei Yanqin

-Original Message-
From: Ilya Maximets 
Sent: Wednesday, September 4, 2019 8:46 PM
To: Yanqin Wei (Arm Technology China) ; James Page 

Cc: ovs-dev@openvswitch.org
Subject: Re: [ovs-dev] SIGILL ovs branch-2.12/arm64/DPDK

BTW, I submitted a bug to DPDK:
https://bugs.dpdk.org/show_bug.cgi?id=344

Best regards, Ilya Maximets.

On 04.09.2019 12:53, Yanqin Wei (Arm Technology China) wrote:
> Understood. Thanks for the information.
>
>
>
> Best Regards,
>
> Wei Yanqin
>
>
>
> *From:* James Page 
> *Sent:* Wednesday, September 4, 2019 5:45 PM
> *To:* Yanqin Wei (Arm Technology China) 
> *Cc:* Ilya Maximets ; ovs-dev@openvswitch.org
> *Subject:* Re: [ovs-dev] SIGILL ovs branch-2.12/arm64/DPDK
>
>
>
> Hi
>
>
>
> On Wed, Sep 4, 2019 at 10:30 AM Yanqin Wei (Arm Technology China) 
> mailto:yanqin@arm.com>> wrote:
>
>
> CRC32 is not a mandatory feature for Armv8.0. What is the arm64 CPU used 
> in your platform?
> I'm not sure if it's necessary to use mandatory feature (neon) to 
> optimize hash libraries, because I thought most arm64 CPUs supported CRC32.
>
>
>
> The builders we use in Launchpad for package builds are fully virtualized via 
> OpenStack; I checked and we have some older X-gene CPU's which don't have 
> crc32 support.
>
>
>
> IMPORTANT NOTICE: The contents of this email and any attachments are 
> confidential and may also be privileged. If you are not the intended 
> recipient, please notify the sender immediately and do not disclose the 
> contents to any other person, use it for any purpose, or store or copy the 
> information in any medium. Thank you.
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] SIGILL ovs branch-2.12/arm64/DPDK

2019-09-04 Thread Yanqin Wei (Arm Technology China)
Understood. Thanks for the information.

Best Regards,
Wei Yanqin

From: James Page 
Sent: Wednesday, September 4, 2019 5:45 PM
To: Yanqin Wei (Arm Technology China) 
Cc: Ilya Maximets ; ovs-dev@openvswitch.org
Subject: Re: [ovs-dev] SIGILL ovs branch-2.12/arm64/DPDK

Hi

On Wed, Sep 4, 2019 at 10:30 AM Yanqin Wei (Arm Technology China) 
mailto:yanqin@arm.com>> wrote:

CRC32 is not a mandatory feature for Armv8.0. What is the arm64 CPU used in 
your platform?
I'm not sure if it's necessary to use mandatory feature (neon) to optimize hash 
libraries, because I thought most arm64 CPUs supported CRC32.

The builders we use in Launchpad for package builds are fully virtualized via 
OpenStack; I checked and we have some older X-gene CPU's which don't have crc32 
support.

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] SIGILL ovs branch-2.12/arm64/DPDK

2019-09-04 Thread Yanqin Wei (Arm Technology China)
Hi James,

CRC32 is not a mandatory feature for Armv8.0. What is the arm64 CPU used in 
your platform?
I'm not sure if it's necessary to use mandatory feature (neon) to optimize hash 
libraries, because I thought most arm64 CPUs supported CRC32.

Best Regards,
Wei Yanqin

-Original Message-
From: ovs-dev-boun...@openvswitch.org  On 
Behalf Of James Page
Sent: Wednesday, September 4, 2019 4:50 PM
To: Ilya Maximets 
Cc: ovs-dev@openvswitch.org
Subject: Re: [ovs-dev] SIGILL ovs branch-2.12/arm64/DPDK

On Tue, Sep 3, 2019 at 5:48 PM Ilya Maximets  wrote:

> > Hi
> >
> > I've been testing non-x86 architecture builds for the upcoming 2.12
> release
> > of OVS and I'm hitting an issue with DPDK enabled builds on the
> > arm64 architecture.
> >
> > branch-2.12 includes improvements for native hashing under arm64;
> > these appear to work fine when DPDK is not in the mix, but with DPDK
> > enabled, I get a SIGILL in lib/hash.c:
>
> Hi.
>
> What is your target platform?
> One explanation I could imagine is that DPDK blindly forces
> -march=armv8-a+crc defining __ARM_FEATURE_CRC32 while your cpu doesn't
> support crc32 extension.
>
> Do you have crc32 in the list of cpu features in /proc/cpuinfo ?
>

It would appear that I do not:

Features : fp asimd evtstrm cpuid

Thanks for the pointer.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] flow: miniflow_extract metadata branchless optimization

2019-08-30 Thread Yanqin Wei (Arm Technology China)
Hi 

I run a basic nic2nic case for compiler branch stripping. The improvement is 
around 0.8%.
It is observed that 0.37% branch miss in case of directly call 
miniflow_extract__(but it is not inline here)  and 0.27% branch miss in case of 
enable multi-instance for miniflow_extract.

But downside is code size gets bigger and is possible to increase I-cache miss. 
In this test, it is not observed because only miniflow_extract_firstpass is 
invoked. For some cases such as tunnel or connection track, it might has some 
impact.  But in these cases, the branch prediction failure should also increase 
if not enable branch strip.

So the impact should be platform(cache size, branch prediction, compiler) and 
deployment(L3 fwding/tunnel/...) dependency. 
On the whole, the greater difference between first pass and recirculation, the 
more benefit to enable multi-instance.

Best Regards,
Wei Yanqin

-Original Message-
From: Ilya Maximets  
Sent: Thursday, August 29, 2019 9:46 PM
To: Yanqin Wei (Arm Technology China) ; Ben Pfaff 

Cc: d...@openvswitch.org; nd ; Gavin Hu (Arm Technology China) 

Subject: Re: [ovs-dev] [PATCH v2] flow: miniflow_extract metadata branchless 
optimization

On 29.08.2019 12:21, Yanqin Wei (Arm Technology China) wrote:
> Hi Ben,
> Thanks for the feedback.  It is indeed related with userspace datapath. 
> 
> Hi Ilya,
> Could you take a look at this patch when you have time?> I knew 
> first-pass and recirculating traffic share the same packet handling. It makes 
> code common and maintainable.
> But if we can introduce some data-oriented and well-defined flag to bypass 
> some time-consuming handling, it can improve performance a lot.

Hi. I had a quick look at the patch.
Few thoughts:
* 'md' is actually always valid inside 'miniflow_extract', so the
   variable 'md_valid' should be renamed to not be misleading.
   Maybe something like 'md_is_full'? I'm not sure about the name.

* How much is the performance benefit of the compiler code stripping?
  I mean, what is the difference between direct call
 miniflow_extract__(packet, dst, md_is_valid);
  where 'md_is_valid == false' and the call to
 miniflow_extract_firstpass(packet, dst);
  ?
  Asking because this complicates dfc_processing() function.
  I'm, actually have a patch locally to combine rss_hash
  calculation functions to reduce code duplication, so I'm trying
  to figure out what are the possibilities here.

Best regards, Ilya Maximets.

> 
> Best Regards,
> Wei Yanqin
> 
> -Original Message-
> From: Ben Pfaff 
> Sent: Thursday, August 29, 2019 6:11 AM
> To: Yanqin Wei (Arm Technology China) ; Ilya 
> Maximets 
> Cc: d...@openvswitch.org; nd ; Gavin Hu (Arm Technology 
> China) 
> Subject: Re: [ovs-dev] [PATCH v2] flow: miniflow_extract metadata 
> branchless optimization
> 
> This fails to apply to current master.
> 
> Whereas most of the time I'd be comfortable with reviewing 'flow'
> patches myself, this one is closed related to the dpif-netdev code and I'd 
> prefer to have someone who understands that code well, as well as the 
> tradeoffs between performance and maintainability, review it.  Ilya (added to 
> the To line) is a good choice.
> 
> On Thu, Aug 22, 2019 at 06:09:16PM +0800, Yanqin Wei wrote:
>> "miniflow_extract" is a branch heavy implementation for packet header 
>> and metadata parsing. There is a lot of meta data handling for all traffic.
>> But this should not be applicable for packets from interface.
>> This patch adds a layer of inline encapsulation to miniflow_extract 
>> and introduces constant "md_valid" input parameter as a branch condition.
>> The new branch will be removed by the compiler at compile time. Two 
>> instances of miniflow_extract with different branches will be generated.
>>
>> This patch is tested on an arm64 platform. It improves more than 3.5% 
>> performance in P2P forwarding cases.
>>
>> Reviewed-by: Gavin Hu 
>> Signed-off-by: Yanqin Wei 
>> ---
>>  lib/dpif-netdev.c |  13 +++---
>>  lib/flow.c| 116 
>> --
>>  lib/flow.h|   2 +
>>  3 files changed, 79 insertions(+), 52 deletions(-)
>>
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
>> d0a1c58..6686b14 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -6508,12 +6508,15 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
>>  }
>>  }
>>  
>> -miniflow_extract(packet, >mf);
>> +if (!md_is_valid) {
>> +miniflow_extract_firstpass(packet, >mf);
>> +key->hash =
>> +dpif_netdev_packet_get_rss_hash_orig_pkt(packet, >mf);
>&g

Re: [ovs-dev] [PATCH v2] flow: miniflow_extract metadata branchless optimization

2019-08-29 Thread Yanqin Wei (Arm Technology China)
Hi Ben,
Thanks for the feedback.  It is indeed related with userspace datapath. 

Hi Ilya, 
Could you take a look at this patch when you have time? 

I knew first-pass and recirculating traffic share the same packet handling. It 
makes code common and maintainable. 
But if we can introduce some data-oriented and well-defined flag to bypass some 
time-consuming handling, it can improve performance a lot.

Best Regards,
Wei Yanqin

-Original Message-
From: Ben Pfaff  
Sent: Thursday, August 29, 2019 6:11 AM
To: Yanqin Wei (Arm Technology China) ; Ilya Maximets 

Cc: d...@openvswitch.org; nd ; Gavin Hu (Arm Technology China) 

Subject: Re: [ovs-dev] [PATCH v2] flow: miniflow_extract metadata branchless 
optimization

This fails to apply to current master.

Whereas most of the time I'd be comfortable with reviewing 'flow'
patches myself, this one is closed related to the dpif-netdev code and I'd 
prefer to have someone who understands that code well, as well as the tradeoffs 
between performance and maintainability, review it.  Ilya (added to the To 
line) is a good choice.

On Thu, Aug 22, 2019 at 06:09:16PM +0800, Yanqin Wei wrote:
> "miniflow_extract" is a branch heavy implementation for packet header 
> and metadata parsing. There is a lot of meta data handling for all traffic.
> But this should not be applicable for packets from interface.
> This patch adds a layer of inline encapsulation to miniflow_extract 
> and introduces constant "md_valid" input parameter as a branch condition.
> The new branch will be removed by the compiler at compile time. Two 
> instances of miniflow_extract with different branches will be generated.
> 
> This patch is tested on an arm64 platform. It improves more than 3.5% 
> performance in P2P forwarding cases.
> 
> Reviewed-by: Gavin Hu 
> Signed-off-by: Yanqin Wei 
> ---
>  lib/dpif-netdev.c |  13 +++---
>  lib/flow.c| 116 
> --
>  lib/flow.h|   2 +
>  3 files changed, 79 insertions(+), 52 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 
> d0a1c58..6686b14 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -6508,12 +6508,15 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
>  }
>  }
>  
> -miniflow_extract(packet, >mf);
> +if (!md_is_valid) {
> +miniflow_extract_firstpass(packet, >mf);
> +key->hash =
> +dpif_netdev_packet_get_rss_hash_orig_pkt(packet, >mf);
> +} else {
> +miniflow_extract(packet, >mf);
> +key->hash = dpif_netdev_packet_get_rss_hash(packet, >mf);
> +}
>  key->len = 0; /* Not computed yet. */
> -key->hash =
> -(md_is_valid == false)
> -? dpif_netdev_packet_get_rss_hash_orig_pkt(packet, >mf)
> -: dpif_netdev_packet_get_rss_hash(packet, >mf);
>  
>  /* If EMC is disabled skip emc_lookup */
>  flow = (cur_min != 0) ? emc_lookup(>emc_cache, key) : 
> NULL; diff --git a/lib/flow.c b/lib/flow.c index e54fd2e..e5b554b 
> 100644
> --- a/lib/flow.c
> +++ b/lib/flow.c
> @@ -707,7 +707,8 @@ ipv6_sanity_check(const struct 
> ovs_16aligned_ip6_hdr *nh, size_t size)  }
>  
>  /* Initializes 'dst' from 'packet' and 'md', taking the packet type 
> into
> - * account.  'dst' must have enough space for FLOW_U64S * 8 bytes.
> + * account.  'dst' must have enough space for FLOW_U64S * 8 bytes. 
> + Metadata
> + * initialization should be bypassed if "md_valid" is false.
>   *
>   * Initializes the layer offsets as follows:
>   *
> @@ -732,8 +733,9 @@ ipv6_sanity_check(const struct ovs_16aligned_ip6_hdr *nh, 
> size_t size)
>   *  present and the packet has at least the content used for the fields
>   *  of interest for the flow, otherwise UINT16_MAX.
>   */
> -void
> -miniflow_extract(struct dp_packet *packet, struct miniflow *dst)
> +static inline ALWAYS_INLINE void
> +miniflow_extract__(struct dp_packet *packet, struct miniflow *dst,
> +const bool md_valid)
>  {
>  /* Add code to this function (or its callees) to extract new fields. */
>  BUILD_ASSERT_DECL(FLOW_WC_SEQ == 41); @@ -752,54 +754,60 @@ 
> miniflow_extract(struct dp_packet *packet, struct miniflow *dst)
>  ovs_be16 ct_tp_src = 0, ct_tp_dst = 0;
>  
>  /* Metadata. */
> -if (flow_tnl_dst_is_set(>tunnel)) {
> -miniflow_push_words(mf, tunnel, >tunnel,
> -offsetof(struct flow_tnl, metadata) /
> -sizeof(uint64_t));
> -
> -if (!(md->tunnel.flags & FLOW_TNL_

Re: [ovs-dev] [PATCH v2] flow: fix incorrect padding length checking and combine branch in ipv6_sanity_check

2019-08-29 Thread Yanqin Wei (Arm Technology China)
Hi Ben,

Thanks for the comments. I am sorry not to notice the risk of calculation with 
different type . 
The original reason to use 'int' for size checking is that compiler can combine 
two conditions into one and save a branch instruction here,  because "negative 
value" always more than UINT8_MAX.  
> +if (OVS_UNLIKELY(pad_len < 0 || pad_len > UINT8_MAX)) {

But now I realized it introduces the risk and not clean for C code even if we 
use type cast here. So I will remove this performance improvement and only keep 
the bug fix for padding length calculation in this patch. What do you think of 
it?

Best Regards,
Wei Yanqin

-Original Message-
From: Ben Pfaff  
Sent: Thursday, August 29, 2019 5:58 AM
To: Yanqin Wei (Arm Technology China) 
Cc: d...@openvswitch.org; nd ; Gavin Hu (Arm Technology China) 

Subject: Re: [ovs-dev] [PATCH v2] flow: fix incorrect padding length checking 
and combine branch in ipv6_sanity_check

On Thu, Aug 22, 2019 at 06:09:34PM +0800, Yanqin Wei wrote:
> The padding length is (packet size - ipv6 header length - ipv6 plen).  
> This patch fixes incorrect ipv6 size checking and improves it by combining 
> branch.
> 
> Reviewed-by: Gavin Hu 
> Signed-off-by: Yanqin Wei 
> ---
>  lib/flow.c | 10 --
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/flow.c b/lib/flow.c
> index e5b554b..1b21f51 100644
> --- a/lib/flow.c
> +++ b/lib/flow.c
> @@ -688,18 +688,16 @@ ipv4_get_nw_frag(const struct ip_header *nh)  
> static inline bool  ipv6_sanity_check(const struct 
> ovs_16aligned_ip6_hdr *nh, size_t size)  {
> -uint16_t plen;
> +int pad_len;
>  
>  if (OVS_UNLIKELY(size < sizeof *nh)) {
>  return false;
>  }
>  
> -plen = ntohs(nh->ip6_plen);
> -if (OVS_UNLIKELY(plen + IPV6_HEADER_LEN > size)) {
> -return false;
> -}
> +pad_len = size - IPV6_HEADER_LEN - ntohs(nh->ip6_plen);

The types in the above calculation worry me.  Writing the type of each
term:

int = size_t - int - uint16_t

The most likely type of the right side of the expression is size_t.
Assigning this to an 'int' might do "the right thing" for negative values, but 
it's risky--especially since size_t and int might be different widths.  I think 
it would be safer to cast the first and third terms to int, e.g.:

pad_len = (int) size - IPV6_HEADER_LEN - (int) ntohs(nh->ip6_plen);

>  /* Jumbo Payload option not supported yet. */
> -if (OVS_UNLIKELY(size - plen > UINT8_MAX)) {
> +if (OVS_UNLIKELY(pad_len < 0 || pad_len > UINT8_MAX)) {
>  return false;
>  }

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] OVS compiling issue(gcc version 6.5.0 )

2019-06-04 Thread Yanqin Wei (Arm Technology China)
Hi Ben,

After reconfigure the compiler: CC=gcc , this issue is solved.  I tried to 
reproduce it and found reproducing condition is to switch default gcc version 
after running configure script.
1. default gcc is gcc-8, running the configure script.
2. change the default gcc to gcc-6
3. building  <--- compiling error

4. specific C compiler by configure script: compiler CC=gcc  <--  building 
success
5. change default gcc to gcc-8 and running the configure script
6  change default gcc to gcc-6 & building <-- compiling error

In the config.log after step 3, the detected gcc version is 8.2, so it choose 
some unsupported compiler option for gcc-6.
configure:4038: checking for C compiler version
configure:4047: gcc --version >&5
gcc (GCC) 8.2.0
Copyright (C) 2018 Free Software Foundation, Inc.

So I think it should not be a bug.  It is a configure issue. Thanks for your 
support.

Best Regards,
Wei Yanqin

-Original Message-
From: Ben Pfaff 
Sent: Tuesday, June 4, 2019 7:11 AM
To: Yanqin Wei (Arm Technology China) 
Cc: ovs-disc...@openvswitch.org; d...@openvswitch.org
Subject: Re: [ovs-dev] OVS compiling issue(gcc version 6.5.0 )

On Mon, Jun 03, 2019 at 11:18:23AM +0000, Yanqin Wei (Arm Technology China) 
wrote:
> I am trying to compile OVS(master branch) via gcc 6.5.0, but there is gcc 
> option error.
> gcc: error: unrecognized command line option '-Wmultistatement-macros'; did 
> you mean '-Wunused-macros'?

What's in config.log?

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] OVS compiling issue(gcc version 6.5.0 )

2019-06-03 Thread Yanqin Wei (Arm Technology China)
Hi,

I am trying to compile OVS(master branch) via gcc 6.5.0, but there is gcc 
option error.
gcc: error: unrecognized command line option '-Wmultistatement-macros'; did you 
mean '-Wunused-macros'?

My question is which versions of GCC are supported by OVS. Is it a bug of OVS 
compiling? The compiling output is attached.


gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/aarch64-linux-gnu/6/lto-wrapper
Target: aarch64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 
6.5.0-2ubuntu1~16.04' --with-bugurl=file:///usr/share/doc/gcc-6/README.Bugs 
--enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr 
--with-as=/usr/bin/aarch64-linux-gnu-as --with-ld=/usr/bin/aarch64-linux-gnu-ld 
--program-suffix=-6 --program-prefix=aarch64-linux-gnu- --enable-shared 
--enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext 
--enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ 
--enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes 
--with-default-libstdcxx-abi=new --enable-gnu-unique-object 
--disable-libquadmath --enable-plugin --with-system-zlib 
--disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo 
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-6-arm64/jre --enable-java-home 
--with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-6-arm64 
--with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-6-arm64 
--with-arch-directory=aarch64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar 
--enable-multiarch --enable-fix-cortex-a53-843419 --disable-werror 
--enable-checking=release --build=aarch64-linux-gnu --host=aarch64-linux-gnu 
--target=aarch64-linux-gnu
Thread model: posix
gcc version 6.5.0 20181026 (Ubuntu/Linaro 6.5.0-2ubuntu1~16.04)


Best Regards,
Wei Yanqin



IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
:~/repo/openvswitch$ sudo -E make install -j 20 LDFLAGS="-libverbs"
make  install-recursive
make[1]: Entering directory '/home/wei/repo/openvswitch'
Making install in datapath
make[2]: Entering directory '/home/wei/repo/openvswitch/datapath'
make[3]: Entering directory '/home/wei/repo/openvswitch/datapath'
make[4]: Entering directory '/home/wei/repo/openvswitch/datapath'
make[4]: Nothing to be done for 'install-exec-am'.
make[4]: Nothing to be done for 'install-data-am'.
make[4]: Leaving directory '/home/wei/repo/openvswitch/datapath'
make[3]: Leaving directory '/home/wei/repo/openvswitch/datapath'
make[2]: Leaving directory '/home/wei/repo/openvswitch/datapath'
make[2]: Entering directory '/home/wei/repo/openvswitch'
/bin/bash ./libtool  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I.-I 
./include -I ./include -I ./lib -I ./lib-Wstrict-prototypes -Wall -Wextra 
-Wno-sign-compare -Wpointer-arith -Wformat -Wformat-security -Wswitch-enum 
-Wunused-parameter -Wbad-function-cast -Wcast-align -Wstrict-prototypes 
-Wold-style-definition -Wmissing-prototypes -Wmissing-field-initializers 
-fno-strict-aliasing -Wswitch-bool -Wlogical-not-parentheses 
-Wsizeof-array-argument -Wbool-compare -Wshift-negative-value -Wduplicated-cond 
-Wshadow -Wmultistatement-macros -Wcast-align=strict 
-I/home/wei/dpdk-stable/arm64-armv8a-linuxapp-gcc/include 
-D_FILE_OFFSET_BITS=64  -Wno-unused -Wno-unused-parameter -O3 -march=native -MT 
lib/lib_libsflow_la-sflow_agent.lo -MD -MP -MF 
lib/.deps/lib_libsflow_la-sflow_agent.Tpo -c -o 
lib/lib_libsflow_la-sflow_agent.lo `test -f 'lib/sflow_agent.c' || echo 
'./'`lib/sflow_agent.c
/bin/bash ./libtool  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I.-I 
./include -I ./include -I ./lib -I ./lib-Wstrict-prototypes -Wall -Wextra 
-Wno-sign-compare -Wpointer-arith -Wformat -Wformat-security -Wswitch-enum 
-Wunused-parameter -Wbad-function-cast -Wcast-align -Wstrict-prototypes 
-Wold-style-definition -Wmissing-prototypes -Wmissing-field-initializers 
-fno-strict-aliasing -Wswitch-bool -Wlogical-not-parentheses 
-Wsizeof-array-argument -Wbool-compare -Wshift-negative-value -Wduplicated-cond 
-Wshadow -Wmultistatement-macros -Wcast-align=strict 
-I/home/wei/dpdk-stable/arm64-armv8a-linuxapp-gcc/include 
-D_FILE_OFFSET_BITS=64  -Wno-unused -Wno-unused-parameter -O3 -march=native -MT 
lib/lib_libsflow_la-sflow_sampler.lo -MD -MP -MF 
lib/.deps/lib_libsflow_la-sflow_sampler.Tpo -c -o 
lib/lib_libsflow_la-sflow_sampler.lo `test -f 'lib/sflow_sampler.c' || echo 
'./'`lib/sflow_sampler.c
/bin/bash ./libtool  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I.-I 
./include -I ./include -I ./lib -I ./lib-Wstrict-prototypes -Wall -Wextra 
-Wno-sign-compare -Wpointer-arith -Wformat -Wformat-security -Wswitch-enum 
-Wunused-parameter -Wbad-function-cast -Wcast-align 

Re: [ovs-dev] [PATCHv9] netdev-afxdp: add new netdev type for AF_XDP.

2019-05-26 Thread Yanqin Wei (Arm Technology China)
Hi William,

I think the main objective of this patch is to introduce AF_XDP socket for OVS 
userspace datapath. This is an alternative to DPDK  PMD library.  Is it 
possible to continue using other DPDK libraries before XDP provides the 
corresponding functionality?

Best Regards,
Wei Yanqin

-Original Message-
From: ovs-dev-boun...@openvswitch.org  On 
Behalf Of William Tu
Sent: Saturday, May 25, 2019 4:37 AM
To: Ben Pfaff 
Cc:  ; Ilya Maximets 

Subject: Re: [ovs-dev] [PATCHv9] netdev-afxdp: add new netdev type for AF_XDP.

On Fri, May 24, 2019 at 11:32 AM Ben Pfaff  wrote:
>
> On Fri, May 24, 2019 at 11:03:33AM -0700, William Tu wrote:
> > The patch introduces experimental AF_XDP support for OVS netdev.
> > AF_XDP, the Address Family of the eXpress Data Path, is a new Linux
> > socket type built upon the eBPF and XDP technology.  It is aims to
> > have comparable performance to DPDK but cooperate better with
> > existing kernel's networking stack.  An AF_XDP socket receives and
> > sends packets from an eBPF/XDP program attached to the netdev,
> > by-passing a couple of Linux kernel's subsystems As a result, AF_XDP
> > socket shows much better performance than AF_PACKET For more details
> > about AF_XDP, please see linux kernel's
> > Documentation/networking/af_xdp.rst. Note that by default, this feature is 
> > not compiled in.
> >
> > Signed-off-by: William Tu 
>
> How heavy is the x86(_64)-only dependency?  It seems undesirable. is

Now in my patch, the cycles_counter_update has x86-specific instructions.
Other part of the code has no x86-only dependency.

The reason I made this x86-only is that AF_XDP is rarely tested on
non-x86 system. So I'm not sure whether it works or not.

Regards,
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] dpif-netdev: dfc_process optimization by prefetching EMC entry.

2019-04-01 Thread Yanqin Wei (Arm Technology China)
So the main concern is performance drop for 1~8 flow, because EMC is effective 
here and would be enable.
On the other hand, EMC prefetching does not take effect when EMC is disable, 
and in the case of large number of flows, the EMC is likely to be disabled. So 
the performance drop here can be avoided by configuration.
Is my understanding correct?

Best Regards,
Wei Yanqin

-Original Message-
From: Ilya Maximets  
Sent: Monday, April 1, 2019 3:15 PM
To: Yanqin Wei (Arm Technology China) ; 
d...@openvswitch.org; ian.sto...@intel.com
Cc: nd ; Gavin Hu (Arm Technology China) 
Subject: Re: [ovs-dev][PATCH v3] dpif-netdev: dfc_process optimization by 
prefetching EMC entry.

On 01.04.2019 9:52, Yanqin Wei (Arm Technology China) wrote:
> Hi Ilya,
> 
> Really appreciate your time in doing benchmarking for this patch. And many 
> thanks for your valuable comments.
> It is quite possible that the second unneeded cache line prefetching causes 
> performance drop in case of big number of flows (64K - 256K).
> This impact the real world use case, I will try to improve it . For 1-8 flows 
> cases, it should be not very important because of the lack of actual 
> deployment.

I'm not actually sure which of these cases is more real.
We're suggesting to disable EMC for cases where it's not effective.
>From the other side, if VM encapsulates all the incoming traffic and sends it 
>for further processing, there will be only few outcoming flows.

> 
> Best Regards,
> Wei Yanqin
> 
> -Original Message-
> From: Ilya Maximets 
> Sent: Monday, April 1, 2019 2:16 PM
> To: Yanqin Wei (Arm Technology China) ; 
> d...@openvswitch.org; ian.sto...@intel.com
> Cc: nd ; Gavin Hu (Arm Technology China) 
> 
> Subject: Re: [ovs-dev][PATCH v3] dpif-netdev: dfc_process optimization by 
> prefetching EMC entry.
> 
> On 29.03.2019 20:33, Ilya Maximets wrote:
>> Hi.
>> I made few tests on PVP with bonded PHY setup and found no 
>> significant difference in maximum performance with low and medium 
>> number of flows
>> (8 - 8192).
> 
> Correction: I see the a slight performance drop (~1%) for the cases with very 
> low number of flows (1-8), and a performance increase (~ 2-3%) for the medium 
> low number of flows (64 - 4096).
> In this scenario packets from VM has already calculated hash on 
> recirculation, so the prefetching optimization doesn't work for the first 
> pass through the datapath, but works for the second.
> Degradation on single or very low number of flows probably caused by 
> additional checks and instructions for prefetching the memory that is in 
> cache already.
> 
>> In case of big number of flows (64K - 256K) I see performance drop in 
>> about 2-3%. I think that because of prefetching of a second 
>> cacheline, which is unneeded because current_entry->key.hash likely 
>> != key->hash and we don't need to compare miniflows while emc_lookup. 
>> Changing the code to prefetch the first cacheline only decreases the 
>> drop to ~1% (However, this increases the consumed processing cycles 
>> for medium numbers of flows described below)
> 
> In general, this patch decreases the performance for cases where EMC is not 
> efficient and improves it for cases where EMC is efficient, except the cases 
> with very low numbers of flows, which could be the main concern.
> 
>>
>> OTOH, I see a slight decrease (~1%) of consumed cycles per packet for 
>> the thread that polls HW NICs and send packets to VM, which is good.
>> This improvement observed for a medium small number of flows: 512 - 8192.
>> For a low (1 - 256) and high (64K - 256K) numbers of flows value of 
>> consumed processing cycles per packet fro this thread was not affected by 
>> the patch.
>>
>> Tests made with average 512B packets, EMC enabled, SMC disabled, TX 
>> flushing interval 50us. Note that the bottleneck in this case is the 
>> VM --> bonded PHY part which is the case of 5tuple hash calculation.
>>
>> See review comments inline.
>>
>> Best regards, Ilya Maximets.
>>
>> On 22.03.2019 11:44, Yanqin Wei (Arm Technology China) wrote:
>>> Hi , OVS Maintainers,
>>>
>>> Could you help to have a look at this patch? Thanks a lot.
>>>
>>> Best Regards,
>>> Wei Yanqin
>>>
>>> -Original Message-
>>> From: Yanqin Wei 
>>> Sent: Wednesday, March 13, 2019 1:28 PM
>>> To: d...@openvswitch.org
>>> Cc: nd ; Gavin Hu (Arm Technology China) 
>>> ; Yanqin Wei (Arm Technology China) 
>>> 
>>> Subject: [ovs-dev][PATCH v3] dpif-netdev: dfc_process optimization by 
>>> prefetching EMC entry.
>>>
>>

Re: [ovs-dev] [PATCH v3] dpif-netdev: dfc_process optimization by prefetching EMC entry.

2019-04-01 Thread Yanqin Wei (Arm Technology China)
Hi Ilya,

Really appreciate your time in doing benchmarking for this patch. And many 
thanks for your valuable comments.
It is quite possible that the second unneeded cache line prefetching causes 
performance drop in case of big number of flows (64K - 256K).
This impact the real world use case, I will try to improve it . For 1-8 flows 
cases, it should be not very important because of the lack of actual deployment.

Best Regards,
Wei Yanqin

-Original Message-
From: Ilya Maximets  
Sent: Monday, April 1, 2019 2:16 PM
To: Yanqin Wei (Arm Technology China) ; 
d...@openvswitch.org; ian.sto...@intel.com
Cc: nd ; Gavin Hu (Arm Technology China) 
Subject: Re: [ovs-dev][PATCH v3] dpif-netdev: dfc_process optimization by 
prefetching EMC entry.

On 29.03.2019 20:33, Ilya Maximets wrote:
> Hi.
> I made few tests on PVP with bonded PHY setup and found no significant 
> difference in maximum performance with low and medium number of flows
> (8 - 8192).

Correction: I see the a slight performance drop (~1%) for the cases with very 
low number of flows (1-8), and a performance increase (~ 2-3%) for the medium 
low number of flows (64 - 4096).
In this scenario packets from VM has already calculated hash on recirculation, 
so the prefetching optimization doesn't work for the first pass through the 
datapath, but works for the second.
Degradation on single or very low number of flows probably caused by additional 
checks and instructions for prefetching the memory that is in cache already.

> In case of big number of flows (64K - 256K) I see performance drop in 
> about 2-3%. I think that because of prefetching of a second cacheline, 
> which is unneeded because current_entry->key.hash likely != key->hash 
> and we don't need to compare miniflows while emc_lookup. Changing the 
> code to prefetch the first cacheline only decreases the drop to ~1% 
> (However, this increases the consumed processing cycles for medium 
> numbers of flows described below)

In general, this patch decreases the performance for cases where EMC is not 
efficient and improves it for cases where EMC is efficient, except the cases 
with very low numbers of flows, which could be the main concern.

> 
> OTOH, I see a slight decrease (~1%) of consumed cycles per packet for 
> the thread that polls HW NICs and send packets to VM, which is good.
> This improvement observed for a medium small number of flows: 512 - 8192.
> For a low (1 - 256) and high (64K - 256K) numbers of flows value of 
> consumed processing cycles per packet fro this thread was not affected by the 
> patch.
> 
> Tests made with average 512B packets, EMC enabled, SMC disabled, TX 
> flushing interval 50us. Note that the bottleneck in this case is the 
> VM --> bonded PHY part which is the case of 5tuple hash calculation.
> 
> See review comments inline.
> 
> Best regards, Ilya Maximets.
> 
> On 22.03.2019 11:44, Yanqin Wei (Arm Technology China) wrote:
>> Hi , OVS Maintainers,
>>
>> Could you help to have a look at this patch? Thanks a lot.
>>
>> Best Regards,
>> Wei Yanqin
>>
>> -Original Message-
>> From: Yanqin Wei 
>> Sent: Wednesday, March 13, 2019 1:28 PM
>> To: d...@openvswitch.org
>> Cc: nd ; Gavin Hu (Arm Technology China) 
>> ; Yanqin Wei (Arm Technology China) 
>> 
>> Subject: [ovs-dev][PATCH v3] dpif-netdev: dfc_process optimization by 
>> prefetching EMC entry.
>>
>> It is observed that the throughput of multi-flow is worse than single-flow 
>> in the EMC NIC to NIC cases. It is because CPU cache-miss increasing in EMC 
>> lookup. Each flow need load at least one EMC entry to CPU cache(several 
>> cache lines) and compare it with packet miniflow.
>> This patch improve it by prefetching EMC entry in advance. Hash value 
>> can be obtained from dpdk rss hash, so this step can be advanced 
>> ahead of
>> miniflow_extract() and prefetch EMC entry there. The prefetching size is 
>> defined as ROUND_UP(128,CACHE_LINE_SIZE), which can cover majority traffic 
>> including TCP/UDP protocol and need 2 cache lines in most modern CPU.
>> Performance test was run in some arm platform. 1000/1 flows NIC2NIC test 
>> achieved around 10% throughput improvement in thunderX2(aarch64 platform).
>>
>> Signed-off-by: Yanqin Wei 
>> Reviewed-by: Gavin Hu 
>> ---
>>  lib/dpif-netdev.c | 80 
>> ---
>>  1 file changed, 52 insertions(+), 28 deletions(-)
>>
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 
>> 4d6d0c3..982082c 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -189,6 +189,10 @@ struct netdev_flow_key {
>>  #define DEFAULT_EM_FLOW_INSERT

Re: [ovs-dev] [PATCH v3] dpif-netdev: dfc_process optimization by prefetching EMC entry.

2019-03-25 Thread Yanqin Wei (Arm Technology China)
Hi Ilya,

Thanks for your reply. We could have a look at test results in x86 by then.
I can understand patch 1054571. If both patches are apply to master, I could 
rebase prefetch EMC patch for it.
Mandatory hash computing in the ingress makes logic simple a lot, and it only 
costs a small price even in worst case(EMC/SMC disable & no hash-feature/load 
balance enable).

Best Regards,
Wei Yanqin

-Original Message-
From: Ilya Maximets  
Sent: Friday, March 22, 2019 9:12 PM
To: Yanqin Wei (Arm Technology China) ; 
d...@openvswitch.org; ian.sto...@intel.com
Cc: nd 
Subject: Re: [ovs-dev][PATCH v3] dpif-netdev: dfc_process optimization by 
prefetching EMC entry.

On 22.03.2019 11:44, Yanqin Wei (Arm Technology China) wrote:
> Hi , OVS Maintainers,
> 
> Could you help to have a look at this patch? Thanks a lot.

Hi. Thanks for improving performance and sorry for delay. Review process here 
in OVS is a bit slow due to lack of reviewers.

I have a plan to test this patch a bit on a next week. Want to check the 
performance impact on PVP cases on x86.

BTW, I have a patch that affects same code. Maybe it'll be interesting to you: 
https://patchwork.ozlabs.org/patch/1054571/

Best regards, Ilya Maximets.

> 
> Best Regards,
> Wei Yanqin
> 
> -Original Message-
> From: Yanqin Wei 
> Sent: Wednesday, March 13, 2019 1:28 PM
> To: d...@openvswitch.org
> Cc: nd ; Gavin Hu (Arm Technology China) 
> ; Yanqin Wei (Arm Technology China) 
> 
> Subject: [ovs-dev][PATCH v3] dpif-netdev: dfc_process optimization by 
> prefetching EMC entry.
> 
> It is observed that the throughput of multi-flow is worse than single-flow in 
> the EMC NIC to NIC cases. It is because CPU cache-miss increasing in EMC 
> lookup. Each flow need load at least one EMC entry to CPU cache(several cache 
> lines) and compare it with packet miniflow.
> This patch improve it by prefetching EMC entry in advance. Hash value 
> can be obtained from dpdk rss hash, so this step can be advanced ahead 
> of
> miniflow_extract() and prefetch EMC entry there. The prefetching size is 
> defined as ROUND_UP(128,CACHE_LINE_SIZE), which can cover majority traffic 
> including TCP/UDP protocol and need 2 cache lines in most modern CPU.
> Performance test was run in some arm platform. 1000/1 flows NIC2NIC test 
> achieved around 10% throughput improvement in thunderX2(aarch64 platform).
> 
> Signed-off-by: Yanqin Wei 
> Reviewed-by: Gavin Hu 
> ---
>  lib/dpif-netdev.c | 80 
> ---
>  1 file changed, 52 insertions(+), 28 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 
> 4d6d0c3..982082c 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -189,6 +189,10 @@ struct netdev_flow_key {
>  #define DEFAULT_EM_FLOW_INSERT_MIN (UINT32_MAX / \
>  DEFAULT_EM_FLOW_INSERT_INV_PROB)
>  
> +/* DEFAULT_EMC_PREFETCH_SIZE can cover majority traffic including 
> +TCP/UDP
> + * protocol. */
> +#define DEFAULT_EMC_PREFETCH_SIZE ROUND_UP(128,CACHE_LINE_SIZE)
> +
>  struct emc_entry {
>  struct dp_netdev_flow *flow;
>  struct netdev_flow_key key;   /* key.hash used for emc hash value. */
> @@ -6166,15 +6170,20 @@ dp_netdev_upcall(struct dp_netdev_pmd_thread 
> *pmd, struct dp_packet *packet_,  }
>  
>  static inline uint32_t
> -dpif_netdev_packet_get_rss_hash_orig_pkt(struct dp_packet *packet,
> -const struct miniflow *mf)
> +dpif_netdev_packet_get_packet_rss_hash(struct dp_packet *packet,
> +bool md_is_valid)
>  {
> -uint32_t hash;
> +uint32_t hash,recirc_depth;
>  
> -if (OVS_LIKELY(dp_packet_rss_valid(packet))) {
> -hash = dp_packet_get_rss_hash(packet);
> -} else {
> -hash = miniflow_hash_5tuple(mf, 0);
> +hash = dp_packet_get_rss_hash(packet);
> +
> +if (md_is_valid) {
> +/* The RSS hash must account for the recirculation depth to avoid
> + * collisions in the exact match cache */
> +recirc_depth = *recirc_depth_get_unsafe();
> +if (OVS_UNLIKELY(recirc_depth)) {
> +hash = hash_finish(hash, recirc_depth);
> +}
>  dp_packet_set_rss_hash(packet, hash);
>  }
>  
> @@ -6182,24 +6191,23 @@ 
> dpif_netdev_packet_get_rss_hash_orig_pkt(struct dp_packet *packet,  }
>  
>  static inline uint32_t
> -dpif_netdev_packet_get_rss_hash(struct dp_packet *packet,
> -const struct miniflow *mf)
> +dpif_netdev_packet_get_hash_5tuple(struct dp_packet *packet,
> +const struct miniflow *mf,
> +

Re: [ovs-dev] [PATCH v3] dpif-netdev: dfc_process optimization by prefetching EMC entry.

2019-03-24 Thread Yanqin Wei (Arm Technology China)
Hi Ian,

I also observed a minor throughput drop (around 1%) with single flow in arm 
platform, but not 25% drop.  Maybe the additional prefetch operation cause it.
Anyway, when you come back next week, let's discuss this patch again.   

Best Regards,
Wei Yanqin

-Original Message-
From: Ian Stokes  
Sent: Monday, March 25, 2019 6:16 AM
To: Yanqin Wei (Arm Technology China) ; d...@openvswitch.org
Cc: nd ; Gavin Hu (Arm Technology China) ; Ilya 
Maximets 
Subject: Re: [ovs-dev] [PATCH v3] dpif-netdev: dfc_process optimization by 
prefetching EMC entry.

On 3/13/2019 5:27 AM, Yanqin Wei wrote:
> It is observed that the throughput of multi-flow is worse than 
> single-flow in the EMC NIC to NIC cases. It is because CPU cache-miss 
> increasing in EMC lookup. Each flow need load at least one EMC entry 
> to CPU cache(several cache lines) and compare it with packet miniflow.
> This patch improve it by prefetching EMC entry in advance. Hash value 
> can be obtained from dpdk rss hash, so this step can be advanced ahead 
> of
> miniflow_extract() and prefetch EMC entry there. The prefetching size 
> is defined as ROUND_UP(128,CACHE_LINE_SIZE), which can cover majority 
> traffic including TCP/UDP protocol and need 2 cache lines in most modern CPU.
> Performance test was run in some arm platform. 1000/1 flows 
> NIC2NIC test achieved around 10% throughput improvement in 
> thunderX2(aarch64 platform).
> 

Thanks for this Wei, not a few review, please see some minor comments below WRT 
style issues.

I've also run some benchmarks on this. I was seeing typically a ~3% drop on x86 
with single flows with RFC2544. However once or twice, I saw a drop of up to 
25% on achievable lossless packet rate but I suspect it could be an anomaly in 
my setup.

Ilya, if you are testing this week on x86, it would be great you confirm if you 
see something similar in your benchmarks?

For vsperf phy2phy_scalability flow tests on x86 I saw an improvement of 
+3% after applying the patch for zero loss tests and +5% in the case of
phy2phy_scalability_cont so this looks promising.

As an FYI I'll be I'm out of office this coming week so will not have an 
opportunity to investigate further until I'm back in office. I'll be 
able to review and benchmark further then.


> Signed-off-by: Yanqin Wei 
> Reviewed-by: Gavin Hu 
Although it doesn't appear here or in patchwork, after downloading the 
patch the sign off and review tags above appear duplicated after being 
applied. Examining the mbox I can confirm they are duplicated, can you 
check this on your side also?

> ---
>   lib/dpif-netdev.c | 80 
> ---
>   1 file changed, 52 insertions(+), 28 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 4d6d0c3..982082c 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -189,6 +189,10 @@ struct netdev_flow_key {
>   #define DEFAULT_EM_FLOW_INSERT_MIN (UINT32_MAX / \
>   DEFAULT_EM_FLOW_INSERT_INV_PROB)
>   
> +/* DEFAULT_EMC_PREFETCH_SIZE can cover majority traffic including TCP/UDP
> + * protocol. */
> +#define DEFAULT_EMC_PREFETCH_SIZE ROUND_UP(128,CACHE_LINE_SIZE)
> +
>   struct emc_entry {
>   struct dp_netdev_flow *flow;
>   struct netdev_flow_key key;   /* key.hash used for emc hash value. */
> @@ -6166,15 +6170,20 @@ dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd, 
> struct dp_packet *packet_,
>   }
>   
>   static inline uint32_t
> -dpif_netdev_packet_get_rss_hash_orig_pkt(struct dp_packet *packet,
> -const struct miniflow *mf)
> +dpif_netdev_packet_get_packet_rss_hash(struct dp_packet *packet,
> +bool md_is_valid)
>   {
> -uint32_t hash;
> +uint32_t hash,recirc_depth;
>   
> -if (OVS_LIKELY(dp_packet_rss_valid(packet))) {
> -hash = dp_packet_get_rss_hash(packet);
> -} else {
> -hash = miniflow_hash_5tuple(mf, 0);
> +hash = dp_packet_get_rss_hash(packet);
> +
> +if (md_is_valid) {
> +/* The RSS hash must account for the recirculation depth to avoid
> + * collisions in the exact match cache */
Minor, comment style, missing period at end of comment.

> +recirc_depth = *recirc_depth_get_unsafe();
> +if (OVS_UNLIKELY(recirc_depth)) {
> +hash = hash_finish(hash, recirc_depth);
> +}
>   dp_packet_set_rss_hash(packet, hash);
>   }
>   
> @@ -6182,24 +6191,23 @@ dpif_netdev_packet_get_rss_hash_orig_pkt(struct 
> dp_packet *packet,
>   }
>   
>   static inline uint32_t
> -dpif_netdev_packet_get_rss_hash(struct dp_packet *packet,
> -const struct miniflow *mf)
>

Re: [ovs-dev] [PATCH v3] dpif-netdev: dfc_process optimization by prefetching EMC entry.

2019-03-22 Thread Yanqin Wei (Arm Technology China)
Hi , OVS Maintainers,

Could you help to have a look at this patch? Thanks a lot.

Best Regards,
Wei Yanqin

-Original Message-
From: Yanqin Wei  
Sent: Wednesday, March 13, 2019 1:28 PM
To: d...@openvswitch.org
Cc: nd ; Gavin Hu (Arm Technology China) ; 
Yanqin Wei (Arm Technology China) 
Subject: [ovs-dev][PATCH v3] dpif-netdev: dfc_process optimization by 
prefetching EMC entry.

It is observed that the throughput of multi-flow is worse than single-flow in 
the EMC NIC to NIC cases. It is because CPU cache-miss increasing in EMC 
lookup. Each flow need load at least one EMC entry to CPU cache(several cache 
lines) and compare it with packet miniflow.
This patch improve it by prefetching EMC entry in advance. Hash value can be 
obtained from dpdk rss hash, so this step can be advanced ahead of
miniflow_extract() and prefetch EMC entry there. The prefetching size is 
defined as ROUND_UP(128,CACHE_LINE_SIZE), which can cover majority traffic 
including TCP/UDP protocol and need 2 cache lines in most modern CPU.
Performance test was run in some arm platform. 1000/1 flows NIC2NIC test 
achieved around 10% throughput improvement in thunderX2(aarch64 platform).

Signed-off-by: Yanqin Wei 
Reviewed-by: Gavin Hu 
---
 lib/dpif-netdev.c | 80 ---
 1 file changed, 52 insertions(+), 28 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 4d6d0c3..982082c 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -189,6 +189,10 @@ struct netdev_flow_key {
 #define DEFAULT_EM_FLOW_INSERT_MIN (UINT32_MAX / \
 DEFAULT_EM_FLOW_INSERT_INV_PROB)
 
+/* DEFAULT_EMC_PREFETCH_SIZE can cover majority traffic including 
+TCP/UDP
+ * protocol. */
+#define DEFAULT_EMC_PREFETCH_SIZE ROUND_UP(128,CACHE_LINE_SIZE)
+
 struct emc_entry {
 struct dp_netdev_flow *flow;
 struct netdev_flow_key key;   /* key.hash used for emc hash value. */
@@ -6166,15 +6170,20 @@ dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd, 
struct dp_packet *packet_,  }
 
 static inline uint32_t
-dpif_netdev_packet_get_rss_hash_orig_pkt(struct dp_packet *packet,
-const struct miniflow *mf)
+dpif_netdev_packet_get_packet_rss_hash(struct dp_packet *packet,
+bool md_is_valid)
 {
-uint32_t hash;
+uint32_t hash,recirc_depth;
 
-if (OVS_LIKELY(dp_packet_rss_valid(packet))) {
-hash = dp_packet_get_rss_hash(packet);
-} else {
-hash = miniflow_hash_5tuple(mf, 0);
+hash = dp_packet_get_rss_hash(packet);
+
+if (md_is_valid) {
+/* The RSS hash must account for the recirculation depth to avoid
+ * collisions in the exact match cache */
+recirc_depth = *recirc_depth_get_unsafe();
+if (OVS_UNLIKELY(recirc_depth)) {
+hash = hash_finish(hash, recirc_depth);
+}
 dp_packet_set_rss_hash(packet, hash);
 }
 
@@ -6182,24 +6191,23 @@ dpif_netdev_packet_get_rss_hash_orig_pkt(struct 
dp_packet *packet,  }
 
 static inline uint32_t
-dpif_netdev_packet_get_rss_hash(struct dp_packet *packet,
-const struct miniflow *mf)
+dpif_netdev_packet_get_hash_5tuple(struct dp_packet *packet,
+const struct miniflow *mf,
+bool md_is_valid)
 {
-uint32_t hash, recirc_depth;
+uint32_t hash,recirc_depth;
 
-if (OVS_LIKELY(dp_packet_rss_valid(packet))) {
-hash = dp_packet_get_rss_hash(packet);
-} else {
-hash = miniflow_hash_5tuple(mf, 0);
-dp_packet_set_rss_hash(packet, hash);
-}
+hash = miniflow_hash_5tuple(mf, 0);
+dp_packet_set_rss_hash(packet, hash);
 
-/* The RSS hash must account for the recirculation depth to avoid
- * collisions in the exact match cache */
-recirc_depth = *recirc_depth_get_unsafe();
-if (OVS_UNLIKELY(recirc_depth)) {
-hash = hash_finish(hash, recirc_depth);
-dp_packet_set_rss_hash(packet, hash);
+if (md_is_valid) {
+/* The RSS hash must account for the recirculation depth to avoid
+ * collisions in the exact match cache */
+recirc_depth = *recirc_depth_get_unsafe();
+if (OVS_UNLIKELY(recirc_depth)) {
+hash = hash_finish(hash, recirc_depth);
+dp_packet_set_rss_hash(packet, hash);
+}
 }
 return hash;
 }
@@ -6390,6 +6398,7 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
 bool smc_enable_db;
 size_t map_cnt = 0;
 bool batch_enable = true;
+bool is_5tuple_hash_needed;
 
 atomic_read_relaxed(>dp->smc_enable_db, _enable_db);
 pmd_perf_update_counter(>perf_stats,
@@ -6436,16 +6445,31 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
 }
 }
 
-miniflow_extract(packet, >mf);
-key->len = 0; /* Not computed yet. */
 /* If EMC and SMC disabled skip h

Re: [ovs-dev] [ovs-dev, v2] netdev-dpdk: dfc_process optimization by

2019-03-12 Thread Yanqin Wei (Arm Technology China)
Thanks for the comments. It will be fixed in the following patch v3.

-Original Message-
From: Ilya Maximets  
Sent: Tuesday, March 12, 2019 9:31 PM
To: Yanqin Wei (Arm Technology China) ; d...@openvswitch.org
Cc: nd ; Gavin Hu (Arm Technology China) 
Subject: Re: [ovs-dev,v2] netdev-dpdk: dfc_process optimization by

Hi.

Thanks for working on this.
Not a full review, just a few notes about formatting.

1. Looks like your subject line was accidentally cropped.
2. This change is local to generic parts of 'dpif-netdev', so, the "area" in
   a subject line should be 'dpif-netdev'. There is nothing DPDK specific here.

On 11.03.2019 14:44, Yanqin Wei wrote:
> It is observed that the throughput of multi-flow is worse than 
> single-flow in the EMC NIC to NIC cases. It is because CPU cache-miss 
> increasing in EMC lookup. Each flow need load at least one EMC entry 
> to CPU cache(several cache lines) and compare it with packet miniflow.
> This patch improve it by prefetching EMC entry in advance. Hash value 
> can be obtained from dpdk rss hash, so this step can be advanced ahead 
> of
> miniflow_extract() and prefetch EMC entry there. The prefetching size 
> is defined as ROUND_UP(128,CACHE_LINE_SIZE), which can cover majority 
> traffic including TCP/UDP protocol and need 2 cache lines in most modern CPU.
> Performance test was run in some arm platform. 1000/1 flows 
> NIC2NIC test achieved around 10% throughput improvement in 
> thunderX2(aarch64 platform).
> 
> Signed-off-by: Yanqin Wei 
> Reviewed-by: Gavin Hu 
> ---
>  lib/dpif-netdev.c | 80 
> ---
>  1 file changed, 52 insertions(+), 28 deletions(-)  mode change 100644 
> => 100755 lib/dpif-netdev.c
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c old mode 100644 new 
> mode 100755

3. Please, don't change the file mode.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1 1/1] hash: Enable hash_bytes128 optimization for aarch64.

2019-02-26 Thread Yanqin Wei (Arm Technology China)
Hi Ben,

Sorry for the format issue. I will resubmit it by "git send-email".

Best Regards,
Wei Yanqin

-Original Message-
From: Ben Pfaff 
Sent: Saturday, February 23, 2019 5:03 AM
To: Yanqin Wei (Arm Technology China) 
Cc: d...@openvswitch.org
Subject: Re: [ovs-dev] [PATCH v1 1/1] hash: Enable hash_bytes128 optimization 
for aarch64.

On Mon, Feb 18, 2019 at 05:46:01AM +0000, Yanqin Wei (Arm Technology China) 
wrote:
> "hash_bytes128" has two versions for 64 bits and 32 bits system. This
> should be common optimization for their respective platforms.
> But 64 bits version was only enabled in x86-64. This patch enable it
> for
> aarch64 platform.
> Micro benchmarking test was run in two kinds of arm platform.  It was
> observed that 50% performance improvement in thunderX2 and 40%
> improvement in TaiShan(Cortex-A72).
>
> Signed-off-by: Yanqin Wei 

Thanks for working to make OVS better.

This patch is whitespace damaged.  For instance, lines that should begin with 
spaces lack them.  I cannot apply it.

You can send the patch another way (for example, with "git send-email") or use 
a Github pull request.
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v1 1/1] hash: Enable hash_bytes128 optimization for aarch64.

2019-02-17 Thread Yanqin Wei (Arm Technology China)
"hash_bytes128" has two versions for 64 bits and 32 bits system. This
should be common optimization for their respective platforms.
But 64 bits version was only enabled in x86-64. This patch enable it for
aarch64 platform.
Micro benchmarking test was run in two kinds of arm platform.  It was
observed that 50% performance improvement in thunderX2 and 40%
improvement in TaiShan(Cortex-A72).

Signed-off-by: Yanqin Wei 

---
lib/hash.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/hash.c b/lib/hash.c
index c64f25e..06f8339 100644
--- a/lib/hash.c
+++ b/lib/hash.c
@@ -72,7 +72,7 @@ hash_words64__(const uint64_t p[], size_t n_words, uint32_t 
basis)
 return hash_words64_inline(p, n_words, basis);
}

-#if !(defined(__x86_64__))
+#if !(defined(__x86_64__)) && !(defined(__aarch64__))
void
hash_bytes128(const void *p_, size_t len, uint32_t basis, ovs_u128 *out)
{
@@ -233,7 +233,7 @@ hash_bytes128(const void *p_, size_t len, uint32_t basis, 
ovs_u128 *out)
 out->u32[3] = h4;
}

-#else /* __x86_64__ */
+#else /* __x86_64__ or __aarch64__*/

static inline uint64_t
hash_rot64(uint64_t x, int8_t r)
@@ -361,4 +361,4 @@ hash_bytes128(const void *p_, size_t len, uint32_t basis, 
ovs_u128 *out)
 out->u64.lo = h1;
 out->u64.hi = h2;
}
-#endif /* __x86_64__ */
+#endif /* __x86_64__ or __aarch64__*/
--
2.7.4
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v1 1/1] hash: Implement hash for aarch64 using CRC32c intrinsics.

2019-01-25 Thread Yanqin Wei (Arm Technology China)
This commit adds lib/hash-aarch64.h to implement hash for aarch64.
It is based on aarch64 built-in CRC32c intrinsics, which accelerates
hash function for datapath performance.

test:
1. "test-hash" case passed in aarch64 platform.
2.  OVS-DPDK datapth performance test was run(NIC to NIC).
Test bed: aarch64(Centriq 2400) platform.
Test case: DPCLS forwarding(disable EMC + avg 10 subtable lookups)
Test result: improve around 10%.

Signed-off-by: Yanqin Wei 

---
lib/automake.mk|   1 +
lib/hash-aarch64.h | 151 +
lib/hash.h |   5 +-
3 files changed, 156 insertions(+), 1 deletion(-)
create mode 100644 lib/hash-aarch64.h

diff --git a/lib/automake.mk b/lib/automake.mk
index b1ff495ff..ba1041095 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -100,6 +100,7 @@ lib_libopenvswitch_la_SOURCES = \
lib/guarded-list.h \
lib/hash.c \
lib/hash.h \
+   lib/hash-aarch64.h \
lib/hindex.c \
lib/hindex.h \
lib/hmap.c \
diff --git a/lib/hash-aarch64.h b/lib/hash-aarch64.h
new file mode 100644
index 0..6993e2a66
--- /dev/null
+++ b/lib/hash-aarch64.h
@@ -0,0 +1,151 @@
+/*
+ * Copyright (c) 2019 Arm Limited
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/* This header implements HASH operation primitives on aarch64. */
+#ifndef HASH_AARCH64_H
+#define HASH_AARCH64_H 1
+
+#ifndef HASH_H
+#error "This header should only be included indirectly via hash.h."
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include 
+
+static inline uint32_t hash_add(uint32_t hash, uint32_t data)
+{
+return __crc32cw(hash, data);
+}
+
+/* Add the halves of 'data' in the memory order. */
+static inline uint32_t hash_add64(uint32_t hash, uint64_t data)
+{
+return __crc32cd(hash, data);
+}
+
+static inline uint32_t hash_finish(uint32_t hash, uint64_t final)
+{
+/* The finishing multiplier 0x805204f3 has been experimentally
+ * derived to pass the testsuite hash tests. */
+hash = __crc32cd(hash, final) * 0x805204f3;
+return hash ^ hash >> 16; /* Increase entropy in LSBs. */
+}
+
+/* Returns the hash of the 'n' 32-bit words at 'p_', starting from 'basis'.
+ * We access 'p_' as a uint64_t pointer.
+ *
+ * This is inlined for the compiler to have access to the 'n_words', which
+ * in many cases is a constant. */
+static inline uint32_t
+hash_words_inline(const uint32_t p_[], size_t n_words, uint32_t basis)
+{
+const uint64_t *p = (const void *)p_;
+uint32_t hash1 = basis;
+uint32_t hash2 = 0;
+uint32_t hash3 = n_words;
+const uint32_t *endp = (const uint32_t *)p + n_words;
+const uint64_t *limit = p + n_words / 2 - 3;
+
+while (p <= limit) {
+hash1 = __crc32cd(hash1, p[0]);
+hash2 = __crc32cd(hash2, p[1]);
+hash3 = __crc32cd(hash3, p[2]);
+p += 3;
+}
+switch (endp - (const uint32_t *)p) {
+case 1:
+hash1 = __crc32cw(hash1, *(const uint32_t *)[0]);
+break;
+case 2:
+hash1 = __crc32cd(hash1, p[0]);
+break;
+case 3:
+hash1 = __crc32cd(hash1, p[0]);
+hash2 = __crc32cw(hash2, *(const uint32_t *)[1]);
+break;
+case 4:
+hash1 = __crc32cd(hash1, p[0]);
+hash2 = __crc32cd(hash2, p[1]);
+break;
+case 5:
+hash1 = __crc32cd(hash1, p[0]);
+hash2 = __crc32cd(hash2, p[1]);
+hash3 = __crc32cw(hash3, *(const uint32_t *)[2]);
+break;
+}
+return hash_finish(hash1, (uint64_t)hash2 << 32 | hash3);
+}
+
+/* A simpler version for 64-bit data.
+ * 'n_words' is the count of 64-bit words, basis is 64 bits. */
+static inline uint32_t
+hash_words64_inline(const uint64_t p[], size_t n_words, uint32_t basis)
+{
+uint32_t hash1 = basis;
+uint32_t hash2 = 0;
+uint32_t hash3 = n_words;
+const uint64_t *endp = p + n_words;
+const uint64_t *limit = endp - 3;
+
+while (p <= limit) {
+hash1 = __crc32cd(hash1, p[0]);
+hash2 = __crc32cd(hash2, p[1]);
+hash3 = __crc32cd(hash3, p[2]);
+p += 3;
+}
+switch (endp - p) {
+case 1:
+hash1 = __crc32cd(hash1, p[0]);
+break;
+case 2:
+hash1 = __crc32cd(hash1, p[0]);
+hash2 = __crc32cd(hash2, p[1]);
+break;
+}
+return hash_finish(hash1, (uint64_t)hash2 << 32 | hash3);
+}
+
+static inline uint32_t hash_uint64_basis(const uint64_t x,
+