[dpdk-dev] One pkt in mbuf chain - virtio pmd driver

2014-08-07 Thread Czaus, Tomasz
Hello,

Does virtio pmd driver support scenario when a frame fits in mbuf chain, this 
means all headers (eth/ipv4/tcp) are located in first mbuf and user data is 
located in next mbuf. I have asked the same question on dpdk-ovs mailing group, 
here is a thread and more details:

https://lists.01.org/pipermail/dpdk-ovs/2014-August/001557.html

Best Regards,
Tomasz Czaus



Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial 
Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | 
Kapital zakladowy 200.000 PLN.

Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i 
moze zawierac informacje poufne. W razie przypadkowego otrzymania tej 
wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; 
jakiekolwiek
przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole 
use of the intended recipient(s). If you are not the intended recipient, please 
contact the sender and delete all copies; any review or distribution by
others is strictly prohibited.


[dpdk-dev] One pkt in mbuf chain - virtio pmd driver

2014-08-07 Thread Xie, Huawei
Hi Tomasz:
This is a known issue in user space vhost. Will be fixed in subsequent patch 
once the vhost lib is applied.

BR.
-Huawei
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Czaus, Tomasz
> Sent: Thursday, August 07, 2014 2:20 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] One pkt in mbuf chain - virtio pmd driver
> 
> Hello,
> 
> Does virtio pmd driver support scenario when a frame fits in mbuf chain, this
> means all headers (eth/ipv4/tcp) are located in first mbuf and user data is
> located in next mbuf. I have asked the same question on dpdk-ovs mailing 
> group,
> here is a thread and more details:
> 
> https://lists.01.org/pipermail/dpdk-ovs/2014-August/001557.html
> 
> Best Regards,
> Tomasz Czaus
> 
> 
> 
> Intel Technology Poland sp. z o.o.
> ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII
> Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-
> 52-316 | Kapital zakladowy 200.000 PLN.
> 
> Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i
> moze zawierac informacje poufne. W razie przypadkowego otrzymania tej
> wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie;
> jakiekolwiek
> przegladanie lub rozpowszechnianie jest zabronione.
> This e-mail and any attachments may contain confidential material for the sole
> use of the intended recipient(s). If you are not the intended recipient, 
> please
> contact the sender and delete all copies; any review or distribution by
> others is strictly prohibited.


[dpdk-dev] One pkt in mbuf chain - virtio pmd driver

2014-08-07 Thread Ouyang, Changchun
It is one feature in developing,
scatter and mergeable RX will be support in virtio subsequent patch.
Thanks 
Changchun


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Xie, Huawei
> Sent: Thursday, August 7, 2014 3:07 PM
> To: Czaus, Tomasz; dev at dpdk.org
> Subject: Re: [dpdk-dev] One pkt in mbuf chain - virtio pmd driver
> 
> Hi Tomasz:
> This is a known issue in user space vhost. Will be fixed in subsequent patch
> once the vhost lib is applied.
> 
> BR.
> -Huawei
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Czaus, Tomasz
> > Sent: Thursday, August 07, 2014 2:20 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] One pkt in mbuf chain - virtio pmd driver
> >
> > Hello,
> >
> > Does virtio pmd driver support scenario when a frame fits in mbuf
> > chain, this means all headers (eth/ipv4/tcp) are located in first mbuf
> > and user data is located in next mbuf. I have asked the same question
> > on dpdk-ovs mailing group, here is a thread and more details:
> >
> > https://lists.01.org/pipermail/dpdk-ovs/2014-August/001557.html
> >
> > Best Regards,
> > Tomasz Czaus
> >
> > 
> >
> > Intel Technology Poland sp. z o.o.
> > ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII
> > Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP
> > 957-07-
> > 52-316 | Kapital zakladowy 200.000 PLN.
> >
> > Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego
> > adresata i moze zawierac informacje poufne. W razie przypadkowego
> > otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale
> > jej usuniecie; jakiekolwiek przegladanie lub rozpowszechnianie jest
> > zabronione.
> > This e-mail and any attachments may contain confidential material for
> > the sole use of the intended recipient(s). If you are not the intended
> > recipient, please contact the sender and delete all copies; any review
> > or distribution by others is strictly prohibited.


[dpdk-dev] [PATCH v3] lib/librte_vhost: user space vhost driver library

2014-08-07 Thread Cao, Waterman
Tested-by: Waterman Cao  

This patch facilitates integration with DPDK vSwitch, and is ready to integrate 
into DPDK.org.

-Original Message-
>From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Huawei Xie
>Sent: Tuesday, August 5, 2014 11:54 PM
>To: dev at dpdk.org
>Subject: [dpdk-dev] [PATCH v3] lib/librte_vhost: user space vhost driver 
>library
>
>This user space vhost library is provided aiming to facilitate integration 
>with DPDK accelerated vswitch. 
>
>Huawei Xie (1):
>  vhost library support to facilitate integration with DPDK accelerated 
> vswitch.
>
> config/common_linuxapp   |7 +
> lib/Makefile |1 +
> lib/librte_vhost/Makefile|   48 ++
> lib/librte_vhost/eventfd_link/Makefile   |   39 +
> lib/librte_vhost/eventfd_link/eventfd_link.c |  194 +
> lib/librte_vhost/eventfd_link/eventfd_link.h |   40 +
> lib/librte_vhost/rte_virtio_net.h|  192 +
> lib/librte_vhost/vhost-net-cdev.c|  363 ++
> lib/librte_vhost/vhost-net-cdev.h|  109 +++
> lib/librte_vhost/vhost_rxtx.c|  292 
> lib/librte_vhost/virtio-net.c| 1002 ++
> 11 files changed, 2287 insertions(+)
> create mode 100644 lib/librte_vhost/Makefile  create mode 100644 
> lib/librte_vhost/eventfd_link/Makefile
> create mode 100644 lib/librte_vhost/eventfd_link/eventfd_link.c
> create mode 100644 lib/librte_vhost/eventfd_link/eventfd_link.h
> create mode 100644 lib/librte_vhost/rte_virtio_net.h  create mode 100644 
> lib/librte_vhost/vhost-net-cdev.c  create mode 100644 
> lib/librte_vhost/vhost-net-cdev.h  create mode 100644 
> lib/librte_vhost/vhost_rxtx.c  create mode 100644 
> lib/librte_vhost/virtio-net.c
>
>--
>1.8.1.4
>


[dpdk-dev] [PATCH 0/3] vhost example based on user space vhost library.

2014-08-07 Thread Cao, Waterman
Tested-by: Waterman Cao  
This patch implements a simple vswitch by user vhost library, and is ready to 
integrate into DPDK.org.

-Original Message-
>From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Huawei Xie
>Sent: Tuesday, August 5, 2014 11:58 PM
>To: dev at dpdk.org
>Subject: [dpdk-dev] [PATCH 0/3] vhost example based on user space vhost 
>library.
>
>This vhost example implements a simple vswitch using DPDK user space vhost 
>library(lib/librte_vhost) and VMDQ to demonstrate vhost's performance.
>- Each virtio device is bound to a VMDQ pool and each pool is assigned the 
>mac/vlan of the virtio device.
>- Packets arriving at a pool after l2 classifier will be moved to the virtio 
>device.
>- Packets whose destination is a local virtio device will be delivered either 
>by a)software switching mode b)hardware l2 switch.
>- zero copy is supported and could be configured through command line.
>
>Huawei Xie (3):
>  remove old vhost example
>  add lib/librte_vhost support in mk/rte.app.mk
>  add new vhost example
>
> examples/vhost/Makefile|   10 +-
> examples/vhost/eventfd_link/Makefile   |   39 -
> examples/vhost/eventfd_link/eventfd_link.c |  205 -
> examples/vhost/eventfd_link/eventfd_link.h |   79 --
> examples/vhost/libvirt/qemu-wrap.py|5 +-
> examples/vhost/main.c  | 1101 +-
> examples/vhost/main.h  |   85 +-
> examples/vhost/vhost-net-cdev.c|  367 -
> examples/vhost/vhost-net-cdev.h|   83 --
> examples/vhost/virtio-net.c| 1165 
> examples/vhost/virtio-net.h|  147 
> mk/rte.app.mk  |5 +
> 12 files changed, 585 insertions(+), 2706 deletions(-)  delete mode 100644 
> examples/vhost/eventfd_link/Makefile
> delete mode 100644 examples/vhost/eventfd_link/eventfd_link.c
> delete mode 100644 examples/vhost/eventfd_link/eventfd_link.h
> delete mode 100644 examples/vhost/vhost-net-cdev.c  delete mode 100644 
> examples/vhost/vhost-net-cdev.h  delete mode 100644 
> examples/vhost/virtio-net.c  delete mode 100644 examples/vhost/virtio-net.h
>
>--
>1.8.1.4
>



[dpdk-dev] [PATCH] librte_acl make it build/work for 'default' target

2014-08-07 Thread Ananyev, Konstantin


> -Original Message-
> From: Neil Horman [mailto:nhorman at tuxdriver.com]
> Sent: Wednesday, August 06, 2014 7:55 PM
> To: Ananyev, Konstantin
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] librte_acl make it build/work for 'default' 
> target
> 
> On Wed, Aug 06, 2014 at 06:53:45PM +0100, Konstantin Ananyev wrote:
> > Make ACL library to build/work on 'default' architecture:
> > - make rte_acl_classify_scalar really scalar
> >  (make sure it wouldn't use sse4 instrincts through resolve_priority()).
> > - Provide two versions of rte_acl_classify code path:
> >   rte_acl_classify_sse() - could be build and used only on systems with 
> > sse4.2
> >   and upper, return -ENOTSUP on lower arch.
> >   rte_acl_classify_scalar() - a slower version, but could be build and used
> >   on all systems.
> > - rte_acl_classify() - becomes just a macro pointing to one of the functions
> >   mentioned abovei (highest avaialbe version at build time).
> > - keep code common for both version code.
> >
> > Signed-off-by: Konstantin Ananyev 
> > ---
> >  lib/librte_acl/acl_bld.c   |   5 +-
> >  lib/librte_acl/acl_match_check.def |  92 +
> >  lib/librte_acl/acl_run.c   | 692 
> > -
> >  lib/librte_acl/acl_run_sse.h   | 629 +
> >  lib/librte_acl/rte_acl.h   |  12 +-
> >  5 files changed, 806 insertions(+), 624 deletions(-)
> >  create mode 100644 lib/librte_acl/acl_match_check.def
> >  create mode 100644 lib/librte_acl/acl_run_sse.h
> >
> This is still compile time selected.  You've gone to all the trouble to 
> separate
> the scalar and sse vector paths.  Why not make it run time selectable based on
> cpu testing?  Just because its built for the default machine doesn't mean it
> will run on the default machine.  We may as well take advantage of the faster
> paths when we're able.
> 

Yes, it is possible to make selection at run-time...
Though I suppose it might add some overhead (extra call (or jump), etc.). 
But ok, I'll see what I can do.
Konstantin



[dpdk-dev] One pkt in mbuf chain - virtio pmd driver

2014-08-07 Thread Stephen Hemminger
On Thu, 7 Aug 2014 06:19:58 +
"Czaus, Tomasz"  wrote:

> Hello,
> 
> Does virtio pmd driver support scenario when a frame fits in mbuf chain, this 
> means all headers (eth/ipv4/tcp) are located in first mbuf and user data is 
> located in next mbuf. I have asked the same question on dpdk-ovs mailing 
> group, here is a thread and more details:
> 
> https://lists.01.org/pipermail/dpdk-ovs/2014-August/001557.html
> 
> Best Regards,
> Tomasz Czaus

Existing DPDK 1.7 virtio driver does not.
I wrote a virtio DPDK driver (before the 1.6 one) that does handle it, so it is 
possible.

I have some patches to handle multi-part mbuf and messages, based on backport
from that driver but they are untested.


[dpdk-dev] [PATCH 1/2] lib/librte_vhost: vhost library support to facilitate integration with vswitch.

2014-08-07 Thread Stephen Hemminger
On Fri, 18 Jul 2014 13:39:05 +0800
Huawei Xie  wrote:

> Signed-off-by: Huawei Xie 
> Acked-by: Konstantin Ananyev 
> Acked-by: Thomos Long 

This looks good, but there are some style convention issues:

1. Please don't use really long lines. About 100 or 120 characters is maximum
   reasonable length in an editor

2. Don't put space here in function decl.

ERROR: space prohibited after that '*' (ctx:BxW)
#1183: FILE: lib/librte_vhost/vhost-net-cdev.h:102:
+   int (* set_vring_kick)(struct vhost_device_ctx, struct vhost_vring_file 
*);
 ^

3. Use BSD and kernel style brace
Not:

+void
+put_files_struct (struct files_struct *files)
+{
+   if (atomic_dec_and_test (&files->count))
+   {
+   BUG ();
+   }
+}

Instead:
+void
+put_files_struct (struct files_struct *files)
+{
+   if (atomic_dec_and_test (&files->count)) {
+   BUG ();
+   }
+}

4. All functions that are not used in other files should be marked static.
   For example put_files_struct

5. Switch should be indented at same level as case:
Not:
+   switch (ioctl)
+   {
+   case EVENTFD_COPY:
+   if (copy_from_user (&eventfd_copy, argp, sizeof (struct 
eventfd_copy)))
+   return -EFAULT;
+

Instead:
+   switch (ioctl) {
+   case EVENTFD_COPY:
+   if (copy_from_user (&eventfd_copy, argp, sizeof (struct 
eventfd_copy)))
+   return -EFAULT


[dpdk-dev] [PATCHv2] librte_acl make it build/work for 'default' target

2014-08-07 Thread Konstantin Ananyev
Make ACL library to build/work on 'default' architecture:
- make rte_acl_classify_scalar really scalar
 (make sure it wouldn't use sse4 instrincts through resolve_priority()).
- Provide two versions of rte_acl_classify code path:
  rte_acl_classify_sse() - could be build and used only on systems with sse4.2
  and upper, return -ENOTSUP on lower arch.
  rte_acl_classify_scalar() - a slower version, but could be build and used
  on all systems.
- keep common code shared between these two codepaths.

v2 chages:
 run-time selection of most appropriate code-path for given ISA.
 By default the highest supprted one is selected.
 User can still override that selection by manually assigning new value to 
 the global function pointer rte_acl_default_classify.
 rte_acl_classify() becomes a macro calling whatever rte_acl_default_classify
 points to.


Signed-off-by: Konstantin Ananyev 
---
 app/test-acl/main.c|  13 +-
 lib/librte_acl/Makefile|   5 +-
 lib/librte_acl/acl_bld.c   |   5 +-
 lib/librte_acl/acl_match_check.def |  92 
 lib/librte_acl/acl_run.c   | 944 -
 lib/librte_acl/acl_run.h   | 220 +
 lib/librte_acl/acl_run_scalar.c| 197 
 lib/librte_acl/acl_run_sse.c   | 630 +
 lib/librte_acl/rte_acl.c   |  15 +
 lib/librte_acl/rte_acl.h   |  24 +-
 10 files changed, 1189 insertions(+), 956 deletions(-)
 create mode 100644 lib/librte_acl/acl_match_check.def
 delete mode 100644 lib/librte_acl/acl_run.c
 create mode 100644 lib/librte_acl/acl_run.h
 create mode 100644 lib/librte_acl/acl_run_scalar.c
 create mode 100644 lib/librte_acl/acl_run_sse.c

diff --git a/app/test-acl/main.c b/app/test-acl/main.c
index d654409..45c6fa6 100644
--- a/app/test-acl/main.c
+++ b/app/test-acl/main.c
@@ -787,6 +787,10 @@ acx_init(void)
/* perform build. */
ret = rte_acl_build(config.acx, &cfg);

+   /* setup default rte_acl_classify */
+   if (config.scalar)
+   rte_acl_default_classify = rte_acl_classify_scalar;
+
dump_verbose(DUMP_NONE, stdout,
"rte_acl_build(%u) finished with %d\n",
config.bld_categories, ret);
@@ -815,13 +819,8 @@ search_ip5tuples_once(uint32_t categories, uint32_t step, 
int scalar)
v += config.trace_sz;
}

-   if (scalar != 0)
-   ret = rte_acl_classify_scalar(config.acx, data,
-   results, n, categories);
-
-   else
-   ret = rte_acl_classify(config.acx, data,
-   results, n, categories);
+   ret = rte_acl_classify(config.acx, data, results,
+   n, categories);

if (ret != 0)
rte_exit(ret, "classify for ipv%c_5tuples returns %d\n",
diff --git a/lib/librte_acl/Makefile b/lib/librte_acl/Makefile
index 4fe4593..65e566d 100644
--- a/lib/librte_acl/Makefile
+++ b/lib/librte_acl/Makefile
@@ -43,7 +43,10 @@ SRCS-$(CONFIG_RTE_LIBRTE_ACL) += tb_mem.c
 SRCS-$(CONFIG_RTE_LIBRTE_ACL) += rte_acl.c
 SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_bld.c
 SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_gen.c
-SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run.c
+SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_scalar.c
+SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_sse.c
+
+CFLAGS_acl_run_sse.o += -msse4.1

 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_ACL)-include := rte_acl_osdep.h
diff --git a/lib/librte_acl/acl_bld.c b/lib/librte_acl/acl_bld.c
index 873447b..09d58ea 100644
--- a/lib/librte_acl/acl_bld.c
+++ b/lib/librte_acl/acl_bld.c
@@ -31,7 +31,6 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */

-#include 
 #include 
 #include "tb_mem.h"
 #include "acl.h"
@@ -1480,8 +1479,8 @@ acl_calc_wildness(struct rte_acl_build_rule *head,

switch (rule->config->defs[n].type) {
case RTE_ACL_FIELD_TYPE_BITMASK:
-   wild = (size -
-   _mm_popcnt_u32(fld->mask_range.u8)) /
+   wild = (size - __builtin_popcount(
+   fld->mask_range.u8)) /
size;
break;

diff --git a/lib/librte_acl/acl_match_check.def 
b/lib/librte_acl/acl_match_check.def
new file mode 100644
index 000..8ff4ec3
--- /dev/null
+++ b/lib/librte_acl/acl_match_check.def
@@ -0,0 +1,92 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of cond

[dpdk-dev] [PATCHv2] librte_acl make it build/work for 'default' target

2014-08-07 Thread Neil Horman
On Thu, Aug 07, 2014 at 07:31:03PM +0100, Konstantin Ananyev wrote:
> Make ACL library to build/work on 'default' architecture:
> - make rte_acl_classify_scalar really scalar
>  (make sure it wouldn't use sse4 instrincts through resolve_priority()).
> - Provide two versions of rte_acl_classify code path:
>   rte_acl_classify_sse() - could be build and used only on systems with sse4.2
>   and upper, return -ENOTSUP on lower arch.
>   rte_acl_classify_scalar() - a slower version, but could be build and used
>   on all systems.
> - keep common code shared between these two codepaths.
> 
> v2 chages:
>  run-time selection of most appropriate code-path for given ISA.
>  By default the highest supprted one is selected.
>  User can still override that selection by manually assigning new value to 
>  the global function pointer rte_acl_default_classify.
>  rte_acl_classify() becomes a macro calling whatever rte_acl_default_classify
>  points to.
> 
> 
> Signed-off-by: Konstantin Ananyev 

This is alot better thank you.  A few remaining issues.

> ---
>  app/test-acl/main.c|  13 +-
>  lib/librte_acl/Makefile|   5 +-
>  lib/librte_acl/acl_bld.c   |   5 +-
>  lib/librte_acl/acl_match_check.def |  92 
>  lib/librte_acl/acl_run.c   | 944 
> -
>  lib/librte_acl/acl_run.h   | 220 +
>  lib/librte_acl/acl_run_scalar.c| 197 
>  lib/librte_acl/acl_run_sse.c   | 630 +
>  lib/librte_acl/rte_acl.c   |  15 +
>  lib/librte_acl/rte_acl.h   |  24 +-
>  10 files changed, 1189 insertions(+), 956 deletions(-)
>  create mode 100644 lib/librte_acl/acl_match_check.def
>  delete mode 100644 lib/librte_acl/acl_run.c
>  create mode 100644 lib/librte_acl/acl_run.h
>  create mode 100644 lib/librte_acl/acl_run_scalar.c
>  create mode 100644 lib/librte_acl/acl_run_sse.c
> 
> diff --git a/app/test-acl/main.c b/app/test-acl/main.c
> index d654409..45c6fa6 100644
> --- a/app/test-acl/main.c
> +++ b/app/test-acl/main.c
> @@ -787,6 +787,10 @@ acx_init(void)
>   /* perform build. */
>   ret = rte_acl_build(config.acx, &cfg);
>  
> + /* setup default rte_acl_classify */
> + if (config.scalar)
> + rte_acl_default_classify = rte_acl_classify_scalar;
> +
Exporting this variable as part of the ABI is a bad idea.  If the prototype of
the function changes you have to update all your applications.  Make the pointer
an internal symbol and set it using a get/set routine with an enum to represent
the path to choose.  That will help isolate the ABI from the internal
implementation.  It will also let you prevent things like selecting a run time
path that is incompatible with the running system, and prevent path switching
during searches, which may produce unexpected results.

>
> diff --git a/lib/librte_acl/acl_run.c b/lib/librte_acl/acl_run.c
> deleted file mode 100644
> index e3d9fc1..000
> --- a/lib/librte_acl/acl_run.c
> +++ /dev/null
> @@ -1,944 +0,0 @@
> -/*-
> - *   BSD LICENSE
> - *
> - *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> - *   All rights reserved.
> - *
> - *   Redistribution and use in source and binary forms, with or without
> - *   modification, are permitted provided that the following conditions
>
> +
> +#define  __func_resolve_priority__   resolve_priority_scalar
> +#define  __func_match_check__acl_match_check_scalar
> +#include "acl_match_check.def"
> +
I get this lets you make some more code common, but its just unpleasant to trace
through.  Looking at the defintion of __func_match_check__ I don't see anything
particularly performance sensitive there.  What if instead you simply redefined
__func_match_check__ in a common internal header as acl_match_check (a generic
function), and had it accept priority resolution function as an argument?  That
would still give you all the performance enhancements without having to include
c files in the middle of other c files, and would make the code a bit more
parseable.

> +/*
> + * When processing the transition, rather than using if/else
> + * construct, the offset is calculated for DFA and QRANGE and
> + * then conditionally added to the address based on node type.
> + * This is done to avoid branch mis-predictions. Since the
> + * offset is rather simple calculation it is more efficient
> + * to do the calculation and do a condition move rather than
> + * a conditional branch to determine which calculation to do.
> + */
> +static inline uint32_t
> +scan_forward(uint32_t input, uint32_t max)
> +{
> + return (input == 0) ? max : rte_bsf32(input);
> +}
> + }
> +}
>
> +
> +#define  __func_resolve_priority__   resolve_priority_sse
> +#define  __func_match_check__acl_match_check_sse
> +#include "acl_match_check.def"
> +
Same deal as above.

> +/*
> + * Extract transitions from an XMM register and check for any matches
> + */
> +static void

[dpdk-dev] [PATCHv2] librte_acl make it build/work for 'default' target

2014-08-07 Thread Vincent JARDIN
What's about using function versioning attributes too:

https://gcc.gnu.org/wiki/FunctionMultiVersioning

?

Le 7 ao?t 2014 22:11, "Neil Horman"  a ?crit :
>
> On Thu, Aug 07, 2014 at 07:31:03PM +0100, Konstantin Ananyev wrote:
> > Make ACL library to build/work on 'default' architecture:
> > - make rte_acl_classify_scalar really scalar
> >  (make sure it wouldn't use sse4 instrincts through resolve_priority()).
> > - Provide two versions of rte_acl_classify code path:
> >   rte_acl_classify_sse() - could be build and used only on systems with
sse4.2
> >   and upper, return -ENOTSUP on lower arch.
> >   rte_acl_classify_scalar() - a slower version, but could be build and
used
> >   on all systems.
> > - keep common code shared between these two codepaths.
> >
> > v2 chages:
> >  run-time selection of most appropriate code-path for given ISA.
> >  By default the highest supprted one is selected.
> >  User can still override that selection by manually assigning new value
to
> >  the global function pointer rte_acl_default_classify.
> >  rte_acl_classify() becomes a macro calling whatever
rte_acl_default_classify
> >  points to.
> >
> >
> > Signed-off-by: Konstantin Ananyev 
>
> This is alot better thank you.  A few remaining issues.
>
> > ---
> >  app/test-acl/main.c|  13 +-
> >  lib/librte_acl/Makefile|   5 +-
> >  lib/librte_acl/acl_bld.c   |   5 +-
> >  lib/librte_acl/acl_match_check.def |  92 
> >  lib/librte_acl/acl_run.c   | 944
-
> >  lib/librte_acl/acl_run.h   | 220 +
> >  lib/librte_acl/acl_run_scalar.c| 197 
> >  lib/librte_acl/acl_run_sse.c   | 630 +
> >  lib/librte_acl/rte_acl.c   |  15 +
> >  lib/librte_acl/rte_acl.h   |  24 +-
> >  10 files changed, 1189 insertions(+), 956 deletions(-)
> >  create mode 100644 lib/librte_acl/acl_match_check.def
> >  delete mode 100644 lib/librte_acl/acl_run.c
> >  create mode 100644 lib/librte_acl/acl_run.h
> >  create mode 100644 lib/librte_acl/acl_run_scalar.c
> >  create mode 100644 lib/librte_acl/acl_run_sse.c
> >
> > diff --git a/app/test-acl/main.c b/app/test-acl/main.c
> > index d654409..45c6fa6 100644
> > --- a/app/test-acl/main.c
> > +++ b/app/test-acl/main.c
> > @@ -787,6 +787,10 @@ acx_init(void)
> >   /* perform build. */
> >   ret = rte_acl_build(config.acx, &cfg);
> >
> > + /* setup default rte_acl_classify */
> > + if (config.scalar)
> > + rte_acl_default_classify = rte_acl_classify_scalar;
> > +
> Exporting this variable as part of the ABI is a bad idea.  If the
prototype of
> the function changes you have to update all your applications.  Make the
pointer
> an internal symbol and set it using a get/set routine with an enum to
represent
> the path to choose.  That will help isolate the ABI from the internal
> implementation.  It will also let you prevent things like selecting a run
time
> path that is incompatible with the running system, and prevent path
switching
> during searches, which may produce unexpected results.
>
> >
> > diff --git a/lib/librte_acl/acl_run.c b/lib/librte_acl/acl_run.c
> > deleted file mode 100644
> > index e3d9fc1..000
> > --- a/lib/librte_acl/acl_run.c
> > +++ /dev/null
> > @@ -1,944 +0,0 @@
> > -/*-
> > - *   BSD LICENSE
> > - *
> > - *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> > - *   All rights reserved.
> > - *
> > - *   Redistribution and use in source and binary forms, with or without
> > - *   modification, are permitted provided that the following conditions
> >
> > +
> > +#define  __func_resolve_priority__   resolve_priority_scalar
> > +#define  __func_match_check__acl_match_check_scalar
> > +#include "acl_match_check.def"
> > +
> I get this lets you make some more code common, but its just unpleasant
to trace
> through.  Looking at the defintion of __func_match_check__ I don't see
anything
> particularly performance sensitive there.  What if instead you simply
redefined
> __func_match_check__ in a common internal header as acl_match_check (a
generic
> function), and had it accept priority resolution function as an argument?
 That
> would still give you all the performance enhancements without having to
include
> c files in the middle of other c files, and would make the code a bit more
> parseable.
>
> > +/*
> > + * When processing the transition, rather than using if/else
> > + * construct, the offset is calculated for DFA and QRANGE and
> > + * then conditionally added to the address based on node type.
> > + * This is done to avoid branch mis-predictions. Since the
> > + * offset is rather simple calculation it is more efficient
> > + * to do the calculation and do a condition move rather than
> > + * a conditional branch to determine which calculation to do.
> > + */
> > +static inline uint32_t
> > +scan_forward(uint32_t input, uint32_t max)
> > +{
> > + return (input 

[dpdk-dev] [PATCHv2] librte_acl make it build/work for 'default' target

2014-08-07 Thread Chris Wright
* Vincent JARDIN (vincent.jardin at 6wind.com) wrote:
> What's about using function versioning attributes too:
> 
> https://gcc.gnu.org/wiki/FunctionMultiVersioning
> 
> ?

Neat, hadn't seen that gcc feature before, but:

  "This support is available in GCC 4.8 and later. Support is only
   available in C++ for i386 targets."

I have 4.8.2, gcc errors, g++ works.  And entirely unclear re:
icc (which was a supported compiler).  Seems like it's not
really an option.

thanks,
-chris


[dpdk-dev] [PATCHv2] librte_acl make it build/work for 'default' target

2014-08-07 Thread Neil Horman
On Thu, Aug 07, 2014 at 02:28:47PM -0700, Chris Wright wrote:
> * Vincent JARDIN (vincent.jardin at 6wind.com) wrote:
> > What's about using function versioning attributes too:
> > 
> > https://gcc.gnu.org/wiki/FunctionMultiVersioning
> > 
> > ?
> 
> Neat, hadn't seen that gcc feature before, but:
> 
>   "This support is available in GCC 4.8 and later. Support is only
>available in C++ for i386 targets."
> 
> I have 4.8.2, gcc errors, g++ works.  And entirely unclear re:
> icc (which was a supported compiler).  Seems like it's not
> really an option.
> 
I agree, nice idea, probably not pragmatic until its supported in all generally
available compilers (from distributions).  Its also a bit questionable as to how
this works on alternate arches.

Neil

> thanks,
> -chris
>