date:20210617

Re: [ovs-dev] [DISCUSS] CI discussion: travis-ci.org / travis-ci.com

2021-06-17 Thread Ben Pfaff

On Wed, Jun 16, 2021 at 12:25:34PM -0400, Aaron Conole wrote:
> Recently, Travis-CI has retired the travis-ci.org service.  At the
> moment, it is read-only.  In the future, it may disappear completely.
> 
> Currently, Open vSwitch has public facing badges, and documentation,
> which heavily refers to the Travis-CI service. (see the
> submitting-patches.rst, testing.rst, and README.rst files).  Travis
> support was formally added for the 2.4.0 release in August 2015, and had
> been heavily used until the deprecation of the 'Free' support tier, in
> Dec. 2020, at which point the project switched to using GitHub Actions
> for most of the CI support.
> 
> Notably, we still make use of Travis for ARM related builds, but as I
> understand it the future of that service for FOSS projects is not so
> clear.
> 
> With that introduction out of the way, I think it might be time to
> adjust the documentation w.r.t. Travis CI (and add notes about the
> GitHub Actions support), and possibly deprecate / remove things related
> to Travis.
> 
> I'm opening up the discussion before simply blasting patches because I
> may have overlooked something.  Maybe there is some way to continue to
> use Travis.  Maybe I missed a program.  At a minimum, we would need to
> s/travis-ci.org/travis-ci.com/g across the documentation above, but
> perhaps we should take this moment to drop it (before the release branch
> is cut).

I think I'm in support of this change.  Lots of projects are switching
to Github Actions.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v6 1/1] ovs-numa: Support non-contiguous numa nodes and offline CPU cores

2021-06-17 Thread David Wilder

This change removes the assumption that numa nodes and cores are numbered
contiguously in linux.  This change is required to support some Power
systems.

A check has been added to verify that cores are online,
offline cores result in non-contiguously numbered cores.

Dpdk EAL option generation is updated to work with non-contiguous numa nodes.
These options can be seen in the ovs-vswitchd.log.  For example:
a system containing only numa nodes 0 and 8 will generate the following:

EAL ARGS: ovs-vswitchd --socket-mem 1024,0,0,0,0,0,0,0,1024 \
 --socket-limit 1024,0,0,0,0,0,0,0,1024 -l 0

Tests for pmd and dpif-netdev have been updated to validate non-contiguous
numbered nodes.

Signed-off-by: David Wilder 
---
 lib/dpdk.c   | 57 +++--
 lib/ovs-numa.c   | 51 
 lib/ovs-numa.h   |  2 ++
 tests/dpif-netdev.at |  2 +-
 tests/pmd.at | 61 
 5 files changed, 142 insertions(+), 31 deletions(-)

diff --git a/lib/dpdk.c b/lib/dpdk.c
index 319540394..7f6f1d164 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -129,22 +129,63 @@ construct_dpdk_options(const struct smap 
*ovs_other_config, struct svec *args)
 }
 }
 
+static int
+compare_numa_node_list(const void *a_, const void *b_)
+{
+size_t a = *(const size_t *) a_;
+size_t b = *(const size_t *) b_;
+
+if (a < b) {
+return -1;
+}
+if (a > b) {
+return 1;
+}
+return 0;
+}
+
 static char *
 construct_dpdk_socket_mem(void)
 {
 const char *def_value = "1024";
-int numa, numa_nodes = ovs_numa_get_n_numas();
+struct ovs_numa_dump *dump;
+const struct ovs_numa_info_numa *node;
+size_t k = 0, last_node = 0, n_numa_nodes, *numa_node_list;
 struct ds dpdk_socket_mem = DS_EMPTY_INITIALIZER;
 
-if (numa_nodes == 0 || numa_nodes == OVS_NUMA_UNSPEC) {
-numa_nodes = 1;
-}
+/* Build a list of all numa nodes with at least one core */
+dump = ovs_numa_dump_n_cores_per_numa(1);
+n_numa_nodes = hmap_count(>numas);
+numa_node_list = xcalloc(n_numa_nodes, sizeof numa_node_list);
 
-ds_put_cstr(_socket_mem, def_value);
-for (numa = 1; numa < numa_nodes; ++numa) {
-ds_put_format(_socket_mem, ",%s", def_value);
+FOR_EACH_NUMA_ON_DUMP(node, dump) {
+if (k >= n_numa_nodes) {
+break;
+}
+numa_node_list[k++] = node->numa_id;
 }
-
+qsort(numa_node_list, k, sizeof numa_node_list, compare_numa_node_list);
+
+for (size_t i = 0; i < n_numa_nodes; i++) {
+while (numa_node_list[i] > last_node &&
+   numa_node_list[i] != OVS_NUMA_UNSPEC &&
+   numa_node_list[i] <= MAX_NUMA_NODES){
+if (last_node == 0) {
+ds_put_format(_socket_mem, "%s", "0");
+} else {
+ds_put_format(_socket_mem, ",%s", "0");
+}
+last_node++;
+}
+if (numa_node_list[i] == 0) {
+ds_put_format(_socket_mem, "%s", def_value);
+} else {
+ds_put_format(_socket_mem, ",%s", def_value);
+}
+last_node++;
+}
+free(numa_node_list);
+ovs_numa_dump_destroy(dump);
 return ds_cstr(_socket_mem);
 }
 
diff --git a/lib/ovs-numa.c b/lib/ovs-numa.c
index 6d0a68522..b825ecbdd 100644
--- a/lib/ovs-numa.c
+++ b/lib/ovs-numa.c
@@ -42,21 +42,22 @@ VLOG_DEFINE_THIS_MODULE(ovs_numa);
  * This module stores the affinity information of numa nodes and cpu cores.
  * It also provides functions to bookkeep the pin of threads on cpu cores.
  *
- * It is assumed that the numa node ids and cpu core ids all start from 0 and
- * range continuously.  So, for example, if 'ovs_numa_get_n_cores()' returns N,
- * user can assume core ids from 0 to N-1 are all valid and there is a
- * 'struct cpu_core' for each id.
+ * It is assumed that the numa node ids and cpu core ids all start from 0.
+ * There is no guarantee that node and cpu ids are numbered consecutively
+ * (this is a change from earlier version of the code). So, for example,
+ * if two nodes exist with ids 0 and 8, 'ovs_numa_get_n_nodes()' will
+ * return 2, no assumption of node numbering should be made.
  *
  * NOTE, this module should only be used by the main thread.
  *
- * NOTE, the assumption above will fail when cpu hotplug is used.  In that
- * case ovs-numa will not function correctly.  For now, add a TODO entry
- * for addressing it in the future.
+ * NOTE, if cpu hotplug is used 'all_numa_nodes' and 'all_cpu_cores' must be
+ * invalidated when ever the system topology changes. Support for detecting
+ * topology changes has not been included. For now, add a TODO entry for
+ * addressing it in the future.
  *
  * TODO: Fix ovs-numa when cpu hotplug is used.
  */
 
-#define MAX_NUMA_NODES 128
 
 /* numa node. */
 struct numa_node {
@@ -130,15 +131,14 @@ insert_new_cpu_core(struct numa_node *n,

[ovs-dev] [PATCH v6 0/1] Support for non-contiguous numa nodes and core ids.

2021-06-17 Thread David Wilder

Ovs-numa currently makes the assumption that numa node ids and cpu core ids
will be numbered contiguously. Current Power systems don't always follow this
model. Furthermore, cpus on Power may be on/off lined based the setting of
Simultaneous multithreading (SMT). The result can be gaps in the numbering of
the cores. For example, a 2 socket system with 20 Core(s) per socket configured
with 4 thread per core (smt=4) has the following topology:

NUMA node0 CPU(s):   0-79
NUMA node8 CPU(s):   80-159

When set to smt=2 the following topology is created.

NUMA node0 CPU(s): 0,1,4,5,8,9,12,13,16,17,20,21,24,25,28,29,32,33,36,37,40,
   41,44,45,48,49,52,53,56,57,60,61,64,65,68,69,72,73,76,77
NUMA node8 CPU(s): 80,81,84,85,88,89,92,93,96,97,100,101,104,105,108,109,112,
   113,116,117,120,121,124,125,128,129,132,133,136,137,140,
   141,144,145,148,149,152,153,156,157

This patch allows ovs-numa to work with non-contiguous node and cpu ids.
In addition lib/dpdk:construct_dpdk_socket_mem() is updated to correctly
build the EAL options: --socket-mem and --socket-limit on systems with
non-contiguous node ids. Pmd and dpif-netdev multi-node tests were updated
to validate a simulated numa topology with non-contiguous nodes.

v2 changes:
-0-day Robot suggested changes.
v3 changes:
-re-wrote cpu_detected() to address memory leak.
V4 changes:
-Rebased patches for:
 https://patchwork.ozlabs.org/project/openvswitch/list/?series=157389.
-Extended the automated tests to test both contiguous and non-contiguous
 configurations.
-changed the phrase "non-consecutive" to "non-contiguous" (since 0, 8 is
 consecutive but not contiguous).
V5 changes 
-Removed unused variable max_numa_id in discover_numa_and_core_dummy().
-Cleanup snprintf formatting in cpu_detected().
-Simplified test coverage of non-contiguous and contiguous nodes.
-Fixed bug in construct_dpdk_socket_mem().
V6 changes
-Sorted the list of numa nodes as hmap will not guarantee any order.
-Added a test to ensure that rxq assignment to pmds can be done when
  there are non-contiguous numa nodes (Provided by ktray...@redhat.com).
-Reduced the number of non-contiguous numa nodes tests in dpif-netdev.at.
-Squashed the patch series to a single patch.

David Wilder (1):
  ovs-numa: Support non-contiguous numa nodes and offline CPU cores

 lib/dpdk.c   | 57 +++--
 lib/ovs-numa.c   | 51 
 lib/ovs-numa.h   |  2 ++
 tests/dpif-netdev.at |  2 +-
 tests/pmd.at | 61 
 5 files changed, 142 insertions(+), 31 deletions(-)

-- 
2.27.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] dpif-netdev: Fix crash when PACKET_OUT is metered

2021-06-17 Thread Ben Pfaff

All these flags for stealing, allowing stealing, blah blah, are just
ways to do some kind of dumb reference counting without actually have a
reference count.  When it gets super complex like this, maybe
introducing a reference count is the way to go.  It would be a bigger
change, but perhaps more maintainable over time.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH v1] conntrack: Add state and sequence validation

2021-06-17 Thread Aaron Conole

During testing, there was an edge condition that was found during
packet pickup where userspace can improperly advance the TCP state
machine during connection exstablishment and bypass the 3whs.  This
can pollute the TCP sequence windows.

Add a fix to ensure that we move the state machine when we see the
appropriate flags, and include a test to show the error condition.

Signed-off-by: Aaron Conole 
---
NOTE: I haven't done as much testing for 'learn existing connections' as
  I would want.  I expect that I may have missed a case there, and
  hope to include a test for it when I work on the tcp_loose mode
  support in the userspace conntrack

 lib/conntrack-tcp.c |  7 +++--
 tests/system-traffic.at | 62 +
 2 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c
index 8a7c98cc45..1f6cc4368d 100644
--- a/lib/conntrack-tcp.c
+++ b/lib/conntrack-tcp.c
@@ -224,6 +224,7 @@ tcp_conn_update(struct conntrack *ct, struct conn *conn_,
 
 end = seq + p_len;
 if (tcp_flags & TCP_SYN) {
+src->state = CT_DPIF_TCPS_SYN_SENT; /* SYN_SENT by src */
 end++;
 if (dst->wscale & CT_WSCALE_FLAG) {
 src->wscale = tcp_get_wscale(tcp);
@@ -245,7 +246,6 @@ tcp_conn_update(struct conntrack *ct, struct conn *conn_,
 }
 
 src->seqlo = seq;
-src->state = CT_DPIF_TCPS_SYN_SENT;
 /*
  * May need to slide the window (seqhi may have been set by
  * the crappy stack check or if we picked up the connection
@@ -329,7 +329,10 @@ tcp_conn_update(struct conntrack *ct, struct conn *conn_,
 }
 if (tcp_flags & TCP_ACK) {
 if (dst->state == CT_DPIF_TCPS_SYN_SENT) {
-dst->state = CT_DPIF_TCPS_ESTABLISHED;
+if (src->state == CT_DPIF_TCPS_SYN_SENT) {
+/* Move to EST only once SRC things this is okay */
+dst->state = CT_DPIF_TCPS_ESTABLISHED;
+}
 } else if (dst->state == CT_DPIF_TCPS_CLOSING) {
 dst->state = CT_DPIF_TCPS_FIN_WAIT_2;
 }
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index c73bbc420f..4e849085bb 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -6025,6 +6025,68 @@ AT_CHECK([ovs-ofctl dump-flows br0 | grep table=2, | 
OFPROTO_CLEAR_DURATION_IDLE
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([conntrack - Out of order TCP state transition])
+dnl This can happen due to buggy TCP implementations that reuse ephemeral
+dnl ports - this test will check that some invalid parameters don't advance
+dnl the state machine
+CHECK_CONNTRACK()
+OVS_TRAFFIC_VSWITCHD_START()
+OVS_CHECK_CT_CLEAR()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", "f0:00:00:01:01:01")
+ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", "f0:00:00:01:01:02")
+
+dnl setup ct flows
+AT_DATA([flows.txt], [dnl
+table=0,priority=10  arp action=normal
+table=0,priority=10  ip,tcp,ct_state=-trk action=ct(table=1)
+table=0,priority=1   action=drop
+dnl dst ns2
+table=1,priority=20  ip,ct_state=+new+trk,nw_dst=10.1.1.2 
action=ct(commit),output:ovs-p1
+table=1,priority=20  ip,ct_state=+est+trk,nw_dst=10.1.1.2 action=output:ovs-p1
+dnl dst ns1
+table=1,priority=10  ip,ct_state=+trk+est,nw_dst=10.1.1.1 action=output:ovs-p0
+table=1,priority=10  ip,ct_state=+trk+new,nw_dst=10.1.1.1 action=output:ovs-p0
+table=1,priority=1   ip,ct_state=+trk+inv action=drop
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+dnl kill tcp packets - this will suppress RST/RST-ACK messages
+NS_CHECK_EXEC([at_ns0], [iptables -I INPUT 1 -p tcp --sport 6667 -j DROP])
+NS_CHECK_EXEC([at_ns1], [iptables -I INPUT 1 -p tcp --dport 6667 -j DROP])
+
+dnl Send TCP SYN
+NS_CHECK_EXEC([at_ns0], [$PYTHON3 $srcdir/sendpkt.py p0 f0 00 00 01 01 02 f0 
00 00 01 01 01 08 00 45 00 00 3c c9 c8 40 00 40 06 5a ef 0a 01 01 01 0a 01 01 
02 b2 2a 1a 0b d3 78 6f 81 00 00 00 00 a0 02 fa f0 7a 0b 00 00 02 04 05 b4 04 
02 08 0a 6d 35 40 9a 00 00 00 00 01 03 03 07 > /dev/null])
+
+dnl Send TCP PSH|URG
+NS_CHECK_EXEC([at_ns1], [$PYTHON3 $srcdir/sendpkt.py p1 f0 00 00 01 01 01 f0 
00 00 01 01 02 08 00 45 00 00 34 c9 c9 40 00 40 06 5a f6 0a 01 01 02 0a 01 01 
01 b2 2a 1a 0b d3 78 6f 82 0a 0b d5 33 80 28 01 f6 72 bb 00 00 01 01 08 0a 6d 
35 40 9a 11 90 3e 20 > /dev/null])
+
+dnl Check that we haven't advanced the ct state machine
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "dst=10.1.1.1" | sort | 
uniq], [0], [dnl
+tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=45610,dport=6667),reply=(src=10.1.1.2,dst=10.1.1.1,sport=6667,dport=45610),protoinfo=(state=SYN_SENT)
+])
+
+dnl Send TCP ACK without syn, and with garbage ack values
+NS_CHECK_EXEC([at_ns1], [$PYTHON3 $srcdir/sendpkt.py p1 f0 00 00 01 01 01 f0 
00 00 01 01 02 08 00 45 00 00 34 73 7a 40 00 40 06 b1 45 0a 01 01 02 0a 01 01 
01 1a 0b b2 2a 0a 0b

Re: [ovs-dev] [PATCH ovn v2] ovn.at: Fix test "virtual ports -- ovn-northd-ddlog".

2021-06-17 Thread Mark Michelson


On 6/14/21 2:44 PM, Ben Pfaff wrote:

On Fri, Jun 11, 2021 at 03:48:52PM -0700, Han Zhou wrote:

The test case fails quite often for northd-ddlog because of the tunnel
keys mismatch when comparing OpenFlow rules. Keys can change in
different runs. This patch fixes it by extracting the expected keys from
SB DB before comparison instead of hardcoding.

There are some other potential timing issues in this test and this
patch fixes them as well by replacing AT_CHECK with OVS_WAIT_UNTIL.

Signed-off-by: Han Zhou 


Awesome!  Thank you.


-AT_CHECK([test x$(ovn-sbctl --bare --columns chassis find port_binding \
+OVS_WAIT_UNTIL([test x$(ovn-sbctl --bare --columns chassis find port_binding \
  logical_port=sw0-vir) = x], [0], [])


I think the above can be better written:
 wait_row_count Port_Binding 0 logical_port=sw0-vir


I don't think this is correct. The test is not attempting to wait for 
the Port_Binding record to be deleted. It's waiting for the chassis 
column in the Port_Binding to contain an empty string. I think 
wait_column() could work:


wait_column "" Port_Binding chassis logical_port=sw0-vir

(assuming that testing for an empty string works)





  # Cleanup hv1-vif3.
  as hv1
  ovs-vsctl del-port hv1-vif3
  
-AT_CHECK([test x$(ovn-sbctl --bare --columns chassis find port_binding \

+OVS_WAIT_UNTIL([test x$(ovn-sbctl --bare --columns chassis find port_binding \
  logical_port=sw0-vir) = x], [0], [])


Ditto?


+sw0_dp_key=$(fetch_column Datapath_Binding tunnel_key 
external_ids:name=sw0)
+lr0_dp_key=$(fetch_column Datapath_Binding tunnel_key 
external_ids:name=lr0)
+lr0_public_dp_key=$(fetch_column Port_Binding tunnel_key 
logical_port=lr0-public)


I think that the above will retrieve tunnel keys in decimal...


+AT_CHECK_UNQUOTED([as $hv ovs-ofctl dump-flows br-int table=44 | ofctl_strip_all | 
grep "priority=2000"], [0], [dnl
+ table=44, priority=2000,ip,metadata=0x$sw0_dp_key actions=resubmit(,45)
+ table=44, priority=2000,ipv6,metadata=0x$sw0_dp_key actions=resubmit(,45)
  ])


...therefore I think that the above 0x should not be there.  (I guess
the test passes because the numbers in the test are all under 10.)


Yeah, it should probably be fine for this test to not worry about this.



Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] dpif-netdev: Fix crash when PACKET_OUT is metered

2021-06-17 Thread Ilya Maximets

On 6/17/21 7:47 PM, Ilya Maximets wrote:
> On 6/16/21 11:54 PM, Tony van der Peet wrote:
>> Thanks Ilya. For what it's worth, besides running the OVS unit tests, I put 
>> this new code through our (enhanced) version of oftest (500 test cases) 
>> including a couple I wrote just for this situation.
>>
>> Tony
>>
>> On Thu, Jun 17, 2021 at 8:05 AM Ilya Maximets > > wrote:
>>
>> On 6/16/21 2:04 AM, Tony van der Peet wrote:
>> > From: Tony van der Peet > >
>> >
>> > When a PACKET_OUT has output port of OFPP_TABLE, and the rule
>> > table includes a meter and this causes the packet to be deleted,
>> > stop the packet from being deleted twice by cloning it and setting
>> > it up to be stolen in execution.
>> >
>> > Add a test to verify this condition.
>> >
>> > Signed-off-by: Tony van der Peet > >
>>
>> Thanks for the patch!  OVS seems to work fine with this change,
>> but for some reason several OVN unit tests are failing if it
>> uses OVS with this change applied.
>>
>> Trying to figure out why...
>>
>> Best regards, Ilya Maximets.
>>
> 
> OK, I've spent most of a day trying to figure out what is going wrong there
> (mostly because OVN tests are insanely complex and very hard to debug).
> 
> In short: dpif_execute() expected to modify the packet by higher layers and
> by cloning the packet inside dp_netdev_execute_actions() all the
> modifications applied to the copy and not propagated to the original packet.
> 
> The call stack looks something like this:
> 
> 1. handle_packet_out()
> 2. --> ofproto_packet_out_finish()
> 3. --> packet_execute()
> 4. --> dpif_execute()
> 5. --> dpif_operate()
> 6. --> dpif_execute_with_help()
> 7. --> odp_execute_actions()
> 8. ** For each action on a list **:
> 9. --> dpif_execute_helper_cb()
> 10.--> dpif_execute()
> 11.--> dpif_operate()
> 12.--> dp_netdev_execute()
> 
> The problem is on a line 8.  odp_execute_actions() executes actions one by
> one expecting the datapath to modify the packet.  In case of OVN unit tests
> PACKET_OUT resulted in a flow with 2 actions: tunnel push + output.
> So, the first dp_netdev_execute() is called to push the tunnel header to the
> packet and the second time it's called to execute OUTPUT action and send the
> packet to the destination.  And since we're cloning the packet, tunnel header
> was lost and bare packet was sent out.  This caused failure of OVN unit tests.
> 
> It's understandable why odp_execute_actions() executes actions one by one.
> Some actions requires datapath assistance and others could be executed in
> userspace, some actions could be executed in userspace only.  So, we have
> to use it this way.  But that means that we need to propagate information
> about stolen packet at least to the level of odp_execute_actions().  I tried
> to implement that but, it doesn't look good...
> 
> So, technically, what I'm suggesting is to copy the content of the packet
> back after the execution.  Still kind of ugly, but, at least localized.
> We also need to report an error condition if the copy was stolen, so upper
> layers will be aware that actions was not successful.
> Something like this:
> 
> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
> index 08d93c277..860a6c3e7 100644
> --- a/lib/dp-packet.h
> +++ b/lib/dp-packet.h
> @@ -199,6 +199,7 @@ struct dp_packet 
> *dp_packet_clone_data_with_headroom(const void *, size_t,
>  void dp_packet_resize(struct dp_packet *b, size_t new_headroom,
>size_t new_tailroom);
>  static inline void dp_packet_delete(struct dp_packet *);
> +static inline void dp_packet_swap(struct dp_packet *, struct dp_packet *);
>  
>  static inline void *dp_packet_at(const struct dp_packet *, size_t offset,
>   size_t size);
> @@ -256,6 +257,16 @@ dp_packet_delete(struct dp_packet *b)
>  }
>  }
>  
> +/* Swaps content of two packets. */
> +static inline void
> +dp_packet_swap(struct dp_packet *a, struct dp_packet *b)
> +{

Ugh.  This function can not be used for packets originated from
afxdp or dpdk.  Luckily, I think, that dpif_netdev_execute() should
never work with such packets, but a bunch of assertions here is
needed.  DPBUF_STACK also doesn't sound like a good idea, because
it's intended to be immutable.  So:

ovs_assert(a->source == DPBUF_MALLOC || a->source == DPBUF_STUB);
ovs_assert(b->source == DPBUF_MALLOC || b->source == DPBUF_STUB);

> +struct dp_packet c = *a;
> +
> +*a = *b;
> +*b = c;
> +}
> +
>  /* If 'b' contains at least 'offset + size' bytes of data, returns a pointer 
> to
>   * byte 'offset'.  Otherwise, returns a null pointer. */
>  static inline void *
>

Re: [ovs-dev] [PATCH ovn v2 3/3] tests: Add check-perf target

2021-06-17 Thread 0-day Robot

Bleep bloop.  Greetings Mark Gray, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Line is 80 characters long (recommended limit is 79)
#44 FILE: Documentation/topics/testing.rst:263:
OVN includes a suite of micro-benchmarks to aid a developer in understanding the

WARNING: Line is 80 characters long (recommended limit is 79)
#70 FILE: Documentation/topics/testing.rst:289:
   rebuilt, the complexity of the tests and the performance of the test machine.

WARNING: Line is 80 characters long (recommended limit is 79)
#71 FILE: Documentation/topics/testing.rst:290:
   If you are only using one test, you can specify the test to run by adding the

WARNING: Line is 139 characters long (recommended limit is 79)
#76 FILE: Documentation/topics/testing.rst:295:
   benchmarking against. If you are only using one test, you can specify the 
test to run by adding the test number to the ``make`` command.

Lines checked: 420, Warnings: 4, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] dpif-netdev: Fix crash when PACKET_OUT is metered

2021-06-17 Thread Ilya Maximets

On 6/16/21 11:54 PM, Tony van der Peet wrote:
> Thanks Ilya. For what it's worth, besides running the OVS unit tests, I put 
> this new code through our (enhanced) version of oftest (500 test cases) 
> including a couple I wrote just for this situation.
> 
> Tony
> 
> On Thu, Jun 17, 2021 at 8:05 AM Ilya Maximets  > wrote:
> 
> On 6/16/21 2:04 AM, Tony van der Peet wrote:
> > From: Tony van der Peet  >
> >
> > When a PACKET_OUT has output port of OFPP_TABLE, and the rule
> > table includes a meter and this causes the packet to be deleted,
> > stop the packet from being deleted twice by cloning it and setting
> > it up to be stolen in execution.
> >
> > Add a test to verify this condition.
> >
> > Signed-off-by: Tony van der Peet  >
> 
> Thanks for the patch!  OVS seems to work fine with this change,
> but for some reason several OVN unit tests are failing if it
> uses OVS with this change applied.
> 
> Trying to figure out why...
> 
> Best regards, Ilya Maximets.
> 

OK, I've spent most of a day trying to figure out what is going wrong there
(mostly because OVN tests are insanely complex and very hard to debug).

In short: dpif_execute() expected to modify the packet by higher layers and
by cloning the packet inside dp_netdev_execute_actions() all the
modifications applied to the copy and not propagated to the original packet.

The call stack looks something like this:

1. handle_packet_out()
2. --> ofproto_packet_out_finish()
3. --> packet_execute()
4. --> dpif_execute()
5. --> dpif_operate()
6. --> dpif_execute_with_help()
7. --> odp_execute_actions()
8. ** For each action on a list **:
9. --> dpif_execute_helper_cb()
10.--> dpif_execute()
11.--> dpif_operate()
12.--> dp_netdev_execute()

The problem is on a line 8.  odp_execute_actions() executes actions one by
one expecting the datapath to modify the packet.  In case of OVN unit tests
PACKET_OUT resulted in a flow with 2 actions: tunnel push + output.
So, the first dp_netdev_execute() is called to push the tunnel header to the
packet and the second time it's called to execute OUTPUT action and send the
packet to the destination.  And since we're cloning the packet, tunnel header
was lost and bare packet was sent out.  This caused failure of OVN unit tests.

It's understandable why odp_execute_actions() executes actions one by one.
Some actions requires datapath assistance and others could be executed in
userspace, some actions could be executed in userspace only.  So, we have
to use it this way.  But that means that we need to propagate information
about stolen packet at least to the level of odp_execute_actions().  I tried
to implement that but, it doesn't look good...

So, technically, what I'm suggesting is to copy the content of the packet
back after the execution.  Still kind of ugly, but, at least localized.
We also need to report an error condition if the copy was stolen, so upper
layers will be aware that actions was not successful.
Something like this:

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 08d93c277..860a6c3e7 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -199,6 +199,7 @@ struct dp_packet *dp_packet_clone_data_with_headroom(const 
void *, size_t,
 void dp_packet_resize(struct dp_packet *b, size_t new_headroom,
   size_t new_tailroom);
 static inline void dp_packet_delete(struct dp_packet *);
+static inline void dp_packet_swap(struct dp_packet *, struct dp_packet *);

 static inline void *dp_packet_at(const struct dp_packet *, size_t offset,
  size_t size);
@@ -256,6 +257,16 @@ dp_packet_delete(struct dp_packet *b)
 }
 }

+/* Swaps content of two packets. */
+static inline void
+dp_packet_swap(struct dp_packet *a, struct dp_packet *b)
+{
+struct dp_packet c = *a;
+
+*a = *b;
+*b = c;
+}
+
 /* If 'b' contains at least 'offset + size' bytes of data, returns a pointer to
  * byte 'offset'.  Otherwise, returns a null pointer. */
 static inline void *
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 8fa7eb6d4..a660a2fd6 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4168,7 +4168,11 @@ dpif_netdev_execute(struct dpif *dpif, struct 
dpif_execute *execute)
flow_hash_5tuple(execute->flow, 0));
 }

-dp_packet_batch_init_packet(, execute->packet);
+/* Making a copy because the packet might be stolen during the execution
+ * and caller might still need it.  */
+struct dp_packet *packet_clone = dp_packet_clone(execute->packet);
+dp_packet_batch_init_packet(, packet_clone);
+
 dp_netdev_execute_actions(pmd, , false,

[ovs-dev] [PATCH ovn v2 3/3] tests: Add check-perf target

2021-06-17 Thread Mark Gray

Add a suite of micro-benchmarks to aid a developer in understanding the
performance impact of any changes that they are making. They can be used to
help to understand the relative performance between two test runs on the same
test machine, but are not intended to give the absolute performance of OVN.

To invoke the performance testsuite, run:

$ make check-perf

This will run all available performance tests.

Additional metrics (e.g. memory, coverage, perf counters) may be added
in the future. Additional tests (e.g. additional topologies,  ovn-controller
tests) may be added in the future.

Signed-off-by: Mark Gray 
---

Notes:
v2:  create results directory to fix build error

 Documentation/topics/testing.rst |  49 
 tests/.gitignore |   3 +
 tests/automake.mk|  26 
 tests/perf-northd.at | 207 +++
 tests/perf-testsuite.at  |  26 
 5 files changed, 311 insertions(+)
 create mode 100644 tests/perf-northd.at
 create mode 100644 tests/perf-testsuite.at

diff --git a/Documentation/topics/testing.rst b/Documentation/topics/testing.rst
index be9e7c57331c..ccd3278437b1 100644
--- a/Documentation/topics/testing.rst
+++ b/Documentation/topics/testing.rst
@@ -256,3 +256,52 @@ the following::
 All the features documented under `Unit Tests`_ are available for the
 datapath testsuites, except that the datapath testsuites do not
 support running tests in parallel.
+
+Performance testing
+~~~
+
+OVN includes a suite of micro-benchmarks to aid a developer in understanding 
the
+performance impact of any changes that they are making. They can be used to
+help to understand the relative performance between two test runs on the same
+test machine, but are not intended to give the absolute performance of OVN.
+
+To invoke the performance testsuite, run::
+
+$ make check-perf
+
+This will run all available performance tests. Some of these tests may be
+long-running as they need to build complex logical network topologies. In order
+to speed up subsequent test runs, some objects (e.g. the Northbound DB) may be
+cached. In order to force the tests to rebuild all these objects, run::
+
+$ make check-perf TESTSUITEFLAGS="--rebuild"
+
+A typical workflow for a developer trying to improve the performance of OVN
+would be the following:
+
+0. Optional: Modify/add a performance test to buld the topology that you are
+   benchmarking, if required.
+1. Run ``make check-perf TESTSUITEFLAGS="--rebuild"`` to generate cached
+   databases.
+
+.. note::
+   This step may take some time depending on the number of tests that are being
+   rebuilt, the complexity of the tests and the performance of the test 
machine.
+   If you are only using one test, you can specify the test to run by adding 
the
+   test number to the ``make`` command.
+   (e.g. ``make check-perf TESTSUITEFLAGS="--rebuild "``)
+
+2. Run ``make check-perf`` to measure the performance metric that you are
+   benchmarking against. If you are only using one test, you can specify the 
test to run by adding the test number to the ``make`` command.
+   (e.g. ``make check-perf TESTSUITEFLAGS="--rebuild "``)
+3. Modify OVN code to implement the change that you believe will improve the
+   performance.
+4. Go to Step 2. to continue making improvements.
+
+If, as a developer, you modify a performance test in a way that may change one
+of these cached objects, be sure to rebuild the test.
+
+The results of each test run are displayed on the screen at the end of the test
+run but are also saved in the file ``tests/perf-testsuite.dir/results``. The
+cached objects are stored under the relevant folder in
+``tests/perf-testsuite.dir/cached``.
diff --git a/tests/.gitignore b/tests/.gitignore
index 8479f9bb0f8f..65cb1c6e4fad 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -22,6 +22,9 @@
 /system-offloads-testsuite
 /system-offloads-testsuite.dir/
 /system-offloads-testsuite.log
+/perf-testsuite
+/perf-testsuite.dir/
+/perf-testsuite.log
 /test-aes128
 /test-atomic
 /test-bundle
diff --git a/tests/automake.mk b/tests/automake.mk
index 742e5cff28cc..ab04461eef69 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -4,9 +4,11 @@ EXTRA_DIST += \
$(SYSTEM_TESTSUITE_AT) \
$(SYSTEM_KMOD_TESTSUITE_AT) \
$(SYSTEM_USERSPACE_TESTSUITE_AT) \
+   $(PERF_TESTSUITE_AT) \
$(TESTSUITE) \
$(SYSTEM_KMOD_TESTSUITE) \
$(SYSTEM_USERSPACE_TESTSUITE) \
+   $(PERF_TESTSUITE) \
tests/atlocal.in \
$(srcdir)/package.m4 \
$(srcdir)/tests/testsuite \
@@ -52,6 +54,10 @@ SYSTEM_TESTSUITE_AT = \
tests/system-ovn.at \
tests/system-ovn-kmod.at
 
+PERF_TESTSUITE_AT = \
+   tests/perf-testsuite.at \
+   tests/perf-northd.at
+
 check_SCRIPTS += tests/atlocal
 
 TESTSUITE = $(srcdir)/tests/testsuite
@@ -59,6 +65,9 @@ TESTSUITE_PATCH = $(srcdir)/tests/testsuite.patch
 TESTSUITE_DIR =

[ovs-dev] [PATCH ovn v2 2/3] ovn-northd: Add useful stopwatches

2021-06-17 Thread Mark Gray

For performance measurement, it is useful to understand the
length of time required to complete a number of key code paths
in ovn-northd.c. Add stopwatches to measure these timings.

Signed-off-by: Mark Gray 
---
 northd/ovn-northd-ddlog.c | 15 +++
 northd/ovn-northd.c   | 20 
 2 files changed, 35 insertions(+)

diff --git a/northd/ovn-northd-ddlog.c b/northd/ovn-northd-ddlog.c
index a4f2960bdcb8..7c552d516550 100644
--- a/northd/ovn-northd-ddlog.c
+++ b/northd/ovn-northd-ddlog.c
@@ -37,6 +37,7 @@
 #include "ovsdb-parser.h"
 #include "ovsdb-types.h"
 #include "simap.h"
+#include "stopwatch.h"
 #include "stream-ssl.h"
 #include "stream.h"
 #include "unixctl.h"
@@ -50,6 +51,10 @@ VLOG_DEFINE_THIS_MODULE(ovn_northd);
 #include "northd/ovn-northd-ddlog-nb.inc"
 #include "northd/ovn-northd-ddlog-sb.inc"
 
+#define NORTHD_LOOP_STOPWATCH_NAME "ovn-northd-loop"
+#define OVNNB_DB_RUN_STOPWATCH_NAME "ovnnb_db_run"
+#define OVNSB_DB_RUN_STOPWATCH_NAME "ovnsb_db_run"
+
 struct northd_status {
 bool locked;
 bool pause;
@@ -1259,6 +1264,10 @@ main(int argc, char *argv[])
 
 daemonize_complete();
 
+stopwatch_create(NORTHD_LOOP_STOPWATCH_NAME, SW_MS);
+stopwatch_create(OVNNB_DB_RUN_STOPWATCH_NAME, SW_MS);
+stopwatch_create(OVNSB_DB_RUN_STOPWATCH_NAME, SW_MS);
+
 /* Main loop. */
 exiting = false;
 while (!exiting) {
@@ -1285,8 +1294,12 @@ main(int argc, char *argv[])
 status.locked = has_lock;
 status.pause = sb_ctx->paused;
 
+stopwatch_start(OVNNB_DB_RUN_STOPWATCH_NAME, time_msec());
 northd_run(nb_ctx);
+stopwatch_stop(OVNNB_DB_RUN_STOPWATCH_NAME, time_msec());
+stopwatch_start(OVNSB_DB_RUN_STOPWATCH_NAME, time_msec());
 northd_run(sb_ctx);
+stopwatch_stop(OVNSB_DB_RUN_STOPWATCH_NAME, time_msec());
 northd_update_probe_interval(nb_ctx, sb_ctx);
 if (ovsdb_cs_has_lock(sb_ctx->cs) &&
 sb_ctx->state == S_UPDATE &&
@@ -1297,6 +1310,8 @@ main(int argc, char *argv[])
 northd_send_deltas(sb_ctx);
 }
 
+stopwatch_stop(NORTHD_LOOP_STOPWATCH_NAME, time_msec());
+stopwatch_start(NORTHD_LOOP_STOPWATCH_NAME, time_msec());
 unixctl_server_run(unixctl);
 
 northd_wait(nb_ctx);
diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index d872f6a3cc1d..bffa18de5c2d 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -49,6 +49,7 @@
 #include "smap.h"
 #include "sset.h"
 #include "svec.h"
+#include "stopwatch.h"
 #include "stream.h"
 #include "stream-ssl.h"
 #include "timeval.h"
@@ -59,6 +60,10 @@
 
 VLOG_DEFINE_THIS_MODULE(ovn_northd);
 
+#define NORTHD_LOOP_STOPWATCH_NAME "ovn-northd-loop"
+#define OVNNB_DB_RUN_STOPWATCH_NAME "ovnnb_db_run"
+#define OVNSB_DB_RUN_STOPWATCH_NAME "ovnsb_db_run"
+
 static unixctl_cb_func ovn_northd_exit;
 static unixctl_cb_func ovn_northd_pause;
 static unixctl_cb_func ovn_northd_resume;
@@ -13238,6 +13243,9 @@ ovnnb_db_run(struct northd_context *ctx,
 if (!ctx->ovnsb_txn || !ctx->ovnnb_txn) {
 return;
 }
+
+stopwatch_start(OVNNB_DB_RUN_STOPWATCH_NAME, time_msec());
+
 struct hmap port_groups;
 struct hmap mcast_groups;
 struct hmap igmp_groups;
@@ -13379,6 +13387,8 @@ ovnnb_db_run(struct northd_context *ctx,
  * as well.
  */
 cleanup_macam();
+
+stopwatch_stop(OVNNB_DB_RUN_STOPWATCH_NAME, time_msec());
 }
 
 /* Stores the list of chassis which references an ha_chassis_group.
@@ -13970,6 +13980,8 @@ ovnsb_db_run(struct northd_context *ctx,
 return;
 }
 
+stopwatch_start(OVNSB_DB_RUN_STOPWATCH_NAME, time_msec());
+
 struct shash ha_ref_chassis_map = SHASH_INITIALIZER(_ref_chassis_map);
 handle_port_binding_changes(ctx, ports, _ref_chassis_map);
 update_northbound_cfg(ctx, sb_loop, loop_start_time);
@@ -13977,6 +13989,8 @@ ovnsb_db_run(struct northd_context *ctx,
 update_sb_ha_group_ref_chassis(_ref_chassis_map);
 }
 shash_destroy(_ref_chassis_map);
+
+stopwatch_stop(OVNSB_DB_RUN_STOPWATCH_NAME, time_msec());
 }
 
 static void
@@ -14425,6 +14439,10 @@ main(int argc, char *argv[])
 char *ovn_internal_version = ovn_get_internal_version();
 VLOG_INFO("OVN internal version is : [%s]", ovn_internal_version);
 
+stopwatch_create(NORTHD_LOOP_STOPWATCH_NAME, SW_MS);
+stopwatch_create(OVNNB_DB_RUN_STOPWATCH_NAME, SW_MS);
+stopwatch_create(OVNSB_DB_RUN_STOPWATCH_NAME, SW_MS);
+
 /* Main loop. */
 exiting = false;
 
@@ -14508,6 +14526,8 @@ main(int argc, char *argv[])
 ovsdb_idl_wait(ovnsb_idl_loop.idl);
 }
 
+stopwatch_stop(NORTHD_LOOP_STOPWATCH_NAME, time_msec());
+stopwatch_start(NORTHD_LOOP_STOPWATCH_NAME, time_msec());
 unixctl_server_run(unixctl);
 unixctl_server_wait(unixctl);
 memory_wait();
-- 
2.27.0

___
dev mailing list
d...@openvswitch.org

[ovs-dev] [PATCH ovn v2 0/3] tests: Add check-perf target

2021-06-17 Thread Mark Gray

This is a proposal to add some micro-benchmarks to aid developers
in benchmarking optimizations to OVN. It starts by adding simple
metrics for northd but could be expanded in future patches.

Mark Gray (3):
  ovn-macros.at: fix typo
  ovn-northd: Add useful stopwatches
  tests: Add check-perf target

 Documentation/topics/testing.rst |  49 
 northd/ovn-northd-ddlog.c|  15 +++
 northd/ovn-northd.c  |  20 +++
 tests/.gitignore |   3 +
 tests/automake.mk|  26 
 tests/ovn-macros.at  |   2 +-
 tests/perf-northd.at | 207 +++
 tests/perf-testsuite.at  |  26 
 8 files changed, 347 insertions(+), 1 deletion(-)
 create mode 100644 tests/perf-northd.at
 create mode 100644 tests/perf-testsuite.at

-- 
2.27.0


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH ovn v2 1/3] ovn-macros.at: fix typo

2021-06-17 Thread Mark Gray

Signed-off-by: Mark Gray 
---
 tests/ovn-macros.at | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at
index cd02b6986cc2..c92c07720112 100644
--- a/tests/ovn-macros.at
+++ b/tests/ovn-macros.at
@@ -184,7 +184,7 @@ ovn_start_northd() {
 # ovn-sbctl and ovn-nbctl use them by default, and starts ovn-northd running
 # against them.
 #
-# Normally this starts an active northd and a backup norhtd.  The following
+# Normally this starts an active northd and a backup northd.  The following
 # options are accepted to adjust that:
 #   --backup-northd=noneDon't start a backup northd.
 #   --backup-northd=paused  Start the backup northd in the paused state.
-- 
2.27.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v4 01/12] dpif-netdev: Add command line and function pointer for miniflow extract

2021-06-17 Thread 0-day Robot

Bleep bloop.  Greetings Kumar Amber, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


git-am:
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch' to see the failed patch
Patch failed at 0001 dpif-netdev: Add command line and function pointer for 
miniflow extract
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v13 01/12] dpif-netdev: Refactor to multiple header files.

2021-06-17 Thread 0-day Robot

Bleep bloop.  Greetings Cian Ferriter, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
ERROR: Improper whitespace around control block
#374 FILE: lib/dpif-netdev-private-dfc.h:111:
#define EMC_FOR_EACH_POS_WITH_HASH(EMC, CURRENT_ENTRY, HASH) \

ERROR: Improper whitespace around control block
#540 FILE: lib/dpif-netdev-private-dpcls.h:95:
#define NETDEV_FLOW_KEY_FOR_EACH_IN_FLOWMAP(VALUE, KEY, FLOWMAP)   \

ERROR: Inappropriate bracing around statement
#541 FILE: lib/dpif-netdev-private-dpcls.h:96:
MINIFLOW_FOR_EACH_IN_FLOWMAP (VALUE, &(KEY)->mf, FLOWMAP)

Lines checked: 1736, Warnings: 0, Errors: 3


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [v4 12/12] dpif/dpcls: limit count subtable search info logs

2021-06-17 Thread Kumar Amber

From: Harry van Haaren 

This commit avoids many instances of "using subtable X for miniflow (x,y)"
in the ovs-vswitchd log when using the DPCLS Autovalidator. This occurs
when no specialized subtable is found, and the generic "_any" version of
the avx512 subtable search implementation was used. This change logs the
subtable usage once, avoiding duplicates.

Signed-off-by: Harry van Haaren 
---
 lib/dpif-netdev-lookup-avx512-gather.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/dpif-netdev-lookup-avx512-gather.c 
b/lib/dpif-netdev-lookup-avx512-gather.c
index 2e754c89f..deed527b0 100644
--- a/lib/dpif-netdev-lookup-avx512-gather.c
+++ b/lib/dpif-netdev-lookup-avx512-gather.c
@@ -411,7 +411,7 @@ dpcls_subtable_avx512_gather_probe(uint32_t u0_bits, 
uint32_t u1_bits)
  */
 if (!f && (u0_bits + u1_bits) < (NUM_U64_IN_ZMM_REG * 2)) {
 f = dpcls_avx512_gather_mf_any;
-VLOG_INFO("Using avx512_gather_mf_any for subtable (%d,%d)\n",
+VLOG_INFO_ONCE("Using avx512_gather_mf_any for subtable (%d,%d)\n",
   u0_bits, u1_bits);
 }
 
-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [v4 10/12] dpif-netdev/mfex: Add AVX512 based optimized miniflow extract

2021-06-17 Thread Kumar Amber

From: Harry van Haaren 

This commit adds AVX512 implementations of miniflow extract.
By using the 64 bytes available in an AVX512 register, it is
possible to convert a packet to a miniflow data-structure in
a small quantity instructions.

The implementation here probes for Ether()/IP()/UDP() traffic,
and builds the appropriate miniflow data-structure for packets
that match the probe.

The implementation here is auto-validated by the miniflow
extract autovalidator, hence its correctness can be easily
tested and verified.

Note that this commit is designed to easily allow addition of new
traffic profiles in a scalable way, without code duplication for
each traffic profile.

Signed-off-by: Harry van Haaren 
---
 lib/automake.mk   |   1 +
 lib/dpif-netdev-extract-avx512.c  | 416 ++
 lib/dpif-netdev-private-extract.c |  15 ++
 lib/dpif-netdev-private-extract.h |  19 ++
 4 files changed, 451 insertions(+)
 create mode 100644 lib/dpif-netdev-extract-avx512.c

diff --git a/lib/automake.mk b/lib/automake.mk
index 3080bb04a..2b95d6f92 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -39,6 +39,7 @@ lib_libopenvswitchavx512_la_CFLAGS = \
$(AM_CFLAGS)
 lib_libopenvswitchavx512_la_SOURCES = \
lib/dpif-netdev-lookup-avx512-gather.c \
+   lib/dpif-netdev-extract-avx512.c \
lib/dpif-netdev-avx512.c
 lib_libopenvswitchavx512_la_LDFLAGS = \
-static
diff --git a/lib/dpif-netdev-extract-avx512.c b/lib/dpif-netdev-extract-avx512.c
new file mode 100644
index 0..1145ac8a9
--- /dev/null
+++ b/lib/dpif-netdev-extract-avx512.c
@@ -0,0 +1,416 @@
+/*
+ * Copyright (c) 2021 Intel.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifdef __x86_64__
+/* Sparse cannot handle the AVX512 instructions. */
+#if !defined(__CHECKER__)
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "flow.h"
+#include "dpdk.h"
+
+#include "dpif-netdev-private-dpcls.h"
+#include "dpif-netdev-private-extract.h"
+#include "dpif-netdev-private-flow.h"
+
+/* AVX512-BW level permutex2var_epi8 emulation. */
+static inline __m512i
+__attribute__((target("avx512bw")))
+_mm512_maskz_permutex2var_epi8_skx(__mmask64 k_mask,
+   __m512i v_data_0,
+   __m512i v_shuf_idxs,
+   __m512i v_data_1)
+{
+/* Manipulate shuffle indexes for u16 size. */
+__mmask64 k_mask_odd_lanes = 0x;
+/* clear away ODD lane bytes. Cannot be done above due to no u8 shift */
+__m512i v_shuf_idx_evn = _mm512_mask_blend_epi8(k_mask_odd_lanes,
+v_shuf_idxs, _mm512_setzero_si512());
+v_shuf_idx_evn = _mm512_srli_epi16(v_shuf_idx_evn, 1);
+
+__m512i v_shuf_idx_odd = _mm512_srli_epi16(v_shuf_idxs, 9);
+
+/* Shuffle each half at 16-bit width */
+__m512i v_shuf1 = _mm512_permutex2var_epi16(v_data_0, v_shuf_idx_evn,
+v_data_1);
+__m512i v_shuf2 = _mm512_permutex2var_epi16(v_data_0, v_shuf_idx_odd,
+v_data_1);
+
+/* Find if the shuffle index was odd, via mask and compare */
+uint16_t index_odd_mask = 0x1;
+const __m512i v_index_mask_u16 = _mm512_set1_epi16(index_odd_mask);
+
+/* EVEN lanes, find if u8 index was odd,  result as u16 bitmask */
+__m512i v_idx_even_masked = _mm512_and_si512(v_shuf_idxs,
+ v_index_mask_u16);
+__mmask32 evn_rotate_mask = _mm512_cmpeq_epi16_mask(v_idx_even_masked,
+v_index_mask_u16);
+
+/* ODD lanes, find if u8 index was odd, result as u16 bitmask */
+__m512i v_shuf_idx_srli8 = _mm512_srli_epi16(v_shuf_idxs, 8);
+__m512i v_idx_odd_masked = _mm512_and_si512(v_shuf_idx_srli8,
+v_index_mask_u16);
+__mmask32 odd_rotate_mask = _mm512_cmpeq_epi16_mask(v_idx_odd_masked,
+v_index_mask_u16);
+odd_rotate_mask = ~odd_rotate_mask;
+
+/* Rotate and blend results from each index */
+__m512i v_shuf_res_evn = _mm512_mask_srli_epi16(v_shuf1, evn_rotate_mask,
+v_shuf1, 8);
+__m512i v_shuf_res_odd = _mm512_mask_slli_epi16(v_shuf2, odd_rotate_mask,
+

[ovs-dev] [v4 11/12] dpif-netdev/mfex: add more AVX512 traffic profiles

2021-06-17 Thread Kumar Amber

From: Harry van Haaren 

This commit adds 3 new traffic profile implementations to the
existing avx512 miniflow extract infrastructure. The profiles added are:
- Ether()/IP()/TCP()
- Ether()/Dot1Q()/IP()/UDP()
- Ether()/Dot1Q()/IP()/TCP()

The design of the avx512 code here is for scalability to add more
traffic profiles, as well as enabling CPU ISA. Note that an implementation
is primarily adding static const data, which the compiler then specializes
away when the profile specific function is declared below.

As a result, the code is relatively maintainable, and scalable for new
traffic profiles as well as new ISA, and does not lower performance
compared with manually written code for each profile/ISA.

Note that confidence in the correctness of each implementation is
achieved through autovalidation, unit tests with known packets, and
fuzz tested packets.

Signed-off-by: Harry van Haaren 

---

Hi Readers,

If you have a traffic profile you'd like to see accelerated using
avx512 code, please send me an email and we can collaborate on adding
support for it!

Regards, -Harry
---
 lib/dpif-netdev-extract-avx512.c  | 155 ++
 lib/dpif-netdev-private-extract.c |  31 ++
 lib/dpif-netdev-private-extract.h |   4 +
 3 files changed, 190 insertions(+)

diff --git a/lib/dpif-netdev-extract-avx512.c b/lib/dpif-netdev-extract-avx512.c
index 1145ac8a9..0e0f6e295 100644
--- a/lib/dpif-netdev-extract-avx512.c
+++ b/lib/dpif-netdev-extract-avx512.c
@@ -117,6 +117,13 @@ _mm512_maskz_permutexvar_epi8_wrap(__mmask64 kmask, 
__m512i idx, __m512i a)
 
 #define PATTERN_ETHERTYPE_MASK PATTERN_ETHERTYPE_GEN(0xFF, 0xFF)
 #define PATTERN_ETHERTYPE_IPV4 PATTERN_ETHERTYPE_GEN(0x08, 0x00)
+#define PATTERN_ETHERTYPE_DT1Q PATTERN_ETHERTYPE_GEN(0x81, 0x00)
+
+/* VLAN (Dot1Q) patterns and masks. */
+#define PATTERN_DT1Q_MASK   \
+  0x00, 0x00, 0xFF, 0xFF,
+#define PATTERN_DT1Q_IPV4   \
+  0x00, 0x00, 0x08, 0x00,
 
 /* Generator for checking IPv4 ver, ihl, and proto */
 #define PATTERN_IPV4_GEN(VER_IHL, FLAG_OFF_B0, FLAG_OFF_B1, PROTO) \
@@ -142,6 +149,29 @@ _mm512_maskz_permutexvar_epi8_wrap(__mmask64 kmask, 
__m512i idx, __m512i a)
   34, 35, 36, 37, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, /* UDP */   \
   NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, /* Unused. */
 
+/* TCP shuffle: tcp_ctl bits require mask/processing, not included here. */
+#define PATTERN_IPV4_TCP_SHUFFLE \
+   0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, NU, NU, /* Ether */ \
+  26, 27, 28, 29, 30, 31, 32, 33, NU, NU, NU, NU, 20, 15, 22, 23, /* IPv4 */  \
+  NU, NU, NU, NU, NU, NU, NU, NU, 34, 35, 36, 37, NU, NU, NU, NU, /* TCP */   \
+  NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, /* Unused. */
+
+#define PATTERN_DT1Q_IPV4_UDP_SHUFFLE \
+  /* Ether (2 blocks): Note that *VLAN* type is written here. */  \
+  0,  1,  2,  3,  4,  5,  6,  7, 8,  9, 10, 11, 16, 17,  0,  0,   \
+  /* VLAN (1 block): Note that the *EtherHdr->Type* is written here. */   \
+  12, 13, 14, 15, 0, 0, 0, 0, \
+  30, 31, 32, 33, 34, 35, 36, 37, 0, 0, 0, 0, 24, 19, 26, 27, /* IPv4 */  \
+  38, 39, 40, 41, NU, NU, NU, NU, /* UDP */
+
+#define PATTERN_DT1Q_IPV4_TCP_SHUFFLE \
+  /* Ether (2 blocks): Note that *VLAN* type is written here. */  \
+  0,  1,  2,  3,  4,  5,  6,  7, 8,  9, 10, 11, 16, 17,  0,  0,   \
+  /* VLAN (1 block): Note that the *EtherHdr->Type* is written here. */   \
+  12, 13, 14, 15, 0, 0, 0, 0, \
+  30, 31, 32, 33, 34, 35, 36, 37, 0, 0, 0, 0, 24, 19, 26, 27, /* IPv4 */  \
+  NU, NU, NU, NU, NU, NU, NU, NU, 38, 39, 40, 41, NU, NU, NU, NU, /* TCP */   \
+  NU, NU, NU, NU, NU, NU, NU, NU, /* Unused. */
 
 /* Generation of K-mask bitmask values, to zero out data in result. Note that
  * these correspond 1:1 to the above "*_SHUFFLE" values, and bit used must be
@@ -151,12 +181,22 @@ _mm512_maskz_permutexvar_epi8_wrap(__mmask64 kmask, 
__m512i idx, __m512i a)
  * Note the ULL suffix allows shifting by 32 or more without integer overflow.
  */
 #define KMASK_ETHER 0x1FFFULL
+#define KMASK_DT1Q  0x000FULL
 #define KMASK_IPV4  0xF0FFULL
 #define KMASK_UDP   0x000FULL
+#define KMASK_TCP   0x0F00ULL
 
 #define PATTERN_IPV4_UDP_KMASK \
 (KMASK_ETHER | (KMASK_IPV4 << 16) | (KMASK_UDP << 32))
 
+#define PATTERN_IPV4_TCP_KMASK \
+(KMASK_ETHER | (KMASK_IPV4 << 16) | (KMASK_TCP << 32))
+
+#define PATTERN_DT1Q_IPV4_UDP_KMASK \
+(KMASK_ETHER | (KMASK_DT1Q << 16) | (KMASK_IPV4 << 24) | (KMASK_UDP << 40))
+
+#define PATTERN_DT1Q_IPV4_TCP_KMASK \
+(KMASK_ETHER | (KMASK_DT1Q << 16) | (KMASK_IPV4 << 24) | (KMASK_TCP << 40))
 
 /* This union allows

[ovs-dev] [v4 09/12] dpdk: add additional CPU ISA detection strings

2021-06-17 Thread Kumar Amber

From: Harry van Haaren 

This commit enables OVS to at runtime check for more detailed
AVX512 capabilities, specifically Byte and Word (BW) extensions,
and Vector Bit Manipulation Instructions (VBMI).

These instructions will be used in the CPU ISA optimized
implementations of traffic profile aware miniflow extract.

Signed-off-by: Harry van Haaren 
---
 lib/dpdk.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/dpdk.c b/lib/dpdk.c
index a9494a40f..9d13e4ab7 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -655,6 +655,8 @@ dpdk_get_cpu_has_isa(const char *arch, const char *feature)
 #if __x86_64__
 /* CPU flags only defined for the architecture that support it. */
 CHECK_CPU_FEATURE(feature, "avx512f", RTE_CPUFLAG_AVX512F);
+CHECK_CPU_FEATURE(feature, "avx512bw", RTE_CPUFLAG_AVX512BW);
+CHECK_CPU_FEATURE(feature, "avx512vbmi", RTE_CPUFLAG_AVX512VBMI);
 CHECK_CPU_FEATURE(feature, "avx512vpopcntdq", RTE_CPUFLAG_AVX512VPOPCNTDQ);
 CHECK_CPU_FEATURE(feature, "bmi2", RTE_CPUFLAG_BMI2);
 #endif
-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [v4 07/12] test/sytem-dpdk: Add unit test for mfex autovalidator

2021-06-17 Thread Kumar Amber

Tests:
  6: OVS-DPDK - MFEX Autovalidator
  7: OVS-DPDK - MFEX Autovalidator Fuzzy

Added a new directory to store the PCAP file used
in the tests and a script to generate the fuzzy traffic
type pcap to be used in fuzzy unit test.

Signed-off-by: Kumar Amber 
---
 tests/automake.mk|   5 +
 tests/pcap/fuzzy.py  |  32 ++
 tests/pcap/mfex_test | Bin 0 -> 416 bytes
 tests/system-dpdk.at |  46 +++
 4 files changed, 83 insertions(+)
 create mode 100755 tests/pcap/fuzzy.py
 create mode 100644 tests/pcap/mfex_test

diff --git a/tests/automake.mk b/tests/automake.mk
index 1a528aa39..532875971 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -142,6 +142,11 @@ $(srcdir)/tests/fuzz-regression-list.at: tests/automake.mk
echo "TEST_FUZZ_REGRESSION([$$basename])"; \
done > $@.tmp && mv $@.tmp $@
 
+EXTRA_DIST += $(MFEX_AUTOVALIDATOR_TESTS)
+MFEX_AUTOVALIDATOR_TESTS = \
+   tests/pcap/mfex_test \
+   tests/pcap/fuzzy.py
+
 OVSDB_CLUSTER_TESTSUITE_AT = \
tests/ovsdb-cluster-testsuite.at \
tests/ovsdb-execution.at \
diff --git a/tests/pcap/fuzzy.py b/tests/pcap/fuzzy.py
new file mode 100755
index 0..a8051ba2b
--- /dev/null
+++ b/tests/pcap/fuzzy.py
@@ -0,0 +1,32 @@
+#!/usr/bin/python3
+try:
+   from scapy.all import *
+except ModuleNotFoundError as err:
+   print(err + ": Scapy")
+import sys
+import os
+
+path = os.environ['OVS_DIR'] + "/tests/pcap/fuzzy"
+pktdump = PcapWriter(path, append=False, sync=True)
+
+for i in range(0, 2000):
+
+   # Generate random protocol bases, use a fuzz() over the combined packet for 
full fuzzing.
+   eth = Ether(src=RandMAC(), dst=RandMAC())
+   vlan = Dot1Q()
+   ipv4 = IP(src=RandIP(), dst=RandIP())
+   ipv6 = IPv6(src=RandIP6(), dst=RandIP6())
+   udp = UDP()
+   tcp = TCP()
+
+   # IPv4 packets with fuzzing
+   pktdump.write(fuzz(eth/ipv4/udp))
+   pktdump.write(fuzz(eth/ipv4/tcp))
+   pktdump.write(fuzz(eth/vlan/ipv4/udp))
+   pktdump.write(fuzz(eth/vlan/ipv4/tcp))
+
+# IPv6 packets with fuzzing
+   pktdump.write(fuzz(eth/ipv6/udp))
+   pktdump.write(fuzz(eth/ipv6/tcp))
+   pktdump.write(fuzz(eth/vlan/ipv6/udp))
+   pktdump.write(fuzz(eth/vlan/ipv6/tcp))
\ No newline at end of file
diff --git a/tests/pcap/mfex_test b/tests/pcap/mfex_test
new file mode 100644
index 
..1aac67b8d643ecb016c758cba4cc32212a80f52a
GIT binary patch
literal 416
zcmca|c+)~A1{MYw`2U}Qff2}QK`M68ITRa|G@yFii5$Gfk6YL%z>@uY&}o|
z2s4N<1VH2&7y^V87$)XGOtD~MV$cFgfG~zBGGJ2#YtF$KST_NTIwYriok6N4Vm)gX-Q@c^{cp<7_5LgK^UuU{2>VS0RZ!RQ+EIW

literal 0
HcmV?d1

diff --git a/tests/system-dpdk.at b/tests/system-dpdk.at
index 802895488..46eaea35a 100644
--- a/tests/system-dpdk.at
+++ b/tests/system-dpdk.at
@@ -232,3 +232,49 @@ OVS_VSWITCHD_STOP(["\@does not exist. The Open vSwitch 
kernel module is probably
 \@EAL: No free hugepages reported in hugepages-1048576kB@d"])
 AT_CLEANUP
 dnl --
+
+dnl --
+dnl Add standard DPDK PHY port
+AT_SETUP([OVS-DPDK - MFEX Autovalidator])
+AT_KEYWORDS([dpdk])
+
+OVS_DPDK_START()
+
+dnl Add userspace bridge and attach it to OVS
+AT_CHECK([ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev])
+AT_CHECK([ovs-vsctl add-port br0 p1 -- set Interface p1 type=dpdk 
options:dpdk-devargs=net_pcap1,rx_pcap=$OVS_DIR/tests/pcap/mfex_test,infinite_rx=1],
 [], [stdout], [stderr])
+AT_CHECK([ovs-vsctl show], [], [stdout])
+
+
+AT_CHECK([ovs-appctl dpif-netdev/miniflow-parser-set autovalidator], [0], [dnl
+Miniflow implementation set to autovalidator.
+])
+sleep 5
+
+dnl Clean up
+AT_CHECK([ovs-vsctl del-port br0 p1], [], [stdout], [stderr])
+AT_CLEANUP
+dnl --
+
+dnl --
+dnl Add standard DPDK PHY port
+AT_SETUP([OVS-DPDK - MFEX Autovalidator Fuzzy])
+AT_KEYWORDS([dpdk])
+AT_CHECK([$PYTHON3 $OVS_DIR/tests/pcap/fuzzy.py], [], [stdout])
+OVS_DPDK_START()
+
+dnl Add userspace bridge and attach it to OVS
+AT_CHECK([ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev])
+AT_CHECK([ovs-vsctl add-port br0 p1 -- set Interface p1 type=dpdk 
options:dpdk-devargs=net_pcap1,rx_pcap=$OVS_DIR/tests/pcap/fuzzy,infinite_rx=1],
 [], [stdout], [stderr])
+AT_CHECK([ovs-vsctl show], [], [stdout])
+
+
+AT_CHECK([ovs-appctl dpif-netdev/miniflow-parser-set autovalidator], [0], [dnl
+Miniflow implementation set to autovalidator.
+])
+sleep 20
+
+dnl Clean up
+AT_CHECK([ovs-vsctl del-port br0 p1], [], [stdout], [stderr])
+AT_CLEANUP
+dnl --
-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [v4 08/12] dpif/stats: add miniflow extract opt hits counter

2021-06-17 Thread Kumar Amber

From: Harry van Haaren 

This commit adds a new counter to be displayed to the user when
requesting datapath packet statistics. It counts the number of
packets that are parsed and a miniflow built up from it by the
optimized miniflow extract parsers.

The ovs-appctl command "dpif-netdev/pmd-perf-show" now has an
extra entry indicating if the optimized MFEX was hit:

  - MFEX Opt hits:6786432  (100.0 %)

Signed-off-by: Harry van Haaren 
---
 lib/dpif-netdev-avx512.c |  2 ++
 lib/dpif-netdev-perf.c   |  3 +++
 lib/dpif-netdev-perf.h   |  1 +
 lib/dpif-netdev.c| 14 +-
 tests/pmd.at |  6 --
 5 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
index bb99b23ff..f55786f8c 100644
--- a/lib/dpif-netdev-avx512.c
+++ b/lib/dpif-netdev-avx512.c
@@ -297,8 +297,10 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
*pmd,
 }
 
 /* At this point we don't return error anymore, so commit stats here. */
+uint32_t mfex_hit = __builtin_popcountll(mf_mask);
 pmd_perf_update_counter(>perf_stats, PMD_STAT_RECV, batch_size);
 pmd_perf_update_counter(>perf_stats, PMD_STAT_PHWOL_HIT, phwol_hits);
+pmd_perf_update_counter(>perf_stats, PMD_STAT_MFEX_OPT_HIT, mfex_hit);
 pmd_perf_update_counter(>perf_stats, PMD_STAT_EXACT_HIT, emc_hits);
 pmd_perf_update_counter(>perf_stats, PMD_STAT_SMC_HIT, smc_hits);
 pmd_perf_update_counter(>perf_stats, PMD_STAT_MASKED_HIT,
diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c
index 7103a2d4d..d7676ea2b 100644
--- a/lib/dpif-netdev-perf.c
+++ b/lib/dpif-netdev-perf.c
@@ -247,6 +247,7 @@ pmd_perf_format_overall_stats(struct ds *str, struct 
pmd_perf_stats *s,
 "  Rx packets:%12"PRIu64"  (%.0f Kpps, %.0f cycles/pkt)\n"
 "  Datapath passes:   %12"PRIu64"  (%.2f passes/pkt)\n"
 "  - PHWOL hits:  %12"PRIu64"  (%5.1f %%)\n"
+"  - MFEX Opt hits:   %12"PRIu64"  (%5.1f %%)\n"
 "  - EMC hits:%12"PRIu64"  (%5.1f %%)\n"
 "  - SMC hits:%12"PRIu64"  (%5.1f %%)\n"
 "  - Megaflow hits:   %12"PRIu64"  (%5.1f %%, %.2f "
@@ -258,6 +259,8 @@ pmd_perf_format_overall_stats(struct ds *str, struct 
pmd_perf_stats *s,
 passes, rx_packets ? 1.0 * passes / rx_packets : 0,
 stats[PMD_STAT_PHWOL_HIT],
 100.0 * stats[PMD_STAT_PHWOL_HIT] / passes,
+stats[PMD_STAT_MFEX_OPT_HIT],
+100.0 * stats[PMD_STAT_MFEX_OPT_HIT] / passes,
 stats[PMD_STAT_EXACT_HIT],
 100.0 * stats[PMD_STAT_EXACT_HIT] / passes,
 stats[PMD_STAT_SMC_HIT],
diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
index 8b1a52387..834c26260 100644
--- a/lib/dpif-netdev-perf.h
+++ b/lib/dpif-netdev-perf.h
@@ -57,6 +57,7 @@ extern "C" {
 
 enum pmd_stat_type {
 PMD_STAT_PHWOL_HIT, /* Packets that had a partial HWOL hit (phwol). */
+PMD_STAT_MFEX_OPT_HIT,  /* Packets that had miniflow optimized match. */
 PMD_STAT_EXACT_HIT, /* Packets that had an exact match (emc). */
 PMD_STAT_SMC_HIT,   /* Packets that had a sig match hit (SMC). */
 PMD_STAT_MASKED_HIT,/* Packets that matched in the flow table. */
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 35c927d55..7a8f15415 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -660,6 +660,7 @@ pmd_info_show_stats(struct ds *reply,
   "  packet recirculations: %"PRIu64"\n"
   "  avg. datapath passes per packet: %.02f\n"
   "  phwol hits: %"PRIu64"\n"
+  "  mfex opt hits: %"PRIu64"\n"
   "  emc hits: %"PRIu64"\n"
   "  smc hits: %"PRIu64"\n"
   "  megaflow hits: %"PRIu64"\n"
@@ -669,10 +670,9 @@ pmd_info_show_stats(struct ds *reply,
   "  avg. packets per output batch: %.02f\n",
   total_packets, stats[PMD_STAT_RECIRC],
   passes_per_pkt, stats[PMD_STAT_PHWOL_HIT],
-  stats[PMD_STAT_EXACT_HIT],
-  stats[PMD_STAT_SMC_HIT],
-  stats[PMD_STAT_MASKED_HIT], lookups_per_hit,
-  stats[PMD_STAT_MISS], stats[PMD_STAT_LOST],
+  stats[PMD_STAT_MFEX_OPT_HIT], stats[PMD_STAT_EXACT_HIT],
+  stats[PMD_STAT_SMC_HIT], stats[PMD_STAT_MASKED_HIT],
+  lookups_per_hit, stats[PMD_STAT_MISS], stats[PMD_STAT_LOST],
   packets_per_batch);
 
 if (total_cycles == 0) {
@@ -6863,7 +6863,7 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
bool md_is_valid, odp_port_t port_no)
 {
 struct netdev_flow_key *key = [0];
-size_t n_missed = 0, n_emc_hit = 0, n_phwol_hit = 0;
+size_t n_missed = 0, n_emc_hit = 0, n_phwol_hit = 0, n_mfex_opt_hit = 0;
 struct dp_packet_batch single_packet;
 struct dfc_cache *cache = >flow_cache;

[ovs-dev] [v4 06/12] dpif-netdev: Add additional packet count parameter for study function

2021-06-17 Thread Kumar Amber

This commit introduces additonal command line paramter
for mfex study function. If user provides additional packet out
it is used in study to compare minimum packets which must be processed
else a default value is choosen.

$ OVS_DIR/utilities/ovs-appctl dpif-netdev/miniflow-parser-set study 500

Signed-off-by: Kumar Amber 
---
 Documentation/topics/dpdk/bridge.rst |  8 ++-
 lib/dpif-netdev-extract-study.c  | 15 +++-
 lib/dpif-netdev-private-extract.h|  8 +++
 lib/dpif-netdev.c| 34 +++-
 4 files changed, 57 insertions(+), 8 deletions(-)

diff --git a/Documentation/topics/dpdk/bridge.rst 
b/Documentation/topics/dpdk/bridge.rst
index 1c78adc75..e7e91289a 100644
--- a/Documentation/topics/dpdk/bridge.rst
+++ b/Documentation/topics/dpdk/bridge.rst
@@ -288,7 +288,13 @@ An implementation can be selected manually by the 
following command ::
 Also user can select the study implementation which studies the traffic for
 a specific number of packets by applying all availbale implementaions of
 miniflow extract and than chooses the one with most optimal result for that
-traffic pattern.
+traffic pattern. User can also provide additonal parameter as packet count
+which is minimum packets which OVS must study before choosing optimal
+implementation, If no packet count is provided than default value is choosen.
+
+Study can be selected with packet count by the following command ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set study 1024
 
 Miniflow Extract Validation
 ~~~
diff --git a/lib/dpif-netdev-extract-study.c b/lib/dpif-netdev-extract-study.c
index d063d040c..c48fb125e 100644
--- a/lib/dpif-netdev-extract-study.c
+++ b/lib/dpif-netdev-extract-study.c
@@ -55,6 +55,19 @@ get_study_stats(void)
 return stats;
 }
 
+static uint32_t pkt_compare_count = 0;
+
+uint32_t mfex_set_study_pkt_cnt(uint32_t pkt_cmp_count,
+struct dpif_miniflow_extract_impl *opt)
+{
+if ((opt->extract_func == mfex_study_traffic) && (pkt_cmp_count != 0)) {
+pkt_compare_count = pkt_cmp_count;
+return 0;
+}
+pkt_compare_count = MFEX_MAX_COUNT;
+return -EINVAL;
+}
+
 uint32_t
 mfex_study_traffic(struct dp_packet_batch *packets,
struct netdev_flow_key *keys,
@@ -87,7 +100,7 @@ mfex_study_traffic(struct dp_packet_batch *packets,
 
 /* Choose the best implementation after a minimum packets have been
  * processed. */
-if (stats->pkt_count >= MFEX_MAX_COUNT) {
+if (stats->pkt_count >= pkt_compare_count) {
 uint32_t best_func_index = MFEX_IMPL_START_IDX;
 uint32_t max_hits = 0;
 for (int i = MFEX_IMPL_START_IDX; i < impl_count; i++) {
diff --git a/lib/dpif-netdev-private-extract.h 
b/lib/dpif-netdev-private-extract.h
index d8a284db7..0ec74bef9 100644
--- a/lib/dpif-netdev-private-extract.h
+++ b/lib/dpif-netdev-private-extract.h
@@ -127,5 +127,13 @@ dpif_miniflow_extract_get_default(void);
  * overridden at runtime. */
 void
 dpif_miniflow_extract_set_default(miniflow_extract_func func);
+/* Sets the packet count from user to the stats for use in
+ * study function to match against the classified packets to choose
+ * the optimal implementation.
+ * On error, returns EINVAL.
+ * On success, returns 0.
+ */
+uint32_t mfex_set_study_pkt_cnt(uint32_t pkt_cmp_count,
+struct dpif_miniflow_extract_impl *opt);
 
 #endif /* DPIF_NETDEV_AVX512_EXTRACT */
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 716e0debf..35c927d55 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -1141,14 +1141,29 @@ dpif_miniflow_extract_impl_set(struct unixctl_conn 
*conn, int argc,
 return;
 }
 new_func = opt->extract_func;
-/* argv[2] is optional datapath instance. If no datapath name is provided.
+
+/* argv[2] is optional packet count, which user can provide along with
+ * study function to set the minimum packet that must be matched in order
+ * to choose the optimal function. */
+uint32_t pkt_cmp_count = 0;
+uint32_t study_ret;
+if (argc == 3) {
+char *err_str;
+pkt_cmp_count = strtoul(argv[2], _str, 10);
+study_ret = mfex_set_study_pkt_cnt(pkt_cmp_count, opt);
+} else {
+/* Default packet compare count when packets count not provided. */
+study_ret = mfex_set_study_pkt_cnt(0, opt);
+}
+
+/* argv[3] is optional datapath instance. If no datapath name is provided.
  * and only one datapath exists, the one existing datapath is reprobed.
  */
 ovs_mutex_lock(_netdev_mutex);
 struct dp_netdev *dp = NULL;
 
-if (argc == 3) {
-dp = shash_find_data(_netdevs, argv[2]);
+if (argc == 4) {
+dp = shash_find_data(_netdevs, argv[3]);
 } else if (shash_count(_netdevs) == 1) {
 dp = shash_first(_netdevs)->data;
 }
@@ -1182,7 +1197,14 @@ dpif_miniflow_extract_impl_set(struct

[ovs-dev] [v4 05/12] dpif-netdev: Add configure to enable autovalidator at build time.

2021-06-17 Thread Kumar Amber

This commit adds a new command to allow the user to enable
autovalidatior by default at build time thus allowing for
runnig unit test by default.

 $ ./configure --enable-mfex-default-autovalidator

Signed-off-by: Kumar Amber 
Co-authored-by: Harry van Haaren 
Signed-off-by: Harry van Haaren 
---
 Documentation/topics/dpdk/bridge.rst |  5 +
 NEWS | 12 +++-
 acinclude.m4 | 16 
 configure.ac |  1 +
 lib/dpif-netdev-private-extract.c| 24 
 lib/dpif-netdev-private-extract.h| 10 ++
 lib/dpif-netdev.c|  7 +--
 7 files changed, 72 insertions(+), 3 deletions(-)

diff --git a/Documentation/topics/dpdk/bridge.rst 
b/Documentation/topics/dpdk/bridge.rst
index b262b98f8..1c78adc75 100644
--- a/Documentation/topics/dpdk/bridge.rst
+++ b/Documentation/topics/dpdk/bridge.rst
@@ -307,6 +307,11 @@ To set the Miniflow autovalidator, use this command ::
 
 $ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
 
+A compile time option is available in order to test it with the OVS unit
+test suite. Use the following configure option ::
+
+$ ./configure --enable-mfex-default-autovalidator
+
 Unit Test Miniflow Extract
 ++
 
diff --git a/NEWS b/NEWS
index 63a485309..ed9f4d4c4 100644
--- a/NEWS
+++ b/NEWS
@@ -24,6 +24,17 @@ Post-v2.15.0
  * An optimized miniflow extract (mfex) implementation is now available,
which uses CPU SIMD ISA to parse specific traffic profiles efficiently.
Refer to the documentation for details on how to enable it at runtime.
+ * Cache results for CPU ISA checks, reduces overhead on repeated lookups.
+ * Add command line option to switch between mfex function pointers.
+ * Add miniflow extract auto-validator function to compare different
+   miniflow extract implementations against default implementation.
+ * Add study function to miniflow function table which studies packet
+   and automatically chooses the best miniflow implementation for that
+   traffic.
+ * Add AVX512 based optimized miniflow extract function for traffic type
+   IP/UDP.
+ * Add build time configure command to enable auto-validatior as default
+   miniflow implementation at build time.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
@@ -35,7 +46,6 @@ Post-v2.15.0
  * New option '--election-timer' to the 'create-cluster' command to set the
leader election timer during cluster creation.
 
-
 v2.15.0 - 15 Feb 2021
 -
- OVSDB:
diff --git a/acinclude.m4 b/acinclude.m4
index 5fbcd9872..e2704cfda 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -14,6 +14,22 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
+dnl Set OVS MFEX Autovalidator as default miniflow extract at compile time?
+dnl This enables automatically running all unit tests with all MFEX
+dnl implementations.
+AC_DEFUN([OVS_CHECK_MFEX_AUTOVALIDATOR], [
+  AC_ARG_ENABLE([mfex-default-autovalidator],
+[AC_HELP_STRING([--enable-mfex-default-autovalidator], [Enable 
MFEX autovalidator as default miniflow_extract implementation.])],
+[autovalidator=yes],[autovalidator=no])
+  AC_MSG_CHECKING([whether MFEX Autovalidator is default implementation])
+  if test "$autovalidator" != yes; then
+AC_MSG_RESULT([no])
+  else
+OVS_CFLAGS="$OVS_CFLAGS -DMFEX_AUTOVALIDATOR_DEFAULT"
+AC_MSG_RESULT([yes])
+  fi
+])
+
 dnl Set OVS DPCLS Autovalidator as default subtable search at compile time?
 dnl This enables automatically running all unit tests with all DPCLS
 dnl implementations.
diff --git a/configure.ac b/configure.ac
index e45685a6c..46c402892 100644
--- a/configure.ac
+++ b/configure.ac
@@ -186,6 +186,7 @@ OVS_ENABLE_SPARSE
 OVS_CTAGS_IDENTIFIERS
 OVS_CHECK_DPCLS_AUTOVALIDATOR
 OVS_CHECK_DPIF_AVX512_DEFAULT
+OVS_CHECK_MFEX_AUTOVALIDATOR
 OVS_CHECK_BINUTILS_AVX512
 
 AC_ARG_VAR(KARCH, [Kernel Architecture String])
diff --git a/lib/dpif-netdev-private-extract.c 
b/lib/dpif-netdev-private-extract.c
index d86268a1d..2008e5ee5 100644
--- a/lib/dpif-netdev-private-extract.c
+++ b/lib/dpif-netdev-private-extract.c
@@ -230,3 +230,27 @@ dpif_miniflow_extract_autovalidator(struct dp_packet_batch 
*packets,
  */
 return 0;
 }
+
+/* Variable to hold the defaualt mfex implementation. */
+static miniflow_extract_func default_mfex_func = NULL;
+
+void
+dpif_miniflow_extract_set_default(miniflow_extract_func func)
+{
+default_mfex_func = func;
+}
+
+miniflow_extract_func
+dpif_miniflow_extract_get_default(void)
+{
+
+#ifdef MFEX_AUTOVALIDATOR_DEFAULT
+ovs_assert(mfex_impls[0].extract_func ==
+   dpif_miniflow_extract_autovalidator);
+VLOG_INFO("Default miniflow Extract implementation %s \n",
+

[ovs-dev] [v4 04/12] docs/dpdk/bridge: add miniflow extract section.

2021-06-17 Thread Kumar Amber

This commit adds a section to the dpdk/bridge.rst netdev documentation,
detailing the added miniflow functionality. The newly added commands are
documented, and sample output is provided.

The use of auto-validator and special study function is also described
in detail as well as running fuzzy tests.

Signed-off-by: Kumar Amber 
Co-authored-by: Cian Ferriter 
Signed-off-by: Cian Ferriter 
Co-authored-by: Harry van Haaren 
Signed-off-by: Harry van Haaren 
---
 Documentation/topics/dpdk/bridge.rst | 105 +++
 NEWS |   3 +
 2 files changed, 108 insertions(+)

diff --git a/Documentation/topics/dpdk/bridge.rst 
b/Documentation/topics/dpdk/bridge.rst
index f59e26cbe..b262b98f8 100644
--- a/Documentation/topics/dpdk/bridge.rst
+++ b/Documentation/topics/dpdk/bridge.rst
@@ -256,3 +256,108 @@ The following line should be seen in the configure output 
when the above option
 is used ::
 
 checking whether DPIF AVX512 is default implementation... yes
+
+Miniflow Extract
+
+
+Miniflow extract (MFEX) performs parsing of the raw packets and extracts the
+important header information into a compressed miniflow. This miniflow is
+composed of bits and blocks where the bits signify which blocks are set or
+have values where as the blocks hold the metadata, ip, udp, vlan, etc. These
+values are used by the datapath for switching decisions later.
+
+Most modern CPUs are have SIMD capabilities. These SIMD instructions are able
+to process a vector rather than act on one single data. OVS provides multiple
+implementations of miniflow extract. This allows the user to take advantage
+of SIMD instructions like AVX512 to gain additional performance.
+
+A list of implementations can be obtained by the following command. The
+command also shows whether the CPU supports each implementation ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-get
+Available Optimized Miniflow Extracts:
+  autovalidator (available: True)
+  disable (available: True)
+  study (available: True)
+  avx512_ip_udp (available: True)
+
+An implementation can be selected manually by the following command ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set study
+
+Also user can select the study implementation which studies the traffic for
+a specific number of packets by applying all availbale implementaions of
+miniflow extract and than chooses the one with most optimal result for that
+traffic pattern.
+
+Miniflow Extract Validation
+~~~
+
+As multiple versions of miniflow extract can co-exist, each with different
+CPU ISA optimizations, it is important to validate that they all give the
+exact same results. To easily test all miniflow implementations, an
+``autovalidator`` implementation of the miniflow exists. This implementation
+runs all other available miniflow extract implementations, and verifies that
+the results are identical.
+
+Running the OVS unit tests with the autovalidator enabled ensures all
+implementations provide the same results.
+
+To set the Miniflow autovalidator, use this command ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
+
+Unit Test Miniflow Extract
+++
+
+Unit test can also be used to test the workflow mentioned above by running
+the following test-case in tests/system-dpdk.at ::
+
+make check-dpdk TESTSUITEFLAGS=6
+6: OVS-DPDK - MFEX Autovalidator
+
+The unit test uses mulitple traffic type to test the correctness of the
+implementaions.
+
+Running Fuzzy test with Autovalidator
++
+
+Fuzzy tests can also be done on minfilow extract with the help of
+auto-validator and Scapy. The steps below describes the steps to
+reproduce the setup with IP being fuzzed to generate packets.
+
+Scapy is used to create fuzzy IP packets and save them into a PCAP ::
+
+pkt = fuzz(Ether()/IP()/TCP())
+
+Set the miniflow extract to autovalidator using ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
+
+OVS is configured to receive the generated packets ::
+
+$ ovs-vsctl add-port br0 pcap0 -- \
+set Interface pcap0 type=dpdk options:dpdk-devargs=net_pcap0
+"rx_pcap=fuzzy.pcap"
+
+With this workflow, the autovalidator will ensure that all MFEX
+implementations are classifying each packet in exactly the same way.
+If an optimized MFEX implementation causes a different miniflow to be
+generated, the autovalidator has ovs_assert and logging statements that
+will inform about the issue.
+
+Unit Fuzzy test with Autovalidator
++
+
+The prerquiste before running the unit test is to run the script provided ::
+
+tests/pcap/fuzzy.py
+
+This script generates a pcap with mulitple type of fuzzed packets to be used
+in the below unit test-case.
+
+Unit test can also be used to test the workflow mentioned above by running
+the following test-case in

[ovs-dev] [v4 03/12] dpif-netdev: Add study function to select the best mfex function

2021-06-17 Thread Kumar Amber

The study function runs all the available implementations
of miniflow_extract and makes a choice whose hitmask has
maximum hits and sets the mfex to that function.

Study can be run at runtime using the following command:

$ ovs-appctl dpif-netdev/miniflow-parser-set study

Signed-off-by: Kumar Amber 
Co-authored-by: Harry van Haaren 
Signed-off-by: Harry van Haaren 
---
 lib/automake.mk   |   1 +
 lib/dpif-netdev-extract-study.c   | 119 ++
 lib/dpif-netdev-private-extract.c |   5 ++
 lib/dpif-netdev-private-extract.h |  14 +++-
 4 files changed, 138 insertions(+), 1 deletion(-)
 create mode 100644 lib/dpif-netdev-extract-study.c

diff --git a/lib/automake.mk b/lib/automake.mk
index 6657b9ae5..3080bb04a 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -114,6 +114,7 @@ lib_libopenvswitch_la_SOURCES = \
lib/dpif-netdev.c \
lib/dpif-netdev.h \
lib/dpif-netdev-private-dfc.c \
+   lib/dpif-netdev-extract-study.c \
lib/dpif-netdev-private-dfc.h \
lib/dpif-netdev-private-dpcls.h \
lib/dpif-netdev-private-dpif.c \
diff --git a/lib/dpif-netdev-extract-study.c b/lib/dpif-netdev-extract-study.c
new file mode 100644
index 0..d063d040c
--- /dev/null
+++ b/lib/dpif-netdev-extract-study.c
@@ -0,0 +1,119 @@
+/*
+ * Copyright (c) 2021 Intel.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "dpif-netdev-private-extract.h"
+#include "dpif-netdev-private-thread.h"
+#include "openvswitch/vlog.h"
+#include "ovs-thread.h"
+
+VLOG_DEFINE_THIS_MODULE(dpif_mfex_extract_study);
+
+/* Max size of packets to be compared. */
+#define MFEX_MAX_COUNT (128)
+
+/* This value is the threshold for the amount of packets that
+ * must hit on the optimized miniflow extract before it will be
+ * accepted and used in the datapath after the study phase. */
+#define MFEX_MIN_HIT_COUNT_FOR_USE (MFEX_MAX_COUNT / 2)
+
+/* Struct to hold miniflow study stats. */
+struct study_stats {
+uint32_t pkt_count;
+uint32_t impl_hitcount[MFEX_IMPLS_MAX_SIZE];
+};
+
+/* Define per thread data to hold the study stats. */
+DEFINE_PER_THREAD_MALLOCED_DATA(struct study_stats *, study_stats);
+
+/* Allocate per thread PMD pointer space for study_stats. */
+static inline struct study_stats *
+get_study_stats(void)
+{
+struct study_stats *stats = study_stats_get();
+if (OVS_UNLIKELY(!stats)) {
+   stats = xzalloc(sizeof *stats);
+   study_stats_set_unsafe(stats);
+}
+return stats;
+}
+
+uint32_t
+mfex_study_traffic(struct dp_packet_batch *packets,
+   struct netdev_flow_key *keys,
+   uint32_t keys_size, odp_port_t in_port,
+   void *pmd_handle)
+{
+uint32_t hitmask = 0;
+uint32_t mask = 0;
+struct dp_netdev_pmd_thread *pmd = pmd_handle;
+struct dpif_miniflow_extract_impl *miniflow_funcs;
+uint32_t impl_count = dpif_miniflow_extract_info_get(_funcs);
+struct study_stats *stats = get_study_stats();
+
+/* Run traffic optimized miniflow_extract to collect the hitmask
+ * to be compared after certain packets have been hit to choose
+ * the best miniflow_extract version for that traffic. */
+for (int i = MFEX_IMPL_START_IDX; i < impl_count; i++) {
+if (miniflow_funcs[i].available) {
+hitmask = miniflow_funcs[i].extract_func(packets, keys, keys_size,
+ in_port, pmd_handle);
+stats->impl_hitcount[i] += count_1bits(hitmask);
+
+/* If traffic is not classified than we dont overwrite the keys
+ * array in minfiflow implementations so its safe to create a
+ * mask for all those packets whose miniflow have been created. */
+mask |= hitmask;
+}
+}
+stats->pkt_count += dp_packet_batch_size(packets);
+
+/* Choose the best implementation after a minimum packets have been
+ * processed. */
+if (stats->pkt_count >= MFEX_MAX_COUNT) {
+uint32_t best_func_index = MFEX_IMPL_START_IDX;
+uint32_t max_hits = 0;
+for (int i = MFEX_IMPL_START_IDX; i < impl_count; i++) {
+if (stats->impl_hitcount[i] > max_hits) {
+max_hits = stats->impl_hitcount[i];
+best_func_index = i;
+}
+}
+
+if (max_hits >= MFEX_MIN_HIT_COUNT_FOR_USE) {
+/* Set

[ovs-dev] [v4 02/12] dpif-netdev: Add auto validation function for miniflow extract

2021-06-17 Thread Kumar Amber

This patch introduced the auto-validation function which
allows users to compare the batch of packets obtained from
different miniflow implementations against the linear
miniflow extract and return a hitmask.

The autovaidator function can be triggered at runtime using the
following command:

$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator

Signed-off-by: Kumar Amber 
Co-authored-by: Harry van Haaren 
Signed-off-by: Harry van Haaren 
---
 lib/dpif-netdev-private-extract.c | 141 ++
 lib/dpif-netdev-private-extract.h |  15 
 lib/dpif-netdev.c |   2 +-
 3 files changed, 157 insertions(+), 1 deletion(-)

diff --git a/lib/dpif-netdev-private-extract.c 
b/lib/dpif-netdev-private-extract.c
index fcc56ef26..0741c19f9 100644
--- a/lib/dpif-netdev-private-extract.c
+++ b/lib/dpif-netdev-private-extract.c
@@ -32,6 +32,11 @@ VLOG_DEFINE_THIS_MODULE(dpif_netdev_extract);
 
 /* Implementations of available extract options. */
 static struct dpif_miniflow_extract_impl mfex_impls[] = {
+   {
+.probe = NULL,
+.extract_func = dpif_miniflow_extract_autovalidator,
+.name = "autovalidator",
+},
 {
 .probe = NULL,
 .extract_func = NULL,
@@ -84,3 +89,139 @@ dpif_miniflow_extract_info_get(struct 
dpif_miniflow_extract_impl **out_ptr)
 *out_ptr = mfex_impls;
 return ARRAY_SIZE(mfex_impls);
 }
+
+uint32_t
+dpif_miniflow_extract_autovalidator(struct dp_packet_batch *packets,
+struct netdev_flow_key *keys,
+uint32_t keys_size, odp_port_t in_port,
+void *pmd_handle)
+{
+const size_t cnt = dp_packet_batch_size(packets);
+uint16_t good_l2_5_ofs[NETDEV_MAX_BURST];
+uint16_t good_l3_ofs[NETDEV_MAX_BURST];
+uint16_t good_l4_ofs[NETDEV_MAX_BURST];
+uint16_t good_l2_pad_size[NETDEV_MAX_BURST];
+struct dp_packet *packet;
+struct dp_netdev_pmd_thread *pmd = pmd_handle;
+struct dpif_miniflow_extract_impl *miniflow_funcs;
+
+int32_t mfunc_count = dpif_miniflow_extract_info_get(_funcs);
+if (mfunc_count < 0) {
+pmd->miniflow_extract_opt = NULL;
+VLOG_ERR("failed to get miniflow extract function implementations\n");
+return 0;
+}
+ovs_assert(keys_size >= cnt);
+struct netdev_flow_key test_keys[NETDEV_MAX_BURST];
+
+/* Run scalar miniflow_extract to get default result. */
+DP_PACKET_BATCH_FOR_EACH (i, packet, packets) {
+pkt_metadata_init(>md, in_port);
+miniflow_extract(packet, [i].mf);
+
+/* Store known good metadata to compare with optimized metadata. */
+good_l2_5_ofs[i] = packet->l2_5_ofs;
+good_l3_ofs[i] = packet->l3_ofs;
+good_l4_ofs[i] = packet->l4_ofs;
+good_l2_pad_size[i] = packet->l2_pad_size;
+}
+
+/* Iterate through each version of miniflow implementations. */
+for (int j = MFEX_IMPL_START_IDX; j < ARRAY_SIZE(mfex_impls); j++) {
+if (!mfex_impls[j].available) {
+continue;
+}
+
+/* Reset keys and offsets before each implementation. */
+memset(test_keys, 0, keys_size * sizeof(struct netdev_flow_key));
+DP_PACKET_BATCH_FOR_EACH (i, packet, packets) {
+dp_packet_reset_offsets(packet);
+}
+/* Call optimized miniflow for each batch of packet. */
+uint32_t hit_mask = mfex_impls[j].extract_func(packets, test_keys,
+keys_size, in_port, pmd_handle);
+
+/* Do a miniflow compare for bits, blocks and offsets for all the
+ * classified packets in the hitmask marked by set bits. */
+while (hit_mask) {
+/* Index for the set bit. */
+uint32_t i = __builtin_ctz(hit_mask);
+/* Set the index in hitmask to Zero. */
+hit_mask &= (hit_mask - 1);
+
+uint32_t failed = 0;
+
+/* Check miniflow bits are equal. */
+if ((keys[i].mf.map.bits[0] != test_keys[i].mf.map.bits[0]) ||
+(keys[i].mf.map.bits[1] != test_keys[i].mf.map.bits[1])) {
+VLOG_ERR("Good 0x%llx 0x%llx\tTest 0x%llx 0x%llx\n",
+ keys[i].mf.map.bits[0], keys[i].mf.map.bits[1],
+ test_keys[i].mf.map.bits[0],
+ test_keys[i].mf.map.bits[1]);
+failed = 1;
+}
+
+if (!miniflow_equal([i].mf, _keys[i].mf)) {
+uint32_t block_cnt = miniflow_n_values([i].mf);
+VLOG_ERR("Autovalidation blocks failed for %s pkt %d",
+ mfex_impls[j].name, i);
+VLOG_ERR("  Good hexdump:\n");
+uint64_t *good_block_ptr = (uint64_t *)[i].buf;
+uint64_t *test_block_ptr = (uint64_t *)_keys[i].buf;
+for (uint32_t b = 0; b < block_cnt; b++) {
+VLOG_ERR("

[ovs-dev] [v4 01/12] dpif-netdev: Add command line and function pointer for miniflow extract

2021-06-17 Thread Kumar Amber

This patch introduces the mfex function pointers which allows
the user to switch between different miniflow extract implementations
which are provided by the OVS based on optimized ISA CPU.

The user can query for the available minflow extract variants available
for that CPU by following commands:

$ovs-appctl dpif-netdev/miniflow-parser-get

Similarly an user can set the miniflow implementation by the following
command :

$ ovs-appctl dpif-netdev/miniflow-parser-set name

This allow for more performance and flexibility to the user to choose
the miniflow implementation according to the needs.

Signed-off-by: Kumar Amber 
Co-authored-by: Harry van Haaren 
Signed-off-by: Harry van Haaren 
---
 lib/automake.mk   |   2 +
 lib/dpif-netdev-avx512.c  |  32 ++--
 lib/dpif-netdev-private-extract.c |  86 
 lib/dpif-netdev-private-extract.h |  94 ++
 lib/dpif-netdev-private-thread.h  |   4 +
 lib/dpif-netdev.c | 126 +-
 6 files changed, 337 insertions(+), 7 deletions(-)
 create mode 100644 lib/dpif-netdev-private-extract.c
 create mode 100644 lib/dpif-netdev-private-extract.h

diff --git a/lib/automake.mk b/lib/automake.mk
index 49f42c2a3..6657b9ae5 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -118,6 +118,8 @@ lib_libopenvswitch_la_SOURCES = \
lib/dpif-netdev-private-dpcls.h \
lib/dpif-netdev-private-dpif.c \
lib/dpif-netdev-private-dpif.h \
+   lib/dpif-netdev-private-extract.c \
+   lib/dpif-netdev-private-extract.h \
lib/dpif-netdev-private-flow.h \
lib/dpif-netdev-private-hwol.h \
lib/dpif-netdev-private-thread.h \
diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
index f9b199637..bb99b23ff 100644
--- a/lib/dpif-netdev-avx512.c
+++ b/lib/dpif-netdev-avx512.c
@@ -148,6 +148,15 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
*pmd,
  * // do all processing (HWOL->MFEX->EMC->SMC)
  * }
  */
+
+/* Do a batch minfilow extract into keys. */
+uint32_t mf_mask = 0;
+if (pmd->miniflow_extract_opt) {
+mf_mask = pmd->miniflow_extract_opt(packets, keys,
+batch_size, in_port,
+(void *) pmd);
+}
+/* Perform first packet interation */
 uint32_t lookup_pkts_bitmask = (1ULL << batch_size) - 1;
 uint32_t iter = lookup_pkts_bitmask;
 while (iter) {
@@ -159,6 +168,12 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
*pmd,
 pkt_metadata_init(>md, in_port);
 
 struct dp_netdev_flow *f = NULL;
+struct netdev_flow_key *key = [i];
+
+/* Check the minfiflow mask to see if the packet was correctly
+* classifed by vector mfex else do a scalar miniflow extract
+* for that packet. */
+uint32_t mfex_hit = (mf_mask & (1 << i));
 
 /* Check for partial hardware offload mark. */
 uint32_t mark;
@@ -166,7 +181,13 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
*pmd,
 f = mark_to_flow_find(pmd, mark);
 if (f) {
 rules[i] = >cr;
-pkt_meta[i].tcp_flags = parse_tcp_flags(packet);
+/* If AVX512 MFEX already classified the packet, use it. */
+if (mfex_hit) {
+pkt_meta[i].tcp_flags = miniflow_get_tcp_flags(>mf);
+} else {
+pkt_meta[i].tcp_flags = parse_tcp_flags(packet);
+}
+
 pkt_meta[i].bytes = dp_packet_size(packet);
 phwol_hits++;
 hwol_emc_smc_hitmask |= (1 << i);
@@ -174,11 +195,12 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
*pmd,
 }
 }
 
-/* Do miniflow extract into keys. */
-struct netdev_flow_key *key = [i];
-miniflow_extract(packet, >mf);
+if (!mfex_hit) {
+/* Do a scalar miniflow extract into keys */
+miniflow_extract(packet, >mf);
+}
 
-/* Cache TCP and byte values for all packets. */
+/* Cache TCP and byte values for all packets */
 pkt_meta[i].bytes = dp_packet_size(packet);
 pkt_meta[i].tcp_flags = miniflow_get_tcp_flags(>mf);
 
diff --git a/lib/dpif-netdev-private-extract.c 
b/lib/dpif-netdev-private-extract.c
new file mode 100644
index 0..fcc56ef26
--- /dev/null
+++ b/lib/dpif-netdev-private-extract.c
@@ -0,0 +1,86 @@
+/*
+ * Copyright (c) 2021 Intel.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR

[ovs-dev] [v4 00/12] MFEX Infrastructure + Optimizations

2021-06-17 Thread Kumar Amber

v4 updates:
- rebase on to latest DPIF v13
- fix fuzzy.py script with random mac/ip

v3 updates:
- rebase on to latest DPIF v12
- add additonal AVX512 traffic profiles for tcp and vlan
- add new command line for study function to add packet count
- add unit tests for fuzzy testing and auto-validation of mfex
- add mfex option hit stats to perf-show command

v2 updates:
- rebase on to latest DPIF v11

This patchset introduces miniflow extract Infrastructure changes
which allows user to choose different type of ISA based optimized
miniflow extract variants which can be user choosen or set based on 
packets studies automatically by OVS using different commands.

The Infrastructure also provides a way to check the correctness of
different ISA optimized miniflow extract variants against the scalar
version.

This Patchset depends on the DPIF patchsets :
http://patchwork.ozlabs.org/project/openvswitch/list/?series=249467

Harry van Haaren (5):
  dpif/stats: add miniflow extract opt hits counter
  dpdk: add additional CPU ISA detection strings
  dpif-netdev/mfex: Add AVX512 based optimized miniflow extract
  dpif-netdev/mfex: add more AVX512 traffic profiles
  dpif/dpcls: limit count subtable search info logs

Kumar Amber (7):
  dpif-netdev: Add command line and function pointer for miniflow
extract
  dpif-netdev: Add auto validation function for miniflow extract
  dpif-netdev: Add study function to select the best mfex function
  docs/dpdk/bridge: add miniflow extract section.
  dpif-netdev: Add configure to enable autovalidator at build time.
  dpif-netdev: Add additional packet count parameter for study function
  test/sytem-dpdk: Add unit test for mfex autovalidator

 Documentation/topics/dpdk/bridge.rst   | 116 +
 NEWS   |  15 +-
 acinclude.m4   |  16 +
 configure.ac   |   1 +
 lib/automake.mk|   4 +
 lib/dpdk.c |   2 +
 lib/dpif-netdev-avx512.c   |  34 +-
 lib/dpif-netdev-extract-avx512.c   | 571 +
 lib/dpif-netdev-extract-study.c| 132 ++
 lib/dpif-netdev-lookup-avx512-gather.c |   2 +-
 lib/dpif-netdev-perf.c |   3 +
 lib/dpif-netdev-perf.h |   1 +
 lib/dpif-netdev-private-extract.c  | 302 +
 lib/dpif-netdev-private-extract.h  | 162 +++
 lib/dpif-netdev-private-thread.h   |   4 +
 lib/dpif-netdev.c  | 165 ++-
 tests/automake.mk  |   5 +
 tests/pcap/fuzzy.py|  32 ++
 tests/pcap/mfex_test   | Bin 0 -> 416 bytes
 tests/pmd.at   |   6 +-
 tests/system-dpdk.at   |  46 ++
 21 files changed, 1603 insertions(+), 16 deletions(-)
 create mode 100644 lib/dpif-netdev-extract-avx512.c
 create mode 100644 lib/dpif-netdev-extract-study.c
 create mode 100644 lib/dpif-netdev-private-extract.c
 create mode 100644 lib/dpif-netdev-private-extract.h
 create mode 100755 tests/pcap/fuzzy.py
 create mode 100644 tests/pcap/mfex_test

-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [v13 12/12] dpcls-avx512: Enable avx512 vector popcount instruction.

2021-06-17 Thread Cian Ferriter

From: Harry van Haaren 

This commit enables the AVX512-VPOPCNTDQ Vector Popcount
instruction. This instruction is not available on every CPU
that supports the AVX512-F Foundation ISA, hence it is enabled
only when the additional VPOPCNTDQ ISA check is passed.

The vector popcount instruction is used instead of the AVX512
popcount emulation code present in the avx512 optimized DPCLS today.
It provides higher performance in the SIMD miniflow processing
as that requires the popcount to calculate the miniflow block indexes.

Signed-off-by: Harry van Haaren 

---

v13:
- Rebased and Improved comment on use_vpop variable (Ian)
---
 NEWS   |  3 +
 lib/dpdk.c |  1 +
 lib/dpif-netdev-lookup-avx512-gather.c | 85 --
 3 files changed, 71 insertions(+), 18 deletions(-)

diff --git a/NEWS b/NEWS
index f03bfeb5d..bc1db7948 100644
--- a/NEWS
+++ b/NEWS
@@ -18,6 +18,9 @@ Post-v2.15.0
  * Enable AVX512 optimized DPCLS to search subtables with larger miniflows.
  * Add more specialized DPCLS subtables to cover common rules, enhancing
the lookup performance.
+ * Enable the AVX512 DPCLS implementation to use VPOPCNT instruction if the
+   CPU supports it. This enhances performance by using the native vpopcount
+   instructions, instead of the emulated version of vpopcount.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
diff --git a/lib/dpdk.c b/lib/dpdk.c
index c883a4b8b..a9494a40f 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -655,6 +655,7 @@ dpdk_get_cpu_has_isa(const char *arch, const char *feature)
 #if __x86_64__
 /* CPU flags only defined for the architecture that support it. */
 CHECK_CPU_FEATURE(feature, "avx512f", RTE_CPUFLAG_AVX512F);
+CHECK_CPU_FEATURE(feature, "avx512vpopcntdq", RTE_CPUFLAG_AVX512VPOPCNTDQ);
 CHECK_CPU_FEATURE(feature, "bmi2", RTE_CPUFLAG_BMI2);
 #endif
 
diff --git a/lib/dpif-netdev-lookup-avx512-gather.c 
b/lib/dpif-netdev-lookup-avx512-gather.c
index 0b51ef9dc..bc359dc4a 100644
--- a/lib/dpif-netdev-lookup-avx512-gather.c
+++ b/lib/dpif-netdev-lookup-avx512-gather.c
@@ -53,6 +53,15 @@
 
 VLOG_DEFINE_THIS_MODULE(dpif_lookup_avx512_gather);
 
+
+/* Wrapper function required to enable ISA. */
+static inline __m512i
+__attribute__((__target__("avx512vpopcntdq")))
+_mm512_popcnt_epi64_wrapper(__m512i v_in)
+{
+return _mm512_popcnt_epi64(v_in);
+}
+
 static inline __m512i
 _mm512_popcnt_epi64_manual(__m512i v_in)
 {
@@ -131,6 +140,7 @@ netdev_rule_matches_key(const struct dpcls_rule *rule,
  *   pkt_mf_u0_pop: population count of bits in u0 of the packet.
  *   zero_mask: bitmask of lanes to zero as packet doesn't have mf bits set.
  *   u64_lanes_mask: bitmask of lanes to process.
+ *   use_vpop: compile-time constant indicating if VPOPCNT instruction allowed.
  */
 static inline ALWAYS_INLINE __m512i
 avx512_blocks_gather(__m512i v_u0,
@@ -141,7 +151,8 @@ avx512_blocks_gather(__m512i v_u0,
  __mmask64 u1_bcast_msk,
  const uint64_t pkt_mf_u0_pop,
  __mmask64 zero_mask,
- __mmask64 u64_lanes_mask)
+ __mmask64 u64_lanes_mask,
+ const uint32_t use_vpop)
 {
 /* Suggest to compiler to load tbl blocks ahead of gather(). */
 __m512i v_tbl_blocks = _mm512_maskz_loadu_epi64(u64_lanes_mask,
@@ -155,8 +166,15 @@ avx512_blocks_gather(__m512i v_u0,
   tbl_mf_masks);
 __m512i v_masks = _mm512_and_si512(v_pkt_bits, v_tbl_masks);
 
-/* Manual AVX512 popcount for u64 lanes. */
-__m512i v_popcnts = _mm512_popcnt_epi64_manual(v_masks);
+/* Calculate AVX512 popcount for u64 lanes using the native instruction
+ * if available, or using emulation if not available.
+ */
+__m512i v_popcnts;
+if (use_vpop) {
+v_popcnts = _mm512_popcnt_epi64_wrapper(v_masks);
+} else {
+v_popcnts = _mm512_popcnt_epi64_manual(v_masks);
+}
 
 /* Add popcounts and offset for u1 bits. */
 __m512i v_idx_u0_offset = _mm512_maskz_set1_epi64(u1_bcast_msk,
@@ -181,7 +199,8 @@ avx512_lookup_impl(struct dpcls_subtable *subtable,
const struct netdev_flow_key *keys[],
struct dpcls_rule **rules,
const uint32_t bit_count_u0,
-   const uint32_t bit_count_u1)
+   const uint32_t bit_count_u1,
+   const uint32_t use_vpop)
 {
 OVS_ALIGNED_VAR(CACHE_LINE_SIZE)uint64_t block_cache[BLOCKS_CACHE_SIZE];
 uint32_t hashes[NETDEV_MAX_BURST];
@@ -233,7 +252,8 @@ avx512_lookup_impl(struct dpcls_subtable *subtable,
 u1_bcast_mask,
 pkt_mf_u0_pop,

[ovs-dev] [v13 11/12] dpdk: Cache result of CPU ISA checks.

2021-06-17 Thread Cian Ferriter

From: Harry van Haaren 

As a small optimization, this patch caches the result of a CPU ISA
check from DPDK. Particularly in the case of running the DPCLS
autovalidator (which repeatedly probes subtables) this reduces
the amount of CPU ISA lookups from the DPDK level.

By caching them at the OVS/dpdk.c level, the ISA checks remain
runtime for the CPU where they are executed, but subsequent checks
for the same ISA feature become much cheaper.

Signed-off-by: Harry van Haaren 
Co-authored-by: Cian Ferriter 
Signed-off-by: Cian Ferriter 
---
 lib/dpdk.c | 28 
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/lib/dpdk.c b/lib/dpdk.c
index 319540394..c883a4b8b 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -614,13 +614,33 @@ print_dpdk_version(void)
 puts(rte_version());
 }
 
+/* Avoid calling rte_cpu_get_flag_enabled() excessively, by caching the
+ * result of the call for each CPU flag in a static variable. To avoid
+ * allocating large numbers of static variables, use a uint8 as a bitfield.
+ * Note the macro must only return if the ISA check is done and available.
+ */
+#define ISA_CHECK_DONE_BIT (1 << 0)
+#define ISA_AVAILABLE_BIT  (1 << 1)
+
 #define CHECK_CPU_FEATURE(feature, name_str, RTE_CPUFLAG)   \
 do {\
 if (strncmp(feature, name_str, strlen(name_str)) == 0) {\
-int has_isa = rte_cpu_get_flag_enabled(RTE_CPUFLAG);\
-VLOG_DBG("CPU flag %s, available %s\n", name_str,   \
-  has_isa ? "yes" : "no");  \
-return true;\
+static uint8_t isa_check_##RTE_CPUFLAG; \
+int check = isa_check_##RTE_CPUFLAG & ISA_CHECK_DONE_BIT;   \
+if (OVS_UNLIKELY(!check)) { \
+int has_isa = rte_cpu_get_flag_enabled(RTE_CPUFLAG);\
+VLOG_DBG("CPU flag %s, available %s\n", \
+ name_str, has_isa ? "yes" : "no"); \
+isa_check_##RTE_CPUFLAG = ISA_CHECK_DONE_BIT;   \
+if (has_isa) {  \
+isa_check_##RTE_CPUFLAG |= ISA_AVAILABLE_BIT;   \
+}   \
+}   \
+if (isa_check_##RTE_CPUFLAG & ISA_AVAILABLE_BIT) {  \
+return true;\
+} else {\
+return false;   \
+}   \
 }   \
 } while (0)
 
-- 
2.32.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [v13 10/12] dpif-netdev/dpcls: Specialize more subtable signatures.

2021-06-17 Thread Cian Ferriter

From: Harry van Haaren 

This commit adds more subtables to be specialized. The traffic
pattern here being matched is VXLAN traffic subtables, which commonly
have (5,3), (9,1) and (9,4) subtable fingerprints.

Signed-off-by: Harry van Haaren 

---

v8: Add NEWS entry.
---
 NEWS   | 2 ++
 lib/dpif-netdev-lookup-avx512-gather.c | 6 ++
 lib/dpif-netdev-lookup-generic.c   | 6 ++
 3 files changed, 14 insertions(+)

diff --git a/NEWS b/NEWS
index 523ff4b37..f03bfeb5d 100644
--- a/NEWS
+++ b/NEWS
@@ -16,6 +16,8 @@ Post-v2.15.0
  * Add a partial HWOL PMD statistic counting hits similar to existing
EMC/SMC/DPCLS stats.
  * Enable AVX512 optimized DPCLS to search subtables with larger miniflows.
+ * Add more specialized DPCLS subtables to cover common rules, enhancing
+   the lookup performance.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
diff --git a/lib/dpif-netdev-lookup-avx512-gather.c 
b/lib/dpif-netdev-lookup-avx512-gather.c
index f1b320bb6..0b51ef9dc 100644
--- a/lib/dpif-netdev-lookup-avx512-gather.c
+++ b/lib/dpif-netdev-lookup-avx512-gather.c
@@ -314,6 +314,9 @@ avx512_lookup_impl(struct dpcls_subtable *subtable,
 return avx512_lookup_impl(subtable, keys_map, keys, rules, U0, U1);   \
 } \
 
+DECLARE_OPTIMIZED_LOOKUP_FUNCTION(9, 4)
+DECLARE_OPTIMIZED_LOOKUP_FUNCTION(9, 1)
+DECLARE_OPTIMIZED_LOOKUP_FUNCTION(5, 3)
 DECLARE_OPTIMIZED_LOOKUP_FUNCTION(5, 1)
 DECLARE_OPTIMIZED_LOOKUP_FUNCTION(4, 1)
 DECLARE_OPTIMIZED_LOOKUP_FUNCTION(4, 0)
@@ -346,6 +349,9 @@ dpcls_subtable_avx512_gather_probe(uint32_t u0_bits, 
uint32_t u1_bits)
 return NULL;
 }
 
+CHECK_LOOKUP_FUNCTION(9, 4);
+CHECK_LOOKUP_FUNCTION(9, 1);
+CHECK_LOOKUP_FUNCTION(5, 3);
 CHECK_LOOKUP_FUNCTION(5, 1);
 CHECK_LOOKUP_FUNCTION(4, 1);
 CHECK_LOOKUP_FUNCTION(4, 0);
diff --git a/lib/dpif-netdev-lookup-generic.c b/lib/dpif-netdev-lookup-generic.c
index e3b6be4b6..6c74ac3a1 100644
--- a/lib/dpif-netdev-lookup-generic.c
+++ b/lib/dpif-netdev-lookup-generic.c
@@ -282,6 +282,9 @@ dpcls_subtable_lookup_generic(struct dpcls_subtable 
*subtable,
 return lookup_generic_impl(subtable, keys_map, keys, rules, U0, U1);  \
 } \
 
+DECLARE_OPTIMIZED_LOOKUP_FUNCTION(9, 4)
+DECLARE_OPTIMIZED_LOOKUP_FUNCTION(9, 1)
+DECLARE_OPTIMIZED_LOOKUP_FUNCTION(5, 3)
 DECLARE_OPTIMIZED_LOOKUP_FUNCTION(5, 1)
 DECLARE_OPTIMIZED_LOOKUP_FUNCTION(4, 1)
 DECLARE_OPTIMIZED_LOOKUP_FUNCTION(4, 0)
@@ -303,6 +306,9 @@ dpcls_subtable_generic_probe(uint32_t u0_bits, uint32_t 
u1_bits)
 {
 dpcls_subtable_lookup_func f = NULL;
 
+CHECK_LOOKUP_FUNCTION(9, 4);
+CHECK_LOOKUP_FUNCTION(9, 1);
+CHECK_LOOKUP_FUNCTION(5, 3);
 CHECK_LOOKUP_FUNCTION(5, 1);
 CHECK_LOOKUP_FUNCTION(4, 1);
 CHECK_LOOKUP_FUNCTION(4, 0);
-- 
2.32.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [v13 09/12] dpif-netdev/dpcls-avx512: Enable 16 block processing.

2021-06-17 Thread Cian Ferriter

From: Harry van Haaren 

This commit implements larger subtable searches in avx512. A limitation
of the previous implementation was that up to 8 blocks of miniflow
data could be matched on (so a subtable with 8 blocks was handled
in avx, but 9 blocks or more would fall back to scalar/generic).
This limitation is removed in this patch, where up to 16 blocks
of subtable can be matched on.

>From an implementation perspective, the key to enabling 16 blocks
over 8 blocks was to do bitmask calculation up front, and then use
the pre-calculated bitmasks for 2x passes of the "blocks gather"
routine. The bitmasks need to be shifted for k-mask usage in the
upper (8-15) block range, but it is relatively trivial. This also
helps in case expanding to 24 blocks is desired in future.

The implementation of the 2nd iteration to handle > 8 blocks is
behind a conditional branch which checks the total number of bits.
This helps the specialized versions of the function that have a
miniflow fingerprint of less-than-or-equal 8 blocks, as the code
can be statically stripped out of those functions. Specialized
functions that do require more than 8 blocks will have the branch
removed and unconditionally execute the 2nd blocks gather routine.

Lastly, the _any() flavour will have the conditional branch, and
the branch predictor may mispredict a bit, but per burst will
likely get most packets correct (particularly towards the middle
and end of a burst).

The code has been run with unit tests under autovalidation and
passes all cases, and unit test coverage has been checked to
ensure the 16 block code paths are executing.

Signed-off-by: Harry van Haaren 

---

v13:
- Improve function comment including variable usage (Ian)
- Comment scope bracket usage (Ian)
---
 NEWS   |   1 +
 lib/dpif-netdev-lookup-avx512-gather.c | 218 ++---
 2 files changed, 162 insertions(+), 57 deletions(-)

diff --git a/NEWS b/NEWS
index 5c740e378..523ff4b37 100644
--- a/NEWS
+++ b/NEWS
@@ -15,6 +15,7 @@ Post-v2.15.0
  * Add commands to get and set the dpif implementations.
  * Add a partial HWOL PMD statistic counting hits similar to existing
EMC/SMC/DPCLS stats.
+ * Enable AVX512 optimized DPCLS to search subtables with larger miniflows.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
diff --git a/lib/dpif-netdev-lookup-avx512-gather.c 
b/lib/dpif-netdev-lookup-avx512-gather.c
index 8fc1cdfa5..f1b320bb6 100644
--- a/lib/dpif-netdev-lookup-avx512-gather.c
+++ b/lib/dpif-netdev-lookup-avx512-gather.c
@@ -34,7 +34,21 @@
  * AVX512 code at a time.
  */
 #define NUM_U64_IN_ZMM_REG (8)
-#define BLOCKS_CACHE_SIZE (NETDEV_MAX_BURST * NUM_U64_IN_ZMM_REG)
+
+/* This implementation of AVX512 gather allows up to 16 blocks of MF data to be
+ * present in the blocks_cache, hence the multiply by 2 in the blocks count.
+ */
+#define MF_BLOCKS_PER_PACKET (NUM_U64_IN_ZMM_REG * 2)
+
+/* Blocks cache size is the maximum number of miniflow blocks that this
+ * implementation of lookup can handle.
+ */
+#define BLOCKS_CACHE_SIZE (NETDEV_MAX_BURST * MF_BLOCKS_PER_PACKET)
+
+/* The gather instruction can handle a scale for the size of the items to
+ * gather. For uint64_t data, this scale is 8.
+ */
+#define GATHER_SCALE_8 (8)
 
 
 VLOG_DEFINE_THIS_MODULE(dpif_lookup_avx512_gather);
@@ -69,22 +83,98 @@ netdev_rule_matches_key(const struct dpcls_rule *rule,
 {
 const uint64_t *keyp = miniflow_get_values(>flow.mf);
 const uint64_t *maskp = miniflow_get_values(>mask->mf);
-const uint32_t lane_mask = (1 << mf_bits_total) - 1;
+const uint32_t lane_mask = (1ULL << mf_bits_total) - 1;
 
 /* Always load a full cache line from blocks_cache. Other loads must be
  * trimmed to the amount of data required for mf_bits_total blocks.
  */
-__m512i v_blocks = _mm512_loadu_si512(_cache[0]);
-__m512i v_mask   = _mm512_maskz_loadu_epi64(lane_mask, [0]);
-__m512i v_key= _mm512_maskz_loadu_epi64(lane_mask, [0]);
+uint32_t res_mask;
 
-__m512i v_data = _mm512_and_si512(v_blocks, v_mask);
-uint32_t res_mask = _mm512_mask_cmpeq_epi64_mask(lane_mask, v_data, v_key);
+/* To avoid a loop, we have two iterations of a block of code here.
+ * Note the scope brackets { } are used to avoid accidental variable usage
+ * in the second iteration.
+ */
+{
+__m512i v_blocks = _mm512_loadu_si512(_cache[0]);
+__m512i v_mask   = _mm512_maskz_loadu_epi64(lane_mask, [0]);
+__m512i v_key= _mm512_maskz_loadu_epi64(lane_mask, [0]);
+__m512i v_data = _mm512_and_si512(v_blocks, v_mask);
+res_mask = _mm512_mask_cmpeq_epi64_mask(lane_mask, v_data, v_key);
+}
+
+if (mf_bits_total > 8) {
+uint32_t lane_mask_gt8 = lane_mask >> 8;
+__m512i v_blocks = _mm512_loadu_si512(_cache[8]);
+__m512i v_mask   = _mm512_maskz_loadu_epi64(lane_mask_gt8,

[ovs-dev] [v13 08/12] dpif-netdev-unixctl.man: Document subtable-lookup-* CMDs

2021-06-17 Thread Cian Ferriter

Signed-off-by: Cian Ferriter 

---

v13:
- New commit to update manpages with more commands that are missing.
---
 lib/dpif-netdev-unixctl.man | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/lib/dpif-netdev-unixctl.man b/lib/dpif-netdev-unixctl.man
index 45a1bd669..d77f5d9a4 100644
--- a/lib/dpif-netdev-unixctl.man
+++ b/lib/dpif-netdev-unixctl.man
@@ -228,6 +228,16 @@ When this is the case, the above command prints the 
load-balancing information
 of the bonds configured in datapath \fIdp\fR showing the interface associated
 with each bucket (hash).
 .
+.IP "\fBdpif-netdev/subtable-lookup-prio-get\fR"
+Lists the DPCLS implementations or lookup functions that are available as well
+as their priorities.
+.
+.IP "\fBdpif-netdev/subtable-lookup-prio-set\fR \fIlookup_function\fR \
+\fIprio\fR"
+Sets the priority of a lookup function by the name, \fIlookup_function\fR, and
+the priority, \fIprio\fR, which should be a positive integer value. The highest
+priority lookup function is used for classification.
+.
 .IP "\fBdpif-netdev/dpif-get\fR
 Lists the DPIF implementations that are available.
 .
-- 
2.32.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [v13 07/12] dpif-netdev: Add a partial HWOL PMD statistic.

2021-06-17 Thread Cian Ferriter

It is possible for packets traversing the userspace datapath to match a
flow before hitting on EMC by using a mark ID provided by a NIC. Add a
PMD statistic for this hit.

Signed-off-by: Cian Ferriter 

---

Cc: Gaetan Rivet 
Cc: Sriharsha Basavapatna 

v13:
- Minor refactoring to address review comments.
- Update manpages to reflect the new format of the pmd-perf-show
  command.
---
 NEWS| 2 ++
 lib/dpif-netdev-avx512.c| 3 +++
 lib/dpif-netdev-perf.c  | 3 +++
 lib/dpif-netdev-perf.h  | 1 +
 lib/dpif-netdev-unixctl.man | 1 +
 lib/dpif-netdev.c   | 9 +++--
 tests/pmd.at| 6 --
 7 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/NEWS b/NEWS
index c47ab349e..5c740e378 100644
--- a/NEWS
+++ b/NEWS
@@ -13,6 +13,8 @@ Post-v2.15.0
  * Add avx512 implementation of dpif which can process non recirculated
packets. It supports partial HWOL, EMC, SMC and DPCLS lookups.
  * Add commands to get and set the dpif implementations.
+ * Add a partial HWOL PMD statistic counting hits similar to existing
+   EMC/SMC/DPCLS stats.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
index f3f66fc60..f9b199637 100644
--- a/lib/dpif-netdev-avx512.c
+++ b/lib/dpif-netdev-avx512.c
@@ -126,6 +126,7 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
*pmd,
 
 uint32_t emc_hits = 0;
 uint32_t smc_hits = 0;
+uint32_t phwol_hits = 0;
 
 /* A 1 bit in this mask indicates a hit, so no DPCLS lookup on the pkt. */
 uint32_t hwol_emc_smc_hitmask = 0;
@@ -167,6 +168,7 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
*pmd,
 rules[i] = >cr;
 pkt_meta[i].tcp_flags = parse_tcp_flags(packet);
 pkt_meta[i].bytes = dp_packet_size(packet);
+phwol_hits++;
 hwol_emc_smc_hitmask |= (1 << i);
 continue;
 }
@@ -274,6 +276,7 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
*pmd,
 
 /* At this point we don't return error anymore, so commit stats here. */
 pmd_perf_update_counter(>perf_stats, PMD_STAT_RECV, batch_size);
+pmd_perf_update_counter(>perf_stats, PMD_STAT_PHWOL_HIT, phwol_hits);
 pmd_perf_update_counter(>perf_stats, PMD_STAT_EXACT_HIT, emc_hits);
 pmd_perf_update_counter(>perf_stats, PMD_STAT_SMC_HIT, smc_hits);
 pmd_perf_update_counter(>perf_stats, PMD_STAT_MASKED_HIT,
diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c
index 9560e7c3c..7103a2d4d 100644
--- a/lib/dpif-netdev-perf.c
+++ b/lib/dpif-netdev-perf.c
@@ -246,6 +246,7 @@ pmd_perf_format_overall_stats(struct ds *str, struct 
pmd_perf_stats *s,
 ds_put_format(str,
 "  Rx packets:%12"PRIu64"  (%.0f Kpps, %.0f cycles/pkt)\n"
 "  Datapath passes:   %12"PRIu64"  (%.2f passes/pkt)\n"
+"  - PHWOL hits:  %12"PRIu64"  (%5.1f %%)\n"
 "  - EMC hits:%12"PRIu64"  (%5.1f %%)\n"
 "  - SMC hits:%12"PRIu64"  (%5.1f %%)\n"
 "  - Megaflow hits:   %12"PRIu64"  (%5.1f %%, %.2f "
@@ -255,6 +256,8 @@ pmd_perf_format_overall_stats(struct ds *str, struct 
pmd_perf_stats *s,
 rx_packets, (rx_packets / duration) / 1000,
 1.0 * stats[PMD_CYCLES_ITER_BUSY] / rx_packets,
 passes, rx_packets ? 1.0 * passes / rx_packets : 0,
+stats[PMD_STAT_PHWOL_HIT],
+100.0 * stats[PMD_STAT_PHWOL_HIT] / passes,
 stats[PMD_STAT_EXACT_HIT],
 100.0 * stats[PMD_STAT_EXACT_HIT] / passes,
 stats[PMD_STAT_SMC_HIT],
diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
index 72645b6b3..8b1a52387 100644
--- a/lib/dpif-netdev-perf.h
+++ b/lib/dpif-netdev-perf.h
@@ -56,6 +56,7 @@ extern "C" {
 /* Set of counter types maintained in pmd_perf_stats. */
 
 enum pmd_stat_type {
+PMD_STAT_PHWOL_HIT, /* Packets that had a partial HWOL hit (phwol). */
 PMD_STAT_EXACT_HIT, /* Packets that had an exact match (emc). */
 PMD_STAT_SMC_HIT,   /* Packets that had a sig match hit (SMC). */
 PMD_STAT_MASKED_HIT,/* Packets that matched in the flow table. */
diff --git a/lib/dpif-netdev-unixctl.man b/lib/dpif-netdev-unixctl.man
index 534823879..45a1bd669 100644
--- a/lib/dpif-netdev-unixctl.man
+++ b/lib/dpif-netdev-unixctl.man
@@ -135,6 +135,7 @@ pmd thread numa_id 0 core_id 1:
   - busy iterations:86009  ( 84.1 % of used cycles)
   Rx packets: 2399607  (2381 Kpps, 848 cycles/pkt)
   Datapath passes:3599415  (1.50 passes/pkt)
+  - PHWOL hits: 0  (  0.0 %)
   - EMC hits:  336472  (  9.3 %)
   - SMC hits:   0  ( 0.0 %)
   - Megaflow hits:3262943  ( 90.7 %, 1.00 subtbl lookups/hit)
diff --git a/lib/dpif-netdev.c

[ovs-dev] [v13 04/12] dpif-avx512: Add ISA implementation of dpif.

2021-06-17 Thread Cian Ferriter

From: Harry van Haaren 

This commit adds the AVX512 implementation of DPIF functionality,
specifically the dp_netdev_input_outer_avx512 function. This function
only handles outer (no re-circulations), and is optimized to use the
AVX512 ISA for packet batching and other DPIF work.

Sparse is not able to handle the AVX512 intrinsics, causing compile
time failures, so it is disabled for this file.

Signed-off-by: Harry van Haaren 
Co-authored-by: Cian Ferriter 
Signed-off-by: Cian Ferriter 
Co-authored-by: Kumar Amber 
Signed-off-by: Kumar Amber 

---

v13:
- Squash "Add HWOL support" commit into this commit.
- Add NEWS item about this feature here rather than in a later commit.
- Add #define NUM_U64_IN_ZMM_REG 8.
- Add comment describing operation of while loop handling HWOL->EMC->SMC
  lookups in dp_netdev_input_outer_avx512().
- Add EMC and SMC batch insert functions for better handling of EMC and
  SMC in AVX512 DPIF.
- Minor code refactor to address review comments.
---
 NEWS |   2 +
 lib/automake.mk  |   5 +-
 lib/dpif-netdev-avx512.c | 327 +++
 lib/dpif-netdev-private-dfc.h|  25 +++
 lib/dpif-netdev-private-dpif.h   |  32 +++
 lib/dpif-netdev-private-thread.h |  11 +-
 lib/dpif-netdev-private.h|  25 +++
 lib/dpif-netdev.c| 103 --
 8 files changed, 514 insertions(+), 16 deletions(-)
 create mode 100644 lib/dpif-netdev-avx512.c
 create mode 100644 lib/dpif-netdev-private-dpif.h

diff --git a/NEWS b/NEWS
index 96b3a61c8..6a4a7b76d 100644
--- a/NEWS
+++ b/NEWS
@@ -10,6 +10,8 @@ Post-v2.15.0
  * Auto load balancing of PMDs now partially supports cross-NUMA polling
cases, e.g if all PMD threads are running on the same NUMA node.
  * Refactor lib/dpif-netdev.c to multiple header files.
+ * Add avx512 implementation of dpif which can process non recirculated
+   packets. It supports partial HWOL, EMC, SMC and DPCLS lookups.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
diff --git a/lib/automake.mk b/lib/automake.mk
index 3a33cdd5c..660cd07f0 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -33,11 +33,13 @@ lib_libopenvswitchavx512_la_CFLAGS = \
-mavx512f \
-mavx512bw \
-mavx512dq \
+   -mbmi \
-mbmi2 \
-fPIC \
$(AM_CFLAGS)
 lib_libopenvswitchavx512_la_SOURCES = \
-   lib/dpif-netdev-lookup-avx512-gather.c
+   lib/dpif-netdev-lookup-avx512-gather.c \
+   lib/dpif-netdev-avx512.c
 lib_libopenvswitchavx512_la_LDFLAGS = \
-static
 endif
@@ -114,6 +116,7 @@ lib_libopenvswitch_la_SOURCES = \
lib/dpif-netdev-private-dfc.c \
lib/dpif-netdev-private-dfc.h \
lib/dpif-netdev-private-dpcls.h \
+   lib/dpif-netdev-private-dpif.h \
lib/dpif-netdev-private-flow.h \
lib/dpif-netdev-private-hwol.h \
lib/dpif-netdev-private-thread.h \
diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
new file mode 100644
index 0..0e55b0be2
--- /dev/null
+++ b/lib/dpif-netdev-avx512.c
@@ -0,0 +1,327 @@
+/*
+ * Copyright (c) 2021 Intel Corporation.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifdef __x86_64__
+/* Sparse cannot handle the AVX512 instructions. */
+#if !defined(__CHECKER__)
+
+#include 
+
+#include "dpif-netdev.h"
+#include "dpif-netdev-perf.h"
+
+#include "dpif-netdev-private.h"
+#include "dpif-netdev-private-dpcls.h"
+#include "dpif-netdev-private-flow.h"
+#include "dpif-netdev-private-thread.h"
+#include "dpif-netdev-private-hwol.h"
+
+#include "dp-packet.h"
+#include "netdev.h"
+
+#include "immintrin.h"
+
+/* Each AVX512 register (zmm register in assembly notation) can contain up to
+ * 512 bits, which is equivalent to 8 uint64_t variables. This is the maximum
+ * number of miniflow blocks that can be processed in a single pass of the
+ * AVX512 code at a time.
+ */
+#define NUM_U64_IN_ZMM_REG (8)
+
+/* Structure to contain per-packet metadata that must be attributed to the
+ * dp netdev flow. This is unfortunate to have to track per packet, however
+ * it's a bit awkward to maintain them in a performant way. This structure
+ * helps to keep two variables on a single cache line per packet.
+ */
+struct pkt_flow_meta {
+uint16_t bytes;
+uint16_t tcp_flags;
+};
+
+/* Structure of heap allocated memory for DPIF internals. */
+struct

[ovs-dev] [v13 06/12] dpif-netdev: Add command to get dpif implementations.

2021-06-17 Thread Cian Ferriter

From: Harry van Haaren 

This commit adds a new command to retrieve the list of available
DPIF implementations. This can be used by to check what implementations
of the DPIF are available in any given OVS binary.

Usage:
 $ ovs-appctl dpif-netdev/dpif-get

Signed-off-by: Harry van Haaren 

---

v13:
- Add NEWS item about DPIF get and set commands here rather than in a
  later commit.
- Add documentation items about DPIF set commands here rather than in a
  later commit.
---
 Documentation/topics/dpdk/bridge.rst |  8 
 NEWS |  1 +
 lib/dpif-netdev-private-dpif.c   |  8 
 lib/dpif-netdev-private-dpif.h   |  6 ++
 lib/dpif-netdev-unixctl.man  |  3 +++
 lib/dpif-netdev.c| 24 
 6 files changed, 50 insertions(+)

diff --git a/Documentation/topics/dpdk/bridge.rst 
b/Documentation/topics/dpdk/bridge.rst
index fafa8c821..f59e26cbe 100644
--- a/Documentation/topics/dpdk/bridge.rst
+++ b/Documentation/topics/dpdk/bridge.rst
@@ -226,6 +226,14 @@ stats associated with the datapath.
 Just like with the SIMD DPCLS feature above, SIMD can be applied to the DPIF to
 improve performance.
 
+OVS provides multiple implementations of the DPIF. The available
+implementations can be listed with the following command ::
+
+$ ovs-appctl dpif-netdev/dpif-get
+Available DPIF implementations:
+  dpif_scalar
+  dpif_avx512
+
 By default, dpif_scalar is used. The DPIF implementation can be selected by
 name ::
 
diff --git a/NEWS b/NEWS
index 6a4a7b76d..c47ab349e 100644
--- a/NEWS
+++ b/NEWS
@@ -12,6 +12,7 @@ Post-v2.15.0
  * Refactor lib/dpif-netdev.c to multiple header files.
  * Add avx512 implementation of dpif which can process non recirculated
packets. It supports partial HWOL, EMC, SMC and DPCLS lookups.
+ * Add commands to get and set the dpif implementations.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
diff --git a/lib/dpif-netdev-private-dpif.c b/lib/dpif-netdev-private-dpif.c
index d829a7ee5..3649e775d 100644
--- a/lib/dpif-netdev-private-dpif.c
+++ b/lib/dpif-netdev-private-dpif.c
@@ -73,6 +73,14 @@ dp_netdev_impl_set_default(dp_netdev_input_func func)
 default_dpif_func = func;
 }
 
+uint32_t
+dp_netdev_impl_get(const struct dpif_netdev_impl_info_t **out_impls)
+{
+ovs_assert(out_impls);
+*out_impls = dpif_impls;
+return ARRAY_SIZE(dpif_impls);
+}
+
 /* This function checks all available DPIF implementations, and selects the
  * returns the function pointer to the one requested by "name".
  */
diff --git a/lib/dpif-netdev-private-dpif.h b/lib/dpif-netdev-private-dpif.h
index a6db3c7f2..717e9e2f9 100644
--- a/lib/dpif-netdev-private-dpif.h
+++ b/lib/dpif-netdev-private-dpif.h
@@ -48,6 +48,12 @@ struct dpif_netdev_impl_info_t {
 const char *name;
 };
 
+/* This function returns all available implementations to the caller. The
+ * quantity of implementations is returned by the int return value.
+ */
+uint32_t
+dp_netdev_impl_get(const struct dpif_netdev_impl_info_t **out_impls);
+
 /* This function checks all available DPIF implementations, and selects the
  * returns the function pointer to the one requested by "name".
  */
diff --git a/lib/dpif-netdev-unixctl.man b/lib/dpif-netdev-unixctl.man
index b348940b0..534823879 100644
--- a/lib/dpif-netdev-unixctl.man
+++ b/lib/dpif-netdev-unixctl.man
@@ -227,5 +227,8 @@ When this is the case, the above command prints the 
load-balancing information
 of the bonds configured in datapath \fIdp\fR showing the interface associated
 with each bucket (hash).
 .
+.IP "\fBdpif-netdev/dpif-get\fR
+Lists the DPIF implementations that are available.
+.
 .IP "\fBdpif-netdev/dpif-set\fR \fIdpif_impl\fR"
 Sets the DPIF to be used to \fIdpif_impl\fR. By default "dpif_scalar" is used.
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 9c234ef3d..59a44a848 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -991,6 +991,27 @@ dpif_netdev_subtable_lookup_set(struct unixctl_conn *conn, 
int argc,
 ds_destroy();
 }
 
+static void
+dpif_netdev_impl_get(struct unixctl_conn *conn, int argc OVS_UNUSED,
+ const char *argv[] OVS_UNUSED, void *aux OVS_UNUSED)
+{
+const struct dpif_netdev_impl_info_t *dpif_impls;
+uint32_t count = dp_netdev_impl_get(_impls);
+if (count == 0) {
+unixctl_command_reply_error(conn, "error getting dpif names");
+return;
+}
+
+/* Add all dpif functions to reply string. */
+struct ds reply = DS_EMPTY_INITIALIZER;
+ds_put_cstr(, "Available DPIF implementations:\n");
+for (uint32_t i = 0; i < count; i++) {
+ds_put_format(, "  %s\n", dpif_impls[i].name);
+}
+unixctl_command_reply(conn, ds_cstr());
+ds_destroy();
+}
+
 static void
 dpif_netdev_impl_set(struct unixctl_conn *conn, int argc,
  const char *argv[], void *aux

[ovs-dev] [v13 05/12] dpif-netdev: Add command to switch dpif implementation.

2021-06-17 Thread Cian Ferriter

From: Harry van Haaren 

This commit adds a new command to allow the user to switch
the active DPIF implementation at runtime. A probe function
is executed before switching the DPIF implementation, to ensure
the CPU is capable of running the ISA required. For example, the
below code will switch to the AVX512 enabled DPIF assuming
that the runtime CPU is capable of running AVX512 instructions:

 $ ovs-appctl dpif-netdev/dpif-set dpif_avx512

A new configuration flag is added to allow selection of the
default DPIF. This is useful for running the unit-tests against
the available DPIF implementations, without modifying each unit test.

The design of the testing & validation for ISA optimized DPIF
implementations is based around the work already upstream for DPCLS.
Note however that a DPCLS lookup has no state or side-effects, allowing
the auto-validator implementation to perform multiple lookups and
provide consistent statistic counters.

The DPIF component does have state, so running two implementations in
parallel and comparing output is not a valid testing method, as there
are changes in DPIF statistic counters (side effects). As a result, the
DPIF is tested directly against the unit-tests.

Signed-off-by: Harry van Haaren 
Co-authored-by: Cian Ferriter 
Signed-off-by: Cian Ferriter 

---

v13:
- Add Docs items about the switch DPIF command here rather than in
  later commit.
- Document operation in manpages as well as rST.
- Minor code refactoring to address review comments.
---
 Documentation/topics/dpdk/bridge.rst |  34 +
 acinclude.m4 |  15 
 configure.ac |   1 +
 lib/automake.mk  |   1 +
 lib/dpif-netdev-avx512.c |  14 
 lib/dpif-netdev-private-dpif.c   | 103 +++
 lib/dpif-netdev-private-dpif.h   |  49 -
 lib/dpif-netdev-private-thread.h |  11 +--
 lib/dpif-netdev-unixctl.man  |   3 +
 lib/dpif-netdev.c|  89 +--
 10 files changed, 304 insertions(+), 16 deletions(-)
 create mode 100644 lib/dpif-netdev-private-dpif.c

diff --git a/Documentation/topics/dpdk/bridge.rst 
b/Documentation/topics/dpdk/bridge.rst
index 526d5c959..fafa8c821 100644
--- a/Documentation/topics/dpdk/bridge.rst
+++ b/Documentation/topics/dpdk/bridge.rst
@@ -214,3 +214,37 @@ implementation ::
 
 Compile OVS in debug mode to have `ovs_assert` statements error out if
 there is a mis-match in the DPCLS lookup implementation.
+
+Datapath Interface Performance
+--
+
+The datapath interface (DPIF) or dp_netdev_input() is responsible for taking
+packets through the major components of the userspace datapath; such as
+miniflow_extract, EMC, SMC and DPCLS lookups, and a lot of the performance
+stats associated with the datapath.
+
+Just like with the SIMD DPCLS feature above, SIMD can be applied to the DPIF to
+improve performance.
+
+By default, dpif_scalar is used. The DPIF implementation can be selected by
+name ::
+
+$ ovs-appctl dpif-netdev/dpif-set dpif_avx512
+DPIF implementation set to dpif_avx512.
+
+$ ovs-appctl dpif-netdev/dpif-set dpif_scalar
+DPIF implementation set to dpif_scalar.
+
+Running Unit Tests with AVX512 DPIF
+~~~
+
+Since the AVX512 DPIF is disabled by default, a compile time option is
+available in order to test it with the OVS unit test suite. When building with
+a CPU that supports AVX512, use the following configure option ::
+
+$ ./configure --enable-dpif-default-avx512
+
+The following line should be seen in the configure output when the above option
+is used ::
+
+checking whether DPIF AVX512 is default implementation... yes
diff --git a/acinclude.m4 b/acinclude.m4
index 15a54d636..5fbcd9872 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -30,6 +30,21 @@ AC_DEFUN([OVS_CHECK_DPCLS_AUTOVALIDATOR], [
   fi
 ])
 
+dnl Set OVS DPIF default implementation at configure time for running the unit
+dnl tests on the whole codebase without modifying tests per DPIF impl
+AC_DEFUN([OVS_CHECK_DPIF_AVX512_DEFAULT], [
+  AC_ARG_ENABLE([dpif-default-avx512],
+[AC_HELP_STRING([--enable-dpif-default-avx512], [Enable DPIF 
AVX512 implementation as default.])],
+[dpifavx512=yes],[dpifavx512=no])
+  AC_MSG_CHECKING([whether DPIF AVX512 is default implementation])
+  if test "$dpifavx512" != yes; then
+AC_MSG_RESULT([no])
+  else
+OVS_CFLAGS="$OVS_CFLAGS -DDPIF_AVX512_DEFAULT"
+AC_MSG_RESULT([yes])
+  fi
+])
+
 dnl OVS_ENABLE_WERROR
 AC_DEFUN([OVS_ENABLE_WERROR],
   [AC_ARG_ENABLE(
diff --git a/configure.ac b/configure.ac
index c077034d4..e45685a6c 100644
--- a/configure.ac
+++ b/configure.ac
@@ -185,6 +185,7 @@ OVS_ENABLE_WERROR
 OVS_ENABLE_SPARSE
 OVS_CTAGS_IDENTIFIERS
 OVS_CHECK_DPCLS_AUTOVALIDATOR
+OVS_CHECK_DPIF_AVX512_DEFAULT
 OVS_CHECK_BINUTILS_AVX512
 
 AC_ARG_VAR(KARCH, [Kernel Architecture String])
diff --git

[ovs-dev] [v13 01/12] dpif-netdev: Refactor to multiple header files.

2021-06-17 Thread Cian Ferriter

From: Harry van Haaren 

Split the very large file dpif-netdev.c and the datastructures
it contains into multiple header files. Each header file is
responsible for the datastructures of that component.

This logical split allows better reuse and modularity of the code,
and reduces the very large file dpif-netdev.c to be more managable.

Due to dependencies between components, it is not possible to
move component in smaller granularities than this patch.

To explain the dependencies better, eg:

DPCLS has no deps (from dpif-netdev.c file)
FLOW depends on DPCLS (struct dpcls_rule)
DFC depends on DPCLS (netdev_flow_key) and FLOW (netdev_flow_key)
THREAD depends on DFC (struct dfc_cache)

DFC_PROC depends on THREAD (struct pmd_thread)

DPCLS lookup.h/c require only DPCLS
DPCLS implementations require only dpif-netdev-lookup.h.
- This change was made in 2.12 release with function pointers
- This commit only refactors the name to "private-dpcls.h"

netdev_flow_key_equal_mf() is renamed to emc_flow_key_equal_mf().

Rename functions specific to dpcls from netdev_* namespace to the
dpcls_* namespace, as they are only used by dpcls code.

'inline' is added to the dp_netdev_flow_hash() when it is moved
definition to fix a compiler error.

One valid checkpatch issue with the use of the
EMC_FOR_EACH_POS_WITH_HASH() macro was fixed.

Signed-off-by: Harry van Haaren 
Co-authored-by: Cian Ferriter 
Signed-off-by: Cian Ferriter 

---

Cc: Gaetan Rivet 
Cc: Sriharsha Basavapatna 

v13:
- Add NEWS item in this commit rather than later.
- Add lib/dpif-netdev-private-dfc.c file and move non fast path dfc
  related functions there.
- Squash commit which renames functions specific to dpcls from netdev_*
  namespace to the dpcls_* namespace, as they are only used by dpcls
  code into this commit.
- Minor fixes from review comments.
---
 NEWS   |   1 +
 lib/automake.mk|   5 +
 lib/dpif-netdev-lookup-autovalidator.c |   1 -
 lib/dpif-netdev-lookup-avx512-gather.c |   1 -
 lib/dpif-netdev-lookup-generic.c   |   1 -
 lib/dpif-netdev-lookup.h   |   2 +-
 lib/dpif-netdev-private-dfc.c  | 110 +
 lib/dpif-netdev-private-dfc.h  | 176 
 lib/dpif-netdev-private-dpcls.h| 127 ++
 lib/dpif-netdev-private-flow.h | 162 
 lib/dpif-netdev-private-thread.h   | 206 ++
 lib/dpif-netdev-private.h  | 100 +
 lib/dpif-netdev.c  | 539 +
 13 files changed, 811 insertions(+), 620 deletions(-)
 create mode 100644 lib/dpif-netdev-private-dfc.c
 create mode 100644 lib/dpif-netdev-private-dfc.h
 create mode 100644 lib/dpif-netdev-private-dpcls.h
 create mode 100644 lib/dpif-netdev-private-flow.h
 create mode 100644 lib/dpif-netdev-private-thread.h

diff --git a/NEWS b/NEWS
index ebba17b22..96b3a61c8 100644
--- a/NEWS
+++ b/NEWS
@@ -9,6 +9,7 @@ Post-v2.15.0
- Userspace datapath:
  * Auto load balancing of PMDs now partially supports cross-NUMA polling
cases, e.g if all PMD threads are running on the same NUMA node.
+ * Refactor lib/dpif-netdev.c to multiple header files.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
diff --git a/lib/automake.mk b/lib/automake.mk
index db9017591..fdba3c6c0 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -111,6 +111,11 @@ lib_libopenvswitch_la_SOURCES = \
lib/dpif-netdev-lookup-generic.c \
lib/dpif-netdev.c \
lib/dpif-netdev.h \
+   lib/dpif-netdev-private-dfc.c \
+   lib/dpif-netdev-private-dfc.h \
+   lib/dpif-netdev-private-dpcls.h \
+   lib/dpif-netdev-private-flow.h \
+   lib/dpif-netdev-private-thread.h \
lib/dpif-netdev-private.h \
lib/dpif-netdev-perf.c \
lib/dpif-netdev-perf.h \
diff --git a/lib/dpif-netdev-lookup-autovalidator.c 
b/lib/dpif-netdev-lookup-autovalidator.c
index 97b59fdd0..475e1ab1e 100644
--- a/lib/dpif-netdev-lookup-autovalidator.c
+++ b/lib/dpif-netdev-lookup-autovalidator.c
@@ -17,7 +17,6 @@
 #include 
 #include "dpif-netdev.h"
 #include "dpif-netdev-lookup.h"
-#include "dpif-netdev-private.h"
 #include "openvswitch/vlog.h"
 
 VLOG_DEFINE_THIS_MODULE(dpif_lookup_autovalidator);
diff --git a/lib/dpif-netdev-lookup-avx512-gather.c 
b/lib/dpif-netdev-lookup-avx512-gather.c
index 5e3634249..8fc1cdfa5 100644
--- a/lib/dpif-netdev-lookup-avx512-gather.c
+++ b/lib/dpif-netdev-lookup-avx512-gather.c
@@ -21,7 +21,6 @@
 
 #include "dpif-netdev.h"
 #include "dpif-netdev-lookup.h"
-#include "dpif-netdev-private.h"
 #include "cmap.h"
 #include "flow.h"
 #include "pvector.h"
diff --git a/lib/dpif-netdev-lookup-generic.c b/lib/dpif-netdev-lookup-generic.c
index b1a0cfc36..e3b6be4b6 100644
--- a/lib/dpif-netdev-lookup-generic.c
+++ b/lib/dpif-netdev-lookup-generic.c
@@ -17,7 +17,6 @@
 
 #include 
 #include "dpif-netdev.h"
-#include "dpif-netdev-private.h"

[ovs-dev] [v13 02/12] dpif-netdev: Split HWOL out to own header file.

2021-06-17 Thread Cian Ferriter

From: Harry van Haaren 

This commit moves the datapath lookup functions required for
hardware offload to a seperate file. This allows other DPIF
implementations to access the lookup functions, encouraging
code reuse.

Signed-off-by: Harry van Haaren 

---

Cc: Gaetan Rivet 
Cc: Sriharsha Basavapatna 

v13:
- Minor code refactor to address review comments.
---
 lib/automake.mk|  1 +
 lib/dpif-netdev-private-hwol.h | 63 ++
 lib/dpif-netdev.c  | 38 ++--
 3 files changed, 66 insertions(+), 36 deletions(-)
 create mode 100644 lib/dpif-netdev-private-hwol.h

diff --git a/lib/automake.mk b/lib/automake.mk
index fdba3c6c0..3a33cdd5c 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -115,6 +115,7 @@ lib_libopenvswitch_la_SOURCES = \
lib/dpif-netdev-private-dfc.h \
lib/dpif-netdev-private-dpcls.h \
lib/dpif-netdev-private-flow.h \
+   lib/dpif-netdev-private-hwol.h \
lib/dpif-netdev-private-thread.h \
lib/dpif-netdev-private.h \
lib/dpif-netdev-perf.c \
diff --git a/lib/dpif-netdev-private-hwol.h b/lib/dpif-netdev-private-hwol.h
new file mode 100644
index 0..b93297a74
--- /dev/null
+++ b/lib/dpif-netdev-private-hwol.h
@@ -0,0 +1,63 @@
+/*
+ * Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2015 Nicira, Inc.
+ * Copyright (c) 2021 Intel Corporation.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef DPIF_NETDEV_PRIVATE_HWOL_H
+#define DPIF_NETDEV_PRIVATE_HWOL_H 1
+
+#include "dpif-netdev-private-flow.h"
+
+#define MAX_FLOW_MARK   (UINT32_MAX - 1)
+#define INVALID_FLOW_MARK   0
+/* Zero flow mark is used to indicate the HW to remove the mark. A packet
+ * marked with zero mark is received in SW without a mark at all, so it
+ * cannot be used as a valid mark.
+ */
+
+struct megaflow_to_mark_data {
+const struct cmap_node node;
+ovs_u128 mega_ufid;
+uint32_t mark;
+};
+
+struct flow_mark {
+struct cmap megaflow_to_mark;
+struct cmap mark_to_flow;
+struct id_pool *pool;
+};
+
+/* allocated in dpif-netdev.c */
+extern struct flow_mark flow_mark;
+
+static inline struct dp_netdev_flow *
+mark_to_flow_find(const struct dp_netdev_pmd_thread *pmd,
+  const uint32_t mark)
+{
+struct dp_netdev_flow *flow;
+
+CMAP_FOR_EACH_WITH_HASH (flow, mark_node, hash_int(mark, 0),
+ _mark.mark_to_flow) {
+if (flow->mark == mark && flow->pmd_id == pmd->core_id &&
+flow->dead == false) {
+return flow;
+}
+}
+
+return NULL;
+}
+
+
+#endif /* dpif-netdev-private-hwol.h */
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index affeeacdc..e913f4efc 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -18,6 +18,7 @@
 #include "dpif-netdev.h"
 #include "dpif-netdev-private.h"
 #include "dpif-netdev-private-dfc.h"
+#include "dpif-netdev-private-hwol.h"
 
 #include 
 #include 
@@ -1953,26 +1954,8 @@ dp_netdev_pmd_find_dpcls(struct dp_netdev_pmd_thread 
*pmd,
 return cls;
 }
 
-#define MAX_FLOW_MARK   (UINT32_MAX - 1)
-#define INVALID_FLOW_MARK   0
-/* Zero flow mark is used to indicate the HW to remove the mark. A packet
- * marked with zero mark is received in SW without a mark at all, so it
- * cannot be used as a valid mark.
- */
 
-struct megaflow_to_mark_data {
-const struct cmap_node node;
-ovs_u128 mega_ufid;
-uint32_t mark;
-};
-
-struct flow_mark {
-struct cmap megaflow_to_mark;
-struct cmap mark_to_flow;
-struct id_pool *pool;
-};
-
-static struct flow_mark flow_mark = {
+struct flow_mark flow_mark = {
 .megaflow_to_mark = CMAP_INITIALIZER,
 .mark_to_flow = CMAP_INITIALIZER,
 };
@@ -2141,23 +2124,6 @@ flow_mark_flush(struct dp_netdev_pmd_thread *pmd)
 }
 }
 
-static struct dp_netdev_flow *
-mark_to_flow_find(const struct dp_netdev_pmd_thread *pmd,
-  const uint32_t mark)
-{
-struct dp_netdev_flow *flow;
-
-CMAP_FOR_EACH_WITH_HASH (flow, mark_node, hash_int(mark, 0),
- _mark.mark_to_flow) {
-if (flow->mark == mark && flow->pmd_id == pmd->core_id &&
-flow->dead == false) {
-return flow;
-}
-}
-
-return NULL;
-}
-
 static struct dp_flow_offload_item *
 dp_netdev_alloc_flow_offload(struct dp_netdev_pmd_thread *pmd,
  struct dp_netdev_flow *flow,
-- 
2.32.0

[ovs-dev] [v13 03/12] dpif-netdev: Add function pointer for netdev input.

2021-06-17 Thread Cian Ferriter

From: Harry van Haaren 

This commit adds a function pointer to the pmd thread data structure,
giving the pmd thread flexibility in its dpif-input function choice.
This allows choosing of the implementation based on ISA capabilities
of the runtime CPU, leading to optimizations and higher performance.

Signed-off-by: Harry van Haaren 

---

v13:
- Minor code refactor to address review comments.
---
 lib/dpif-netdev-private-thread.h | 13 +
 lib/dpif-netdev.c|  7 ++-
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/lib/dpif-netdev-private-thread.h b/lib/dpif-netdev-private-thread.h
index 5e5308b96..0d674ab83 100644
--- a/lib/dpif-netdev-private-thread.h
+++ b/lib/dpif-netdev-private-thread.h
@@ -47,6 +47,13 @@ struct dp_netdev_pmd_thread_ctx {
 uint32_t emc_insert_min;
 };
 
+/* Forward declaration for typedef. */
+struct dp_netdev_pmd_thread;
+
+typedef void (*dp_netdev_input_func)(struct dp_netdev_pmd_thread *pmd,
+ struct dp_packet_batch *packets,
+ odp_port_t port_no);
+
 /* PMD: Poll modes drivers.  PMD accesses devices via polling to eliminate
  * the performance overhead of interrupt processing.  Therefore netdev can
  * not implement rx-wait for these devices.  dpif-netdev needs to poll
@@ -101,6 +108,12 @@ struct dp_netdev_pmd_thread {
 /* Current context of the PMD thread. */
 struct dp_netdev_pmd_thread_ctx ctx;
 
+/* Function pointer to call for dp_netdev_input() functionality. */
+dp_netdev_input_func netdev_input_func;
+
+/* Pointer for per-DPIF implementation scratch space. */
+void *netdev_input_func_userdata;
+
 struct seq *reload_seq;
 uint64_t last_reload_seq;
 
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index e913f4efc..e6486417e 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4231,8 +4231,9 @@ dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread 
*pmd,
 }
 }
 }
+
 /* Process packet batch. */
-dp_netdev_input(pmd, , port_no);
+pmd->netdev_input_func(pmd, , port_no);
 
 /* Assign processing cycles to rx queue. */
 cycles = cycle_timer_stop(>perf_stats, );
@@ -6029,6 +6030,10 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread 
*pmd, struct dp_netdev *dp,
 hmap_init(>tnl_port_cache);
 hmap_init(>send_port_cache);
 cmap_init(>tx_bonds);
+
+/* Initialize the DPIF function pointer to the default scalar version. */
+pmd->netdev_input_func = dp_netdev_input;
+
 /* init the 'flow_cache' since there is no
  * actual thread created for NON_PMD_CORE_ID. */
 if (core_id == NON_PMD_CORE_ID) {
-- 
2.32.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [v13 00/12] DPIF Framework + Optimizations

2021-06-17 Thread Cian Ferriter

v13 Summary:
- Squash DPCLS function rename commit into the first refactor commit.
- Add NEWS items in the commits where the features are added.
- Add documentation in the commits where the features are added.
- Squash commit which adds HWOL support to AVX512 DPIF into commit which
  adds the AVX512 DPIF.
- Add EMC and SMC batch insert functions for better handling of EMC and
  SMC in AVX512 DPIF.
- Document added commands in manpages as well as rST.

v12 Summary:
- Add a partial HWOL PMD statistic. This is added for both the scalar
  and AVX512 DPIFs.

v11 Summary:
- Improve the dp_netdev_impl_get_default() function so PMD threads created
  after running "dpif-set" command will use the DPIF implementation that was
  set.
- Fix small comment formatting issues.

v10 Summary:
- Removed AVX512 POC work for DPIF and MFEX which was added in v9
-- MFEX patches will be sent separately
- Rebase additions to NEWS entries
- Update copyright notices

v9 Summary:
- Added AVX512 POC work for DPIF and MFEX in single patch at end
-- Note that the AVX512 MFEX is for Ether()/IP()/UDP() traffic.
-- A significant performance boost is possible with these optimizations.

v8 Summary:
- Added NEWS entries for significant changes
- Added scalar optimizations for datapath TX
- Patchset is now ready for merge in my opinion.

v7 summary:
- OVS Conference included DPIF overview, youtube link:
--- https://youtu.be/5dWyPxiXEhg
- Rebased and tested on the DPDK 20.11 v4 patch
--- Link: https://patchwork.ozlabs.org/project/openvswitch/list/?series=220645
--- Tested this series for shared/static builds
--- Tested this series with/without -march=
- Minor code improvements in DPIF component (see commits for details)
- Improved CPU ISA checks, caching results
- Commit message improvements (.'s etc)
- Added performance data of patchset
--- Note that the benchmark below does not utilize the AVX512-vpopcntdq
--- optimizations, and performance is expected to improve when used.
--- Further optimizations are planned that continue.

Benchmark Details & Results
===

Intel® Xeon® Gold 6230 CPU @2.10GHz
OVS*-DPDK* Phy-Phy Performance 4x 25G Ports - Total 1 million flows
1C1T-4P, 64-byte frame size, performance in mpps:

Results Table:
---
DPIF  | Scalar | Scalar | AVX512 | AVX512 |
DPCLS | Scalar | AVX512 | Scalar | AVX512 |
---
mpps  |  6.955 |  7.530 |  7.530 |  7.962 |

By enabling both AVX512 DPIF and DPCLS, packet forwarding
is  7.962 / 6.955 = 1.1447x faster, aka 14% speedup.



v6 summary:
- Rebase to DPDK 20.11 enabling patch
--- This creates a dependency, expect CI build failures on the last
patch in this series if it is not applied!
- Small improvements to DPIF layer
--- EMC/SMC enabling in AVX512 DPIF cleanups
- CPU ISA flags are cached, lowering overhead
- Wilcard Classifier DPCLS
--- Refactor and cleanups for function names
--- Enable more subtable specializations
--- Enable AVX512 vpopcount instruction


v5 summary:
- Dropped MFEX optimizations, re-targetting to a later release
--- This allows focus of community reviews & development on DPIF
--- Note OVS Conference talk still introduces both DPIF and MFEX topics
- DPIF improvements
--- Better EMC/SMC handling
--- HWOL is enabled in the avx512 DPIF
--- Documentation & NEWS items added
--- Various smaller improvements

v4 summary:
- Updated and improve DPIF component
--- SMC now implemented
--- EMC handling improved
--- Novel batching method using AVX512 implemented
--- see commits for details
- Updated Miniflow Extract component
--- Improved AVX512 code path performance
--- Implemented multiple TODO item's in v3
--- Add "disable" implementation to return to scalar miniflow only
--- More fixes planned for v5/future revisions:
 Rename command to better reflect usage
 Improve dynamicness of patterns
 Add more demo protocols to show usage
- Future work
--- Documentation/NEWS items
--- Statistics for optimized MFEX
- Note that this patchset will be discussed/presented at OvsConf soon :)

v3 update summary:
(Cian Ferriter helping with rebases, review and code cleanups)
- Split out partially related changes (these will be sent separately)
--- netdev output action optimization
--- avx512 dpcls 16-block support optimization
- Squash commit which moves netdev struct flow into the refactor commit:
--- Squash dpif-netdev: move netdev flow struct to header
--- Into dpif-netdev: Refactor to multiple header files
- Implement Miniflow extract for AVX-512 DPIF
--- A generic method of matching patterns and packets is implemented,
providing traffic-pattern specific miniflow-extract acceleration.
--- The patterns today are hard-coded, however in a future patchset it
is intended to make these runtime configurable, allowing users to
optimize the SIMD miniflow extract for active traffic types.
- Notes:
--- 32 bit builds will be fixed in next release by adding flexible
miniflow

Re: [ovs-dev] [PATCH ovn 2/2] ovn-macros.at: Enable northd parallelization

2021-06-17 Thread Fabrizio D'Angelo

There's another small mistake here...

> +# Test parallelization with dp groups enabled and disabled
> +m4_define([OVN_NORTHD_PARALLELIZATION_DUMMY], [
> +m4_pushdef([NORTHD_TYPE], [ovn_northd])
> +m4_pushdef(NORTHD_DUMMY_NUMA, [yes])
> +[m4_foreach([NORTHD_USE_DP_GROUPS], [yes, no],
> +[[NORTHD_USE_PARALLELIZATION], [yes]
> +])]])


> +m4_define([OVN_NORTHD_PARALLELIZATION_NO_DUMMY], [
> +m4_pushdef([NORTHD_TYPE], [ovn_northd])
> +m4_pushdef(NORTHD_DUMMY_NUMA, [yes])
> +[m4_foreach([NORTHD_USE_DP_GROUPS], [yes, no],
> +[[NORTHD_USE_PARALLELIZATION], [yes]
> +])]])

The two lines defining NORTH_DUMMY_NUMA are set to yes when the second
should be set to no.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn 2/2] ovn-macros.at: Enable northd parallelization

2021-06-17 Thread Fabrizio D'Angelo

On Wed, Jun 16, 2021 at 4:57 PM Mark Michelson  wrote:
>
> For this and patch 1:
>
> Acked-by: Mark Michelson 
>
> I have a small question below that may or may not require a change to
> the series. If a change is needed, it's so small that it's not worth
> having you upload a new version of the patch. Whoever merges this series
> can make the change.

Thanks Mark.

> > +# Use --dummy-numa if system has low cores
> > +m4_define([HOST_HAS_LOW_CORES], [
> > +if test $(nproc) -le 4; then
>
> Shouldn't this be -lt instead of -le?

I agree.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] datapath-windows: Specify external include paths

2021-06-17 Thread Ilya Maximets

On 6/17/21 11:31 AM, Alin-Gabriel Serdean wrote:
> On Wed, 2021-06-16 at 18:06 +0300, Alin-Gabriel Serdean wrote:
>> On Tue, 2021-06-15 at 18:06 +0200, Ilya Maximets wrote:
>>> On 6/15/21 3:43 PM, Alin Gabriel Serdean wrote:
 VStudio 16.10 adds usermode includes before including the driver
 kit ones.

 Bug tracked at:
 https://developercommunity.visualstudio.com/t/error-lnk2019-unresolved-external-symbol-stdio-com/1434674

 Fixes appveyor build reported by forcing external includes.
>>>
>>> Thanks, Alin.  I know nothing about the windows build process, but
>>> I
>>> see
>>> that this patch fixes the issue with the current AppVeyor CI,
>>> therefore:
>>>
>>> Acked-by: Ilya Maximets 
>>
>> Thank you!
>>
>>> Out of curiosity, is this change backward compatible?  I mean,
>>> is it possible to build on older platform (older VS) with this
>>> change?
>>
>> It should be.
>> Usually we do not need to force the order of include directories. For
>> kernel projects it should default to the kernel includes.
>> I test with the last two versions of VS (2019, 2017).
>>
>> We should add a build matrix for different versions of VS images to
>> appveyor / GHA so we could be sure.
>> I'll try to update the appveyor side.
>>
>> FWIW a new version of VS was launched yesterday (
>> https://docs.microsoft.com/en-us/visualstudio/releases/2019/release-notes#visual-studio-2019-version-1610-releases
>> ), I will try to compile
>> without the patch to see if they hotfixed the issue.
> 
> It did not. Applying the patch.
> 

OK.  CI is green now.  Thanks!

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] datapath-windows: Specify external include paths

2021-06-17 Thread Alin-Gabriel Serdean

On Wed, 2021-06-16 at 18:06 +0300, Alin-Gabriel Serdean wrote:
> On Tue, 2021-06-15 at 18:06 +0200, Ilya Maximets wrote:
> > On 6/15/21 3:43 PM, Alin Gabriel Serdean wrote:
> > > VStudio 16.10 adds usermode includes before including the driver
> > > kit ones.
> > > 
> > > Bug tracked at:
> > > https://developercommunity.visualstudio.com/t/error-lnk2019-unresolved-external-symbol-stdio-com/1434674
> > > 
> > > Fixes appveyor build reported by forcing external includes.
> > 
> > Thanks, Alin.  I know nothing about the windows build process, but
> > I
> > see
> > that this patch fixes the issue with the current AppVeyor CI,
> > therefore:
> > 
> > Acked-by: Ilya Maximets 
> 
> Thank you!
> 
> > Out of curiosity, is this change backward compatible?  I mean,
> > is it possible to build on older platform (older VS) with this
> > change?
> 
> It should be.
> Usually we do not need to force the order of include directories. For
> kernel projects it should default to the kernel includes.
> I test with the last two versions of VS (2019, 2017).
> 
> We should add a build matrix for different versions of VS images to
> appveyor / GHA so we could be sure.
> I'll try to update the appveyor side.
> 
> FWIW a new version of VS was launched yesterday (
> https://docs.microsoft.com/en-us/visualstudio/releases/2019/release-notes#visual-studio-2019-version-1610-releases
> ), I will try to compile
> without the patch to see if they hotfixed the issue.

It did not. Applying the patch.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [PATCH ovn v2] ovn-controller: Propagate nb-cfg-ts to local OVSDB.

2021-06-17 Thread Dumitru Ceara

Also store the timestamp when ovn-controller started up.  This helps
implementing alerts on the CMS side to detect whether ovn-controller is
still alive and functioning well.

Reported-at: https://bugzilla.redhat.com/1924751
Reported-by: Casey Callendrello 
Signed-off-by: Dumitru Ceara 
---
v2:
- Addressed Mark's comments:
  - added units to documentation of timestamp fields.
  - rephrased test comment.
  - did *not* implement the micro optimization suggestion because
there's a chance the local ovsdb gets out of sync (e.g., txns fail
or values are changed externally) and ovn-controller should
reconciliate the database.
---
 controller/ovn-controller.8.xml | 25 +
 controller/ovn-controller.c | 29 +++--
 tests/ovn-controller.at | 11 +++
 3 files changed, 59 insertions(+), 6 deletions(-)

diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml
index 8886df568..77067c3a3 100644
--- a/controller/ovn-controller.8.xml
+++ b/controller/ovn-controller.8.xml
@@ -418,6 +418,18 @@
 
   
 
+  
+external-ids:ovn-startup-ts in the Bridge
+table
+  
+
+  
+
+  This key represents the timestamp (in milliseconds) at which
+  ovn-controller process was started.
+
+  
+
   
 external-ids:ovn-nb-cfg in the Bridge table
   
@@ -429,6 +441,19 @@
   flows have been successfully installed in OVS.
 
   
+
+  
+external-ids:ovn-nb-cfg-ts in the Bridge
+table
+  
+
+  
+
+  This key represents the timestamp (in milliseconds) of the last known
+  OVN_Southbound.SB_Global.nb_cfg value for which all
+  flows have been successfully installed in OVS.
+
+  
 
 
 OVN Southbound Database Usage
diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index addb08755..2f8ceff9f 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -94,6 +94,8 @@ static unixctl_cb_func debug_delay_nb_cfg_report;
 #define CONTROLLER_LOOP_STOPWATCH_NAME "ovn-controller-flow-generation"
 
 #define OVS_NB_CFG_NAME "ovn-nb-cfg"
+#define OVS_NB_CFG_TS_NAME "ovn-nb-cfg-ts"
+#define OVS_STARTUP_TS_NAME "ovn-startup-ts"
 
 static char *parse_options(int argc, char *argv[]);
 OVS_NO_RETURN static void usage(void);
@@ -788,19 +790,30 @@ static void
 store_nb_cfg(struct ovsdb_idl_txn *sb_txn, struct ovsdb_idl_txn *ovs_txn,
  const struct sbrec_chassis_private *chassis,
  const struct ovsrec_bridge *br_int,
- unsigned int delay_nb_cfg_report)
+ unsigned int delay_nb_cfg_report, int64_t startup_ts)
 {
 struct ofctrl_acked_seqnos *acked_nb_cfg_seqnos =
 ofctrl_acked_seqnos_get(ofctrl_seq_type_nb_cfg);
 uint64_t cur_cfg = acked_nb_cfg_seqnos->last_acked;
 
+if (ovs_txn && br_int
+&& startup_ts != smap_get_ullong(_int->external_ids,
+ OVS_STARTUP_TS_NAME, 0)) {
+char *startup_ts_str = xasprintf("%"PRId64, startup_ts);
+ovsrec_bridge_update_external_ids_setkey(br_int, OVS_STARTUP_TS_NAME,
+ startup_ts_str);
+free(startup_ts_str);
+}
+
 if (!cur_cfg) {
 goto done;
 }
 
+long long ts_now = time_wall_msec();
+
 if (sb_txn && chassis && cur_cfg != chassis->nb_cfg) {
 sbrec_chassis_private_set_nb_cfg(chassis, cur_cfg);
-sbrec_chassis_private_set_nb_cfg_timestamp(chassis, time_wall_msec());
+sbrec_chassis_private_set_nb_cfg_timestamp(chassis, ts_now);
 
 if (delay_nb_cfg_report) {
 VLOG_INFO("Sleep for %u sec", delay_nb_cfg_report);
@@ -808,12 +821,15 @@ store_nb_cfg(struct ovsdb_idl_txn *sb_txn, struct 
ovsdb_idl_txn *ovs_txn,
 }
 }
 
-if (ovs_txn && br_int &&
-cur_cfg != smap_get_ullong(_int->external_ids,
-   OVS_NB_CFG_NAME, 0)) {
+if (ovs_txn && br_int && cur_cfg != smap_get_ullong(_int->external_ids,
+OVS_NB_CFG_NAME, 0)) {
+char *cur_cfg_ts_str = xasprintf("%lld", ts_now);
 char *cur_cfg_str = xasprintf("%"PRId64, cur_cfg);
 ovsrec_bridge_update_external_ids_setkey(br_int, OVS_NB_CFG_NAME,
  cur_cfg_str);
+ovsrec_bridge_update_external_ids_setkey(br_int, OVS_NB_CFG_TS_NAME,
+ cur_cfg_ts_str);
+free(cur_cfg_ts_str);
 free(cur_cfg_str);
 }
 
@@ -2987,6 +3003,7 @@ main(int argc, char *argv[])
 /* Main loop. */
 exiting = false;
 restart = false;
+int64_t startup_ts = time_wall_msec();
 bool sb_monitor_all = false;
 while (!exiting) {
 memory_run();
@@ -3234,7 +3251,7 @@ main(int argc, char

Re: [ovs-dev] [PATCH] system-dpdk: Negotiation Tests for TSO

2021-06-17 Thread Meher Chinwala

This patch is based on this original patch 
https://patchwork.ozlabs.org/project/openvswitch/patch/3c2f1fc66cb1d3281db9001c6a0babb6d162c27a.1594312857.git.gmuth...@redhat.com/ 
which was not merged.

I have addressed the review comments and made the required changes.

On 6/15/21 3:20 PM, root wrote:

From: Meher Chinwala 

system-dpdk: Negotiation Tests for TSO

This patch adds negotiation tests for checking whether TSO is enabled or not in 
OVS and in TestPMD for 4 diferent scenarios.

Signed-off-by: Meher Chinwala 
---
  tests/system-dpdk-macros.at | 43 ++
  tests/system-dpdk.at| 71 +
  2 files changed, 114 insertions(+)

diff --git a/tests/system-dpdk-macros.at b/tests/system-dpdk-macros.at
index c6708caaf..bdfdac1b0 100644
--- a/tests/system-dpdk-macros.at
+++ b/tests/system-dpdk-macros.at
@@ -63,3 +63,46 @@ m4_define([OVS_DPDK_START],
 AT_CAPTURE_FILE([ovs-vswitchd.log])
 on_exit "kill_ovs_vswitchd `cat ovs-vswitchd.pid`"
  ])
+
+
+# NEGOTIATION_TEST_TSO([testpmd], [ovs])
+#
+# Test whether TSO is being enabled for OVS and TestPMD. The arguments denote 
whether
+# TSO is enabled for testpmd and ovs or not, respectively.
+#
+m4_define([NEGOTIATION_TEST_TSO],
+  [
+   OVS_DPDK_START()
+   AS_IF([test $1 -eq 1], [AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . 
other_config:userspace-tso-enable=true])], [AT_CHECK([ovs-vsctl --no-wait set 
Open_vSwitch . other_config:userspace-tso-enable=false])])
+   AS_IF([test $1 -eq 1], [OVS_WAIT_UNTIL([grep 'Userspace TCP Segmentation 
Offloading support enabled' ovs-vswitchd.log])], [])
+   AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
+   AT_CHECK([ovs-vsctl add-port br10 dpdkvhostuserclient0 -- set Interface 
dpdkvhostuserclient0 type=dpdkvhostuserclient 
options:vhost-server-path=$OVS_RUNDIR/dpdkvhostclient0], [], [stdout], [stderr])
+   AT_CHECK([ovs-vsctl show], [], [stdout])
+   on_exit "pkill -f -x -9 'tail -f /dev/null'"
+   AT_CHECK([echo "show device info all" > CMDFILE])
+   AT_CHECK([echo "stop" >> CMDFILE])
+   AT_CHECK([echo "port stop 0" >> CMDFILE])
+   AS_IF([test $2 -eq 1], [AT_CHECK([echo "tso set 1500 0" >> CMDFILE], [])], 
[])
+   AS_IF([test $2 -eq 1], [AT_CHECK([echo "csum set tcp hw 0" >> CMDFILE], 
[])], [])
+   AT_CHECK([echo "port start 0" >> CMDFILE])
+   AT_CHECK([echo "start" >> CMDFILE])
+   AT_CHECK([echo "show port 0 tx_offload capabilities" >> CMDFILE])
+   AT_CHECK([echo "show port 0 tx_offload configuration" >> CMDFILE])
+   AT_CHECK([lscpu], [], [stdout])
+   AT_CHECK([cat stdout | grep "NUMA node(s)" | awk '{c=1; while (c++<$(3)) {printf 
"512,"}; print "512"}' > NUMA_NODE])
+   tail -f /dev/null | dpdk-testpmd --socket-mem="$(cat NUMA_NODE)" --no-pci\
+   --vdev="net_virtio_user,path=$OVS_RUNDIR/dpdkvhostclient0,server=1" \
+   --vdev="net_tap0,iface=tap0" --file-prefix page0 \
+   --single-file-segments -- --cmdline-file=CMDFILE \
+   -a >$OVS_RUNDIR/testpmd-dpdkvhostuserclient0.log 2>&1 &
+
+   sleep 10
+   pkill -f -x -9 'tail -f /dev/null'
+
+   sleep 1
+   AT_CHECK([awk '/negotiated Virtio features/ {a=$NF} END{print a}' 
ovs-vswitchd.log],[],[stdout])
+
+   AS_IF([test $1 -eq 1 && test $2 -eq 1], [AT_CHECK([printf "%X" $(( $(cat stdout) & ((1<<0)|(1<<11)|(1<<12)) 
))],[],[1801])], [AT_CHECK([printf "%X" $(( $(cat stdout) & ((1<<0)|(1<<11)|(1<<12)) ))],[],[0])])
+   AS_IF([test $2 -eq 1], [AT_CHECK([grep "Port : TCP_CKSUM TCP_TSO" 
$OVS_RUNDIR/testpmd-dpdkvhostuserclient0.log], [0], [stdout], [stderr])], [])
+   AT_CHECK([ovs-vsctl del-port br10 dpdkvhostuserclient0], [], [stdout], 
[stderr])
+   ])
diff --git a/tests/system-dpdk.at b/tests/system-dpdk.at
index 802895488..d9399ecb9 100644
--- a/tests/system-dpdk.at
+++ b/tests/system-dpdk.at
@@ -232,3 +232,74 @@ OVS_VSWITCHD_STOP(["\@does not exist. The Open vSwitch 
kernel module is probably
  \@EAL: No free hugepages reported in hugepages-1048576kB@d"])
  AT_CLEANUP
  dnl --
+
+
+dnl --
+dnl Negotiation tests for TSO - both OVS and TestPMD have TSO turned on
+
+AT_SETUP([NEGOTIATION TEST FOR TSO ENABLED FOR TESTPMD AND OVS])
+AT_KEYWORDS([dpdk])
+OVS_DPDK_PRE_CHECK()
+OVS_DB_START()
+NEGOTIATION_TEST_TSO([1],[1])
+OVS_VSWITCHD_STOP(["
+\@EAL: No available hugepages reported in hugepages-1048576kB@d
+\@EAL:   Invalid NUMA socket, default to 0@d
+\@VHOST_CONFIG: failed to connect to $OVS_RUNDIR/dpdkvhostclient0: No such file or 
directory@d"])
+AT_CLEANUP
+
+dnl --
+
+
+
+dnl --
+dnl Negotiation tests - TSO enabled for OVS, disabled for TestPMD
+
+AT_SETUP([NEGOTIATION TEST FOR TSO ENABLED FOR OVS, DISABLED FOR TESTPMD])
+AT_KEYWORDS([dpdk])
+OVS_DPDK_PRE_CHECK()
+OVS_DB_START()

47 matches

Mail list logo