Re: [ovs-dev] [PATCH] ovs-rcu: Remove unneeded mutex from struct ovsrcu_perthread.

2021-06-29 Thread David Marchand
Hello Ben,

On Tue, Jun 29, 2021 at 5:36 PM Ben Pfaff  wrote:
>
> It was not really used.

If I am not mistaken, this is the same patch as
https://patchwork.ozlabs.org/project/openvswitch/patch/d89cc03128a6e449ce49f729b9eeafe687356b4e.1621517561.git.gr...@u256.net/.

>
> Signed-off-by: Ben Pfaff 
> Reported-by: 贺鹏 


-- 
David Marchand

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC PATCH ovn] Introduce representor port plugging support

2021-06-29 Thread Numan Siddique
On Thu, Jun 10, 2021 at 10:13 AM Frode Nordahl
 wrote:
>
> On Thu, Jun 10, 2021 at 1:46 PM Ilya Maximets  wrote:
> >
> > On 6/10/21 8:36 AM, Han Zhou wrote:
> > >
> > >
> > > On Thu, May 13, 2021 at 9:25 AM Frode Nordahl 
> > > mailto:frode.nord...@canonical.com>> wrote:
> > >>
> > >> On Thu, May 13, 2021 at 5:12 PM Ilya Maximets  > >> > wrote:
> > >> >
> > >> > On 5/9/21 4:03 PM, Frode Nordahl wrote:
> > >> > > Introduce plugging module that adds and removes ports on the
> > >> > > integration bridge, as directed by Port_Binding options.
> > >> > >
> > >> > > Traditionally it has been the CMSs responsibility to create Virtual
> > >> > > Interfaces (VIFs) as part of instance (Container, Pod, Virtual
> > >> > > Machine etc.) life cycle, and subsequently manage plug/unplug
> > >> > > operations on the Open vSwitch integration bridge.
> > >> > >
> > >> > > With the advent of NICs connected to multiple distinct CPUs we can
> > >> > > have a topology where the instance runs on one host and Open
> > >> > > vSwitch and OVN runs on a different host, the smartnic CPU.
> > >> > >
> > >> > > The act of plugging and unplugging the representor port in Open
> > >> > > vSwitch running on the smartnic host CPU would be the same for
> > >> > > every smartnic variant (thanks to the devlink-port[0][1]
> > >> > > infrastructure) and every CMS (Kubernetes, LXD, OpenStack, etc.).
> > >> > > As such it is natural to extend OVN to provide this common
> > >> > > functionality through its CMS facing API.
> > >> >
> > >> > Hi, Frode.  Thanks for putting this together, but it doesn't look
> > >> > natural to me.  OVN, AFAIK, never touched physical devices or
> > >> > interacted with the kernel directly.  This change introduces completely
> > >> > new functionality inside OVN.  With the same effect we can run a fully
> > >> > separate service on these smartnic CPUs that will do plugging
> > >> > and configuration job for CMS.  You may even make it independent
> > >> > from a particular CMS by creating a REST API for it or whatever.
> > >> > This will additionally allow using same service for non-OVN setups.
> > >>
> > >> Ilya,
> > >>
> > >> Thank you for taking the time to comment, much appreciated.
> > >>
> > >> Yes, this is new functionality, NICs with separate control plane CPUs
> > >> and isolation from the host are also new, so this is one proposal for
> > >> how we could go about to enable the use of them.
> > >>
> > >> The OVN controller does today get pretty close to the physical realm
> > >> by maintaining patch ports in Open vSwitch based on bridge mapping
> > >> configuration and presence of bridges to physical interfaces. It also
> > >> does react to events of physical interfaces being plugged into the
> > >> Open vSwitch instance it manages, albeit to date some other entity has
> > >> been doing the act of adding the port into the bridge.
> > >>
> > >> The rationale for proposing to use the OVN database for coordinating
> > >> this is that the information about which ports to bind, and where to
> > >> bind them is already there. The timing of the information flow from
> > >> the CMS is also suitable for the task.
> > >>
> > >> OVN relies on OVS library code, and all the necessary libraries for
> > >> interfacing with the kernel through netlink and friends are there or
> > >> would be easy to add. The rationale for using the netlink-devlink
> > >> interface is that it provides a generic infrastructure for these types
> > >> of NICs. So by using this interface we should be able to support most
> > >> if not all of the variants of these cards.
> > >>
> > >>
> > >> Providing a separate OVN service to do the task could work, but would
> > >> have the cost of an extra SB DB connection, IDL and monitors.
> >
> > IMHO, CMS should never connect to Southbound DB.  It's just because the
> > Southbound DB is not meant to be a public interface, it just happened
> > to be available for connections.  I know that OpenStack has metadata
> > agents that connects to Sb DB, but if it's really required for them, I
> > think, there should be a different way to get/set required information
> > without connection to the Southbound.
>
> The CMS-facing API is the Northbound DB, I was not suggesting direct
> use of the Southbound DB by external to OVN services. My suggestion
> was to have a separate OVN process do this if your objection was to
> handle it as part of the ovn-controller process.
>
> > >>
> > >> I fear it would be quite hard to build a whole separate project with
> > >> its own API, feels like a lot of duplicated effort when the flow of
> > >> data and APIs in OVN already align so well with CMSs interested in
> > >> using this?
> > >>
> > >> > Interactions with physical devices also makes OVN linux-dependent
> > >> > at least for this use case, IIUC.
> > >>
> > >> This specific bit would be linux-specific in the first iteration, yes.
> > >> But the vendors manufacturing and distributing the hardware do 

Re: [ovs-dev] ovn-northd-ddlog scale issues

2021-06-29 Thread Ben Pfaff
On Mon, Jun 28, 2021 at 05:40:53PM +0200, Dumitru Ceara wrote:
> On 5/20/21 5:50 PM, Ben Pfaff wrote:
> > On Thu, May 20, 2021 at 05:06:26PM +0200, Dumitru Ceara wrote:
> >> On 4/7/21 6:49 PM, Ben Pfaff wrote:
> >>
> >> [...]
> >>
> 
>  Thanks!  I can download them now.  It's back on my to-do list.
> >>>
> >>> I can reproduce the problem now.  I haven't fixed it yet, but I did fix
> >>> a nasty performance problem in ovn-nbctl that became really apparent
> >>> when working with your databases:
> >>> https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/381909.html
> >>
> >> I was wondering if you had a chance to look at this since.
> > 
> > I haven't kept going.  I consider my series that gives a 5x performance
> > improvement a kind of checkpoint along the way.  I assumed at first that
> > it would get reviewed quickly so I could move on to other things, but no
> > one has looked at it yet.
> > 
> 
> 
> Hi Ben,
> 
> Just a note, I've tried this again with ovn-northd-ddlog built from
> current OVN master branch, running against the same DBs:

I've identified the problem.  It's because of the ReachableLogicalRouter
relation, which holds all pairs of routers (A,B) such that a packet at
router A can transit switches and rotuers to arrive at router B.  This
is inherently O(n**2) and in this example n is about 8,000.

I'll fix it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 ovn 0/9] northd: rework ovn-northd lb flow installation

2021-06-29 Thread Numan Siddique
On Fri, Jun 18, 2021 at 9:04 AM Lorenzo Bianconi
 wrote:
>
> Rework lb flow logic in order to visit first each load_balancer and then
> related datapath during lb flow installation.
> This patch allows to reduce memory footprint and cpu utilization in
> ovn-northd.
>

Hi Lorenzo,

The ovsrobot CI runs show failures with memory leaks.  Please take a
look at them - https://github.com/ovsrobot/ovn/runs/2859118644

Numan


northd: move snat_type out of vip loop

> Testing environment:
> ovn-nbctl lr-list |wc -l
> 308
> ovn-nbctl ls-list |wc -l
> 615
> ovn-nbctl lb-list |wc -l
> 14524
>
> Time needed for build_lrouter_lb_flows() to run for all datapaths/lbs 
> (logical routers)
> Total samples: 22
> Maximum: 6937 msec
> Minimum: 6869 msec
> 95th percentile: 6933.00 msec
> Short term average: 6916.599206 msec
> Long term average: 6914.809656 msec
>
> Time needed for build_pre_lb()/build_stateful()[lb-only] to run for all 
> datapaths/lbs (logical switches)
>   Total samples: 20
>   Maximum: 1735 msec
>   Minimum: 1693 msec
>   95th percentile: 1735.00 msec
>   Short term average: 1731.136610 msec
>   Long term average: 1698.853040 msec
>
> Time needed for build_lrouter_flows_for_lb() to run for all lbs/datapaths 
> (logical routers)
>Total samples: 22
>Maximum: 2745 msec
>Minimum: 2674 msec
>95th percentile: 2742.00 msec
>Short term average: 2724.775973 msec
>Long term average: 2681.334522 msec
>
> Time needed for build_lswitch_flows_for_lb() to run for all lbs/datapaths 
> (logical switches)
>   Total samples: 20
>   Maximum: 406 msec
>   Minimum: 354 msec
>   95th percentile: 406.00 msec
>   Short term average: 383.915676 msec
>   Long term average: 363.318006 mse
>
>
> This series does not introduce any new feature to ovn-northd.
>
> Changes since v1:
> - rebase ontop of ovn-master
> - add build_lswitch_flows_for_lb routine
>
> Lorenzo Bianconi (9):
>   northd: move snat_type out of vip loop
>   lib: link logical routers assigned for the same lb
>   northd: move build_empty_lb_event_flow in build_lrouter_flows_for_lb
>   northd: move lb_{skip,force}_snat code in
> build_lrouter_snat_flows_for_lb
>   northd: get rid of add_router_lb_flow
>   northd: remove dead code in build_lrouter_nat_defrag_and_lb
>   lb: link logical switches assigned for the same lb
>   northd: move build_empty_lb_event_flow in build_lswitch_flows_for_lb
>   northd: move build_lb_rules in build_lswitch_flows_for_lb
>
>  lib/lb.c|  22 ++
>  lib/lb.h|  12 +
>  northd/ovn-northd.c | 606 +++-
>  3 files changed, 403 insertions(+), 237 deletions(-)
>
> --
> 2.31.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] Set release date for 21.06.0.

2021-06-29 Thread Numan Siddique
On Tue, Jun 29, 2021 at 4:57 PM Mark Michelson  wrote:
>
> Thanks for fixing my error, Numan.
>
> Acked-by: Mark Michelson 

Thanks.  I applied to the main branch.

Numan

>
> On 6/25/21 8:00 PM, num...@ovn.org wrote:
> > From: Numan Siddique 
> >
> > And also prepare for 21.6.90.  This was missed out when v21.06.0 was
> > released.
> >
> > Signed-off-by: Numan Siddique 
> > ---
> >   NEWS | 6 +-
> >   configure.ac | 2 +-
> >   debian/changelog | 8 +++-
> >   3 files changed, 13 insertions(+), 3 deletions(-)
> >
> > diff --git a/NEWS b/NEWS
> > index 0da7d8f97c..e779892085 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -1,4 +1,8 @@
> > -OVN v21.06.0 - 11 May 2021
> > +Post-v21.06.0
> > +-
> > +
> > +
> > +OVN v21.06.0 - 18 Jun 2021
> >   -
> > - ovn-northd-ddlog: New implementation of northd, based on DDlog.  This
> >   implementation is incremental, meaning that it only recalculates what 
> > is
> > diff --git a/configure.ac b/configure.ac
> > index 53034388a3..df0b982952 100644
> > --- a/configure.ac
> > +++ b/configure.ac
> > @@ -13,7 +13,7 @@
> >   # limitations under the License.
> >
> >   AC_PREREQ(2.63)
> > -AC_INIT(ovn, 21.06.0, b...@openvswitch.org)
> > +AC_INIT(ovn, 21.06.90, b...@openvswitch.org)
> >   AC_CONFIG_MACRO_DIR([m4])
> >   AC_CONFIG_AUX_DIR([build-aux])
> >   AC_CONFIG_HEADERS([config.h])
> > diff --git a/debian/changelog b/debian/changelog
> > index 1667407305..81aaed3079 100644
> > --- a/debian/changelog
> > +++ b/debian/changelog
> > @@ -1,8 +1,14 @@
> > +ovn (21.06.90-1) unstable; urgency=low
> > +
> > +   * New upstream version
> > +
> > + -- OVN team   Fri, 18 Jun 2021 13:21:08 -0400
> > +
> >   ovn (21.06.0-1) unstable; urgency=low
> >
> >  * New upstream version
> >
> > - -- OVN team   Fri, 11 May 2021 12:00:00 -0500
> > + -- OVN team   Fri, 18 Jun 2021 13:21:08 -0400
> >
> >   ovn (21.03.0-1) unstable; urgency=low
> >
> >
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v8 4/6] northd: Add IP routing and ARP resolution flows for NAT/LB addresses.

2021-06-29 Thread Numan Siddique
On Thu, Jun 3, 2021 at 2:50 PM Mark Michelson  wrote:
>
> Dealing with NAT and load balancer IPs has been a bit of a pain point.
> It requires creating static routes if east-west traffic to those
> addresses is desired. Further, it requires ARPs to be sent between the
> logical routers in order to create MAC Bindings.
>
> This commit seeks to make things easier. NAT and load balancer addresess
> automatically have IP routing logical flows and ARP resolution logical
> flows created for reachable routers. This eliminates the need to create
> static routes, and it also eliminates the need for ARPs to be sent
> between logical routers.
>
> In this commit, the behavior is not optional. The next commit will
> introduce configuration to make the behavior optional.
>
> Signed-off-by: Mark Michelson 
> ---
>  northd/ovn-northd.c  | 133 ++-
>  northd/ovn_northd.dl |  57 
>  tests/ovn-northd.at  | 214 +++
>  3 files changed, 399 insertions(+), 5 deletions(-)
>

Hi Mark,

Thanks for the patch series.

Overall the patch looks good to me.

There are a couple of  things which need to be addressed
  - You need to rebase the whole series.

  -  Some test cases are failing and libasan is complaining about
memory leaks - https://github.com/ovsrobot/ovn/runs/2740075247
**
   ==348806==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 912 byte(s) in 57 object(s) allocated from:
#0 0x7f01a1b40d1f in __interceptor_malloc (/lib64/libasan.so.6+0xaed1f)
#1 0x5c2c6d in xmalloc__ ../lib/util.c:137
#2 0x5c2c6d in xmalloc ../lib/util.c:172
#3 0x4102a5 in assign_routable_addresses ../northd/ovn-northd.c:1488
#4 0x418933 in join_logical_ports ../northd/ovn-northd.c:2468
#5 0x422f91 in build_ports ../northd/ovn-northd.c:3688
#6 0x461f31 in ovnnb_db_run ../northd/ovn-northd.c:13452
#7 0x464ffb in ovn_db_run ../northd/ovn-northd.c:14123
#8 0x46860c in main ../northd/ovn-northd.c:14612
#9 0x7f01a129fb74 in __libc_start_main (/lib64/libc.so.6+0x27b74)

Direct leak of 57 byte(s) in 57 object(s) allocated from:
#0 0x7f01a1b40d1f in __interceptor_malloc (/lib64/libasan.so.6+0xaed1f)
#1 0x5c2f79 in xcalloc__ ../lib/util.c:121
#2 0x5c2f79 in xcalloc ../lib/util.c:158
#3 0x418933 in join_logical_ports ../northd/ovn-northd.c:2468
#4 0x422f91 in build_ports ../northd/ovn-northd.c:3688
#5 0x461f31 in ovnnb_db_run ../northd/ovn-northd.c:13452
#6 0x464ffb in ovn_db_run ../northd/ovn-northd.c:14123
#7 0x46860c in main ../northd/ovn-northd.c:14612
#8 0x7f01a129fb74 in __libc_start_main (/lib64/libc.so.6+0x27b74)
***

  - You need to add missing documentation in ovn-northd.8.xml for the
newly added lflows.


Thanks
Numan


> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index ef4f5b790..3b9cad80b 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -1353,6 +1353,21 @@ build_datapaths(struct northd_context *ctx, struct 
> hmap *datapaths,
>  }
>  }
>
> +/* Structure representing logical router port
> + * routable addresses. This includes DNAT and Load Balancer
> + * addresses. This structure will only be filled in if the
> + * router port is a gateway router port. Otherwise, all pointers
> + * will be NULL and n_addrs will be 0.
> + */
> +struct ovn_port_routable_addresses {
> +/* Array of address strings suitable for writing to a database table */
> +char **addresses;
> +/* The addresses field parsed into component parts */
> +struct lport_addresses *laddrs;
> +/* Number of items in each of the above arrays */
> +size_t n_addrs;
> +};
> +
>  /* A logical switch port or logical router port.
>   *
>   * In steady state, an ovn_port points to a northbound Logical_Switch_Port
> @@ -1396,6 +1411,8 @@ struct ovn_port {
>
>  struct lport_addresses lrp_networks;
>
> +struct ovn_port_routable_addresses routables;
> +
>  /* Logical port multicast data. */
>  struct mcast_port_info mcast_info;
>
> @@ -1422,6 +1439,48 @@ struct ovn_port {
>  struct ovs_list list;   /* In list of similar records. */
>  };
>
> +static void
> +destroy_routable_addresses(struct ovn_port_routable_addresses *ra)
> +{
> +if (ra->n_addrs == 0) {
> +return;
> +}
> +
> +for (size_t i = 0; i < ra->n_addrs; i++) {
> +free(ra->addresses[i]);
> +destroy_lport_addresses(>laddrs[i]);
> +}
> +free(ra->addresses);
> +free(ra->laddrs);
> +}
> +
> +static char **get_nat_addresses(const struct ovn_port *op, size_t *n);
> +
> +static void
> +assign_routable_addresses(struct ovn_port *op)
> +{
> +size_t n;
> +char **nats = get_nat_addresses(op, );
> +
> +if (!nats) {
> +return;
> +}
> +
> +struct lport_addresses *laddrs = xcalloc(n, sizeof(*laddrs));
> +for (size_t i = 0; i < n; i++) {
> +int ofs;
> +if (!extract_addresses(nats[i], [i], )){
> +continue;

Re: [ovs-dev] [PATCH ovn] Set release date for 21.06.0.

2021-06-29 Thread Mark Michelson

Thanks for fixing my error, Numan.

Acked-by: Mark Michelson 

On 6/25/21 8:00 PM, num...@ovn.org wrote:

From: Numan Siddique 

And also prepare for 21.6.90.  This was missed out when v21.06.0 was
released.

Signed-off-by: Numan Siddique 
---
  NEWS | 6 +-
  configure.ac | 2 +-
  debian/changelog | 8 +++-
  3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/NEWS b/NEWS
index 0da7d8f97c..e779892085 100644
--- a/NEWS
+++ b/NEWS
@@ -1,4 +1,8 @@
-OVN v21.06.0 - 11 May 2021
+Post-v21.06.0
+-
+
+
+OVN v21.06.0 - 18 Jun 2021
  -
- ovn-northd-ddlog: New implementation of northd, based on DDlog.  This
  implementation is incremental, meaning that it only recalculates what is
diff --git a/configure.ac b/configure.ac
index 53034388a3..df0b982952 100644
--- a/configure.ac
+++ b/configure.ac
@@ -13,7 +13,7 @@
  # limitations under the License.
  
  AC_PREREQ(2.63)

-AC_INIT(ovn, 21.06.0, b...@openvswitch.org)
+AC_INIT(ovn, 21.06.90, b...@openvswitch.org)
  AC_CONFIG_MACRO_DIR([m4])
  AC_CONFIG_AUX_DIR([build-aux])
  AC_CONFIG_HEADERS([config.h])
diff --git a/debian/changelog b/debian/changelog
index 1667407305..81aaed3079 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,8 +1,14 @@
+ovn (21.06.90-1) unstable; urgency=low
+
+   * New upstream version
+
+ -- OVN team   Fri, 18 Jun 2021 13:21:08 -0400
+
  ovn (21.06.0-1) unstable; urgency=low
  
 * New upstream version
  
- -- OVN team   Fri, 11 May 2021 12:00:00 -0500

+ -- OVN team   Fri, 18 Jun 2021 13:21:08 -0400
  
  ovn (21.03.0-1) unstable; urgency=low
  



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v10] ofproto-dpif: APIs and CLI option to add/delete static fdb entry

2021-06-29 Thread Vasu Dasari
Hi Eelco/Ben,

I found that the documentation for ovs-vswitchd did not reflect the last
minute change I made to the fdb/del syntax. This last patch v11

version reflects that change.

Thanks
-Vasu

*Vasu Dasari*


On Tue, Jun 29, 2021 at 3:54 PM Vasu Dasari  wrote:

> Thank you Eelco for your careful review. I appreciate and thank you for
> all your comments.
>
> -Vasu
>
> *Vasu Dasari*
>
>
> On Tue, Jun 29, 2021 at 11:08 AM Eelco Chaudron 
> wrote:
>
>>
>>
>> On 29 Jun 2021, at 15:19, Vasu Dasari wrote:
>>
>> > Currently there is an option to add/flush/show ARP/ND neighbor. This
>> covers L3
>> > side.  For L2 side, there is only fdb show command. This patch gives an
>> option
>> > to add/del an fdb entry via ovs-appctl.
>> >
>> > CLI command looks like:
>> >
>> > To add:
>> > ovs-appctl fdb/add
>> > ovs-appctl fdb/add br0 p1 0 50:54:00:00:00:05
>> >
>> > To del:
>> > ovs-appctl fdb/del   
>> > ovs-appctl fdb/del br0 0 50:54:00:00:00:05
>> >
>> > Added two new APIs to provide convenient interface to add and delete
>> static-macs.
>> > bool xlate_add_static_mac_entry(const struct ofproto_dpif *, ofp_port_t
>> in_port,
>> >struct eth_addr dl_src, int vlan);
>> > bool xlate_delete_static_mac_entry(const struct ofproto_dpif *,
>> >   struct eth_addr dl_src, int vlan);
>> >
>> > 1. Static entry should not age. To indicate that entry being programmed
>> is a static entry,
>> >'expires' field in 'struct mac_entry' will be set to a
>> MAC_ENTRY_AGE_STATIC_ENTRY. A
>> >check for this value is made while deleting mac entry as part of
>> regular aging process.
>> > 2. Another change to of mac-update logic, when a packet with same
>> dl_src as that of a
>> >static-mac entry arrives on any port, the logic will not modify the
>> expires field.
>> > 3. While flushing fdb entries, made sure static ones are not evicted.
>> > 4. Updated "ovs-appctl fdb/stats-show br0" to display numberof static
>> entries in switch
>> >
>> > Added following tests:
>> >   ofproto-dpif - static-mac add/del/flush
>> >   ofproto-dpif - static-mac mac moves
>> >
>> > Signed-off-by: Vasu Dasari 
>> > Reported-at:
>> https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048894.html
>> > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1597752
>> > Tested-by: Eelco Chaudron 
>> > Acked-by: Eelco Chaudron 
>> > ---
>>
>> Thanks for your patience to follow this trough!
>>
>> Acked-by: Eelco Chaudron 
>>
>> //Eelco
>>
>>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v11] ofproto-dpif: APIs and CLI option to add/delete static fdb entry

2021-06-29 Thread Vasu Dasari
Currently there is an option to add/flush/show ARP/ND neighbor. This covers L3
side.  For L2 side, there is only fdb show command. This patch gives an option
to add/del an fdb entry via ovs-appctl.

CLI command looks like:

To add:
ovs-appctl fdb/add
ovs-appctl fdb/add br0 p1 0 50:54:00:00:00:05

To del:
ovs-appctl fdb/del   
ovs-appctl fdb/del br0 0 50:54:00:00:00:05

Added two new APIs to provide convenient interface to add and delete 
static-macs.
bool xlate_add_static_mac_entry(const struct ofproto_dpif *, ofp_port_t in_port,
   struct eth_addr dl_src, int vlan);
bool xlate_delete_static_mac_entry(const struct ofproto_dpif *,
  struct eth_addr dl_src, int vlan);

1. Static entry should not age. To indicate that entry being programmed is a 
static entry,
   'expires' field in 'struct mac_entry' will be set to a 
MAC_ENTRY_AGE_STATIC_ENTRY. A
   check for this value is made while deleting mac entry as part of regular 
aging process.
2. Another change to of mac-update logic, when a packet with same dl_src as 
that of a
   static-mac entry arrives on any port, the logic will not modify the expires 
field.
3. While flushing fdb entries, made sure static ones are not evicted.
4. Updated "ovs-appctl fdb/stats-show br0" to display numberof static entries 
in switch

Added following tests:
  ofproto-dpif - static-mac add/del/flush
  ofproto-dpif - static-mac mac moves

Signed-off-by: Vasu Dasari 
Reported-at: 
https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048894.html
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1597752
Tested-by: Eelco Chaudron 
Acked-by: Eelco Chaudron 
---
v1:
 - Fixed 0-day robot warnings
v2:
 - Fix valgrind error in the modified code in mac_learning_insert() where a 
read is
   is performed on e->expires which is not initialized
v3:
 - Addressed code review comments
 - Added more documentation
 - Fixed mac_entry_age() and is_mac_learning_update_needed() to have common
   understanding of return values when mac_entry is a static one.
 - Added NEWS item
v4:
 - Addressed code review comments
 - Static entries will not be purged when fdb/flush is performed.
 - Static entries will not be overwritten when packet with same dl_src arrives 
on
   any port of switch
 - Provided bit more detail while doing fdb/add, to indicate if static-mac is
   overriding already present entry
 - Separated test cases for a bit more clarity
 v5:
 - Addressed code review comments
 - Added new total_static counter to count number of static entries.
 - Removed mac_entry_set_idle_time()
 - Added mac_learning_add_static_entry() and mac_learning_del_static_entry()
 - Modified APIs xlate_add_static_mac_entry() and 
xlate_delete_static_mac_entry()
   return 0 on success else a failure code
 v6:
 - Fixed a probable bug with Eelco's code review comments in
   is_mac_learning_update_needed()
 v7:
 - Added a ovs-vswitchd.8 man page entry for fdb add/del commands
 v8:
 - Updaed with code review comments from Eelco.
 - Renamed total_static to static_entries
 - Added coverage counter mac_learning_static_none_move
 - Fixed a possible bug with static_entries getting cleared via
   fdb/stats-clear command
 - Initialize static_entries in mac_learning_create()
 - Modified fdb/del command by removing option to specify port-name
 - Breakup ofproto_unixctl_fdb_update into ofproto_unixctl_fdb_add
   and ofproto_unixctl_fdb_delete
 - Updated test "static-mac add/del/flush" to have interleaved mac
   entries before fdb/flush
 - Updated test "static-mac mac move" to check for newly added
   coverage counter mac_learning_static_none_move
v9:
 - Updated source code comments and addressed code review comments
 v10:
 - Simplified error code paths in ofproto_unixctl_fdb_{add,delete}
   functions
v11:
 - Fix ovs-vswitchd man page documentation to reflect changed syntax
---
 NEWS |   4 +
 lib/mac-learning.c   | 155 +++
 lib/mac-learning.h   |  17 
 ofproto/ofproto-dpif-xlate.c |  48 +--
 ofproto/ofproto-dpif-xlate.h |   5 ++
 ofproto/ofproto-dpif.c   | 111 -
 tests/ofproto-dpif.at|  99 ++
 vswitchd/ovs-vswitchd.8.in   |   6 ++
 8 files changed, 416 insertions(+), 29 deletions(-)

diff --git a/NEWS b/NEWS
index f02f07cdf..909e88c6d 100644
--- a/NEWS
+++ b/NEWS
@@ -25,6 +25,10 @@ Post-v2.15.0
- ovsdb-tool:
  * New option '--election-timer' to the 'create-cluster' command to set the
leader election timer during cluster creation.
+   - ovs-appctl:
+ * Added ability to add and delete static mac entries using:
+   'ovs-appctl fdb/add'
+   'ovs-appctl fdb/del   '
 
 
 v2.15.0 - 15 Feb 2021
diff --git a/lib/mac-learning.c b/lib/mac-learning.c
index 3d5293d3b..dd3f46a8b 100644
--- a/lib/mac-learning.c
+++ b/lib/mac-learning.c
@@ -34,13 +34,25 @@ COVERAGE_DEFINE(mac_learning_learned);
 

Re: [ovs-dev] [PATCH ovn] northd: avoid memory reallocation while building ACL and QoS rules

2021-06-29 Thread Numan Siddique
On Tue, Jun 22, 2021 at 5:22 PM Mark Michelson  wrote:
>
> On 6/22/21 1:14 PM, Dan Williams wrote:
> > On Fri, 2021-06-18 at 21:49 +0200, Dumitru Ceara wrote:
> >> On 6/4/21 10:00 PM, Dan Williams wrote:
> >>> Inspried by:
> >>>
> >>> 3b6362d64e86b northd: Avoid memory reallocation while building lb
> >>> rules.
> >>>
> >>> Signed-off-by: Dan Williams 
> >>> ---
> >>> NOTE: this is driven by visual inspection not perf data. But it
> >>> shouldn't be worse than current code and should be better for
> >>> large numbers of ACLs I think.
> >>
> >> The changes look OK to me.
> >>
> >> Acked-by: Dumitru Ceara 
> >>
> >> However, I wonder how many such optimizations we can implement
> >> without
> >> affecting maintainability.  Mark suggested an approach [0].
> >
> > I'm happy to drop my patch in favor of Mark's. I think mine is a subset
> > of his.
> >
> > Dan
>
> Funny because I'm not even 100% behind my own approach.

Thanks Dan, Dumitru and Mark.  I applied this patch to the main branch.

Numan

>
> >
> >>
> >> CC-ing Ilya too, maybe he has some more suggestions, maybe there's a
> >> way
> >> to better use the OVS dynamic strings.
>
> I did some brainstorming and came up with a test program:
>
> #include 
> #include 
>
> static void
> my_crazy_printf(char *fmt1, char *fmt2, ...)
> {
>  va_list ap;
>
>  va_start(ap, fmt2);
>  vprintf(fmt1, ap);
>
>  va_list aq;
>  va_copy(aq, ap);
>
>  vprintf(fmt2, aq);
>
>  va_end(aq);
>  va_end(ap);
> }
>
> int main(void)
> {
>  my_crazy_printf("%s=%d\n", "%s=%d\n", "howdy", 4, "byebye", 3);
>  return 0;
> }
>
> On my system, the output is:
> howdy=4
> byebye=3
>
> I came up with this as a test to see how feasible it is to have two
> format strings in the same parameter list. The idea here is to translate
> that into something like this:
>
> /* I've omitted unimportant parameters */
> static void
> ovn_lflow_add(match_fmt, actions_fmt, ...)
> {
>   static struct ds match = DS_EMPTY_INITIALIZER;
>   static struct ds actions = DS_EMPTY_INITIALIZER;
>
>   struct va_list ap;
>   va_start(ap, actions_fmt);
>   ds_clear();
>   ds_put_format_valist(, match_fmt, ap);
>
>   struct va_list aq;
>   va_copy(aq, ap);
>   ds_clear();
>   ds_put_format_valist(, actions_fmt, aq);
>
>   va_end(aq);
>   va_end(ap);
>
>   /* The rest of the function */
> }
>
> With this, the dynamic string handling is done entirely within
> ovn_lflow_add(), meaning that the same buffers are reused. The problem
> (aside from the fact that it's weird), is that ds_put_format_valist()
> performs its own va_copy() operation of the passed-in va_list. This
> means that the two ds_put_format_valist() operations operate on
> identical va_lists. Pursuing this problem any further means essentially
> re-implementing dynamic strings to allow for this unorthodox usage. It
> feels like a dead end to me.
>
> >>
> >> Regards,
> >> Dumitru
> >>
> >> [0]
> >> https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/384043.html
> >>
> >
> >
>
> [1] Test code:
>
> If you run this code, then the output is:
> howdy=4
> byebye=3
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [External] : [PATCH ovn] northd-ddlog: Add proxy arp flows for configured addresses in lsp router port.

2021-06-29 Thread Numan Siddique
On Tue, Jun 29, 2021 at 2:29 PM Brendan Doyle  wrote:
>
>
> Thanks for doing the ddlog for this, comment below
>
> On 29/06/2021 17:08, num...@ovn.org wrote:
> > From: Numan Siddique 
> >
> > The commit [1] didn't add the ddlog part.
> >
> > [1] - 8087cbc7462("ovn-northd.c: Add proxy ARP support to OVN")
> >
> > Signed-off-by: Numan Siddique 
> > ---
> >   northd/ovn.dl|  1 +
> >   northd/ovn.rs| 13 +
> >   northd/ovn_northd.dl | 38 ++
> >   tests/ovn.at |  4 ++--
> >   4 files changed, 54 insertions(+), 2 deletions(-)
> >
> > diff --git a/northd/ovn.dl b/northd/ovn.dl
> > index f23ea3b9e1..3c7a734ddb 100644
> > --- a/northd/ovn.dl
> > +++ b/northd/ovn.dl
> > @@ -364,6 +364,7 @@ extern function is_dynamic_lsp_address(addr: string): 
> > bool
> >   extern function extract_lsp_addresses(address: string): 
> > Option
> >   extern function extract_addresses(address: string): 
> > Option
> >   extern function extract_lrp_networks(mac: string, networks: Set): 
> > Option
> > +extern function extract_ip_addresses(address: string): 
> > Option
> >
> >   extern function split_addresses(addr: string): (Set, Set)
> >
> > diff --git a/northd/ovn.rs b/northd/ovn.rs
> > index d44f83bc75..5f0939409c 100644
> > --- a/northd/ovn.rs
> > +++ b/northd/ovn.rs
> > @@ -184,6 +184,18 @@ pub fn extract_lrp_networks(mac: , networks: 
> > _std::Set) ->
> >   }
> >   }
> >
> > +pub fn extract_ip_addresses(address: ) -> 
> > ddlog_std::Option {
> > +unsafe {
> > +let mut laddrs: lport_addresses_c = Default::default();
> > +if ovn_c::extract_ip_addresses(string2cstr(address).as_ptr(),
> > +laddrs as *mut 
> > lport_addresses_c) {
> > +ddlog_std::Option::Some{x: laddrs.into_ddlog()}
> > +} else {
> > +ddlog_std::Option::None
> > +}
> > +}
> > +}
> > +
> >   pub fn ovn_internal_version() -> String {
> >   unsafe {
> >   let s = ovn_c::ovn_get_internal_version();
> > @@ -623,6 +635,7 @@ mod ovn_c {
> >   pub fn extract_addresses(address: *const raw::c_char, laddrs: 
> > *mut lport_addresses_c, ofs: *mut raw::c_int) -> bool;
> >   pub fn extract_lrp_networks__(mac: *const raw::c_char, networks: 
> > *const *const raw::c_char,
> > n_networks: libc::size_t, laddrs: 
> > *mut lport_addresses_c) -> bool;
> > +pub fn extract_ip_addresses(address: *const raw::c_char, laddrs: 
> > *mut lport_addresses_c) -> bool;
> >   pub fn destroy_lport_addresses(addrs: *mut lport_addresses_c);
> >   pub fn is_dynamic_lsp_address(address: *const raw::c_char) -> 
> > bool;
> >   pub fn split_addresses(addresses: *const raw::c_char, ip4_addrs: 
> > *mut ovs_svec, ipv6_addrs: *mut ovs_svec);
> > diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
> > index 52a6206a18..a7a327c7f0 100644
> > --- a/northd/ovn_northd.dl
> > +++ b/northd/ovn_northd.dl
> > @@ -3360,6 +3360,44 @@ for (CheckLspIsUp[check_lsp_is_up]) {
> >   }
> >   }
> >
> > +Flow(.logical_datapath = sw._uuid,
> > + .stage= s_SWITCH_IN_ARP_ND_RSP(),
> > + .priority = 50,
> > + .__match  = __match,
> > + .actions  = __actions,
> > + .external_ids = stage_hint(sp.lsp._uuid)) :-
> > +
> > +sp in (.sw = sw, .peer = Some{rp}),
> > +rp.is_enabled(),
> > +var proxy_ips = {
> > +match (sp.lsp.options.get("arp_proxy")) {
> > +None -> "",
> > +Some {addresses} -> {
> > +match (extract_ip_addresses(addresses)) {
> > +None -> "",
> > +Some{addr} -> {
> > +var ip4_addrs = vec_empty();
> > +for (ip4 in addr.ipv4_addrs) {
> > +ip4_addrs.push("${ip4.addr}")
> > +};
> > +string_join(ip4_addrs, ",")
> > +}
> > +}
> > +}
> > +}
> > +},
> > +proxy_ips != "",
> > +var __match = "arp.op == 1 && arp.tpa == {" ++ proxy_ips ++ "}",
> > +var __actions = "eth.dst = eth.src; "
> > +"eth.src = ${rp.networks.ea}; "
> > +"arp.op = 2; /* ARP reply */ "
> > +"arp.tha = arp.sha; "
> > +"arp.sha = %s; "
> > +"arp.tpa <-> arp.spa; "
> > +"outport = inport; "
> > +"flags.loopback = 1; "
> > +"output;".
> > +
> >   /* For ND solicitations, we need to listen for both the
> >* unicast IPv6 address and its all-nodes multicast address,
> >* but always respond with the unicast IPv6 address. */
> > diff --git a/tests/ovn.at b/tests/ovn.at
> > index db1a0a35c2..31f0b90996 100644
> > --- a/tests/ovn.at
> > +++ b/tests/ovn.at
> > @@ 

Re: [ovs-dev] [PATCH] ovsdb-cs: Avoid unnecessary re-connections when updating remotes.

2021-06-29 Thread Ben Pfaff
On Tue, Jun 29, 2021 at 09:57:44PM +0200, Ilya Maximets wrote:
> For the more or less automatic ways of solving the disbalance there are
> few more ideas that we can explore:
> 
> - Try to measure the load on the ovsdb-server process and report it
>   somehow in the _Server database, so the client might make a decision
>   to re-connect to a less loaded server.  This might be some metric
>   based on total number of clients or the time it takes to run a
>   single event processing loop (poll interval).

The servers could report their individual loads to each other via the
leader, and report all the server loads in _Server.  This has pitfalls
too, of course.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ovsdb-cs: Avoid unnecessary re-connections when updating remotes.

2021-06-29 Thread Ilya Maximets
On 6/29/21 8:05 PM, Ben Pfaff wrote:
> On Tue, Jun 29, 2021 at 10:29:59AM -0700, Han Zhou wrote:
>> On Tue, Jun 29, 2021 at 8:43 AM Ben Pfaff  wrote:
>>>
>>> On Tue, Jun 29, 2021 at 12:56:18PM +0200, Ilya Maximets wrote:
 If a new database server added to the cluster, or if one of the
 database servers changed its IP address or port, then you need to
 update the list of remotes for the client.  For example, if a new
 OVN_Southbound database server is added, you need to update the
 ovn-remote for the ovn-controller.

 However, in the current implementation, the ovsdb-cs module always
 closes the current connection and creates a new one.  This can lead
 to a storm of re-connections if all ovn-controllers will be updated
 simultaneously.  They can also start re-dowloading the database
 content, creating even more load on the database servers.

 Correct this by saving an existing connection if it is still in the
 list of remotes after the update.

 'reconnect' module will report connection state updates, but that
 is OK since no real re-connection happened and we only updated the
 state of a new 'reconnect' instance.

 If required, re-connection can be forced after the update of remotes
 with ovsdb_cs_force_reconnect().
>>>
>>> I think one of the goals here was to keep the load balanced as servers
>>> are added.

Yes, I thought about that and that is a valid point.  It's more like
a trade-off here between stability of connections and trying to keep
the load balanced in some way.

>>> Maybe that's not a big deal, or maybe it would make sense to
>>> flip a coin for each of the new servers and switch over to it with
>>> probability 1/n where n is the number of servers.

That seems like an interesting approach, but I think that resulted
probability of keeping the connection would be low.

>>
>> A similar load-balancing problem exists also when a server is down and then
>> recovered. Connections will obviously move away when it is down but they
>> won't automatically connect back when it is recovered. Apart from the
>> flipping-a-coin approach suggested by Ben, I saw a proposal [0] [1] in the
>> past that provides a CLI to reconnect to a specific server which leaves
>> this burden to CMS/operators. It is not ideal but still could be an
>> alternative to solve the problem.

I remember these patches.  And I think that disbalance after one of the
servers went down and up again (e.g. temporary disconnection of one of
the cluster nodes) is a more important issue and at the same time harder
to solve, because this happens automatically without intervention from
user/CMS's side.  And at some extent it's inevitable. E.g. cluster will
almost always be disbalanced if 3 server nodes will be restarted for
upgrade one by one.  Luckily, worker nodes with ovn-controllers needs
maintenance too, so eventual load balance will be achieved.

One interesting side effect of the current patch is that you can mimic
behavior of patches [0][1] like this:
  set ovn-remote=
  set ovn-remote=
After the first command, the ovn-controller will re-connect to a new
server and it will not re-connect again after addition of all other
servers back to the list.  But I agree that this looks more like a hack
than an actual way to do that.

For the more or less automatic ways of solving the disbalance there are
few more ideas that we can explore:

- Try to measure the load on the ovsdb-server process and report it
  somehow in the _Server database, so the client might make a decision
  to re-connect to a less loaded server.  This might be some metric
  based on total number of clients or the time it takes to run a
  single event processing loop (poll interval).

- A bit more controlled way is to limit number of clients per server,
  so the server will decline connection attempts.  CMS might have an
  idea how many clients one server is able/allowed to handle.
  E.g. for N servers and M clients, it might be reasonable to allow
  not more than 2M/N connections per server to still be able to serve
  all clients if half of the servers is down.  Of course, it's up to
  CMS/user to decide on the exact number.  This could be implemented
  as an extra column for connection row in the database.

>>
>> I think both approaches have their pros and cons. The smart way doesn't
>> require human intervention in theory, but when operating at scale people
>> usually want to be cautious and have more control over the changes. For
>> example, they may want to add the server to the cluster first, and then
>> gradually move 1/n connections to the new server after a graceful period,
>> or they could be more conservative and only let the new server take new
>> connections without moving any existing connections. I'd support both
>> options and let the operators decide according to their requirements.

This sounds reasonable.

>>
>> Regarding the current patch, I think it's better to add a test 

Re: [ovs-dev] [PATCH v10] ofproto-dpif: APIs and CLI option to add/delete static fdb entry

2021-06-29 Thread Vasu Dasari
Thank you Eelco for your careful review. I appreciate and thank you for all
your comments.

-Vasu

*Vasu Dasari*


On Tue, Jun 29, 2021 at 11:08 AM Eelco Chaudron  wrote:

>
>
> On 29 Jun 2021, at 15:19, Vasu Dasari wrote:
>
> > Currently there is an option to add/flush/show ARP/ND neighbor. This
> covers L3
> > side.  For L2 side, there is only fdb show command. This patch gives an
> option
> > to add/del an fdb entry via ovs-appctl.
> >
> > CLI command looks like:
> >
> > To add:
> > ovs-appctl fdb/add
> > ovs-appctl fdb/add br0 p1 0 50:54:00:00:00:05
> >
> > To del:
> > ovs-appctl fdb/del   
> > ovs-appctl fdb/del br0 0 50:54:00:00:00:05
> >
> > Added two new APIs to provide convenient interface to add and delete
> static-macs.
> > bool xlate_add_static_mac_entry(const struct ofproto_dpif *, ofp_port_t
> in_port,
> >struct eth_addr dl_src, int vlan);
> > bool xlate_delete_static_mac_entry(const struct ofproto_dpif *,
> >   struct eth_addr dl_src, int vlan);
> >
> > 1. Static entry should not age. To indicate that entry being programmed
> is a static entry,
> >'expires' field in 'struct mac_entry' will be set to a
> MAC_ENTRY_AGE_STATIC_ENTRY. A
> >check for this value is made while deleting mac entry as part of
> regular aging process.
> > 2. Another change to of mac-update logic, when a packet with same dl_src
> as that of a
> >static-mac entry arrives on any port, the logic will not modify the
> expires field.
> > 3. While flushing fdb entries, made sure static ones are not evicted.
> > 4. Updated "ovs-appctl fdb/stats-show br0" to display numberof static
> entries in switch
> >
> > Added following tests:
> >   ofproto-dpif - static-mac add/del/flush
> >   ofproto-dpif - static-mac mac moves
> >
> > Signed-off-by: Vasu Dasari 
> > Reported-at:
> https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048894.html
> > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1597752
> > Tested-by: Eelco Chaudron 
> > Acked-by: Eelco Chaudron 
> > ---
>
> Thanks for your patience to follow this trough!
>
> Acked-by: Eelco Chaudron 
>
> //Eelco
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v3 3/3] ovn-controller: Fix incremental processing for logical port references.

2021-06-29 Thread Han Zhou
On Thu, Jun 24, 2021 at 10:33 AM Dumitru Ceara  wrote:
>
> On 6/24/21 7:20 PM, Han Zhou wrote:
> > For the reason mentioned above, we can't make this change. In fact, I
> > wouldn't worry much about lflow_ref_lookup()'s cost. It is O(1)
operation.
> > If it really turns out to be a bottleneck, we could optimize the
> > function/data-structure, without worrying about the logic.
> > The real performance impact part is probably not being able to cache the
> > "match" for lflows that have logical port references, but I will work on
> > some other solutions to optimize that.
>
> OTOH, on real deployments the lflow cache limits should be enforced by
> the CMS.  Therefore I would expect some of these flows to not make it in
> the cache anyway (even without your change).  I don't have data to back
> this up but I'm guessing the impact of the change in this patch will be
> minimal.
>
> Regards,
> Dumitru
>

Thanks Dumitru.
Numan, I sent v4 that adds more coverage in the test case. Please take a
look:
https://patchwork.ozlabs.org/project/ovn/patch/20210629192257.1699504-1-hz...@ovn.org/

Thanks,
Han
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v4] ovn-controller: Fix incremental processing for logical port references.

2021-06-29 Thread Han Zhou
If a lflow has an lport name in the match, but when the lflow is
processed the port-binding is not seen by ovn-controller, the
corresponding openflow will not be created. Later if the port-binding is
created/monitored by ovn-controller, the lflow is not reprocessed
because the lflow didn't change and ovn-controller doesn't know that the
port-binding affects the lflow. This patch fixes the problem by tracking
the references when parsing the lflow, even if the port-binding is not
found when the lflow is firstly parsed. A test case is also added to
cover the scenario.

Signed-off-by: Han Zhou 
---
v3 -> v4: Updated the test case to cover:
- when the referenced lport is removed
- when the referenced lport is remote

 controller/lflow.c  | 63 +-
 controller/lflow.h  |  3 ++
 controller/ovn-controller.c | 35 +
 include/ovn/expr.h  |  2 +-
 lib/expr.c  | 14 +++
 tests/ovn.at| 76 +
 tests/test-ovn.c|  4 +-
 utilities/ovn-trace.c   |  2 +-
 8 files changed, 161 insertions(+), 38 deletions(-)

diff --git a/controller/lflow.c b/controller/lflow.c
index 34eca135a..b7699a309 100644
--- a/controller/lflow.c
+++ b/controller/lflow.c
@@ -61,6 +61,7 @@ struct lookup_port_aux {
 
 struct condition_aux {
 struct ovsdb_idl_index *sbrec_port_binding_by_name;
+const struct sbrec_datapath_binding *dp;
 const struct sbrec_chassis *chassis;
 const struct sset *active_tunnels;
 const struct sbrec_logical_flow *lflow;
@@ -98,6 +99,12 @@ lookup_port_cb(const void *aux_, const char *port_name, 
unsigned int *portp)
 
 const struct lookup_port_aux *aux = aux_;
 
+/* Store the name that used to lookup the lport to lflow reference, so that
+ * in the future when the lport's port binding changes, the logical flow
+ * that references this lport can be reprocessed. */
+lflow_resource_add(aux->lfrr, REF_TYPE_PORTBINDING, port_name,
+   >lflow->header_.uuid);
+
 const struct sbrec_port_binding *pb
 = lport_lookup_by_name(aux->sbrec_port_binding_by_name, port_name);
 if (pb && pb->datapath == aux->dp) {
@@ -149,19 +156,18 @@ is_chassis_resident_cb(const void *c_aux_, const char 
*port_name)
 {
 const struct condition_aux *c_aux = c_aux_;
 
+/* Store the port name that used to lookup the lport to lflow reference, so
+ * that in the future when the lport's port-binding changes the logical
+ * flow that references this lport can be reprocessed. */
+lflow_resource_add(c_aux->lfrr, REF_TYPE_PORTBINDING, port_name,
+   _aux->lflow->header_.uuid);
+
 const struct sbrec_port_binding *pb
 = lport_lookup_by_name(c_aux->sbrec_port_binding_by_name, port_name);
 if (!pb) {
 return false;
 }
 
-/* Store the port_name to lflow reference. */
-int64_t dp_id = pb->datapath->tunnel_key;
-char buf[16];
-get_unique_lport_key(dp_id, pb->tunnel_key, buf, sizeof(buf));
-lflow_resource_add(c_aux->lfrr, REF_TYPE_PORTBINDING, buf,
-   _aux->lflow->header_.uuid);
-
 if (strcmp(pb->type, "chassisredirect")) {
 /* for non-chassisredirect ports */
 return pb->chassis && pb->chassis == c_aux->chassis;
@@ -623,8 +629,6 @@ add_matches_to_flow_table(const struct sbrec_logical_flow 
*lflow,
 int64_t dp_id = dp->tunnel_key;
 char buf[16];
 get_unique_lport_key(dp_id, port_id, buf, sizeof(buf));
-lflow_resource_add(l_ctx_out->lfrr, REF_TYPE_PORTBINDING, buf,
-   >header_.uuid);
 if (!sset_contains(l_ctx_in->local_lport_ids, buf)) {
 VLOG_DBG("lflow "UUID_FMT
  " port %s in match is not local, skip",
@@ -788,6 +792,7 @@ consider_logical_flow__(const struct sbrec_logical_flow 
*lflow,
 };
 struct condition_aux cond_aux = {
 .sbrec_port_binding_by_name = l_ctx_in->sbrec_port_binding_by_name,
+.dp = dp,
 .chassis = l_ctx_in->chassis,
 .active_tunnels = l_ctx_in->active_tunnels,
 .lflow = lflow,
@@ -805,7 +810,6 @@ consider_logical_flow__(const struct sbrec_logical_flow 
*lflow,
 struct hmap *matches = NULL;
 size_t matches_size = 0;
 
-bool is_cr_cond_present = false;
 bool pg_addr_set_ref = false;
 uint32_t n_conjs = 0;
 
@@ -843,8 +847,8 @@ consider_logical_flow__(const struct sbrec_logical_flow 
*lflow,
 case LCACHE_T_NONE:
 case LCACHE_T_CONJ_ID:
 case LCACHE_T_EXPR:
-expr = expr_evaluate_condition(expr, is_chassis_resident_cb, _aux,
-   _cr_cond_present);
+expr = expr_evaluate_condition(expr, is_chassis_resident_cb,
+   _aux);
 expr = expr_normalize(expr);
 break;
 case 

Re: [ovs-dev] [PATCH ovn] ovs-sandbox: Allow specifying initial contents for NB and SB database.

2021-06-29 Thread 0-day Robot
Bleep bloop.  Greetings Ben Pfaff, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Line has non-spaces leading whitespace
#73 FILE: tutorial/ovs-sandbox:284:
--nbdb-so*=*)

WARNING: Line has non-spaces leading whitespace
#74 FILE: tutorial/ovs-sandbox:285:
ovnnb_source=$optarg

WARNING: Line has non-spaces leading whitespace
#75 FILE: tutorial/ovs-sandbox:286:
;;

WARNING: Line has non-spaces leading whitespace
#76 FILE: tutorial/ovs-sandbox:287:
--nbdb-so*)

WARNING: Line has non-spaces leading whitespace
#77 FILE: tutorial/ovs-sandbox:288:
prev=ovnnb_source

WARNING: Line has non-spaces leading whitespace
#78 FILE: tutorial/ovs-sandbox:289:
;;

WARNING: Line has non-spaces leading whitespace
#92 FILE: tutorial/ovs-sandbox:304:
--sbdb-so*=*)

WARNING: Line has non-spaces leading whitespace
#93 FILE: tutorial/ovs-sandbox:305:
ovnsb_source=$optarg

WARNING: Line has non-spaces leading whitespace
#94 FILE: tutorial/ovs-sandbox:306:
;;

WARNING: Line has non-spaces leading whitespace
#95 FILE: tutorial/ovs-sandbox:307:
--sbdb-so*)

WARNING: Line has non-spaces leading whitespace
#96 FILE: tutorial/ovs-sandbox:308:
prev=ovnsb_source

WARNING: Line has non-spaces leading whitespace
#97 FILE: tutorial/ovs-sandbox:309:
;;

WARNING: Line has non-spaces leading whitespace
#109 FILE: tutorial/ovs-sandbox:397:
: ${ovnnb_source:=$srcdir/ovn-nb.ovsschema}

WARNING: Line has non-spaces leading whitespace
#110 FILE: tutorial/ovs-sandbox:398:
if test ! -e "$ovnnb_source"; then

WARNING: Line is 111 characters long (recommended limit is 79)
WARNING: Line has non-spaces leading whitespace
#111 FILE: tutorial/ovs-sandbox:399:
echo >&2 "OVN northbound database source $ovnnb_source not found, 
please check --srcdir or --ovnnb-source"

WARNING: Line has non-spaces leading whitespace
#112 FILE: tutorial/ovs-sandbox:400:
exit 1

WARNING: Line has non-spaces leading whitespace
#118 FILE: tutorial/ovs-sandbox:402:
: ${ovnsb_source:=$srcdir/ovn-sb.ovsschema}

WARNING: Line has non-spaces leading whitespace
#119 FILE: tutorial/ovs-sandbox:403:
if test ! -e "$ovnsb_source"; then

WARNING: Line is 111 characters long (recommended limit is 79)
WARNING: Line has non-spaces leading whitespace
#120 FILE: tutorial/ovs-sandbox:404:
echo >&2 "OVN southbound database source $ovnsb_source not found, 
please check --srcdir or --ovnsb-source"

WARNING: Line has non-spaces leading whitespace
#121 FILE: tutorial/ovs-sandbox:405:
exit 1

WARNING: Line has non-spaces leading whitespace
#136 FILE: tutorial/ovs-sandbox:506:
source_type=schema

WARNING: Line has non-spaces leading whitespace
#138 FILE: tutorial/ovs-sandbox:508:
source_type=database

WARNING: Line has non-spaces leading whitespace
#140 FILE: tutorial/ovs-sandbox:510:
echo "$source is not an OVSDB schema or database" >&2

WARNING: Line has non-spaces leading whitespace
#141 FILE: tutorial/ovs-sandbox:511:
exit 1

WARNING: Line has non-spaces leading whitespace
#151 FILE: tutorial/ovs-sandbox:546:
case $source_type in

WARNING: Line has non-spaces leading whitespace
#152 FILE: tutorial/ovs-sandbox:547:
database) run cp "$source" ${db}1.db ;;

WARNING: Line has non-spaces leading whitespace
#153 FILE: tutorial/ovs-sandbox:548:
schema) run ovsdb-tool create ${db}1.db "$source" ;;

WARNING: Line has non-spaces leading whitespace
#154 FILE: tutorial/ovs-sandbox:549:
esac

WARNING: Line has non-spaces leading whitespace
#161 FILE: tutorial/ovs-sandbox:555:
case $source_type in

WARNING: Line has non-spaces leading whitespace
#162 FILE: tutorial/ovs-sandbox:556:
database) run cp "$source" $db$i.db ;;

WARNING: Line has non-spaces leading whitespace
#163 FILE: tutorial/ovs-sandbox:557:
schema) run ovsdb-tool create $db$i.db "$source" ;;

WARNING: Line has non-spaces leading whitespace
#164 FILE: tutorial/ovs-sandbox:558:
esac

WARNING: Line is 87 characters long (recommended limit is 79)
#173 FILE: tutorial/ovs-sandbox:573:
run ovsdb-tool create-cluster ${db}1.db "$source" 
unix:${db}1.raft;

Lines checked: 190, Warnings: 35, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpif/dpcls: limit count subtable search info logs

2021-06-29 Thread Flavio Leitner
On Tue, Jun 29, 2021 at 10:19:41PM +0530, Kumar Amber wrote:
> From: Harry van Haaren 
> 
> This commit avoids many instances of "using subtable X for miniflow (x,y)"
> in the ovs-vswitchd log when using the DPCLS Autovalidator. This occurs
> when no specialized subtable is found, and the generic "_any" version of
> the avx512 subtable search implementation was used. This change logs the
> subtable usage once, avoiding duplicates.
> 
> Signed-off-by: Harry van Haaren 
> ---
>  lib/dpif-netdev-lookup-avx512-gather.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/dpif-netdev-lookup-avx512-gather.c 
> b/lib/dpif-netdev-lookup-avx512-gather.c
> index bc359dc4a..f1b44deb3 100644
> --- a/lib/dpif-netdev-lookup-avx512-gather.c
> +++ b/lib/dpif-netdev-lookup-avx512-gather.c
> @@ -411,7 +411,7 @@ dpcls_subtable_avx512_gather_probe(uint32_t u0_bits, 
> uint32_t u1_bits)
>   */
>  if (!f && (u0_bits + u1_bits) < (NUM_U64_IN_ZMM_REG * 2)) {
>  f = dpcls_avx512_gather_mf_any;
> -VLOG_INFO("Using avx512_gather_mf_any for subtable (%d,%d)\n",
> +VLOG_INFO_ONCE("Using avx512_gather_mf_any for subtable (%d,%d)\n",
>u0_bits, u1_bits);

This will log only one time, but there are multiple subtables, so we
won't see other subtable changing. If the subtable information is not
relevant, then it shouldn't be in the msg.

Also, the log only exists for *_mf_any, not for others specialized
functions.

Do we need that information in runtime? Unless I am missing
other callers, dpcls_subtable_get_best_impl() has a VLOG_DBG()
logging all cases with the same information. 

-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] ovs-sandbox: Allow specifying initial contents for NB and SB database.

2021-06-29 Thread Ben Pfaff
This makes it easier to test northd behavior with particular database
contents, like the ones that Dumitru posted to the mailing list:
https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/384519.html

You just do something like this:
make sandbox SANDBOXFLAGS="--nbdb-source=$HOME/Downloads/ovnnb_db.db 
--sbdb-source=$HOME/Downloads/ovnsb_db.db --ddlog"

Signed-off-by: Ben Pfaff 
CC: Dumitru Ceara 
---
 tutorial/ovs-sandbox | 71 +++-
 1 file changed, 50 insertions(+), 21 deletions(-)

diff --git a/tutorial/ovs-sandbox b/tutorial/ovs-sandbox
index 08a3629be7f6..847bff224b2d 100755
--- a/tutorial/ovs-sandbox
+++ b/tutorial/ovs-sandbox
@@ -74,8 +74,6 @@ built=false
 ovn=true
 ddlog=false
 ddlog_record=true
-ovnsb_schema=
-ovnnb_schema=
 ic_sb_schema=
 ic_nb_schema=
 ovn_rbac=true
@@ -84,8 +82,10 @@ n_ics=1
 n_controllers=1
 nbdb_model=standalone
 nbdb_servers=3
+nbdb_source=
 sbdb_model=backup
 sbdb_servers=3
+sbdb_source=
 ic_nb_model=clustered
 ic_nb_servers=3
 ic_sb_model=clustered
@@ -152,8 +152,10 @@ OVN options:
   --n-ics=NUMBER   run NUMBER copies of ic (default: 1)
   --nbdb-model=standalone|backup|clusterednorthbound database model
   --nbdb-servers=N number of servers in nbdb cluster (default: 3)
+  --nbdb-source=FILE   database or schema to copy NBDB from
   --sbdb-model=standalone|backup|clusteredsouthbound database model
   --sbdb-servers=N number of servers in sbdb cluster (default: 3)
+  --sbdb-source=FILE database or schema to copy SBDB from
   --ic-nb-model=standalone|backup|clustered   ic-northbound database model
   --ic-nb-servers=N number of servers in IC NB cluster (default: 3)
   --ic-sb-model=standalone|backup|clustered   ic-southbound database model
@@ -265,11 +267,11 @@ EOF
 --n-controller*)
 prev=n_controllers
 ;;
---nbdb-s*=*)
+--nbdb-se*=*)
 nbdb_servers=$optarg
 nbdb_model=clustered
 ;;
---nbdb-s*)
+--nbdb-se*)
 prev=nbdb_servers
 nbdb_model=clustered
 ;;
@@ -279,11 +281,17 @@ EOF
 --nbdb-m*)
 prev=nbdb_model
 ;;
---sbdb-s*=*)
+   --nbdb-so*=*)
+   ovnnb_source=$optarg
+   ;;
+   --nbdb-so*)
+   prev=ovnnb_source
+   ;;
+--sbdb-se*=*)
 sbdb_servers=$optarg
 sbdb_model=clustered
 ;;
---sbdb-s*)
+--sbdb-se*)
 prev=sbdb_servers
 sbdb_model=clustered
 ;;
@@ -293,6 +301,12 @@ EOF
 --sbdb-m*)
 prev=sbdb_model
 ;;
+   --sbdb-so*=*)
+   ovnsb_source=$optarg
+   ;;
+   --sbdb-so*)
+   prev=ovnsb_source
+   ;;
 --ic-nb-s*=*)
 ic_nb_servers=$optarg
 ic_nb_model=clustered
@@ -380,15 +394,15 @@ if $built; then
 exit 1
 fi
 if $ovn; then
-ovnsb_schema=$srcdir/ovn-sb.ovsschema
-if test ! -e "$ovnsb_schema"; then
-echo >&2 'source directory not found, please use --srcdir'
-exit 1
+   : ${ovnnb_source:=$srcdir/ovn-nb.ovsschema}
+   if test ! -e "$ovnnb_source"; then
+   echo >&2 "OVN northbound database source $ovnnb_source not found, 
please check --srcdir or --ovnnb-source"
+   exit 1
 fi
-ovnnb_schema=$srcdir/ovn-nb.ovsschema
-if test ! -e "$ovnnb_schema"; then
-echo >&2 'source directory not found, please use --srcdir'
-exit 1
+   : ${ovnsb_source:=$srcdir/ovn-sb.ovsschema}
+   if test ! -e "$ovnsb_source"; then
+   echo >&2 "OVN southbound database source $ovnsb_source not found, 
please check --srcdir or --ovnsb-source"
+   exit 1
 fi
 ic_sb_schema=$srcdir/ovn-ic-sb.ovsschema
 if test ! -e "$ic_sb_schema"; then
@@ -484,9 +498,18 @@ rungdb $gdb_ovsdb $gdb_ovsdb_ex ovsdb-server --detach 
--no-chdir --pidfile -vcon
$ovsdb_server_args
 
 ovn_start_db() {
-local db=$1 model=$2 servers=$3 schema=$4
+local db=$1 model=$2 servers=$3 source=$4
 local DB=$(echo $db | tr a-z A-Z)
-local schema_name=$(ovsdb-tool schema-name $schema)
+
+local schema_name source_type
+if schema_name=$(ovsdb-tool schema-name "$source" 2>/dev/null); then
+   source_type=schema
+elif schema_name=$(ovsdb-tool db-name "$source" 2>/dev/null); then
+   source_type=database
+else
+   echo "$source is not an OVSDB schema or database" >&2
+   exit 1
+fi
 
 case $model in
 standalone | backup) ;;
@@ -520,13 +543,19 @@ ovn_start_db() {
 
 case $model in
 standalone)
-run ovsdb-tool create ${db}1.db "$schema"
+   case $source_type in
+   database) run cp "$source" ${db}1.db ;;
+   schema) run ovsdb-tool create ${db}1.db "$source" ;;
+   esac
   

Re: [ovs-dev] [v4 07/12] test/sytem-dpdk: Add unit test for mfex autovalidator

2021-06-29 Thread Amber, Kumar
Hi Flavio,

Comments Inline.

> -Original Message-
> From: Flavio Leitner 
> Sent: Tuesday, June 29, 2021 11:49 PM
> To: Amber, Kumar 
> Cc: Eelco Chaudron ; d...@openvswitch.org;
> i.maxim...@ovn.org
> Subject: Re: [ovs-dev] [v4 07/12] test/sytem-dpdk: Add unit test for mfex
> autovalidator
> 
> On Tue, Jun 29, 2021 at 05:11:00PM +, Amber, Kumar wrote:
> > Hi Eelco, Flavio,
> >
> > Pls find my replies Inline
> >
> > > -Original Message-
> > > From: Flavio Leitner 
> > > Sent: Tuesday, June 29, 2021 7:51 PM
> > > To: Eelco Chaudron 
> > > Cc: Amber, Kumar ; Van Haaren, Harry
> > > ; d...@openvswitch.org;
> > > i.maxim...@ovn.org
> > > Subject: Re: [ovs-dev] [v4 07/12] test/sytem-dpdk: Add unit test for
> > > mfex autovalidator
> > >
> > > On Tue, Jun 29, 2021 at 03:50:22PM +0200, Eelco Chaudron wrote:
> > > >
> > > >
> > > > On 28 Jun 2021, at 4:57, Flavio Leitner wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > >
> > > > > On Thu, Jun 17, 2021 at 09:57:49PM +0530, Kumar Amber wrote:
> > > > >> Tests:
> > > > >>   6: OVS-DPDK - MFEX Autovalidator
> > > > >>   7: OVS-DPDK - MFEX Autovalidator Fuzzy
> > > > >>
> > > > >> Added a new directory to store the PCAP file used in the tests
> > > > >> and a script to generate the fuzzy traffic type pcap to be used
> > > > >> in fuzzy unit test.
> > > > >
> > > > >
> > > > > I haven't tried this yet but am I right that these tests are
> > > > > going to pass a pcap to send traffic in a busy loop for 5
> > > > > seconds in the first case and 20 seconds in the second case?
> > > > >
> > > > > I see that when autovalidator is set OVS will crash if one
> > > > > implementation returns a different value, so I wonder why we
> > > > > need to run for that long.
> > > >
> > > > I think we should remove the assert (already suggested by Harry),
> > > > so it will not crass by accident if someone selects autovalidator
> > > > in the field (and runs into an issue).
> > > > Failure will then be detected by the ERROR log entries on shutdown.
> > >
> > > That's true for the testsuite, but not in production as there is
> > > nothing to disable that.
> > >
> > > Perhaps if autovalidator detects an issue, it should log an ERROR
> > > level log to report to testsuite, disable the failing mode and make
> > > sure OVS is either in default or in another functional mode.
> >
> > So I have put the following :
> > Removed the assert
> > Allow the Auto-validator to run for all implementation and for a full
> batch
> > Document error via Vlog_Error
> > Set the auto-validator to default {Scalar} when returning out in case
> of failure.
> 
> Sounds like a plan to me.
> Is that okay with you Eelco?
> 
> 
> > > > I’m wondering if there is another way than a simple delay, as
> > > > these tend to
> > > cause issues later on. Can we check packets processed or something?
> > >
> > > Yeah, maybe we can pass all packets like 5x at least.
> >
> > Sure I will try to find something to do it more nicely.
> > But just a thought keeping it 20sec allows for a full-stabilization and also
> thorough testing of stability as well.
> > So keeping it may not be just a bad idea.
> 
> The issue is that if every test decides to delay seconds, the testsuite
> becomes impractical. We have removed 'sleep' over time. Instead, we have
> functions to wait for a certain cmdline output, or some event.
> Yes, there are still some left to be fixed.
> 
> Back to the point, maybe there is a signal of some sort we can get that
> indicates the stability you're looking for.
> 

I agree to the point and I am looking for a singal but currently due to assert 
removal we don’t have any marker.
To Minimize the time, I did analysis of the time taken in each test-case :

1) for the simple test-case we don’t need the 5sec wait time as PCAP only 
contains one traffic or each type.
2) for fuzzy we do need at least 5sec for all 10k packets to be sent at-least 
2x and also stability.

Will the reductions suffice for now until I find a way to remove it completely 
from Fuzzy ?

Regards
Amber
> --
> fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [External] : [PATCH ovn] northd-ddlog: Add proxy arp flows for configured addresses in lsp router port.

2021-06-29 Thread Brendan Doyle



Thanks for doing the ddlog for this, comment below

On 29/06/2021 17:08, num...@ovn.org wrote:

From: Numan Siddique 

The commit [1] didn't add the ddlog part.

[1] - 8087cbc7462("ovn-northd.c: Add proxy ARP support to OVN")

Signed-off-by: Numan Siddique 
---
  northd/ovn.dl|  1 +
  northd/ovn.rs| 13 +
  northd/ovn_northd.dl | 38 ++
  tests/ovn.at |  4 ++--
  4 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/northd/ovn.dl b/northd/ovn.dl
index f23ea3b9e1..3c7a734ddb 100644
--- a/northd/ovn.dl
+++ b/northd/ovn.dl
@@ -364,6 +364,7 @@ extern function is_dynamic_lsp_address(addr: string): bool
  extern function extract_lsp_addresses(address: string): 
Option
  extern function extract_addresses(address: string): Option
  extern function extract_lrp_networks(mac: string, networks: Set): 
Option
+extern function extract_ip_addresses(address: string): Option
  
  extern function split_addresses(addr: string): (Set, Set)
  
diff --git a/northd/ovn.rs b/northd/ovn.rs

index d44f83bc75..5f0939409c 100644
--- a/northd/ovn.rs
+++ b/northd/ovn.rs
@@ -184,6 +184,18 @@ pub fn extract_lrp_networks(mac: , networks: 
_std::Set) ->
  }
  }
  
+pub fn extract_ip_addresses(address: ) -> ddlog_std::Option {

+unsafe {
+let mut laddrs: lport_addresses_c = Default::default();
+if ovn_c::extract_ip_addresses(string2cstr(address).as_ptr(),
+laddrs as *mut lport_addresses_c) {
+ddlog_std::Option::Some{x: laddrs.into_ddlog()}
+} else {
+ddlog_std::Option::None
+}
+}
+}
+
  pub fn ovn_internal_version() -> String {
  unsafe {
  let s = ovn_c::ovn_get_internal_version();
@@ -623,6 +635,7 @@ mod ovn_c {
  pub fn extract_addresses(address: *const raw::c_char, laddrs: *mut 
lport_addresses_c, ofs: *mut raw::c_int) -> bool;
  pub fn extract_lrp_networks__(mac: *const raw::c_char, networks: 
*const *const raw::c_char,
n_networks: libc::size_t, laddrs: *mut 
lport_addresses_c) -> bool;
+pub fn extract_ip_addresses(address: *const raw::c_char, laddrs: *mut 
lport_addresses_c) -> bool;
  pub fn destroy_lport_addresses(addrs: *mut lport_addresses_c);
  pub fn is_dynamic_lsp_address(address: *const raw::c_char) -> bool;
  pub fn split_addresses(addresses: *const raw::c_char, ip4_addrs: *mut 
ovs_svec, ipv6_addrs: *mut ovs_svec);
diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
index 52a6206a18..a7a327c7f0 100644
--- a/northd/ovn_northd.dl
+++ b/northd/ovn_northd.dl
@@ -3360,6 +3360,44 @@ for (CheckLspIsUp[check_lsp_is_up]) {
  }
  }
  
+Flow(.logical_datapath = sw._uuid,

+ .stage= s_SWITCH_IN_ARP_ND_RSP(),
+ .priority = 50,
+ .__match  = __match,
+ .actions  = __actions,
+ .external_ids = stage_hint(sp.lsp._uuid)) :-
+
+sp in (.sw = sw, .peer = Some{rp}),
+rp.is_enabled(),
+var proxy_ips = {
+match (sp.lsp.options.get("arp_proxy")) {
+None -> "",
+Some {addresses} -> {
+match (extract_ip_addresses(addresses)) {
+None -> "",
+Some{addr} -> {
+var ip4_addrs = vec_empty();
+for (ip4 in addr.ipv4_addrs) {
+ip4_addrs.push("${ip4.addr}")
+};
+string_join(ip4_addrs, ",")
+}
+}
+}
+}
+},
+proxy_ips != "",
+var __match = "arp.op == 1 && arp.tpa == {" ++ proxy_ips ++ "}",
+var __actions = "eth.dst = eth.src; "
+"eth.src = ${rp.networks.ea}; "
+"arp.op = 2; /* ARP reply */ "
+"arp.tha = arp.sha; "
+"arp.sha = %s; "
+"arp.tpa <-> arp.spa; "
+"outport = inport; "
+"flags.loopback = 1; "
+"output;".
+
  /* For ND solicitations, we need to listen for both the
   * unicast IPv6 address and its all-nodes multicast address,
   * but always respond with the unicast IPv6 address. */
diff --git a/tests/ovn.at b/tests/ovn.at
index db1a0a35c2..31f0b90996 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -26940,7 +26940,7 @@ ovs-vsctl -- add-port br-int vif1 -- \
  # And proxy ARP flows for 69.254.239.254 and 169.254.239.2
  # and check that SB flows have been added.
  ovn-nbctl --wait=hv add Logical_Switch_Port rp-ls1 \
-options arp_proxy='"169.254.239.254 169.254.239.2"'
+options arp_proxy='"169.254.239.254,169.254.239.2"'
  ovn-sbctl dump-flows > sbflows
  AT_CAPTURE_FILE([sbflows])
  
@@ -26957,7 +26957,7 @@ AT_CHECK([ovn-sbctl dump-flows | grep ls_in_arp_rsp | grep "169.254.239.2"], [1]
  
  # Add the flows back send arp request and 

Re: [ovs-dev] [v4 07/12] test/sytem-dpdk: Add unit test for mfex autovalidator

2021-06-29 Thread Flavio Leitner
On Tue, Jun 29, 2021 at 05:11:00PM +, Amber, Kumar wrote:
> Hi Eelco, Flavio,
> 
> Pls find my replies Inline
> 
> > -Original Message-
> > From: Flavio Leitner 
> > Sent: Tuesday, June 29, 2021 7:51 PM
> > To: Eelco Chaudron 
> > Cc: Amber, Kumar ; Van Haaren, Harry
> > ; d...@openvswitch.org; i.maxim...@ovn.org
> > Subject: Re: [ovs-dev] [v4 07/12] test/sytem-dpdk: Add unit test for mfex
> > autovalidator
> > 
> > On Tue, Jun 29, 2021 at 03:50:22PM +0200, Eelco Chaudron wrote:
> > >
> > >
> > > On 28 Jun 2021, at 4:57, Flavio Leitner wrote:
> > >
> > > > Hi,
> > > >
> > > >
> > > > On Thu, Jun 17, 2021 at 09:57:49PM +0530, Kumar Amber wrote:
> > > >> Tests:
> > > >>   6: OVS-DPDK - MFEX Autovalidator
> > > >>   7: OVS-DPDK - MFEX Autovalidator Fuzzy
> > > >>
> > > >> Added a new directory to store the PCAP file used in the tests and
> > > >> a script to generate the fuzzy traffic type pcap to be used in
> > > >> fuzzy unit test.
> > > >
> > > >
> > > > I haven't tried this yet but am I right that these tests are going
> > > > to pass a pcap to send traffic in a busy loop for 5 seconds in the
> > > > first case and 20 seconds in the second case?
> > > >
> > > > I see that when autovalidator is set OVS will crash if one
> > > > implementation returns a different value, so I wonder why we need to
> > > > run for that long.
> > >
> > > I think we should remove the assert (already suggested by Harry), so
> > > it will not crass by accident if someone selects autovalidator in the
> > > field (and runs into an issue).
> > > Failure will then be detected by the ERROR log entries on shutdown.
> > 
> > That's true for the testsuite, but not in production as there is nothing to
> > disable that.
> > 
> > Perhaps if autovalidator detects an issue, it should log an ERROR level log 
> > to
> > report to testsuite, disable the failing mode and make sure OVS is either in
> > default or in another functional mode.
> 
> So I have put the following : 
>   Removed the assert 
>   Allow the Auto-validator to run for all implementation and for a full 
> batch
>   Document error via Vlog_Error
>   Set the auto-validator to default {Scalar} when returning out in case 
> of failure. 

Sounds like a plan to me.
Is that okay with you Eelco?


> > > I’m wondering if there is another way than a simple delay, as these tend 
> > > to
> > cause issues later on. Can we check packets processed or something?
> > 
> > Yeah, maybe we can pass all packets like 5x at least.
> 
> Sure I will try to find something to do it more nicely.
> But just a thought keeping it 20sec allows for a full-stabilization and also 
> thorough testing of stability as well.
> So keeping it may not be just a bad idea.

The issue is that if every test decides to delay seconds, the testsuite
becomes impractical. We have removed 'sleep' over time. Instead, we
have functions to wait for a certain cmdline output, or some event.
Yes, there are still some left to be fixed.

Back to the point, maybe there is a signal of some sort we can get
that indicates the stability you're looking for. 

-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v4 03/12] dpif-netdev: Add study function to select the best mfex function

2021-06-29 Thread Flavio Leitner
On Tue, Jun 29, 2021 at 04:32:05PM +, Van Haaren, Harry wrote:
> > -Original Message-
> > From: dev  On Behalf Of Eelco Chaudron
> > Sent: Tuesday, June 29, 2021 1:38 PM
> > To: Amber, Kumar 
> > Cc: d...@openvswitch.org; i.maxim...@ovn.org
> > Subject: Re: [ovs-dev] [v4 03/12] dpif-netdev: Add study function to select 
> > the best
> > mfex function
> > 
> > More comments below. FYI I’m only reviewing right now, no testing.
> 
> Sure, thanks for reviews.
> 
> > On 17 Jun 2021, at 18:27, Kumar Amber wrote:
> 
> 
> 
> > > +/* Allocate per thread PMD pointer space for study_stats. */
> > > +static inline struct study_stats *
> > > +get_study_stats(void)
> > > +{
> > > +struct study_stats *stats = study_stats_get();
> > > +if (OVS_UNLIKELY(!stats)) {
> > > +   stats = xzalloc(sizeof *stats);
> > > +   study_stats_set_unsafe(stats);
> > > +}
> > > +return stats;
> > > +}
> > > +
> > 
> > Just got a mind-meld with the code, and realized that the function might be 
> > different
> > per PMD thread due to this auto mode (and autovalidator mode in the previous
> > patch).
> > 
> > This makes it only stronger that we need a way to see the currently 
> > selected mode,
> > and not per datapath, but per PMD per datapath!
> 
> Study depends on the traffic pattern, so yes you're correct that it depends.
> The study command was added after community suggested user-experience
> would improve if the user doesn't have to provide an exact miniflow profile 
> name.
> 
> Study studies the traffic running on that PMD, compares all MFEX impls, and 
> prints out
> hits. It selects the _first_ implementation that surpasses the threshold of 
> packets.
> 
> Users are free to use the more specific names of MFEX impls instead of "study"
> for fine-grained control over the MFEX impl in use, e.g.
> 
> ovs-appctl dpif-netdev/miniflow-parser-set avx512_vbmi_ipv4_udp
> 
> > Do we also need a way to set this per PMD?
> 
> I don't feel there is real value here, but we could investigate adding an
> optional parameter to the command indicating a PMD thread IDX to set?
> We have access to "pmd->core_id" in our set() function, so limiting changes
> to a specific PMD thread can be done ~ easily... but is it really required?

I think the concern here (at least from my side) is that users can
set the algorithm globally or per DP, not per PMD. However, the
study can set different algorithms per PMD. For example, say that
'study' indicates that alg#1 for PMD#1 and alg#2 for PMD#2 in the
lab. Now we want to move to production and make that selection
static, how can we do that?

If we set study, how do we tell from the cmdline the algorithm
chose for each PMD? Another example of the same situation: if
we always start with 'study' and suddenly there is a traffic
processing difference. How one can check what is different in
the settings? The logs don't tell which PMD was affected.
 
> Perfect is the enemy of good... I'd prefer focus on getting existing code 
> changes merged,
> and add additional (optional) parameters in future if deemed useful in real 
> world testing?

True. Perhaps we have different use cases in mind. How do you expect
users to use this feature? Do you think production users will always
start with 'study'?

Thanks,
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v3 1/3] ovn-northd: Remove lflow_add_unique.

2021-06-29 Thread Han Zhou
On Tue, Jun 29, 2021 at 7:04 AM Dumitru Ceara  wrote:
>
> On 6/21/21 8:51 AM, Han Zhou wrote:
> > This patch removes the workaround when adding multicast group related
> > lflows, because the multicast group dependency problem is fixed in
> > ovn-controller in the previous commit.
> >
> > This patch also removes the UniqueFlow/AnnotatedFlow usage in northd
> > DDlog implementation for the same reason.
> >
> > Signed-off-by: Han Zhou 
> > ---
>
> Hi Han,
>
> The changes look good to me.
>
> Acked-by: Dumitru Ceara 
>
> Thanks,
> Dumitru
>

Thanks Dumitru. I applied patch 1/3 - 2/3 of the series to master. I will
update the test case of patch 3/3 with v4.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ovsdb-cs: Avoid unnecessary re-connections when updating remotes.

2021-06-29 Thread Ben Pfaff
On Tue, Jun 29, 2021 at 10:29:59AM -0700, Han Zhou wrote:
> On Tue, Jun 29, 2021 at 8:43 AM Ben Pfaff  wrote:
> >
> > On Tue, Jun 29, 2021 at 12:56:18PM +0200, Ilya Maximets wrote:
> > > If a new database server added to the cluster, or if one of the
> > > database servers changed its IP address or port, then you need to
> > > update the list of remotes for the client.  For example, if a new
> > > OVN_Southbound database server is added, you need to update the
> > > ovn-remote for the ovn-controller.
> > >
> > > However, in the current implementation, the ovsdb-cs module always
> > > closes the current connection and creates a new one.  This can lead
> > > to a storm of re-connections if all ovn-controllers will be updated
> > > simultaneously.  They can also start re-dowloading the database
> > > content, creating even more load on the database servers.
> > >
> > > Correct this by saving an existing connection if it is still in the
> > > list of remotes after the update.
> > >
> > > 'reconnect' module will report connection state updates, but that
> > > is OK since no real re-connection happened and we only updated the
> > > state of a new 'reconnect' instance.
> > >
> > > If required, re-connection can be forced after the update of remotes
> > > with ovsdb_cs_force_reconnect().
> >
> > I think one of the goals here was to keep the load balanced as servers
> > are added.  Maybe that's not a big deal, or maybe it would make sense to
> > flip a coin for each of the new servers and switch over to it with
> > probability 1/n where n is the number of servers.
> 
> A similar load-balancing problem exists also when a server is down and then
> recovered. Connections will obviously move away when it is down but they
> won't automatically connect back when it is recovered. Apart from the
> flipping-a-coin approach suggested by Ben, I saw a proposal [0] [1] in the
> past that provides a CLI to reconnect to a specific server which leaves
> this burden to CMS/operators. It is not ideal but still could be an
> alternative to solve the problem.
> 
> I think both approaches have their pros and cons. The smart way doesn't
> require human intervention in theory, but when operating at scale people
> usually want to be cautious and have more control over the changes. For
> example, they may want to add the server to the cluster first, and then
> gradually move 1/n connections to the new server after a graceful period,
> or they could be more conservative and only let the new server take new
> connections without moving any existing connections. I'd support both
> options and let the operators decide according to their requirements.
> 
> Regarding the current patch, I think it's better to add a test case to
> cover the scenario and confirm that existing connections didn't reset. With
> that:
> Acked-by: Han Zhou 

This seems reasonable; to be sure, I'm not arguing against Ilya's
appproach, just trying to explain my recollection of why it was done
this way.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] northd-ddlog: Add proxy arp flows for configured addresses in lsp router port.

2021-06-29 Thread Numan Siddique
Hi Ben,

This patch is not working as expected.   Need your help here.  Not very urgent.

Please see below.

Thanks
Numan

On Tue, Jun 29, 2021 at 12:09 PM  wrote:
>
> From: Numan Siddique 
>
> The commit [1] didn't add the ddlog part.
>
> [1] - 8087cbc7462("ovn-northd.c: Add proxy ARP support to OVN")
>
> Signed-off-by: Numan Siddique 
> ---
>  northd/ovn.dl|  1 +
>  northd/ovn.rs| 13 +
>  northd/ovn_northd.dl | 38 ++
>  tests/ovn.at |  4 ++--
>  4 files changed, 54 insertions(+), 2 deletions(-)
>
> diff --git a/northd/ovn.dl b/northd/ovn.dl
> index f23ea3b9e1..3c7a734ddb 100644
> --- a/northd/ovn.dl
> +++ b/northd/ovn.dl
> @@ -364,6 +364,7 @@ extern function is_dynamic_lsp_address(addr: string): bool
>  extern function extract_lsp_addresses(address: string): 
> Option
>  extern function extract_addresses(address: string): Option
>  extern function extract_lrp_networks(mac: string, networks: Set): 
> Option
> +extern function extract_ip_addresses(address: string): 
> Option
>
>  extern function split_addresses(addr: string): (Set, Set)
>
> diff --git a/northd/ovn.rs b/northd/ovn.rs
> index d44f83bc75..5f0939409c 100644
> --- a/northd/ovn.rs
> +++ b/northd/ovn.rs
> @@ -184,6 +184,18 @@ pub fn extract_lrp_networks(mac: , networks: 
> _std::Set) ->
>  }
>  }
>
> +pub fn extract_ip_addresses(address: ) -> 
> ddlog_std::Option {
> +unsafe {
> +let mut laddrs: lport_addresses_c = Default::default();
> +if ovn_c::extract_ip_addresses(string2cstr(address).as_ptr(),
> +laddrs as *mut 
> lport_addresses_c) {
> +ddlog_std::Option::Some{x: laddrs.into_ddlog()}
> +} else {
> +ddlog_std::Option::None
> +}
> +}
> +}
> +
>  pub fn ovn_internal_version() -> String {
>  unsafe {
>  let s = ovn_c::ovn_get_internal_version();
> @@ -623,6 +635,7 @@ mod ovn_c {
>  pub fn extract_addresses(address: *const raw::c_char, laddrs: *mut 
> lport_addresses_c, ofs: *mut raw::c_int) -> bool;
>  pub fn extract_lrp_networks__(mac: *const raw::c_char, networks: 
> *const *const raw::c_char,
>n_networks: libc::size_t, laddrs: *mut 
> lport_addresses_c) -> bool;
> +pub fn extract_ip_addresses(address: *const raw::c_char, laddrs: 
> *mut lport_addresses_c) -> bool;
>  pub fn destroy_lport_addresses(addrs: *mut lport_addresses_c);
>  pub fn is_dynamic_lsp_address(address: *const raw::c_char) -> bool;
>  pub fn split_addresses(addresses: *const raw::c_char, ip4_addrs: 
> *mut ovs_svec, ipv6_addrs: *mut ovs_svec);
> diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
> index 52a6206a18..a7a327c7f0 100644
> --- a/northd/ovn_northd.dl
> +++ b/northd/ovn_northd.dl
> @@ -3360,6 +3360,44 @@ for (CheckLspIsUp[check_lsp_is_up]) {
>  }
>  }
>
> +Flow(.logical_datapath = sw._uuid,
> + .stage= s_SWITCH_IN_ARP_ND_RSP(),
> + .priority = 50,
> + .__match  = __match,
> + .actions  = __actions,
> + .external_ids = stage_hint(sp.lsp._uuid)) :-
> +
> +sp in (.sw = sw, .peer = Some{rp}),
> +rp.is_enabled(),
> +var proxy_ips = {
> +match (sp.lsp.options.get("arp_proxy")) {
> +None -> "",
> +Some {addresses} -> {
> +match (extract_ip_addresses(addresses)) {
> +None -> "",
> +Some{addr} -> {
> +var ip4_addrs = vec_empty();
> +for (ip4 in addr.ipv4_addrs) {
> +ip4_addrs.push("${ip4.addr}")
> +};
> +string_join(ip4_addrs, ",")
> +}
> +}
> +}

If the - sp.lsp.option.get("arp_proxy") has 2 IP addresses -
"10.0.0.4, 10.0.0.5",  then the above
code sets only "10.0.0.4" into the variable  "proxy_ips".  I was
expecting it to have "10.0.0.4,10.0.0.5".

I'm not sure if the issue is in "extract_ip_addresses" or somewhere else.


> +}
> +},
> +proxy_ips != "",
> +var __match = "arp.op == 1 && arp.tpa == {" ++ proxy_ips ++ "}",
> +var __actions = "eth.dst = eth.src; "
> +"eth.src = ${rp.networks.ea}; "
> +"arp.op = 2; /* ARP reply */ "
> +"arp.tha = arp.sha; "
> +"arp.sha = %s; "
> +"arp.tpa <-> arp.spa; "
> +"outport = inport; "
> +"flags.loopback = 1; "
> +"output;".
> +
>  /* For ND solicitations, we need to listen for both the
>   * unicast IPv6 address and its all-nodes multicast address,
>   * but always respond with the unicast IPv6 address. */
> diff --git a/tests/ovn.at b/tests/ovn.at
> index db1a0a35c2..31f0b90996 100644
> --- a/tests/ovn.at
> +++ 

Re: [ovs-dev] [v4 10/12] dpif-netdev/mfex: Add AVX512 based optimized miniflow extract

2021-06-29 Thread Amber, Kumar
Hi Ian,

Pls find my replies inline and Thanks again for review.

BR
Amber

> -Original Message-
> From: Stokes, Ian 
> Sent: Tuesday, June 29, 2021 10:11 PM
> To: Amber, Kumar ; d...@openvswitch.org
> Cc: i.maxim...@ovn.org
> Subject: RE: [ovs-dev] [v4 10/12] dpif-netdev/mfex: Add AVX512 based
> optimized miniflow extract
> 
> > From: Harry van Haaren 
> >
> > This commit adds AVX512 implementations of miniflow extract.
> > By using the 64 bytes available in an AVX512 register, it is possible
> > to convert a packet to a miniflow data-structure in a small quantity
> > instructions.
> >
> > The implementation here probes for Ether()/IP()/UDP() traffic, and
> > builds the appropriate miniflow data-structure for packets that match
> > the probe.
> >
> > The implementation here is auto-validated by the miniflow extract
> > autovalidator, hence its correctness can be easily tested and
> > verified.
> >
> > Note that this commit is designed to easily allow addition of new
> > traffic profiles in a scalable way, without code duplication for each
> > traffic profile.
> >
> 
> Thanks Harry/Amber.
> 
> Agree with what Flavio has proposed so far as well. A few more comments
> inline below.
> 
> Note: A few comments refer to Comment coding style, I haven't called out
> every instance as there are quite a few, but would recommend giving the
> comments in particularly a look over to ensure they meet standards.
> 
> BR
> Ian
> > Signed-off-by: Harry van Haaren 
> > ---
> >  lib/automake.mk   |   1 +
> >  lib/dpif-netdev-extract-avx512.c  | 416
> > ++  lib/dpif-netdev-private-extract.c |
> > 15 ++  lib/dpif-netdev-private-extract.h |  19 ++
> >  4 files changed, 451 insertions(+)
> >  create mode 100644 lib/dpif-netdev-extract-avx512.c
> >
> > diff --git a/lib/automake.mk b/lib/automake.mk index
> > 3080bb04a..2b95d6f92 100644
> > --- a/lib/automake.mk
> > +++ b/lib/automake.mk
> > @@ -39,6 +39,7 @@ lib_libopenvswitchavx512_la_CFLAGS = \
> >  $(AM_CFLAGS)
> >  lib_libopenvswitchavx512_la_SOURCES = \
> > lib/dpif-netdev-lookup-avx512-gather.c \
> > +lib/dpif-netdev-extract-avx512.c \
> >  lib/dpif-netdev-avx512.c
> >  lib_libopenvswitchavx512_la_LDFLAGS = \  -static diff --git
> > a/lib/dpif-netdev-extract-avx512.c b/lib/dpif-netdev-extract-avx512.c
> > new file mode 100644
> > index 0..1145ac8a9
> > --- /dev/null
> > +++ b/lib/dpif-netdev-extract-avx512.c
> > @@ -0,0 +1,416 @@
> > +/*
> > + * Copyright (c) 2021 Intel.
> > + *
> > + * Licensed under the Apache License, Version 2.0 (the "License");
> > + * you may not use this file except in compliance with the License.
> > + * You may obtain a copy of the License at:
> > + *
> > + * http://www.apache.org/licenses/LICENSE-2.0
> > + *
> > + * Unless required by applicable law or agreed to in writing,
> > +software
> > + * distributed under the License is distributed on an "AS IS" BASIS,
> > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> > implied.
> > + * See the License for the specific language governing permissions
> > + and
> > + * limitations under the License.
> > + */
> > +
> > +#ifdef __x86_64__
> > +/* Sparse cannot handle the AVX512 instructions. */ #if
> > +!defined(__CHECKER__)
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include "flow.h"
> > +#include "dpdk.h"
> > +
> > +#include "dpif-netdev-private-dpcls.h"
> > +#include "dpif-netdev-private-extract.h"
> > +#include "dpif-netdev-private-flow.h"
> > +
> > +/* AVX512-BW level permutex2var_epi8 emulation. */ static inline
> > +__m512i
> > +__attribute__((target("avx512bw")))
> > +_mm512_maskz_permutex2var_epi8_skx(__mmask64 k_mask,
> > +   __m512i v_data_0,
> > +   __m512i v_shuf_idxs,
> > +   __m512i v_data_1) {
> > +/* Manipulate shuffle indexes for u16 size. */
> > +__mmask64 k_mask_odd_lanes = 0x;
> > +/* clear away ODD lane bytes. Cannot be done above due to no u8
> > +shift */
> Coding standard for comments. Capitalize Clear and add period at end of
> comment.
> 

Fixed all in v5.
> > +__m512i v_shuf_idx_evn =
> _mm512_mask_blend_epi8(k_mask_odd_lanes,
> > +v_shuf_idxs, _mm512_setzero_si512());
> Alignment of arguments above seems a bit odd. Can we align vertically
> under k_mask_odd_lanes?
> 
> > +v_shuf_idx_evn = _mm512_srli_epi16(v_shuf_idx_evn, 1);
> > +
> > +__m512i v_shuf_idx_odd = _mm512_srli_epi16(v_shuf_idxs, 9);
> > +
> > +/* Shuffle each half at 16-bit width */
> For the comment above and multiple comments below, please add period at
> end of comment to keep with standard.
> 

Fixed all below as well in V5.
> > +__m512i v_shuf1 = _mm512_permutex2var_epi16(v_data_0,
> > v_shuf_idx_evn,
> > +v_data_1);
> > +__m512i v_shuf2 = 

Re: [ovs-dev] [v4 02/12] dpif-netdev: Add auto validation function for miniflow extract

2021-06-29 Thread Flavio Leitner

Hi,

On Tue, Jun 29, 2021 at 03:20:36PM +, Amber, Kumar wrote:
> Hi Flavio,
> 
> Replies inline.
> 
> > >
> > > Guess the above needs to be atomic.
> > >
> > > Removed based on Flavio comments.
> > 
> > I asked to initialize that using an API and Eelco is asking to set it 
> > atomically.
> > The requests are complementary, right?
> > 
> 
> Yes True sorry for confusion so we have refactored the code a bit to use 
> Atomic set and get along with the API 
> Wherever applicable since here on any failure we would want to fall back to 
> Scalar we would not need the API
> To find default implementation.

OK, no problem. Looking forward to the next version.

> > >
> > > + VLOG_ERR("failed to get miniflow extract function
> > > + implementations\n");
> > >
> > > Capital F to be in sync with your other error messages?
> > >
> > > Removed based on Flavio comments.
> > 
> > Not sure if I got this. I mentioned that the '\n' is not needed at the end 
> > of all
> > VLOG_* calls. Eelco is asking to start with capital 'F'. So the requests are
> > complementary, unless with the refactor the message went away.
> > 
> > Just make sure to follow the logging style convention in OVS.
> 
> Sorry for confusion I have fixed all the VLOGS with this convention.

great!

fbl

> > 
> > fbl
> > 
> > 
> > 
> > >
> > > + return 0;
> > > + }
> > > + ovs_assert(keys_size >= cnt);
> > >
> > > I don’t think we should assert here. Just return an error like above, so 
> > > in
> > production, we get notified, and this implementation gets disabled.
> > >
> > > Actually we do else one would most likely be overwriting the assigned
> > array space for keys and will hit a Seg fault at some point.
> > >
> > > And hence we would like to know at the compile time if this is the case.
> > >
> > > + struct netdev_flow_key test_keys[NETDEV_MAX_BURST];
> > > +
> > > + /* Run scalar miniflow_extract to get default result. */
> > > + DP_PACKET_BATCH_FOR_EACH (i, packet, packets) {
> > > + pkt_metadata_init(>md, in_port); miniflow_extract(packet,
> > > + [i].mf);
> > > +
> > > + /* Store known good metadata to compare with optimized metadata. */
> > > + good_l2_5_ofs[i] = packet->l2_5_ofs; good_l3_ofs[i] =
> > > + packet->l3_ofs; good_l4_ofs[i] = packet->l4_ofs; good_l2_pad_size[i]
> > > + = packet->l2_pad_size; }
> > > +
> > > + /* Iterate through each version of miniflow implementations. */ for
> > > + (int j = MFEX_IMPL_START_IDX; j < ARRAY_SIZE(mfex_impls); j++) { if
> > > + (!mfex_impls[j].available) { continue; }
> > > +
> > > + /* Reset keys and offsets before each implementation. */
> > > + memset(test_keys, 0, keys_size * sizeof(struct netdev_flow_key));
> > > + DP_PACKET_BATCH_FOR_EACH (i, packet, packets) {
> > > + dp_packet_reset_offsets(packet); }
> > > + /* Call optimized miniflow for each batch of packet. */ uint32_t
> > > + hit_mask = mfex_impls[j].extract_func(packets, test_keys, keys_size,
> > > + in_port, pmd_handle);
> > > +
> > > + /* Do a miniflow compare for bits, blocks and offsets for all the
> > > + * classified packets in the hitmask marked by set bits. */ while
> > > + (hit_mask) {
> > > + /* Index for the set bit. */
> > > + uint32_t i = __builtin_ctz(hit_mask);
> > > + /* Set the index in hitmask to Zero. */ hit_mask &= (hit_mask - 1);
> > > +
> > > + uint32_t failed = 0;
> > > +
> > > + /* Check miniflow bits are equal. */ if ((keys[i].mf.map.bits[0] !=
> > > + test_keys[i].mf.map.bits[0]) || (keys[i].mf.map.bits[1] !=
> > > + test_keys[i].mf.map.bits[1])) { VLOG_ERR("Good 0x%llx 0x%llx\tTest
> > > + 0x%llx 0x%llx\n", keys[i].mf.map.bits[0], keys[i].mf.map.bits[1],
> > > + test_keys[i].mf.map.bits[0], test_keys[i].mf.map.bits[1]); failed =
> > > + 1; }
> > > +
> > > + if (!miniflow_equal([i].mf, _keys[i].mf)) { uint32_t
> > > + block_cnt = miniflow_n_values([i].mf); VLOG_ERR("Autovalidation
> > > + blocks failed for %s pkt %d", mfex_impls[j].name, i); VLOG_ERR("
> > > + Good hexdump:\n"); uint64_t *good_block_ptr = (uint64_t
> > > + *)[i].buf; uint64_t *test_block_ptr = (uint64_t
> > > + *)_keys[i].buf; for (uint32_t b = 0; b < block_cnt; b++) {
> > > + VLOG_ERR(" %"PRIx64"\n", good_block_ptr[b]); } VLOG_ERR(" Test
> > > + hexdump:\n"); for (uint32_t b = 0; b < block_cnt; b++) { VLOG_ERR("
> > > + %"PRIx64"\n", test_block_ptr[b]); } failed = 1; }
> > > +
> > > + if ((packets->packets[i]->l2_pad_size != good_l2_pad_size[i]) ||
> > > + (packets->packets[i]->l2_5_ofs != good_l2_5_ofs[i]) ||
> > > + (packets->packets[i]->l3_ofs != good_l3_ofs[i]) ||
> > > + (packets->packets[i]->l4_ofs != good_l4_ofs[i])) {
> > > + VLOG_ERR("Autovalidation packet offsets failed for %s pkt %d",
> > > + mfex_impls[j].name, i); VLOG_ERR(" Good offsets: l2_pad_size %u,
> > > + l2_5_ofs : %u"
> > > + " l3_ofs %u, l4_ofs %u\n",
> > > + good_l2_pad_size[i], good_l2_5_ofs[i], good_l3_ofs[i],
> > > + good_l4_ofs[i]); VLOG_ERR(" Test offsets: l2_pad_size %u, l2_5_ofs :
> > > + %u"
> > > + " l3_ofs %u, l4_ofs %u\n",
> > > + 

Re: [ovs-dev] [PATCH] ovsdb-cs: Avoid unnecessary re-connections when updating remotes.

2021-06-29 Thread Han Zhou
On Tue, Jun 29, 2021 at 8:43 AM Ben Pfaff  wrote:
>
> On Tue, Jun 29, 2021 at 12:56:18PM +0200, Ilya Maximets wrote:
> > If a new database server added to the cluster, or if one of the
> > database servers changed its IP address or port, then you need to
> > update the list of remotes for the client.  For example, if a new
> > OVN_Southbound database server is added, you need to update the
> > ovn-remote for the ovn-controller.
> >
> > However, in the current implementation, the ovsdb-cs module always
> > closes the current connection and creates a new one.  This can lead
> > to a storm of re-connections if all ovn-controllers will be updated
> > simultaneously.  They can also start re-dowloading the database
> > content, creating even more load on the database servers.
> >
> > Correct this by saving an existing connection if it is still in the
> > list of remotes after the update.
> >
> > 'reconnect' module will report connection state updates, but that
> > is OK since no real re-connection happened and we only updated the
> > state of a new 'reconnect' instance.
> >
> > If required, re-connection can be forced after the update of remotes
> > with ovsdb_cs_force_reconnect().
>
> I think one of the goals here was to keep the load balanced as servers
> are added.  Maybe that's not a big deal, or maybe it would make sense to
> flip a coin for each of the new servers and switch over to it with
> probability 1/n where n is the number of servers.

A similar load-balancing problem exists also when a server is down and then
recovered. Connections will obviously move away when it is down but they
won't automatically connect back when it is recovered. Apart from the
flipping-a-coin approach suggested by Ben, I saw a proposal [0] [1] in the
past that provides a CLI to reconnect to a specific server which leaves
this burden to CMS/operators. It is not ideal but still could be an
alternative to solve the problem.

I think both approaches have their pros and cons. The smart way doesn't
require human intervention in theory, but when operating at scale people
usually want to be cautious and have more control over the changes. For
example, they may want to add the server to the cluster first, and then
gradually move 1/n connections to the new server after a graceful period,
or they could be more conservative and only let the new server take new
connections without moving any existing connections. I'd support both
options and let the operators decide according to their requirements.

Regarding the current patch, I think it's better to add a test case to
cover the scenario and confirm that existing connections didn't reset. With
that:
Acked-by: Han Zhou 

[0] https://mail.openvswitch.org/pipermail/ovs-dev/2020-August/373895.html
[1] https://mail.openvswitch.org/pipermail/ovs-dev/2020-August/374237.html
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] docs: fix git format-patch command for backports

2021-06-29 Thread Ben Pfaff
On Tue, Jun 29, 2021 at 12:50:54PM -0400, Ihar Hrachyshka wrote:
> On Tue, Jun 29, 2021 at 12:46 PM Ben Pfaff  wrote:
> >
> > On Tue, Jun 29, 2021 at 12:24:11PM -0400, Ihar Hrachyshka wrote:
> > > One, HEAD~, not HEAD, should be used to generate any patches. Two, add
> > > "ovn" to the generated mail topic. Third, update branch name to a
> > > fresh one.
> > >
> > > Signed-off-by: Ihar Hrachyshka 
> >
> > Both of these look odd to me:
> > > -$ git format-patch HEAD --subject-prefix="PATCH branch-2.7"
> > > +$ git format-patch HEAD~ --subject-prefix="PATCH ovn branch-21.06"
> >
> > I think the idea here is to just generate one patch from the tip of the
> > current branch.  HEAD~ works but I'd normally write -1 instead.
> >
> 
> Yes. But HEAD doesn't generate any patches since it's a head and there
> are no patches "above" it. Am I missing something?

You're correct.  -1 and HEAD~ have the same effect, but -1 is the more
common way to write it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v4 07/12] test/sytem-dpdk: Add unit test for mfex autovalidator

2021-06-29 Thread Amber, Kumar
Hi Eelco, Flavio,

Pls find my replies Inline

> -Original Message-
> From: Flavio Leitner 
> Sent: Tuesday, June 29, 2021 7:51 PM
> To: Eelco Chaudron 
> Cc: Amber, Kumar ; Van Haaren, Harry
> ; d...@openvswitch.org; i.maxim...@ovn.org
> Subject: Re: [ovs-dev] [v4 07/12] test/sytem-dpdk: Add unit test for mfex
> autovalidator
> 
> On Tue, Jun 29, 2021 at 03:50:22PM +0200, Eelco Chaudron wrote:
> >
> >
> > On 28 Jun 2021, at 4:57, Flavio Leitner wrote:
> >
> > > Hi,
> > >
> > >
> > > On Thu, Jun 17, 2021 at 09:57:49PM +0530, Kumar Amber wrote:
> > >> Tests:
> > >>   6: OVS-DPDK - MFEX Autovalidator
> > >>   7: OVS-DPDK - MFEX Autovalidator Fuzzy
> > >>
> > >> Added a new directory to store the PCAP file used in the tests and
> > >> a script to generate the fuzzy traffic type pcap to be used in
> > >> fuzzy unit test.
> > >
> > >
> > > I haven't tried this yet but am I right that these tests are going
> > > to pass a pcap to send traffic in a busy loop for 5 seconds in the
> > > first case and 20 seconds in the second case?
> > >
> > > I see that when autovalidator is set OVS will crash if one
> > > implementation returns a different value, so I wonder why we need to
> > > run for that long.
> >
> > I think we should remove the assert (already suggested by Harry), so
> > it will not crass by accident if someone selects autovalidator in the
> > field (and runs into an issue).
> > Failure will then be detected by the ERROR log entries on shutdown.
> 
> That's true for the testsuite, but not in production as there is nothing to
> disable that.
> 
> Perhaps if autovalidator detects an issue, it should log an ERROR level log to
> report to testsuite, disable the failing mode and make sure OVS is either in
> default or in another functional mode.

So I have put the following : 
Removed the assert 
Allow the Auto-validator to run for all implementation and for a full 
batch
Document error via Vlog_Error
Set the auto-validator to default {Scalar} when returning out in case 
of failure. 
> 
> > I’m wondering if there is another way than a simple delay, as these tend to
> cause issues later on. Can we check packets processed or something?
> 
> Yeah, maybe we can pass all packets like 5x at least.

Sure I will try to find something to do it more nicely.
But just a thought keeping it 20sec allows for a full-stabilization and also 
thorough testing of stability as well.
So keeping it may not be just a bad idea.
> 
> fbl
> 
> 
> >
> > > It is storing a python tool in the pcap directory. I think the fuzzy
> > > tool could be called 'mfex_fuzzy.py' and stay in tests/ with other
> > > similar testing tools.
> > >
> > > Also, I don't think the test environment sets OVS_DIR. The 'tests/'
> > > is actually $srcdir, but I could be wrong here.
> > >
> > > BTW, scapy is not mandatory to build or test OVS, so if that tool is
> > > not available, the test should be skipped and not fail.
> > >
> > > Thanks,
> > > fbl
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v4 12/12] dpif/dpcls: limit count subtable search info logs

2021-06-29 Thread Stokes, Ian
> Hi Ian,
> 
> Pls find the separated patch for DPCLS at :
> http://patchwork.ozlabs.org/project/openvswitch/patch/20210629164941.1563
> 52-1-kumar.am...@intel.com/
> 
> Regards
> Amber

Just spotted it, thanks.

Regards
Ian
> 
> > -Original Message-
> > From: Van Haaren, Harry 
> > Sent: Tuesday, June 29, 2021 10:16 PM
> > To: Stokes, Ian ; Amber, Kumar
> > ; d...@openvswitch.org
> > Cc: i.maxim...@ovn.org; Ferriter, Cian 
> > Subject: RE: [ovs-dev] [v4 12/12] dpif/dpcls: limit count subtable search 
> > info
> > logs
> >
> > > -Original Message-
> > > From: Stokes, Ian 
> > > Sent: Tuesday, June 29, 2021 5:40 PM
> > > To: Amber, Kumar ; d...@openvswitch.org
> > > Cc: i.maxim...@ovn.org; Ferriter, Cian ; Van
> > > Haaren, Harry 
> > > Subject: RE: [ovs-dev] [v4 12/12] dpif/dpcls: limit count subtable
> > > search info logs
> > >
> > > > From: Harry van Haaren 
> > > >
> > > > This commit avoids many instances of "using subtable X for miniflow
> > (x,y)"
> > > > in the ovs-vswitchd log when using the DPCLS Autovalidator. This
> > > > occurs when no specialized subtable is found, and the generic "_any"
> > > > version of the avx512 subtable search implementation was used. This
> > > > change logs the subtable usage once, avoiding duplicates.
> > > >
> > >
> > > Good point here, I think people forget there is a cost to logs and no
> > > need to flood them.
> > >
> > > Just to confirm, I think this log is already upstream? What I mean is
> > > that it is not added by either the DPIF or MFEX patch series so this
> > > is the earliest we can make the change on it?
> >
> > This change can be made earlier. The logs spam gets worse if we use the
> > autovalidator, so it was identified as an issue to fix when testing with 
> > MFEX
> > autovalidator && DPCLS autovalidator, hence why here in the patchset.
> >
> > Can submit separately if preferred.
> >
> >
> > > Regards
> > > Ian
> >
> > Thanks for review, -Harry
> >
> > 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v4 12/12] dpif/dpcls: limit count subtable search info logs

2021-06-29 Thread Amber, Kumar
Hi Ian,

Pls find the separated patch for DPCLS at :
http://patchwork.ozlabs.org/project/openvswitch/patch/20210629164941.156352-1-kumar.am...@intel.com/

Regards
Amber

> -Original Message-
> From: Van Haaren, Harry 
> Sent: Tuesday, June 29, 2021 10:16 PM
> To: Stokes, Ian ; Amber, Kumar
> ; d...@openvswitch.org
> Cc: i.maxim...@ovn.org; Ferriter, Cian 
> Subject: RE: [ovs-dev] [v4 12/12] dpif/dpcls: limit count subtable search info
> logs
> 
> > -Original Message-
> > From: Stokes, Ian 
> > Sent: Tuesday, June 29, 2021 5:40 PM
> > To: Amber, Kumar ; d...@openvswitch.org
> > Cc: i.maxim...@ovn.org; Ferriter, Cian ; Van
> > Haaren, Harry 
> > Subject: RE: [ovs-dev] [v4 12/12] dpif/dpcls: limit count subtable
> > search info logs
> >
> > > From: Harry van Haaren 
> > >
> > > This commit avoids many instances of "using subtable X for miniflow
> (x,y)"
> > > in the ovs-vswitchd log when using the DPCLS Autovalidator. This
> > > occurs when no specialized subtable is found, and the generic "_any"
> > > version of the avx512 subtable search implementation was used. This
> > > change logs the subtable usage once, avoiding duplicates.
> > >
> >
> > Good point here, I think people forget there is a cost to logs and no
> > need to flood them.
> >
> > Just to confirm, I think this log is already upstream? What I mean is
> > that it is not added by either the DPIF or MFEX patch series so this
> > is the earliest we can make the change on it?
> 
> This change can be made earlier. The logs spam gets worse if we use the
> autovalidator, so it was identified as an issue to fix when testing with MFEX
> autovalidator && DPCLS autovalidator, hence why here in the patchset.
> 
> Can submit separately if preferred.
> 
> 
> > Regards
> > Ian
> 
> Thanks for review, -Harry
> 
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] dpif/dpcls: limit count subtable search info logs

2021-06-29 Thread Kumar Amber
From: Harry van Haaren 

This commit avoids many instances of "using subtable X for miniflow (x,y)"
in the ovs-vswitchd log when using the DPCLS Autovalidator. This occurs
when no specialized subtable is found, and the generic "_any" version of
the avx512 subtable search implementation was used. This change logs the
subtable usage once, avoiding duplicates.

Signed-off-by: Harry van Haaren 
---
 lib/dpif-netdev-lookup-avx512-gather.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/dpif-netdev-lookup-avx512-gather.c 
b/lib/dpif-netdev-lookup-avx512-gather.c
index bc359dc4a..f1b44deb3 100644
--- a/lib/dpif-netdev-lookup-avx512-gather.c
+++ b/lib/dpif-netdev-lookup-avx512-gather.c
@@ -411,7 +411,7 @@ dpcls_subtable_avx512_gather_probe(uint32_t u0_bits, 
uint32_t u1_bits)
  */
 if (!f && (u0_bits + u1_bits) < (NUM_U64_IN_ZMM_REG * 2)) {
 f = dpcls_avx512_gather_mf_any;
-VLOG_INFO("Using avx512_gather_mf_any for subtable (%d,%d)\n",
+VLOG_INFO_ONCE("Using avx512_gather_mf_any for subtable (%d,%d)\n",
   u0_bits, u1_bits);
 }
 
-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v4 03/12] dpif-netdev: Add study function to select the best mfex function

2021-06-29 Thread Stokes, Ian
> > -Original Message-
> > From: Stokes, Ian 
> > Sent: Thursday, June 24, 2021 2:20 PM
> > To: Amber, Kumar ; d...@openvswitch.org; Van
> > Haaren, Harry 
> > Cc: Amber, Kumar ; i.maxim...@ovn.org
> > Subject: RE: [ovs-dev] [v4 03/12] dpif-netdev: Add study function to select 
> > the
> > best mfex function
> >
> > > The study function runs all the available implementations
> > > of miniflow_extract and makes a choice whose hitmask has
> > > maximum hits and sets the mfex to that function.
> >
> > Hi Amber/Harry,
> >
> > Thanks for the patch, a few comments inline below.
> 
> Thanks for review. Just addressing the stats get/TLS topic here.
> 
> 
> > > +/* Struct to hold miniflow study stats. */
> > > +struct study_stats {
> > > +uint32_t pkt_count;
> > > +uint32_t impl_hitcount[MFEX_IMPLS_MAX_SIZE];
> > > +};
> > > +
> > > +/* Define per thread data to hold the study stats. */
> > > +DEFINE_PER_THREAD_MALLOCED_DATA(struct study_stats *, study_stats);
> > > +
> > > +/* Allocate per thread PMD pointer space for study_stats. */
> > > +static inline struct study_stats *
> > > +get_study_stats(void)
> >
> > Would maybe suggest a name change here, get_study_stats sounds as if info is
> > being returned whereas whats actually happening is that the memory for the
> > stats are being provisioned.
> 
> More context for explaining below...
> 
> > > +{
> > > +struct study_stats *stats = study_stats_get();
> > > +if (OVS_UNLIKELY(!stats)) {
> > > +   stats = xzalloc(sizeof *stats);
> > > +   study_stats_set_unsafe(stats);
> > Can you explain why above is set unsafe? Where does that function originate
> > from?
> 
> Yes, this is how the OVS "per thread data" (also called "Thread Local 
> Storage" or
> TLS)
> is implemented. The "get()" function indeed allocates the memory first time 
> that
> this
> thread actually accesses it, and any time after that it just returns the 
> per-thread
> allocated
> data pointer.
> 

Ah that makes more sense, have followed up on the existing code since and 
indeed it follows the same logic.

> The "unsafe" is essentially the API used to change a TLS variable. It must 
> only be
> called
> by the thread that's using it itself, hence the unsafe() AFAIK.
> 
> The same function naming etc is used in DPCLS already, where this was the
> recommended
> method of getting/using TLS data.
> 
> dpif-netdev-lookup-generic.c +47   function has "get_blocks_scratch()" which
> performs
> approximately the same functionality as here.
> 
> Hope that clears up that topic, regards, -Harry

Thanks for clarifying.

BR
Ian
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] docs: fix git format-patch command for backports

2021-06-29 Thread Ihar Hrachyshka
On Tue, Jun 29, 2021 at 12:46 PM Ben Pfaff  wrote:
>
> On Tue, Jun 29, 2021 at 12:24:11PM -0400, Ihar Hrachyshka wrote:
> > One, HEAD~, not HEAD, should be used to generate any patches. Two, add
> > "ovn" to the generated mail topic. Third, update branch name to a
> > fresh one.
> >
> > Signed-off-by: Ihar Hrachyshka 
>
> Both of these look odd to me:
> > -$ git format-patch HEAD --subject-prefix="PATCH branch-2.7"
> > +$ git format-patch HEAD~ --subject-prefix="PATCH ovn branch-21.06"
>
> I think the idea here is to just generate one patch from the tip of the
> current branch.  HEAD~ works but I'd normally write -1 instead.
>

Yes. But HEAD doesn't generate any patches since it's a head and there
are no patches "above" it. Am I missing something?

Ihar

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v4 02/12] dpif-netdev: Add auto validation function for miniflow extract

2021-06-29 Thread Stokes, Ian
> > -Original Message-
> > From: Amber, Kumar 
> > Sent: Thursday, June 24, 2021 12:27 PM
> > To: Stokes, Ian ; d...@openvswitch.org; Van Haaren,
> Harry
> > 
> > Cc: i.maxim...@ovn.org
> > Subject: RE: [ovs-dev] [v4 02/12] dpif-netdev: Add auto validation function 
> > for
> > miniflow extract
> 
> 
> 
> > > >  #endif /* DPIF_NETDEV_AVX512_EXTRACT */ diff --git
> > > > a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 567ebd952..4f4ab2790
> > > > 100644
> > > > --- a/lib/dpif-netdev.c
> > > > +++ b/lib/dpif-netdev.c
> > > > @@ -1181,8 +1181,8 @@ dpif_miniflow_extract_impl_set(struct
> > > > unixctl_conn *conn, int argc,
> > > >  struct ds reply = DS_EMPTY_INITIALIZER;
> > > >  ds_put_format(, "Miniflow implementation set to %s.\n",
> > > mfex_name);
> > > >  const char *reply_str = ds_cstr();
> > > > -unixctl_command_reply(conn, reply_str);
> > > >  VLOG_INFO("%s", reply_str);
> > > > +unixctl_command_reply(conn, reply_str);
> > >
> > > Is there a reason for swapping the order above?
> > >
> >
> > This looks more logical .
> 
> Hi All,
> 
> Actually yes there's a good reason, by logging internally in Vswitchd first,
> and then sending the reply to the user, the order of prints in the logs is
> easier to understand.
> 
> This is particularly true when e.g. MFEX enabling logs can come *after* the 
> PMD
> log
> print of study having chosen a specific MFEX impl.
> 
> (pseudo) Example of bad logging behaviour:
> 
> PMD: MFEX study chose profile "eth_ip_udp" (128/128 hits)
> DPIF: MFEX optimized functionality set to "study"
> 
> (pseudo) Example of good logging behaviour (with switched log/conn_reply):
> 
> DPIF: MFEX optimized functionality set to "study"
> PMD: MFEX study chose profile "eth_ip_udp" (128/128 hits)
> 
> Hope the helps clarify the change! -Harry

Thanks Harry, that makes it clearer for sure. Is it worth detailing the change 
in behaviour here that it does not get reverted in future by accident?

Regards
Ian
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v4 12/12] dpif/dpcls: limit count subtable search info logs

2021-06-29 Thread Van Haaren, Harry
> -Original Message-
> From: Stokes, Ian 
> Sent: Tuesday, June 29, 2021 5:40 PM
> To: Amber, Kumar ; d...@openvswitch.org
> Cc: i.maxim...@ovn.org; Ferriter, Cian ; Van Haaren,
> Harry 
> Subject: RE: [ovs-dev] [v4 12/12] dpif/dpcls: limit count subtable search 
> info logs
> 
> > From: Harry van Haaren 
> >
> > This commit avoids many instances of "using subtable X for miniflow (x,y)"
> > in the ovs-vswitchd log when using the DPCLS Autovalidator. This occurs
> > when no specialized subtable is found, and the generic "_any" version of
> > the avx512 subtable search implementation was used. This change logs the
> > subtable usage once, avoiding duplicates.
> >
> 
> Good point here, I think people forget there is a cost to logs and no need to 
> flood
> them.
> 
> Just to confirm, I think this log is already upstream? What I mean is that it 
> is not
> added by either the DPIF or MFEX patch series so this is the earliest we can 
> make the
> change on it?

This change can be made earlier. The logs spam gets worse if we use the 
autovalidator, so it was
identified as an issue to fix when testing with MFEX autovalidator && DPCLS 
autovalidator,
hence why here in the patchset.

Can submit separately if preferred.


> Regards
> Ian

Thanks for review, -Harry


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] docs: fix git format-patch command for backports

2021-06-29 Thread Ben Pfaff
On Tue, Jun 29, 2021 at 12:24:11PM -0400, Ihar Hrachyshka wrote:
> One, HEAD~, not HEAD, should be used to generate any patches. Two, add
> "ovn" to the generated mail topic. Third, update branch name to a
> fresh one.
> 
> Signed-off-by: Ihar Hrachyshka 

Both of these look odd to me:
> -$ git format-patch HEAD --subject-prefix="PATCH branch-2.7"
> +$ git format-patch HEAD~ --subject-prefix="PATCH ovn branch-21.06"

I think the idea here is to just generate one patch from the tip of the
current branch.  HEAD~ works but I'd normally write -1 instead.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v4 02/12] dpif-netdev: Add auto validation function for miniflow extract

2021-06-29 Thread Stokes, Ian
> Hi Ian,
> 
> Thanks for reviews, replies are inlined.
> 

Thanks Amber looking forward to the v5.

BR
Ian
> 
> 
> 
> > > +/* Call optimized miniflow for each batch of packet. */
> > > +uint32_t hit_mask = mfex_impls[j].extract_func(packets, 
> > > test_keys,
> > > +keys_size, in_port,
> > > + pmd_handle);
> >
> > Alignment above is out, should be aligned under first argument passed ie.
> > packets.
> 
> Fixed in v5.
> >
> > > +
> > > +/* Do a miniflow compare for bits, blocks and offsets for all the
> > > + * classified packets in the hitmask marked by set bits. */
> > > +while (hit_mask) {
> > > +/* Index for the set bit. */
> > > +uint32_t i = __builtin_ctz(hit_mask);
> > > +/* Set the index in hitmask to Zero. */
> > > +hit_mask &= (hit_mask - 1);
> > > +
> > > +uint32_t failed = 0;
> > > +
> > > +/* Check miniflow bits are equal. */
> > > +if ((keys[i].mf.map.bits[0] != test_keys[i].mf.map.bits[0]) 
> > > ||
> > > +(keys[i].mf.map.bits[1] != test_keys[i].mf.map.bits[1])) 
> > > {
> > > +VLOG_ERR("Good 0x%llx 0x%llx\tTest 0x%llx 0x%llx\n",
> > > + keys[i].mf.map.bits[0], keys[i].mf.map.bits[1],
> > > + test_keys[i].mf.map.bits[0],
> > > + test_keys[i].mf.map.bits[1]);
> > > +failed = 1;
> > > +}
> > > +
> > > +if (!miniflow_equal([i].mf, _keys[i].mf)) {
> > > +uint32_t block_cnt = miniflow_n_values([i].mf);
> > > +VLOG_ERR("Autovalidation blocks failed for %s pkt %d",
> > > + mfex_impls[j].name, i);
> > > +VLOG_ERR("  Good hexdump:\n");
> > > +uint64_t *good_block_ptr = (uint64_t *)[i].buf;
> > > +uint64_t *test_block_ptr = (uint64_t *)_keys[i].buf;
> > > +for (uint32_t b = 0; b < block_cnt; b++) {
> > > +VLOG_ERR("%"PRIx64"\n", good_block_ptr[b]);
> >
> > For this and other VLOG Errs  rather than using spaces to have you thought
> > of using pad left?
> 
> Fixed in v5.
> > > +}
> > > +VLOG_ERR("  Test hexdump:\n");
> > > +for (uint32_t b = 0; b < block_cnt; b++) {
> > > +VLOG_ERR("%"PRIx64"\n", test_block_ptr[b]);
> > > +}
> > > +failed = 1;
> > > +}
> > > +
> > > +if ((packets->packets[i]->l2_pad_size != 
> > > good_l2_pad_size[i]) ||
> > > +(packets->packets[i]->l2_5_ofs != good_l2_5_ofs[i]) 
> > > ||
> > > +(packets->packets[i]->l3_ofs != good_l3_ofs[i]) ||
> > > +(packets->packets[i]->l4_ofs != good_l4_ofs[i])) {
> > > +VLOG_ERR("Autovalidation packet offsets failed for %s 
> > > pkt %d",
> > > + mfex_impls[j].name, i);
> > > +VLOG_ERR("  Good offsets: l2_pad_size %u, l2_5_ofs : %u"
> > > + " l3_ofs %u, l4_ofs %u\n",
> > > + good_l2_pad_size[i], good_l2_5_ofs[i],
> > > + good_l3_ofs[i], good_l4_ofs[i]);
> > > +VLOG_ERR("  Test offsets: l2_pad_size %u, l2_5_ofs : %u"
> > > + " l3_ofs %u, l4_ofs %u\n",
> > > + packets->packets[i]->l2_pad_size,
> > > + packets->packets[i]->l2_5_ofs,
> > > + packets->packets[i]->l3_ofs,
> > > + packets->packets[i]->l4_ofs);
> > > +failed = 1;
> > > +}
> > > +
> > > +if (failed) {
> > > +/* Having dumped the debug info, disable autovalidator. 
> > > */
> > > +VLOG_ERR("Autovalidation failed in %s pkt %d, 
> > > disabling.\n",
> > > + mfex_impls[j].name, i);
> > > +/* Halt OVS here on debug builds. */
> > > +ovs_assert(0);
> > > +pmd->miniflow_extract_opt = NULL;
> > > +break;
> > > +}
> > > +}
> > > +}
> > > +
> > > +/* preserve packet correctness by storing back the good offsets in
> > > + * packets back. */
> >
> > Minor, capitalize Preserve above.
> 
> Fixed in v5.
> >
> > > +DP_PACKET_BATCH_FOR_EACH (i, packet, packets) {
> > > +packet->l2_5_ofs = good_l2_5_ofs[i];
> > > +packet->l3_ofs = good_l3_ofs[i];
> > > +packet->l4_ofs = good_l4_ofs[i];
> > > +packet->l2_pad_size = good_l2_pad_size[i];
> > > +}
> > > +
> > > +/* Returning zero implies no packets were hit by autovalidation. This
> > > + * simplifies unit-tests as changing 
> > > --enable-mfex-default-autovalidator
> > > + * would pass/fail. By always returning zero, 

Re: [ovs-dev] [PATCH ovn branch-21.06] Disable ARP/NA responders for vlan-passthru switches

2021-06-29 Thread 0-day Robot
Bleep bloop.  Greetings Ihar Hrachyshka, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Unexpected sign-offs from developers who are not authors or co-authors 
or committers: Numan Siddique 
Lines checked: 204, Warnings: 1, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v4 10/12] dpif-netdev/mfex: Add AVX512 based optimized miniflow extract

2021-06-29 Thread Stokes, Ian
> From: Harry van Haaren 
> 
> This commit adds AVX512 implementations of miniflow extract.
> By using the 64 bytes available in an AVX512 register, it is
> possible to convert a packet to a miniflow data-structure in
> a small quantity instructions.
> 
> The implementation here probes for Ether()/IP()/UDP() traffic,
> and builds the appropriate miniflow data-structure for packets
> that match the probe.
> 
> The implementation here is auto-validated by the miniflow
> extract autovalidator, hence its correctness can be easily
> tested and verified.
> 
> Note that this commit is designed to easily allow addition of new
> traffic profiles in a scalable way, without code duplication for
> each traffic profile.
> 

Thanks Harry/Amber.

Agree with what Flavio has proposed so far as well. A few more comments inline 
below.

Note: A few comments refer to Comment coding style, I haven't called out every 
instance as there are quite a few, but would recommend giving the comments in 
particularly a look over to ensure they meet standards. 

BR
Ian
> Signed-off-by: Harry van Haaren 
> ---
>  lib/automake.mk   |   1 +
>  lib/dpif-netdev-extract-avx512.c  | 416 ++
>  lib/dpif-netdev-private-extract.c |  15 ++
>  lib/dpif-netdev-private-extract.h |  19 ++
>  4 files changed, 451 insertions(+)
>  create mode 100644 lib/dpif-netdev-extract-avx512.c
> 
> diff --git a/lib/automake.mk b/lib/automake.mk
> index 3080bb04a..2b95d6f92 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -39,6 +39,7 @@ lib_libopenvswitchavx512_la_CFLAGS = \
>   $(AM_CFLAGS)
>  lib_libopenvswitchavx512_la_SOURCES = \
>   lib/dpif-netdev-lookup-avx512-gather.c \
> + lib/dpif-netdev-extract-avx512.c \
>   lib/dpif-netdev-avx512.c
>  lib_libopenvswitchavx512_la_LDFLAGS = \
>   -static
> diff --git a/lib/dpif-netdev-extract-avx512.c 
> b/lib/dpif-netdev-extract-avx512.c
> new file mode 100644
> index 0..1145ac8a9
> --- /dev/null
> +++ b/lib/dpif-netdev-extract-avx512.c
> @@ -0,0 +1,416 @@
> +/*
> + * Copyright (c) 2021 Intel.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#ifdef __x86_64__
> +/* Sparse cannot handle the AVX512 instructions. */
> +#if !defined(__CHECKER__)
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "flow.h"
> +#include "dpdk.h"
> +
> +#include "dpif-netdev-private-dpcls.h"
> +#include "dpif-netdev-private-extract.h"
> +#include "dpif-netdev-private-flow.h"
> +
> +/* AVX512-BW level permutex2var_epi8 emulation. */
> +static inline __m512i
> +__attribute__((target("avx512bw")))
> +_mm512_maskz_permutex2var_epi8_skx(__mmask64 k_mask,
> +   __m512i v_data_0,
> +   __m512i v_shuf_idxs,
> +   __m512i v_data_1)
> +{
> +/* Manipulate shuffle indexes for u16 size. */
> +__mmask64 k_mask_odd_lanes = 0x;
> +/* clear away ODD lane bytes. Cannot be done above due to no u8 shift */
Coding standard for comments. Capitalize Clear and add period at end of comment.

> +__m512i v_shuf_idx_evn = _mm512_mask_blend_epi8(k_mask_odd_lanes,
> +v_shuf_idxs, _mm512_setzero_si512());
Alignment of arguments above seems a bit odd. Can we align vertically under 
k_mask_odd_lanes?

> +v_shuf_idx_evn = _mm512_srli_epi16(v_shuf_idx_evn, 1);
> +
> +__m512i v_shuf_idx_odd = _mm512_srli_epi16(v_shuf_idxs, 9);
> +
> +/* Shuffle each half at 16-bit width */
For the comment above and multiple comments below, please add period at end of 
comment to keep with standard.

> +__m512i v_shuf1 = _mm512_permutex2var_epi16(v_data_0,
> v_shuf_idx_evn,
> +v_data_1);
> +__m512i v_shuf2 = _mm512_permutex2var_epi16(v_data_0,
> v_shuf_idx_odd,
> +v_data_1);
> +
> +/* Find if the shuffle index was odd, via mask and compare */
> +uint16_t index_odd_mask = 0x1;
> +const __m512i v_index_mask_u16 = _mm512_set1_epi16(index_odd_mask);
> +
> +/* EVEN lanes, find if u8 index was odd,  result as u16 bitmask */
> +__m512i v_idx_even_masked = _mm512_and_si512(v_shuf_idxs,
> + v_index_mask_u16);
> +__mmask32 evn_rotate_mask =
> _mm512_cmpeq_epi16_mask(v_idx_even_masked,
> +

Re: [ovs-dev] [v4 08/12] dpif/stats: add miniflow extract opt hits counter

2021-06-29 Thread Stokes, Ian
> From: Harry van Haaren 
> 
> This commit adds a new counter to be displayed to the user when
> requesting datapath packet statistics. It counts the number of
> packets that are parsed and a miniflow built up from it by the
> optimized miniflow extract parsers.
> 
> The ovs-appctl command "dpif-netdev/pmd-perf-show" now has an
> extra entry indicating if the optimized MFEX was hit:
> 
>   - MFEX Opt hits:6786432  (100.0 %)
> 

Hi Amber,

Agree with the changes/updates suggested by Flavio, other than that LGTM. Will 
await your v5.

Regards
Ian

> Signed-off-by: Harry van Haaren 
> ---
>  lib/dpif-netdev-avx512.c |  2 ++
>  lib/dpif-netdev-perf.c   |  3 +++
>  lib/dpif-netdev-perf.h   |  1 +
>  lib/dpif-netdev.c| 14 +-
>  tests/pmd.at |  6 --
>  5 files changed, 19 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
> index bb99b23ff..f55786f8c 100644
> --- a/lib/dpif-netdev-avx512.c
> +++ b/lib/dpif-netdev-avx512.c
> @@ -297,8 +297,10 @@ dp_netdev_input_outer_avx512(struct
> dp_netdev_pmd_thread *pmd,
>  }
> 
>  /* At this point we don't return error anymore, so commit stats here. */
> +uint32_t mfex_hit = __builtin_popcountll(mf_mask);
>  pmd_perf_update_counter(>perf_stats, PMD_STAT_RECV, batch_size);
>  pmd_perf_update_counter(>perf_stats, PMD_STAT_PHWOL_HIT,
> phwol_hits);
> +pmd_perf_update_counter(>perf_stats, PMD_STAT_MFEX_OPT_HIT,
> mfex_hit);
>  pmd_perf_update_counter(>perf_stats, PMD_STAT_EXACT_HIT,
> emc_hits);
>  pmd_perf_update_counter(>perf_stats, PMD_STAT_SMC_HIT,
> smc_hits);
>  pmd_perf_update_counter(>perf_stats, PMD_STAT_MASKED_HIT,
> diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c
> index 7103a2d4d..d7676ea2b 100644
> --- a/lib/dpif-netdev-perf.c
> +++ b/lib/dpif-netdev-perf.c
> @@ -247,6 +247,7 @@ pmd_perf_format_overall_stats(struct ds *str, struct
> pmd_perf_stats *s,
>  "  Rx packets:%12"PRIu64"  (%.0f Kpps, %.0f 
> cycles/pkt)\n"
>  "  Datapath passes:   %12"PRIu64"  (%.2f passes/pkt)\n"
>  "  - PHWOL hits:  %12"PRIu64"  (%5.1f %%)\n"
> +"  - MFEX Opt hits:   %12"PRIu64"  (%5.1f %%)\n"
>  "  - EMC hits:%12"PRIu64"  (%5.1f %%)\n"
>  "  - SMC hits:%12"PRIu64"  (%5.1f %%)\n"
>  "  - Megaflow hits:   %12"PRIu64"  (%5.1f %%, %.2f "
> @@ -258,6 +259,8 @@ pmd_perf_format_overall_stats(struct ds *str, struct
> pmd_perf_stats *s,
>  passes, rx_packets ? 1.0 * passes / rx_packets : 0,
>  stats[PMD_STAT_PHWOL_HIT],
>  100.0 * stats[PMD_STAT_PHWOL_HIT] / passes,
> +stats[PMD_STAT_MFEX_OPT_HIT],
> +100.0 * stats[PMD_STAT_MFEX_OPT_HIT] / passes,
>  stats[PMD_STAT_EXACT_HIT],
>  100.0 * stats[PMD_STAT_EXACT_HIT] / passes,
>  stats[PMD_STAT_SMC_HIT],
> diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
> index 8b1a52387..834c26260 100644
> --- a/lib/dpif-netdev-perf.h
> +++ b/lib/dpif-netdev-perf.h
> @@ -57,6 +57,7 @@ extern "C" {
> 
>  enum pmd_stat_type {
>  PMD_STAT_PHWOL_HIT, /* Packets that had a partial HWOL hit (phwol). 
> */
> +PMD_STAT_MFEX_OPT_HIT,  /* Packets that had miniflow optimized match.
> */
>  PMD_STAT_EXACT_HIT, /* Packets that had an exact match (emc). */
>  PMD_STAT_SMC_HIT,   /* Packets that had a sig match hit (SMC). */
>  PMD_STAT_MASKED_HIT,/* Packets that matched in the flow table. */
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 35c927d55..7a8f15415 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -660,6 +660,7 @@ pmd_info_show_stats(struct ds *reply,
>"  packet recirculations: %"PRIu64"\n"
>"  avg. datapath passes per packet: %.02f\n"
>"  phwol hits: %"PRIu64"\n"
> +  "  mfex opt hits: %"PRIu64"\n"
>"  emc hits: %"PRIu64"\n"
>"  smc hits: %"PRIu64"\n"
>"  megaflow hits: %"PRIu64"\n"
> @@ -669,10 +670,9 @@ pmd_info_show_stats(struct ds *reply,
>"  avg. packets per output batch: %.02f\n",
>total_packets, stats[PMD_STAT_RECIRC],
>passes_per_pkt, stats[PMD_STAT_PHWOL_HIT],
> -  stats[PMD_STAT_EXACT_HIT],
> -  stats[PMD_STAT_SMC_HIT],
> -  stats[PMD_STAT_MASKED_HIT], lookups_per_hit,
> -  stats[PMD_STAT_MISS], stats[PMD_STAT_LOST],
> +  stats[PMD_STAT_MFEX_OPT_HIT], stats[PMD_STAT_EXACT_HIT],
> +  stats[PMD_STAT_SMC_HIT], stats[PMD_STAT_MASKED_HIT],
> +  lookups_per_hit, stats[PMD_STAT_MISS], 
> stats[PMD_STAT_LOST],
>packets_per_batch);
> 
>  if (total_cycles == 0) {
> @@ -6863,7 +6863,7 @@ dfc_processing(struct 

Re: [ovs-dev] [v4 09/12] dpdk: add additional CPU ISA detection strings

2021-06-29 Thread Stokes, Ian
> From: Harry van Haaren 
> 
> This commit enables OVS to at runtime check for more detailed
> AVX512 capabilities, specifically Byte and Word (BW) extensions,
> and Vector Bit Manipulation Instructions (VBMI).
> 
> These instructions will be used in the CPU ISA optimized
> implementations of traffic profile aware miniflow extract.
> 
> Signed-off-by: Harry van Haaren 

LGTM.

Ian
> ---
>  lib/dpdk.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/lib/dpdk.c b/lib/dpdk.c
> index a9494a40f..9d13e4ab7 100644
> --- a/lib/dpdk.c
> +++ b/lib/dpdk.c
> @@ -655,6 +655,8 @@ dpdk_get_cpu_has_isa(const char *arch, const char
> *feature)
>  #if __x86_64__
>  /* CPU flags only defined for the architecture that support it. */
>  CHECK_CPU_FEATURE(feature, "avx512f", RTE_CPUFLAG_AVX512F);
> +CHECK_CPU_FEATURE(feature, "avx512bw", RTE_CPUFLAG_AVX512BW);
> +CHECK_CPU_FEATURE(feature, "avx512vbmi", RTE_CPUFLAG_AVX512VBMI);
>  CHECK_CPU_FEATURE(feature, "avx512vpopcntdq",
> RTE_CPUFLAG_AVX512VPOPCNTDQ);
>  CHECK_CPU_FEATURE(feature, "bmi2", RTE_CPUFLAG_BMI2);
>  #endif
> --
> 2.25.1
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v4 07/12] test/sytem-dpdk: Add unit test for mfex autovalidator

2021-06-29 Thread Stokes, Ian
> Tests:
>   6: OVS-DPDK - MFEX Autovalidator
>   7: OVS-DPDK - MFEX Autovalidator Fuzzy
> 
> Added a new directory to store the PCAP file used
> in the tests and a script to generate the fuzzy traffic
> type pcap to be used in fuzzy unit test.

Hi Amber,

I don't have much to add here that Flavio and Eelco have already flagged. 
Looking forward to reviewing/testing the v5.

Regards
Ian
> 
> Signed-off-by: Kumar Amber 
> ---
>  tests/automake.mk|   5 +
>  tests/pcap/fuzzy.py  |  32 ++
>  tests/pcap/mfex_test | Bin 0 -> 416 bytes
>  tests/system-dpdk.at |  46 +++
>  4 files changed, 83 insertions(+)
>  create mode 100755 tests/pcap/fuzzy.py
>  create mode 100644 tests/pcap/mfex_test
> 
> diff --git a/tests/automake.mk b/tests/automake.mk
> index 1a528aa39..532875971 100644
> --- a/tests/automake.mk
> +++ b/tests/automake.mk
> @@ -142,6 +142,11 @@ $(srcdir)/tests/fuzz-regression-list.at:
> tests/automake.mk
>   echo "TEST_FUZZ_REGRESSION([$$basename])"; \
>   done > $@.tmp && mv $@.tmp $@
> 
> +EXTRA_DIST += $(MFEX_AUTOVALIDATOR_TESTS)
> +MFEX_AUTOVALIDATOR_TESTS = \
> + tests/pcap/mfex_test \
> + tests/pcap/fuzzy.py
> +
>  OVSDB_CLUSTER_TESTSUITE_AT = \
>   tests/ovsdb-cluster-testsuite.at \
>   tests/ovsdb-execution.at \
> diff --git a/tests/pcap/fuzzy.py b/tests/pcap/fuzzy.py
> new file mode 100755
> index 0..a8051ba2b
> --- /dev/null
> +++ b/tests/pcap/fuzzy.py
> @@ -0,0 +1,32 @@
> +#!/usr/bin/python3
> +try:
> +   from scapy.all import *
> +except ModuleNotFoundError as err:
> +   print(err + ": Scapy")
> +import sys
> +import os
> +
> +path = os.environ['OVS_DIR'] + "/tests/pcap/fuzzy"
> +pktdump = PcapWriter(path, append=False, sync=True)
> +
> +for i in range(0, 2000):
> +
> +   # Generate random protocol bases, use a fuzz() over the combined packet 
> for
> full fuzzing.
> +   eth = Ether(src=RandMAC(), dst=RandMAC())
> +   vlan = Dot1Q()
> +   ipv4 = IP(src=RandIP(), dst=RandIP())
> +   ipv6 = IPv6(src=RandIP6(), dst=RandIP6())
> +   udp = UDP()
> +   tcp = TCP()
> +
> +   # IPv4 packets with fuzzing
> +   pktdump.write(fuzz(eth/ipv4/udp))
> +   pktdump.write(fuzz(eth/ipv4/tcp))
> +   pktdump.write(fuzz(eth/vlan/ipv4/udp))
> +   pktdump.write(fuzz(eth/vlan/ipv4/tcp))
> +
> +# IPv6 packets with fuzzing
> +   pktdump.write(fuzz(eth/ipv6/udp))
> +   pktdump.write(fuzz(eth/ipv6/tcp))
> +   pktdump.write(fuzz(eth/vlan/ipv6/udp))
> +   pktdump.write(fuzz(eth/vlan/ipv6/tcp))
> \ No newline at end of file
> diff --git a/tests/pcap/mfex_test b/tests/pcap/mfex_test
> new file mode 100644
> index
> ..1aac67b8d643ecb016c758cb
> a4cc32212a80f52a
> GIT binary patch
> literal 416
> zcmca|c+)~A1{MYw`2U}Qff2}QK`M68ITRa|G@yFii5$Gfk6YL%z>@uY&}o
> |
> z2s4N<1VH2&7y^V87$)XGOtD~MV$cFgfG~zBGGJ2#YtF$ z
> xK>KST_NTIwYriok6N4Vm)gX-
> Q@c^{cp<7_5LgK^UuU{2>VS0RZ!RQ+EIW
> 
> literal 0
> HcmV?d1
> 
> diff --git a/tests/system-dpdk.at b/tests/system-dpdk.at
> index 802895488..46eaea35a 100644
> --- a/tests/system-dpdk.at
> +++ b/tests/system-dpdk.at
> @@ -232,3 +232,49 @@ OVS_VSWITCHD_STOP(["\@does not exist. The Open
> vSwitch kernel module is probably
>  \@EAL: No free hugepages reported in hugepages-1048576kB@d"])
>  AT_CLEANUP
>  dnl 
> --
> +
> +dnl 
> --
> +dnl Add standard DPDK PHY port
> +AT_SETUP([OVS-DPDK - MFEX Autovalidator])
> +AT_KEYWORDS([dpdk])
> +
> +OVS_DPDK_START()
> +
> +dnl Add userspace bridge and attach it to OVS
> +AT_CHECK([ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev])
> +AT_CHECK([ovs-vsctl add-port br0 p1 -- set Interface p1 type=dpdk
> options:dpdk-
> devargs=net_pcap1,rx_pcap=$OVS_DIR/tests/pcap/mfex_test,infinite_rx=1], [],
> [stdout], [stderr])
> +AT_CHECK([ovs-vsctl show], [], [stdout])
> +
> +
> +AT_CHECK([ovs-appctl dpif-netdev/miniflow-parser-set autovalidator], [0], 
> [dnl
> +Miniflow implementation set to autovalidator.
> +])
> +sleep 5
> +
> +dnl Clean up
> +AT_CHECK([ovs-vsctl del-port br0 p1], [], [stdout], [stderr])
> +AT_CLEANUP
> +dnl 
> --
> +
> +dnl 
> --
> +dnl Add standard DPDK PHY port
> +AT_SETUP([OVS-DPDK - MFEX Autovalidator Fuzzy])
> +AT_KEYWORDS([dpdk])
> +AT_CHECK([$PYTHON3 $OVS_DIR/tests/pcap/fuzzy.py], [], [stdout])
> +OVS_DPDK_START()
> +
> +dnl Add userspace bridge and attach it to OVS
> +AT_CHECK([ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev])
> +AT_CHECK([ovs-vsctl add-port br0 p1 -- set Interface p1 type=dpdk
> options:dpdk-
> devargs=net_pcap1,rx_pcap=$OVS_DIR/tests/pcap/fuzzy,infinite_rx=1], [],
> [stdout], [stderr])
> +AT_CHECK([ovs-vsctl show], [], [stdout])
> +
> +
> +AT_CHECK([ovs-appctl 

Re: [ovs-dev] [v4 12/12] dpif/dpcls: limit count subtable search info logs

2021-06-29 Thread Stokes, Ian
> From: Harry van Haaren 
> 
> This commit avoids many instances of "using subtable X for miniflow (x,y)"
> in the ovs-vswitchd log when using the DPCLS Autovalidator. This occurs
> when no specialized subtable is found, and the generic "_any" version of
> the avx512 subtable search implementation was used. This change logs the
> subtable usage once, avoiding duplicates.
> 

Good point here, I think people forget there is a cost to logs and no need to 
flood them.

Just to confirm, I think this log is already upstream? What I mean is that it 
is not added by either the DPIF or MFEX patch series so this is the earliest we 
can make the change on it?

Regards
Ian
> Signed-off-by: Harry van Haaren 
> ---
>  lib/dpif-netdev-lookup-avx512-gather.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/dpif-netdev-lookup-avx512-gather.c b/lib/dpif-netdev-lookup-
> avx512-gather.c
> index 2e754c89f..deed527b0 100644
> --- a/lib/dpif-netdev-lookup-avx512-gather.c
> +++ b/lib/dpif-netdev-lookup-avx512-gather.c
> @@ -411,7 +411,7 @@ dpcls_subtable_avx512_gather_probe(uint32_t u0_bits,
> uint32_t u1_bits)
>   */
>  if (!f && (u0_bits + u1_bits) < (NUM_U64_IN_ZMM_REG * 2)) {
>  f = dpcls_avx512_gather_mf_any;
> -VLOG_INFO("Using avx512_gather_mf_any for subtable (%d,%d)\n",
> +VLOG_INFO_ONCE("Using avx512_gather_mf_any for subtable
> (%d,%d)\n",
>u0_bits, u1_bits);
>  }
> 
> --
> 2.25.1
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v4 03/12] dpif-netdev: Add study function to select the best mfex function

2021-06-29 Thread Van Haaren, Harry
> -Original Message-
> From: dev  On Behalf Of Eelco Chaudron
> Sent: Tuesday, June 29, 2021 1:38 PM
> To: Amber, Kumar 
> Cc: d...@openvswitch.org; i.maxim...@ovn.org
> Subject: Re: [ovs-dev] [v4 03/12] dpif-netdev: Add study function to select 
> the best
> mfex function
> 
> More comments below. FYI I’m only reviewing right now, no testing.

Sure, thanks for reviews.

> On 17 Jun 2021, at 18:27, Kumar Amber wrote:



> > +/* Allocate per thread PMD pointer space for study_stats. */
> > +static inline struct study_stats *
> > +get_study_stats(void)
> > +{
> > +struct study_stats *stats = study_stats_get();
> > +if (OVS_UNLIKELY(!stats)) {
> > +   stats = xzalloc(sizeof *stats);
> > +   study_stats_set_unsafe(stats);
> > +}
> > +return stats;
> > +}
> > +
> 
> Just got a mind-meld with the code, and realized that the function might be 
> different
> per PMD thread due to this auto mode (and autovalidator mode in the previous
> patch).
> 
> This makes it only stronger that we need a way to see the currently selected 
> mode,
> and not per datapath, but per PMD per datapath!

Study depends on the traffic pattern, so yes you're correct that it depends.
The study command was added after community suggested user-experience
would improve if the user doesn't have to provide an exact miniflow profile 
name.

Study studies the traffic running on that PMD, compares all MFEX impls, and 
prints out
hits. It selects the _first_ implementation that surpasses the threshold of 
packets.

Users are free to use the more specific names of MFEX impls instead of "study"
for fine-grained control over the MFEX impl in use, e.g.

ovs-appctl dpif-netdev/miniflow-parser-set avx512_vbmi_ipv4_udp

> Do we also need a way to set this per PMD?

I don't feel there is real value here, but we could investigate adding an
optional parameter to the command indicating a PMD thread IDX to set?
We have access to "pmd->core_id" in our set() function, so limiting changes
to a specific PMD thread can be done ~ easily... but is it really required?

Perfect is the enemy of good... I'd prefer focus on getting existing code 
changes merged,
and add additional (optional) parameters in future if deemed useful in real 
world testing?


> > +uint32_t
> > +mfex_study_traffic(struct dp_packet_batch *packets,
> > +   struct netdev_flow_key *keys,
> > +   uint32_t keys_size, odp_port_t in_port,
> > +   void *pmd_handle)
> > +{
> > +uint32_t hitmask = 0;
> > +uint32_t mask = 0;
> > +struct dp_netdev_pmd_thread *pmd = pmd_handle;
> > +struct dpif_miniflow_extract_impl *miniflow_funcs;
> > +uint32_t impl_count = dpif_miniflow_extract_info_get(_funcs);
> > +struct study_stats *stats = get_study_stats();
> > +
> > +/* Run traffic optimized miniflow_extract to collect the hitmask
> > + * to be compared after certain packets have been hit to choose
> > + * the best miniflow_extract version for that traffic. */
> > +for (int i = MFEX_IMPL_START_IDX; i < impl_count; i++) {
> > +if (miniflow_funcs[i].available) {
> > +hitmask = miniflow_funcs[i].extract_func(packets, keys, 
> > keys_size,
> > + in_port, pmd_handle);
> > +stats->impl_hitcount[i] += count_1bits(hitmask);
> > +
> > +/* If traffic is not classified than we dont overwrite the keys
> > + * array in minfiflow implementations so its safe to create a
> > + * mask for all those packets whose miniflow have been 
> > created. */
> > +mask |= hitmask;
> > +}
> > +}
> > +stats->pkt_count += dp_packet_batch_size(packets);
> > +
> > +/* Choose the best implementation after a minimum packets have been
> > + * processed. */
> > +if (stats->pkt_count >= MFEX_MAX_COUNT) {
> > +uint32_t best_func_index = MFEX_IMPL_START_IDX;
> > +uint32_t max_hits = 0;
> > +for (int i = MFEX_IMPL_START_IDX; i < impl_count; i++) {
> > +if (stats->impl_hitcount[i] > max_hits) {
> > +max_hits = stats->impl_hitcount[i];
> > +best_func_index = i;
> > +}
> > +}
> > +
> > +if (max_hits >= MFEX_MIN_HIT_COUNT_FOR_USE) {
> > +/* Set the implementation to index with max_hits. */
> > +pmd->miniflow_extract_opt =
> > +miniflow_funcs[best_func_index].extract_func;
> > +VLOG_INFO("MFEX study chose impl %s: (hits %d/%d pkts)\n",
> > +  miniflow_funcs[best_func_index].name, max_hits,
> > +  stats->pkt_count);
> 
> We have no idea which PMD the mode is selected for guess we might need to add
> this?
> 
> Maybe we should report the numbers/hits for the other methods, as they might 
> be
> equal, and some might be faster in execution time?

As above, the implementations are sorted in 

[ovs-dev] [PATCH ovn] docs: fix git format-patch command for backports

2021-06-29 Thread Ihar Hrachyshka
One, HEAD~, not HEAD, should be used to generate any patches. Two, add
"ovn" to the generated mail topic. Third, update branch name to a
fresh one.

Signed-off-by: Ihar Hrachyshka 
---
 Documentation/internals/contributing/backporting-patches.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/internals/contributing/backporting-patches.rst 
b/Documentation/internals/contributing/backporting-patches.rst
index b10c9d7b0..b5232bb51 100644
--- a/Documentation/internals/contributing/backporting-patches.rst
+++ b/Documentation/internals/contributing/backporting-patches.rst
@@ -68,7 +68,7 @@ patch, for example:
 
 ::
 
-$ git format-patch HEAD --subject-prefix="PATCH branch-2.7"
+$ git format-patch HEAD~ --subject-prefix="PATCH ovn branch-21.06"
 
 If a maintainer is backporting a change to older branches and the backport is
 not a trivial cherry-pick, then the maintainer may opt to submit the backport
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] northd-ddlog: Add proxy arp flows for configured addresses in lsp router port.

2021-06-29 Thread 0-day Robot
Bleep bloop.  Greetings Numan Siddique, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Line is 85 characters long (recommended limit is 79)
#38 FILE: northd/ovn.rs:187:
pub fn extract_ip_addresses(address: ) -> 
ddlog_std::Option {

WARNING: Line is 105 characters long (recommended limit is 79)
#57 FILE: northd/ovn.rs:638:
pub fn extract_ip_addresses(address: *const raw::c_char, laddrs: *mut 
lport_addresses_c) -> bool;

Lines checked: 134, Warnings: 2, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn branch-21.06] Disable ARP/NA responders for vlan-passthru switches

2021-06-29 Thread Ihar Hrachyshka
When vlan-passthru is on, VIFs may attach different VLAN tags. In this
case, VIFs are not guaranteed to belong to the same L2 broadcast domain.
Because of that, we don't know if a peer port on the switch has the same
tag used and should not allow the local responder to generate neighbour
traffic. Instead, pass ARP and ND requests to the peer port owner and
allow it to reply, if needed.

Signed-off-by: Ihar Hrachyshka 
Signed-off-by: Numan Siddique 
(cherry picked from commit ea57f666f6eef1eb1d578f0e975baa14c5d23ec9)
---
 northd/ovn-northd.8.xml |   6 ++-
 northd/ovn-northd.c |   4 ++
 northd/ovn_northd.dl|   6 ++-
 tests/ovn.at| 112 
 4 files changed, 124 insertions(+), 4 deletions(-)

diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
index 407464602..21ae0ca60 100644
--- a/northd/ovn-northd.8.xml
+++ b/northd/ovn-northd.8.xml
@@ -1072,8 +1072,10 @@ output;
   localport ports) that are down (unless 
   ignore_lsp_down is configured as true in options
   column of NB_Global table of the Northbound
-  database), for logical ports of type virtual and for
-  logical ports with 'unknown' address set.
+  database), for logical ports of type virtual, for
+  logical ports with 'unknown' address set and for logical ports of
+  a logical switch configured with
+  other_config:vlan-passthru=true.
 
   
 
diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index 3dae7bb1c..17bcede5a 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -7007,6 +7007,10 @@ build_lswitch_arp_nd_responder_known_ips(struct ovn_port 
*op,
 return;
 }
 
+if (is_vlan_transparent(op->od)) {
+return;
+}
+
 for (size_t i = 0; i < op->n_lsp_addrs; i++) {
 for (size_t j = 0; j < op->lsp_addrs[i].n_ipv4_addrs; j++) {
 ds_clear(match);
diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
index 3afa80a3b..a09aea6ee 100644
--- a/northd/ovn_northd.dl
+++ b/northd/ovn_northd.dl
@@ -3309,7 +3309,8 @@ for (CheckLspIsUp[check_lsp_is_up]) {
 ((lsp_is_up(lsp) or not check_lsp_is_up)
  or lsp.__type == "router" or lsp.__type == "localport") and
 lsp.__type != "external" and lsp.__type != "virtual" and
-not lsp.addresses.contains("unknown"))
+not lsp.addresses.contains("unknown") and
+not sw.is_vlan_transparent)
 {
 var __match = "arp.tpa == ${addr.addr} && arp.op == 1" in
 {
@@ -3359,7 +3360,8 @@ for (SwitchPortIPv6Address(.port = {.lsp = 
lsp, .json_name = json_nam
.ea = ea, .addr = addr)
  if lsp.is_enabled() and
 (lsp_is_up(lsp) or lsp.__type == "router" or lsp.__type == 
"localport") and
-lsp.__type != "external" and lsp.__type != "virtual")
+lsp.__type != "external" and lsp.__type != "virtual" and
+not sw.is_vlan_transparent)
 {
 var __match = "nd_ns && ip6.dst == {${addr.addr}, 
${addr.solicited_node()}} && nd.target == ${addr.addr}" in
 var actions = "${if (lsp.__type == \"router\") \"nd_na_router\" else 
\"nd_na\"} { "
diff --git a/tests/ovn.at b/tests/ovn.at
index b6523c328..811a05c5a 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -3169,6 +3169,118 @@ OVN_CLEANUP([hv-1],[hv-2])
 AT_CLEANUP
 ])
 
+OVN_FOR_EACH_NORTHD([
+AT_SETUP([ovn -- VLAN transparency, passthru=true, ARP responder disabled])
+ovn_start
+
+net_add net
+check ovs-vsctl add-br br-phys
+ovn_attach net br-phys 192.168.0.1
+
+check ovn-nbctl ls-add ls
+check ovn-nbctl --wait=sb add Logical-Switch ls other_config vlan-passthru=true
+
+for i in 1 2; do
+check ovn-nbctl lsp-add ls lsp$i
+check ovn-nbctl lsp-set-addresses lsp$i "f0:00:00:00:00:0$i 10.0.0.$i"
+done
+
+for i in 1 2; do
+check ovs-vsctl add-port br-int vif$i -- set Interface vif$i 
external-ids:iface-id=lsp$i \
+  options:tx_pcap=vif$i-tx.pcap \
+  options:rxq_pcap=vif$i-rx.pcap \
+  ofport-request=$i
+done
+
+wait_for_ports_up
+
+ovn-sbctl dump-flows ls > lsflows
+AT_CAPTURE_FILE([lsflows])
+
+AT_CHECK([grep -w "ls_in_arp_rsp" lsflows | sort], [0], [dnl
+  table=16(ls_in_arp_rsp  ), priority=0, match=(1), action=(next;)
+])
+
+test_arp() {
+local inport=$1 outport=$2 sha=$3 spa=$4 tpa=$5 reply_ha=$6
+tag=8100fefe
+local 
request=${sha}${tag}08060001080006040001${sha}${spa}${tpa}
+ovs-appctl netdev-dummy/receive vif$inport $request
+echo $request >> $outport.expected
+
+local 
reply=${sha}${reply_ha}${tag}08060001080006040002${reply_ha}${tpa}${sha}${spa}
+ovs-appctl netdev-dummy/receive vif$outport $reply
+echo $reply >> $inport.expected
+}
+
+test_arp 1 2 f001 0a01 0a02 f002
+test_arp 2 1 

Re: [ovs-dev] [v4 02/12] dpif-netdev: Add auto validation function for miniflow extract

2021-06-29 Thread Amber, Kumar
Hi Eelco ,

Sorry the formatting seems broken on this email thread.
Replies are inlined .

From: Eelco Chaudron 
Sent: Tuesday, June 29, 2021 7:36 PM
To: Amber, Kumar 
Cc: Van Haaren, Harry ; d...@openvswitch.org; 
i.maxim...@ovn.org; Stokes, Ian ; Flavio Leitner 

Subject: Re: [ovs-dev] [v4 02/12] dpif-netdev: Add auto validation function for 
miniflow extract


Not sure how you replied, but it’s hard to see which comments are mine, and 
which are yours.

On 29 Jun 2021, at 14:27, Amber, Kumar wrote:

Hi Eelco,

Thanks Again for reviews , Pls find my replies inline.

From: Eelco Chaudron mailto:echau...@redhat.com>>
Sent: Tuesday, June 29, 2021 5:14 PM
To: Van Haaren, Harry 
mailto:harry.van.haa...@intel.com>>; Amber, Kumar 
mailto:kumar.am...@intel.com>>
Cc: d...@openvswitch.org; 
i.maxim...@ovn.org; Stokes, Ian 
mailto:ian.sto...@intel.com>>; Flavio Leitner 
mailto:f...@sysclose.org>>
Subject: Re: [ovs-dev] [v4 02/12] dpif-netdev: Add auto validation function for 
miniflow extract

On 17 Jun 2021, at 18:27, Kumar Amber wrote:

This patch introduced the auto-validation function which
allows users to compare the batch of packets obtained from
different miniflow implementations against the linear
miniflow extract and return a hitmask.

The autovaidator function can be triggered at runtime using the
following command:

$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator

Signed-off-by: Kumar Amber mailto:kumar.am...@intel.com>>
Co-authored-by: Harry van Haaren 
mailto:harry.van.haa...@intel.com>>
Signed-off-by: Harry van Haaren 
mailto:harry.van.haa...@intel.com>>
---
lib/dpif-netdev-private-extract.c | 141 ++
lib/dpif-netdev-private-extract.h | 15 
lib/dpif-netdev.c | 2 +-
3 files changed, 157 insertions(+), 1 deletion(-)

diff --git a/lib/dpif-netdev-private-extract.c 
b/lib/dpif-netdev-private-extract.c
index fcc56ef26..0741c19f9 100644
--- a/lib/dpif-netdev-private-extract.c
+++ b/lib/dpif-netdev-private-extract.c
@@ -32,6 +32,11 @@ VLOG_DEFINE_THIS_MODULE(dpif_netdev_extract);

/* Implementations of available extract options. */
static struct dpif_miniflow_extract_impl mfex_impls[] = {
+ {
+ .probe = NULL,
+ .extract_func = dpif_miniflow_extract_autovalidator,
+ .name = "autovalidator",
+ },
{
.probe = NULL,
.extract_func = NULL,
@@ -84,3 +89,139 @@ dpif_miniflow_extract_info_get(struct 
dpif_miniflow_extract_impl **out_ptr)
*out_ptr = mfex_impls;
return ARRAY_SIZE(mfex_impls);
}
+
+uint32_t
+dpif_miniflow_extract_autovalidator(struct dp_packet_batch *packets,
+ struct netdev_flow_key *keys,
+ uint32_t keys_size, odp_port_t in_port,
+ void *pmd_handle)
+{
+ const size_t cnt = dp_packet_batch_size(packets);
+ uint16_t good_l2_5_ofs[NETDEV_MAX_BURST];
+ uint16_t good_l3_ofs[NETDEV_MAX_BURST];
+ uint16_t good_l4_ofs[NETDEV_MAX_BURST];
+ uint16_t good_l2_pad_size[NETDEV_MAX_BURST];
+ struct dp_packet *packet;
+ struct dp_netdev_pmd_thread *pmd = pmd_handle;
+ struct dpif_miniflow_extract_impl *miniflow_funcs;
+
+ int32_t mfunc_count = dpif_miniflow_extract_info_get(_funcs);
+ if (mfunc_count < 0) {

In theory 0 could not be returned, but just to cover the corner case can we 
change this to include zero.

The code has been adapted as per Flavio comments so will not be a concern.

+ pmd->miniflow_extract_opt = NULL;

Guess the above needs to be atomic.

Removed based on Flavio comments.

+ VLOG_ERR("failed to get miniflow extract function implementations\n");

Capital F to be in sync with your other error messages?

Removed based on Flavio comments.

+ return 0;
+ }
+ ovs_assert(keys_size >= cnt);


I don’t think we should assert here. Just return an error like above, so in 
production, we get notified, and this implementation gets disabled.

Actually we do else one would most likely be overwriting the assigned array 
space for keys and will hit a Seg fault at some point.

And hence we would like to know at the compile time if this is the case.

But this is not a compile time check, it will crash OVS. You could just do this:

if (keys_size < cnt) {
pmd->miniflow_extract_opt = NULL;
VLOG_ERR(“Invalid key size supplied etc. etc.\n”);
return 0;
}

Or you could process up to key_size packets

Reply:   sure I have taken the first approach in v5 as it safe and avoid any 
risk of Seg fault in V5.

+ struct netdev_flow_key test_keys[NETDEV_MAX_BURST];
+
+ /* Run scalar miniflow_extract to get default result. */
+ DP_PACKET_BATCH_FOR_EACH (i, packet, packets) {
+ pkt_metadata_init(>md, in_port);
+ miniflow_extract(packet, [i].mf);
+
+ /* Store known good metadata to compare with optimized metadata. */
+ good_l2_5_ofs[i] = packet->l2_5_ofs;
+ good_l3_ofs[i] = packet->l3_ofs;
+ good_l4_ofs[i] = packet->l4_ofs;
+ good_l2_pad_size[i] = packet->l2_pad_size;
+ }
+
+ /* Iterate through each version of miniflow implementations. */
+ for (int j = MFEX_IMPL_START_IDX; j < ARRAY_SIZE(mfex_impls); j++) {
+ if 

[ovs-dev] [PATCH ovn] northd-ddlog: Add proxy arp flows for configured addresses in lsp router port.

2021-06-29 Thread numans
From: Numan Siddique 

The commit [1] didn't add the ddlog part.

[1] - 8087cbc7462("ovn-northd.c: Add proxy ARP support to OVN")

Signed-off-by: Numan Siddique 
---
 northd/ovn.dl|  1 +
 northd/ovn.rs| 13 +
 northd/ovn_northd.dl | 38 ++
 tests/ovn.at |  4 ++--
 4 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/northd/ovn.dl b/northd/ovn.dl
index f23ea3b9e1..3c7a734ddb 100644
--- a/northd/ovn.dl
+++ b/northd/ovn.dl
@@ -364,6 +364,7 @@ extern function is_dynamic_lsp_address(addr: string): bool
 extern function extract_lsp_addresses(address: string): Option
 extern function extract_addresses(address: string): Option
 extern function extract_lrp_networks(mac: string, networks: Set): 
Option
+extern function extract_ip_addresses(address: string): Option
 
 extern function split_addresses(addr: string): (Set, Set)
 
diff --git a/northd/ovn.rs b/northd/ovn.rs
index d44f83bc75..5f0939409c 100644
--- a/northd/ovn.rs
+++ b/northd/ovn.rs
@@ -184,6 +184,18 @@ pub fn extract_lrp_networks(mac: , networks: 
_std::Set) ->
 }
 }
 
+pub fn extract_ip_addresses(address: ) -> 
ddlog_std::Option {
+unsafe {
+let mut laddrs: lport_addresses_c = Default::default();
+if ovn_c::extract_ip_addresses(string2cstr(address).as_ptr(),
+laddrs as *mut lport_addresses_c) {
+ddlog_std::Option::Some{x: laddrs.into_ddlog()}
+} else {
+ddlog_std::Option::None
+}
+}
+}
+
 pub fn ovn_internal_version() -> String {
 unsafe {
 let s = ovn_c::ovn_get_internal_version();
@@ -623,6 +635,7 @@ mod ovn_c {
 pub fn extract_addresses(address: *const raw::c_char, laddrs: *mut 
lport_addresses_c, ofs: *mut raw::c_int) -> bool;
 pub fn extract_lrp_networks__(mac: *const raw::c_char, networks: 
*const *const raw::c_char,
   n_networks: libc::size_t, laddrs: *mut 
lport_addresses_c) -> bool;
+pub fn extract_ip_addresses(address: *const raw::c_char, laddrs: *mut 
lport_addresses_c) -> bool;
 pub fn destroy_lport_addresses(addrs: *mut lport_addresses_c);
 pub fn is_dynamic_lsp_address(address: *const raw::c_char) -> bool;
 pub fn split_addresses(addresses: *const raw::c_char, ip4_addrs: *mut 
ovs_svec, ipv6_addrs: *mut ovs_svec);
diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
index 52a6206a18..a7a327c7f0 100644
--- a/northd/ovn_northd.dl
+++ b/northd/ovn_northd.dl
@@ -3360,6 +3360,44 @@ for (CheckLspIsUp[check_lsp_is_up]) {
 }
 }
 
+Flow(.logical_datapath = sw._uuid,
+ .stage= s_SWITCH_IN_ARP_ND_RSP(),
+ .priority = 50,
+ .__match  = __match,
+ .actions  = __actions,
+ .external_ids = stage_hint(sp.lsp._uuid)) :-
+
+sp in (.sw = sw, .peer = Some{rp}),
+rp.is_enabled(),
+var proxy_ips = {
+match (sp.lsp.options.get("arp_proxy")) {
+None -> "",
+Some {addresses} -> {
+match (extract_ip_addresses(addresses)) {
+None -> "",
+Some{addr} -> {
+var ip4_addrs = vec_empty();
+for (ip4 in addr.ipv4_addrs) {
+ip4_addrs.push("${ip4.addr}")
+};
+string_join(ip4_addrs, ",")
+}
+}
+}
+}
+},
+proxy_ips != "",
+var __match = "arp.op == 1 && arp.tpa == {" ++ proxy_ips ++ "}",
+var __actions = "eth.dst = eth.src; "
+"eth.src = ${rp.networks.ea}; "
+"arp.op = 2; /* ARP reply */ "
+"arp.tha = arp.sha; "
+"arp.sha = %s; "
+"arp.tpa <-> arp.spa; "
+"outport = inport; "
+"flags.loopback = 1; "
+"output;".
+
 /* For ND solicitations, we need to listen for both the
  * unicast IPv6 address and its all-nodes multicast address,
  * but always respond with the unicast IPv6 address. */
diff --git a/tests/ovn.at b/tests/ovn.at
index db1a0a35c2..31f0b90996 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -26940,7 +26940,7 @@ ovs-vsctl -- add-port br-int vif1 -- \
 # And proxy ARP flows for 69.254.239.254 and 169.254.239.2
 # and check that SB flows have been added.
 ovn-nbctl --wait=hv add Logical_Switch_Port rp-ls1 \
-options arp_proxy='"169.254.239.254 169.254.239.2"'
+options arp_proxy='"169.254.239.254,169.254.239.2"'
 ovn-sbctl dump-flows > sbflows
 AT_CAPTURE_FILE([sbflows])
 
@@ -26957,7 +26957,7 @@ AT_CHECK([ovn-sbctl dump-flows | grep ls_in_arp_rsp | 
grep "169.254.239.2"], [1]
 
 # Add the flows back send arp request and check we see an ARP response
 ovn-nbctl --wait=hv add Logical_Switch_Port rp-ls1 \
-options arp_proxy='"169.254.239.254 169.254.239.2"'

Re: [ovs-dev] Openvswitch patch doubts:conntrack: Fix missed 'conn' lookup checks.

2021-06-29 Thread Ben Pfaff
On Mon, Jun 28, 2021 at 11:45:11AM +0800, user wrote:
> I think this situation may not happen, because if there are two pkts
> are going to create the same conntrack, their headers will be roughly
> the same, the rss of the hardware will assign the packets to the same
> cpu, so there is no chance for two threads to try to insert the same
> conntrack at the same time, so I am confused about the reason for
> doing this and what are the missing cases.

We can't rely on RSS for mutual exclusion.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v7 1/4] conntrack: handle already natted packets

2021-06-29 Thread Paolo Valerio
Dumitru Ceara  writes:

> On 6/25/21 2:01 PM, Paolo Valerio wrote:
>> Dumitru Ceara  writes:
>> 
>>> On 6/21/21 12:06 PM, Paolo Valerio wrote:
 when a packet gets dnatted and then recirculated, it could be possible
 that it matches another rule that performs another nat action.
 The kernel datapath handles this situation turning to a no-op the
 second nat action, so natting only once the packet.  In the userspace
 datapath instead, when the ct action gets executed, an initial lookup
 of the translated packet fails to retrieve the connection related to
 the packet, leading to the creation of a new entry in ct for the src
 nat action with a subsequent failure of the connection establishment.

 with the following flows:

 table=0,priority=30,in_port=1,ip,nw_dst=192.168.2.100,actions=ct(commit,nat(dst=10.1.1.2:80),table=1)
 table=0,priority=20,in_port=2,ip,actions=ct(nat,table=1)
 table=0,priority=10,ip,actions=resubmit(,2)
 table=0,priority=10,arp,actions=NORMAL
 table=0,priority=0,actions=drop
 table=1,priority=5,ip,actions=ct(commit,nat(src=10.1.1.240),table=2)
 table=2,in_port=ovs-l0,actions=2
 table=2,in_port=ovs-r0,actions=1

 establishing a connection from 10.1.1.1 to 192.168.2.100 the outcome is:

 tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=4000,dport=80),reply=(src=10.1.1.2,dst=10.1.1.240,sport=80,dport=4000),protoinfo=(state=ESTABLISHED)
 tcp,orig=(src=10.1.1.1,dst=192.168.2.100,sport=4000,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=4000),protoinfo=(state=ESTABLISHED)

 with this patch applied the outcome is:

 tcp,orig=(src=10.1.1.1,dst=192.168.2.100,sport=4000,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=4000),protoinfo=(state=ESTABLISHED)

 The patch performs, for already natted packets, a lookup of the
 reverse key in order to retrieve the related entry, it also adds a
 test case that besides testing the scenario ensures that the other ct
 actions are executed.

 Reported-by: Dumitru Ceara 
 Signed-off-by: Paolo Valerio 
 ---
>>>
>>> Hi Paolo,
>>>
>>> Thanks for the patch!  I tested it and it works fine for OVN.  I have a
>>> few comments/questions below.
>>>
>> 
>> Thanks for the test and for the review.
>> 
  lib/conntrack.c |   30 +-
  tests/system-traffic.at |   35 +++
  2 files changed, 64 insertions(+), 1 deletion(-)

 diff --git a/lib/conntrack.c b/lib/conntrack.c
 index 99198a601..7e8b16a3e 100644
 --- a/lib/conntrack.c
 +++ b/lib/conntrack.c
 @@ -1281,6 +1281,33 @@ process_one_fast(uint16_t zone, const uint32_t 
 *setmark,
  }
  }
  
 +static void
 +initial_conn_lookup(struct conntrack *ct, struct conn_lookup_ctx *ctx,
 + long long now, bool natted)
>>>
>>> Nit: indentation.
>>>
>> 
>> ACK
>> 
 +{
 +bool found;
 +
 +if (natted) {
 +/* if the packet has been already natted (e.g. a previous
 + * action took place), retrieve it performing a lookup of its
 + * reverse key. */
 +conn_key_reverse(>key);
 +}
 +
 +found = conn_key_lookup(ct, >key, ctx->hash,
 +now, >conn, >reply);
 +if (natted) {
 +if (OVS_LIKELY(found)) {
 +ctx->reply = !ctx->reply;
 +ctx->key = ctx->reply ? ctx->conn->rev_key : ctx->conn->key;
 +ctx->hash = conn_key_hash(>key, ct->hash_basis);
 +} else {
 +/* in case of failure restore the initial key. */
 +conn_key_reverse(>key);
>>>
>>> Can the lookup actually fail?  I mean, if the packet was natted, there
>>> must have been a connection on which it got natted.  Anyway, I think we
>>> should probably also increment a coverage counter.  I guess dropping
>>> such packets would be hard, right?
>>>
>> 
>> I agree, it should not fail. If I'm not missing something, if it
>> happens, it should be because there's been a problem somewhere else
>> (e.g. a polluted ct_state value or more in general an unexpected
>> scenario). For this reason, I think it's better not to drop it or even
>> set it as invalid.
>
> I'm not sure, won't this create horrible to debug bugs when packets get
> forwarded in an unexpected way?  Setting it as invalid isn't good enough
> in my opinion because there might be flows later in the pipeline that
> perform actions (other than drop) on packets with ct_state +inv.
>
> The problem I have (because I don't know the conntrack code) is that I
> see no easy way to drop the packet.
>
>> 
>> Yes, the coverage counter gives more meaning to the else branch.
>> Alternatively, we could probably even remove it. I would leave the NULL
>> check or equivalent.
>> 
>> I have no strong preference.
>> WDYT?
>> 
>
> I would prefer a 

Re: [ovs-dev] [PATCH] reconnect: Add graceful reconnect.

2021-06-29 Thread Ben Pfaff
On Tue, Jun 29, 2021 at 01:20:35PM +0200, Dumitru Ceara wrote:
> Until now clients that needed to reconnect immediately could only use
> reconnect_force_reconnect().  However, reconnect_force_reconnect()
> doesn't reset the backoff for connections that were alive long enough
> (more than backoff seconds).
> 
> Moreover, the reconnect library cannot determine the exact reason why a
> client wishes to initiate a reconnection.  In most cases reconnection
> happens because of a fatal error when communicating with the remote,
> e.g., in the ovsdb-cs layer, when invalid messages are received from
> ovsdb-server.  In such cases it makes sense to not reset the backoff
> because the remote seems to be unhealthy.
> 
> There are however cases when reconnection is needed for other reasons.
> One such example is when ovsdb-clients require "leader-only" connections
> to clustered ovsdb-server databases.  Whenever the client determines
> that the remote is not a leader anymore, it decides to reconnect to a
> new remote from its list, searching for the new leader.  Using
> jsonrpc_force_reconnect() (which calls reconnect_force_reconnect()) will
> not reset backoff even though the former leader is still likely in good
> shape.
> 
> Since 3c2d6274bcee ("raft: Transfer leadership before creating
> snapshots.") leadership changes inside the clustered database happen
> more often and therefore "leader-only" clients need to reconnect more
> often too.  Not resetting the backoff every time a leadership change
> happens will cause all reconnections to happen with the maximum backoff
> (8 seconds) resulting in significant latency.
> 
> This commit also updates the Python reconnect and IDL implementations
> and adds tests for force-reconnect and graceful-reconnect.
> 
> Reported-at: https://bugzilla.redhat.com/1977264
> Signed-off-by: Dumitru Ceara 

I only glanced over this, but my reaction is good.  Thank you for
adding tests and writing such a thorough rationale!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v4 06/12] dpif-netdev: Add additional packet count parameter for study function

2021-06-29 Thread Amber, Kumar
Hi Eelco,

Replies are inline.

> > +++ b/Documentation/topics/dpdk/bridge.rst
> > @@ -288,7 +288,13 @@ An implementation can be selected manually by
> the following command ::
> >  Also user can select the study implementation which studies the
> > traffic for  a specific number of packets by applying all availbale
> > implementaions of  miniflow extract and than chooses the one with most
> > optimal result for that -traffic pattern.
> > +traffic pattern. User can also provide additonal parameter as packet
> > +count which is minimum packets which OVS must study before choosing
> > +optimal implementation, If no packet count is provided than default
> value is choosen.
> > +
> 
> Should we mention the default value?
> 
> 
> Also, thinking about configuring the study option, as there is no
> synchronization point between threads, do we need to mention that one
> PMD thread might still be running a previous round, and can now decide on
> earlier data?
> 
> Let say you do:
> 
>   ovs-appctl dpif-netdev/miniflow-parser-set study 3
> 
> 3 busy threads are done, and a 4th is still busy as it has only done 1
> packets. Now you do:
> 
>   ovs-appctl dpif-netdev/miniflow-parser-set study 1
> 
> And one thread will be done instantly, will the other might take a while…
> 

Will update the study section accordingly .

> 
> > +Study can be selected with packet count by the following command ::
> > +
> > +$ ovs-appctl dpif-netdev/miniflow-parser-set study 1024
> >
> >  Miniflow Extract Validation
> >  ~~~
> > diff --git a/lib/dpif-netdev-extract-study.c
> > b/lib/dpif-netdev-extract-study.c index d063d040c..c48fb125e 100644
> > --- a/lib/dpif-netdev-extract-study.c
> > +++ b/lib/dpif-netdev-extract-study.c
> > @@ -55,6 +55,19 @@ get_study_stats(void)
> >  return stats;
> >  }
> >
> > +static uint32_t pkt_compare_count = 0;
> > +
> > +uint32_t mfex_set_study_pkt_cnt(uint32_t pkt_cmp_count,
> > +struct dpif_miniflow_extract_impl
> > +*opt) {
> > +if ((opt->extract_func == mfex_study_traffic) && (pkt_cmp_count != 0))
> {
> > +pkt_compare_count = pkt_cmp_count;
> > +return 0;
> > +}
> > +pkt_compare_count = MFEX_MAX_COUNT;
> > +return -EINVAL;
> > +}
> > +
> >  uint32_t
> >  mfex_study_traffic(struct dp_packet_batch *packets,
> > struct netdev_flow_key *keys, @@ -87,7 +100,7 @@
> > mfex_study_traffic(struct dp_packet_batch *packets,
> >
> >  /* Choose the best implementation after a minimum packets have been
> >   * processed. */
> > -if (stats->pkt_count >= MFEX_MAX_COUNT) {
> > +if (stats->pkt_count >= pkt_compare_count) {
> >  uint32_t best_func_index = MFEX_IMPL_START_IDX;
> >  uint32_t max_hits = 0;
> >  for (int i = MFEX_IMPL_START_IDX; i < impl_count; i++) { diff
> > --git a/lib/dpif-netdev-private-extract.h
> > b/lib/dpif-netdev-private-extract.h
> > index d8a284db7..0ec74bef9 100644
> > --- a/lib/dpif-netdev-private-extract.h
> > +++ b/lib/dpif-netdev-private-extract.h
> > @@ -127,5 +127,13 @@ dpif_miniflow_extract_get_default(void);
> >   * overridden at runtime. */
> >  void
> >  dpif_miniflow_extract_set_default(miniflow_extract_func func);
> > +/* Sets the packet count from user to the stats for use in
> > + * study function to match against the classified packets to choose
> > + * the optimal implementation.
> > + * On error, returns EINVAL.
> > + * On success, returns 0.
> > + */
> > +uint32_t mfex_set_study_pkt_cnt(uint32_t pkt_cmp_count,
> > +struct dpif_miniflow_extract_impl *opt);
> >
> >  #endif /* DPIF_NETDEV_AVX512_EXTRACT */ diff --git
> > a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 716e0debf..35c927d55
> > 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -1141,14 +1141,29 @@ dpif_miniflow_extract_impl_set(struct
> unixctl_conn *conn, int argc,
> >  return;
> >  }
> >  new_func = opt->extract_func;
> > -/* argv[2] is optional datapath instance. If no datapath name is
> provided.
> > +
> > +/* argv[2] is optional packet count, which user can provide along with
> > + * study function to set the minimum packet that must be matched in
> order
> > + * to choose the optimal function. */
> > +uint32_t pkt_cmp_count = 0;
> > +uint32_t study_ret;
> > +if (argc == 3) {
> > +char *err_str;
> > +pkt_cmp_count = strtoul(argv[2], _str, 10);
> 
> 
> Guess you might want to use the ovs str_to_uint() like functions to verify the
> input string is valid.
>

Thanks seems a fair idea to do will be taken into v5.
 
> > +study_ret = mfex_set_study_pkt_cnt(pkt_cmp_count, opt);
> > +} else {
> > +/* Default packet compare count when packets count not provided.
> */
> > +study_ret = mfex_set_study_pkt_cnt(0, opt);
> > +}
> > +
> > +/* argv[3] is optional datapath instance. If no datapath name is
> provided.
> > 

Re: [ovs-dev] [PATCH] ovsdb-cs: Avoid unnecessary re-connections when updating remotes.

2021-06-29 Thread Ben Pfaff
On Tue, Jun 29, 2021 at 12:56:18PM +0200, Ilya Maximets wrote:
> If a new database server added to the cluster, or if one of the
> database servers changed its IP address or port, then you need to
> update the list of remotes for the client.  For example, if a new
> OVN_Southbound database server is added, you need to update the
> ovn-remote for the ovn-controller.
> 
> However, in the current implementation, the ovsdb-cs module always
> closes the current connection and creates a new one.  This can lead
> to a storm of re-connections if all ovn-controllers will be updated
> simultaneously.  They can also start re-dowloading the database
> content, creating even more load on the database servers.
> 
> Correct this by saving an existing connection if it is still in the
> list of remotes after the update.
> 
> 'reconnect' module will report connection state updates, but that
> is OK since no real re-connection happened and we only updated the
> state of a new 'reconnect' instance.
> 
> If required, re-connection can be forced after the update of remotes
> with ovsdb_cs_force_reconnect().

I think one of the goals here was to keep the load balanced as servers
are added.  Maybe that's not a big deal, or maybe it would make sense to
flip a coin for each of the new servers and switch over to it with
probability 1/n where n is the number of servers.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] "Why perthread->mutex is needed?"

2021-06-29 Thread Ben Pfaff
On Tue, Jun 29, 2021 at 09:49:39PM +0800, 贺鹏 wrote:
> I am investigating the OVS RCU code, and feel confused about the
> perthread->mutex, what's the usage of this mutex? it seems in the code
> there are only codes that inits and destroys the mutex, but there is
> no code that locks and unlocks it.

Thanks for spotting that.  I sent out a patch to remove it:
https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/384582.html
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] ovs-rcu: Remove unneeded mutex from struct ovsrcu_perthread.

2021-06-29 Thread Ben Pfaff
It was not really used.

Signed-off-by: Ben Pfaff 
Reported-by: 贺鹏 
---
 lib/ovs-rcu.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/lib/ovs-rcu.c b/lib/ovs-rcu.c
index cde1e925ba94..1866bd308822 100644
--- a/lib/ovs-rcu.c
+++ b/lib/ovs-rcu.c
@@ -47,7 +47,6 @@ struct ovsrcu_cbset {
 struct ovsrcu_perthread {
 struct ovs_list list_node;  /* In global list. */
 
-struct ovs_mutex mutex;
 uint64_t seqno;
 struct ovsrcu_cbset *cbset;
 char name[16];  /* This thread's name. */
@@ -84,7 +83,6 @@ ovsrcu_perthread_get(void)
 const char *name = get_subprogram_name();
 
 perthread = xmalloc(sizeof *perthread);
-ovs_mutex_init(>mutex);
 perthread->seqno = seq_read(global_seqno);
 perthread->cbset = NULL;
 ovs_strlcpy(perthread->name, name[0] ? name : "main",
@@ -406,7 +404,6 @@ ovsrcu_unregister__(struct ovsrcu_perthread *perthread)
 ovs_list_remove(>list_node);
 ovs_mutex_unlock(_threads_mutex);
 
-ovs_mutex_destroy(>mutex);
 free(perthread);
 
 seq_change(global_seqno);
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] Fwd: "Why perthread->mutex is needed?"

2021-06-29 Thread 贺鹏
-- 转发的邮件 --
发件人: *贺鹏* 
日期: 2021年6月29日星期二
主题: "Why perthread->mutex is needed?"
收件人: Sriharsha Basavapatna via dev , Ben Pfaff <
b...@ovn.org>


Hi,Ben,

I am investigating the OVS RCU code, and feel confused about the
perthread->mutex, what's the usage of this mutex? it seems in the code
there are only codes that inits and destroys the mutex, but there is
no code that locks and unlocks it.

Thanks.

-- 
hepeng



-- 
hepeng
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v4 01/12] dpif-netdev: Add command line and function pointer for miniflow extract

2021-06-29 Thread Van Haaren, Harry
> -Original Message-
> From: Eelco Chaudron 
> Sent: Tuesday, June 29, 2021 2:56 PM
> To: Amber, Kumar 
> Cc: Van Haaren, Harry ; d...@openvswitch.org;
> i.maxim...@ovn.org; Flavio Leitner 
> Subject: Re: [ovs-dev] [v4 01/12] dpif-netdev: Add command line and function
> pointer for miniflow extract
> 
> 
> 
> On 29 Jun 2021, at 13:59, Amber, Kumar wrote:
> 
> > Hi Eelco,
> >
> > Thanks a lot for the comments and my replies are inline.
> >
> 
> 
> 
> >>> +return;
> >>> +}
> >>> +
> >>> +/* Add all mfex functions to reply string. */
> >>> +struct ds reply = DS_EMPTY_INITIALIZER;
> >>> +ds_put_cstr(, "Available Optimized Miniflow Extracts:\n");
> >>> +for (uint32_t i = 0; i < count; i++) {
> >>> +ds_put_format(, "  %s (available: %s)\n",
> >>> +  mfex_impls[i].name, mfex_impls[i].available ?
> >>> +  "True" : "False");
> >>> +}
> >>> +unixctl_command_reply(conn, ds_cstr());
> >>> +ds_destroy();
> >>
> >> I think this command must output the currently configured values for all
> >> data paths, or else there is no easy way to see the current setting.
> >>
> >
> > We are planning to do a separate patch for implementing the same for DPIF,
> > MFEX adnd DPCLS.
> >
> 
> If you do, please do it ASAP, as I think this feature should not get in 
> without being
> able to see in the field what the actual configuration is.

Hi Eelco,
 
OK it seems that there's a lot of focus around visibility of implementation 
used here.
That's good and makes sense, lets focus to get that improved.
 
So moving forward, how about the below output for each command?
(Note, I had a quick chat with Amber & Cian over IM here to get to the below!)
 
The mapping is not always very obvious, as e.g. DPCLS ports can be re-assigned 
between PMD threads.
(Note the implementation of DPCLS might be a bit tricky, as specialized 
subtable searches
aren't externally exposed. I'm confident we'll find a solution.)
 
DPIF and MFEX are enabled per-PMD thread, and are always consistent for all 
datapath threads.
 
Today's commands have very similar output, now with (name: value) data points 
added.
Example for DPIF:   (pmds: 15,16)  means pmd threads 15 and 16 are running that 
impl.
 
Thoughts on the below commands, and added info?  Regards, -Harry
 
 
$ ovs-appctl dpif-netdev/subtable-lookup-prio-get
Available lookup functions (priority : name)
  0 : autovalidator (ports: none)
  1 : generic (ports: none)
  3 : avx512_gather (ports: 2) # number of DPCLS ports using this impl
 
$ ovs-appctl dpif-netdev/dpif-set-impl
Available DPIF impls:
  dpif_scalar (pmds: 15,16)# PMD thread ids using this DPIF impl
  dpif_avx512 (pmds: none)
 
$ ovs-appctl  dpif-netdev/miniflow-parser-get
Available Optimized Miniflow Extracts:
  autovalidator (available: True, pmds: none)
  disable (available: True, pmds: none)
  study (available: True, pmds: none)
  avx512_vbmi_ipv4_udp (available: True, pmds: 15,16) # PMD thread ids 
using this MFEX impl
. 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v4 02/12] dpif-netdev: Add auto validation function for miniflow extract

2021-06-29 Thread Amber, Kumar
Hi Flavio,

Replies inline.

> >
> > Guess the above needs to be atomic.
> >
> > Removed based on Flavio comments.
> 
> I asked to initialize that using an API and Eelco is asking to set it 
> atomically.
> The requests are complementary, right?
> 

Yes True sorry for confusion so we have refactored the code a bit to use Atomic 
set and get along with the API 
Wherever applicable since here on any failure we would want to fall back to 
Scalar we would not need the API
To find default implementation.
> >
> > + VLOG_ERR("failed to get miniflow extract function
> > + implementations\n");
> >
> > Capital F to be in sync with your other error messages?
> >
> > Removed based on Flavio comments.
> 
> Not sure if I got this. I mentioned that the '\n' is not needed at the end of 
> all
> VLOG_* calls. Eelco is asking to start with capital 'F'. So the requests are
> complementary, unless with the refactor the message went away.
> 
> Just make sure to follow the logging style convention in OVS.

Sorry for confusion I have fixed all the VLOGS with this convention.
> 
> fbl
> 
> 
> 
> >
> > + return 0;
> > + }
> > + ovs_assert(keys_size >= cnt);
> >
> > I don’t think we should assert here. Just return an error like above, so in
> production, we get notified, and this implementation gets disabled.
> >
> > Actually we do else one would most likely be overwriting the assigned
> array space for keys and will hit a Seg fault at some point.
> >
> > And hence we would like to know at the compile time if this is the case.
> >
> > + struct netdev_flow_key test_keys[NETDEV_MAX_BURST];
> > +
> > + /* Run scalar miniflow_extract to get default result. */
> > + DP_PACKET_BATCH_FOR_EACH (i, packet, packets) {
> > + pkt_metadata_init(>md, in_port); miniflow_extract(packet,
> > + [i].mf);
> > +
> > + /* Store known good metadata to compare with optimized metadata. */
> > + good_l2_5_ofs[i] = packet->l2_5_ofs; good_l3_ofs[i] =
> > + packet->l3_ofs; good_l4_ofs[i] = packet->l4_ofs; good_l2_pad_size[i]
> > + = packet->l2_pad_size; }
> > +
> > + /* Iterate through each version of miniflow implementations. */ for
> > + (int j = MFEX_IMPL_START_IDX; j < ARRAY_SIZE(mfex_impls); j++) { if
> > + (!mfex_impls[j].available) { continue; }
> > +
> > + /* Reset keys and offsets before each implementation. */
> > + memset(test_keys, 0, keys_size * sizeof(struct netdev_flow_key));
> > + DP_PACKET_BATCH_FOR_EACH (i, packet, packets) {
> > + dp_packet_reset_offsets(packet); }
> > + /* Call optimized miniflow for each batch of packet. */ uint32_t
> > + hit_mask = mfex_impls[j].extract_func(packets, test_keys, keys_size,
> > + in_port, pmd_handle);
> > +
> > + /* Do a miniflow compare for bits, blocks and offsets for all the
> > + * classified packets in the hitmask marked by set bits. */ while
> > + (hit_mask) {
> > + /* Index for the set bit. */
> > + uint32_t i = __builtin_ctz(hit_mask);
> > + /* Set the index in hitmask to Zero. */ hit_mask &= (hit_mask - 1);
> > +
> > + uint32_t failed = 0;
> > +
> > + /* Check miniflow bits are equal. */ if ((keys[i].mf.map.bits[0] !=
> > + test_keys[i].mf.map.bits[0]) || (keys[i].mf.map.bits[1] !=
> > + test_keys[i].mf.map.bits[1])) { VLOG_ERR("Good 0x%llx 0x%llx\tTest
> > + 0x%llx 0x%llx\n", keys[i].mf.map.bits[0], keys[i].mf.map.bits[1],
> > + test_keys[i].mf.map.bits[0], test_keys[i].mf.map.bits[1]); failed =
> > + 1; }
> > +
> > + if (!miniflow_equal([i].mf, _keys[i].mf)) { uint32_t
> > + block_cnt = miniflow_n_values([i].mf); VLOG_ERR("Autovalidation
> > + blocks failed for %s pkt %d", mfex_impls[j].name, i); VLOG_ERR("
> > + Good hexdump:\n"); uint64_t *good_block_ptr = (uint64_t
> > + *)[i].buf; uint64_t *test_block_ptr = (uint64_t
> > + *)_keys[i].buf; for (uint32_t b = 0; b < block_cnt; b++) {
> > + VLOG_ERR(" %"PRIx64"\n", good_block_ptr[b]); } VLOG_ERR(" Test
> > + hexdump:\n"); for (uint32_t b = 0; b < block_cnt; b++) { VLOG_ERR("
> > + %"PRIx64"\n", test_block_ptr[b]); } failed = 1; }
> > +
> > + if ((packets->packets[i]->l2_pad_size != good_l2_pad_size[i]) ||
> > + (packets->packets[i]->l2_5_ofs != good_l2_5_ofs[i]) ||
> > + (packets->packets[i]->l3_ofs != good_l3_ofs[i]) ||
> > + (packets->packets[i]->l4_ofs != good_l4_ofs[i])) {
> > + VLOG_ERR("Autovalidation packet offsets failed for %s pkt %d",
> > + mfex_impls[j].name, i); VLOG_ERR(" Good offsets: l2_pad_size %u,
> > + l2_5_ofs : %u"
> > + " l3_ofs %u, l4_ofs %u\n",
> > + good_l2_pad_size[i], good_l2_5_ofs[i], good_l3_ofs[i],
> > + good_l4_ofs[i]); VLOG_ERR(" Test offsets: l2_pad_size %u, l2_5_ofs :
> > + %u"
> > + " l3_ofs %u, l4_ofs %u\n",
> > + packets->packets[i]->l2_pad_size,
> > + packets->packets[i]->l2_5_ofs,
> > + packets->packets[i]->l3_ofs,
> > + packets->packets[i]->l4_ofs);
> > + failed = 1;
> > + }
> > +
> > + if (failed) {
> >
> > Why stop now!? I think we should run all implementations, as others
> might need fixing too!
> >
> > We had the same model as above by you but with so many debug
> messages
> > flooding made it 

Re: [ovs-dev] [PATCH v3 0/2] add port-based ingress policing based packet-per-second rate-limiting

2021-06-29 Thread Marcelo Ricardo Leitner
On Wed, Jun 23, 2021 at 03:47:45PM +0200, Simon Horman wrote:
> On Wed, Jun 09, 2021 at 11:52:07AM +0200, Simon Horman wrote:
> > Hi,
> >
> > this short test adds support for add port-based ingress policing based
> > packet-per-second rate-limiting. This builds on existing support for
> > byte-per-second rate limiting.
> >
> > Changes since v2
> >
> > * Remove the for loop in function nl_msg_put_act_police()
> > * Remove unused enum definition for qos type
> > * Define 1 kpkts as 1000 packets rather than 1024 packets
> > * Update the description for the new item in ovsdb
> > * Fix some format warnings according robot's comments
> >
> > Changes between v1 and v2
> > * Correct typo: s/comsume/consume/
>
> Hi Marcelo,
>
> could I trouble you for a review of this series.
> I believe it addresses the issues that you raised in v2.

Hi Simon,

Yes, it does, thanks.

I'd like to run some tests and get more acquainted with rate limiting
on OVS before adding a Reviewed-by tag, but I couldn't do it so far
and now I'm not sure I can do it this week. Anyhow, lets not have the
merge blocked on this, unless you really want to. :-)
I probably can get to this next week, FWIW.

Thanks,
Marcelo

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v10] ofproto-dpif: APIs and CLI option to add/delete static fdb entry

2021-06-29 Thread Eelco Chaudron



On 29 Jun 2021, at 15:19, Vasu Dasari wrote:

> Currently there is an option to add/flush/show ARP/ND neighbor. This covers L3
> side.  For L2 side, there is only fdb show command. This patch gives an option
> to add/del an fdb entry via ovs-appctl.
>
> CLI command looks like:
>
> To add:
> ovs-appctl fdb/add
> ovs-appctl fdb/add br0 p1 0 50:54:00:00:00:05
>
> To del:
> ovs-appctl fdb/del   
> ovs-appctl fdb/del br0 0 50:54:00:00:00:05
>
> Added two new APIs to provide convenient interface to add and delete 
> static-macs.
> bool xlate_add_static_mac_entry(const struct ofproto_dpif *, ofp_port_t 
> in_port,
>struct eth_addr dl_src, int vlan);
> bool xlate_delete_static_mac_entry(const struct ofproto_dpif *,
>   struct eth_addr dl_src, int vlan);
>
> 1. Static entry should not age. To indicate that entry being programmed is a 
> static entry,
>'expires' field in 'struct mac_entry' will be set to a 
> MAC_ENTRY_AGE_STATIC_ENTRY. A
>check for this value is made while deleting mac entry as part of regular 
> aging process.
> 2. Another change to of mac-update logic, when a packet with same dl_src as 
> that of a
>static-mac entry arrives on any port, the logic will not modify the 
> expires field.
> 3. While flushing fdb entries, made sure static ones are not evicted.
> 4. Updated "ovs-appctl fdb/stats-show br0" to display numberof static entries 
> in switch
>
> Added following tests:
>   ofproto-dpif - static-mac add/del/flush
>   ofproto-dpif - static-mac mac moves
>
> Signed-off-by: Vasu Dasari 
> Reported-at: 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048894.html
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1597752
> Tested-by: Eelco Chaudron 
> Acked-by: Eelco Chaudron 
> ---

Thanks for your patience to follow this trough!

Acked-by: Eelco Chaudron 

//Eelco

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v4 10/12] dpif-netdev/mfex: Add AVX512 based optimized miniflow extract

2021-06-29 Thread Amber, Kumar
Hi Flavio,

Pls find my replies inline.

> -Original Message-
> From: Flavio Leitner 
> Sent: Tuesday, June 29, 2021 6:50 PM
> To: Amber, Kumar 
> Cc: d...@openvswitch.org; i.maxim...@ovn.org
> Subject: Re: [ovs-dev] [v4 10/12] dpif-netdev/mfex: Add AVX512 based
> optimized miniflow extract
> 
> 
> Hi,
> 
> On Thu, Jun 17, 2021 at 09:57:52PM +0530, Kumar Amber wrote:
> > From: Harry van Haaren 
> >
> > This commit adds AVX512 implementations of miniflow extract.
> > By using the 64 bytes available in an AVX512 register, it is possible
> > to convert a packet to a miniflow data-structure in a small quantity
> > instructions.
> >
> > The implementation here probes for Ether()/IP()/UDP() traffic, and
> > builds the appropriate miniflow data-structure for packets that match
> > the probe.
> >
> > The implementation here is auto-validated by the miniflow extract
> > autovalidator, hence its correctness can be easily tested and
> > verified.
> >
> > Note that this commit is designed to easily allow addition of new
> > traffic profiles in a scalable way, without code duplication for each
> > traffic profile.
> >
> > Signed-off-by: Harry van Haaren 
> > ---
> >  lib/automake.mk   |   1 +
> >  lib/dpif-netdev-extract-avx512.c  | 416
> > ++  lib/dpif-netdev-private-extract.c |
> > 15 ++  lib/dpif-netdev-private-extract.h |  19 ++
> >  4 files changed, 451 insertions(+)
> >  create mode 100644 lib/dpif-netdev-extract-avx512.c
> >
> > diff --git a/lib/automake.mk b/lib/automake.mk index
> > 3080bb04a..2b95d6f92 100644
> > --- a/lib/automake.mk
> > +++ b/lib/automake.mk
> > @@ -39,6 +39,7 @@ lib_libopenvswitchavx512_la_CFLAGS = \
> > $(AM_CFLAGS)
> >  lib_libopenvswitchavx512_la_SOURCES = \
> > lib/dpif-netdev-lookup-avx512-gather.c \
> > +   lib/dpif-netdev-extract-avx512.c \
> > lib/dpif-netdev-avx512.c
> >  lib_libopenvswitchavx512_la_LDFLAGS = \
> > -static
> > diff --git a/lib/dpif-netdev-extract-avx512.c
> > b/lib/dpif-netdev-extract-avx512.c
> > new file mode 100644
> > index 0..1145ac8a9
> > --- /dev/null
> > +++ b/lib/dpif-netdev-extract-avx512.c
> > @@ -0,0 +1,416 @@
> > +/*
> > + * Copyright (c) 2021 Intel.
> > + *
> > + * Licensed under the Apache License, Version 2.0 (the "License");
> > + * you may not use this file except in compliance with the License.
> > + * You may obtain a copy of the License at:
> > + *
> > + * http://www.apache.org/licenses/LICENSE-2.0
> > + *
> > + * Unless required by applicable law or agreed to in writing,
> > +software
> > + * distributed under the License is distributed on an "AS IS" BASIS,
> > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> > + * See the License for the specific language governing permissions
> > +and
> > + * limitations under the License.
> > + */
> 
> 
> Since this is very specific to AVX512, can we have a more verbose comment
> here explaining how it works? See the 'dpif, the DataPath InterFace.' in 
> dpif.h
> as an example.
> 

Sure will put a detailed description at top of the file 
> 
> > +
> > +#ifdef __x86_64__
> > +/* Sparse cannot handle the AVX512 instructions. */ #if
> > +!defined(__CHECKER__)
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include "flow.h"
> > +#include "dpdk.h"
> > +
> > +#include "dpif-netdev-private-dpcls.h"
> > +#include "dpif-netdev-private-extract.h"
> > +#include "dpif-netdev-private-flow.h"
> > +
> > +/* AVX512-BW level permutex2var_epi8 emulation. */ static inline
> > +__m512i
> > +__attribute__((target("avx512bw")))
> > +_mm512_maskz_permutex2var_epi8_skx(__mmask64 k_mask,
> > +   __m512i v_data_0,
> > +   __m512i v_shuf_idxs,
> > +   __m512i v_data_1) {
> > +/* Manipulate shuffle indexes for u16 size. */
> > +__mmask64 k_mask_odd_lanes = 0x;
> > +/* clear away ODD lane bytes. Cannot be done above due to no u8 shift
> */
> > +__m512i v_shuf_idx_evn =
> _mm512_mask_blend_epi8(k_mask_odd_lanes,
> > +v_shuf_idxs, _mm512_setzero_si512());
> > +v_shuf_idx_evn = _mm512_srli_epi16(v_shuf_idx_evn, 1);
> > +
> > +__m512i v_shuf_idx_odd = _mm512_srli_epi16(v_shuf_idxs, 9);
> > +
> > +/* Shuffle each half at 16-bit width */
> > +__m512i v_shuf1 = _mm512_permutex2var_epi16(v_data_0,
> v_shuf_idx_evn,
> > +v_data_1);
> > +__m512i v_shuf2 = _mm512_permutex2var_epi16(v_data_0,
> v_shuf_idx_odd,
> > +v_data_1);
> > +
> > +/* Find if the shuffle index was odd, via mask and compare */
> > +uint16_t index_odd_mask = 0x1;
> > +const __m512i v_index_mask_u16 =
> > + _mm512_set1_epi16(index_odd_mask);
> > +
> > +/* EVEN lanes, find if u8 index was odd,  result as u16 bitmask */
> > +__m512i v_idx_even_masked 

Re: [ovs-dev] [v4 09/12] dpdk: add additional CPU ISA detection strings

2021-06-29 Thread Eelco Chaudron



On 17 Jun 2021, at 18:27, Kumar Amber wrote:

> From: Harry van Haaren 
>
> This commit enables OVS to at runtime check for more detailed
> AVX512 capabilities, specifically Byte and Word (BW) extensions,
> and Vector Bit Manipulation Instructions (VBMI).
>
> These instructions will be used in the CPU ISA optimized
> implementations of traffic profile aware miniflow extract.
>
> Signed-off-by: Harry van Haaren 

Acked-by: Eelco Chaudron 

//Eelco

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v4 07/12] test/sytem-dpdk: Add unit test for mfex autovalidator

2021-06-29 Thread Flavio Leitner
On Tue, Jun 29, 2021 at 03:50:22PM +0200, Eelco Chaudron wrote:
> 
> 
> On 28 Jun 2021, at 4:57, Flavio Leitner wrote:
> 
> > Hi,
> >
> >
> > On Thu, Jun 17, 2021 at 09:57:49PM +0530, Kumar Amber wrote:
> >> Tests:
> >>   6: OVS-DPDK - MFEX Autovalidator
> >>   7: OVS-DPDK - MFEX Autovalidator Fuzzy
> >>
> >> Added a new directory to store the PCAP file used
> >> in the tests and a script to generate the fuzzy traffic
> >> type pcap to be used in fuzzy unit test.
> >
> >
> > I haven't tried this yet but am I right that these tests are
> > going to pass a pcap to send traffic in a busy loop for 5
> > seconds in the first case and 20 seconds in the second case?
> >
> > I see that when autovalidator is set OVS will crash if one
> > implementation returns a different value, so I wonder why
> > we need to run for that long.
> 
> I think we should remove the assert (already suggested by Harry),
> so it will not crass by accident if someone selects autovalidator
> in the field (and runs into an issue).
> Failure will then be detected by the ERROR log entries on shutdown.

That's true for the testsuite, but not in production as there is
nothing to disable that.

Perhaps if autovalidator detects an issue, it should log an ERROR
level log to report to testsuite, disable the failing mode and make
sure OVS is either in default or in another functional mode.

> I’m wondering if there is another way than a simple delay, as these tend to 
> cause issues later on. Can we check packets processed or something?

Yeah, maybe we can pass all packets like 5x at least.

fbl


> 
> > It is storing a python tool in the pcap directory. I think the
> > fuzzy tool could be called 'mfex_fuzzy.py' and stay in tests/
> > with other similar testing tools.
> >
> > Also, I don't think the test environment sets OVS_DIR. The
> > 'tests/' is actually $srcdir, but I could be wrong here.
> >
> > BTW, scapy is not mandatory to build or test OVS, so if that
> > tool is not available, the test should be skipped and not fail.
> >
> > Thanks,
> > fbl


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v4 02/12] dpif-netdev: Add auto validation function for miniflow extract

2021-06-29 Thread Eelco Chaudron
Not sure how you replied, but it’s hard to see which comments are 
mine, and which are yours.


On 29 Jun 2021, at 14:27, Amber, Kumar wrote:


Hi Eelco,

Thanks Again for reviews , Pls find my replies inline.

From: Eelco Chaudron 
Sent: Tuesday, June 29, 2021 5:14 PM
To: Van Haaren, Harry ; Amber, Kumar 

Cc: d...@openvswitch.org; i.maxim...@ovn.org; Stokes, Ian 
; Flavio Leitner 
Subject: Re: [ovs-dev] [v4 02/12] dpif-netdev: Add auto validation 
function for miniflow extract



On 17 Jun 2021, at 18:27, Kumar Amber wrote:

This patch introduced the auto-validation function which
allows users to compare the batch of packets obtained from
different miniflow implementations against the linear
miniflow extract and return a hitmask.

The autovaidator function can be triggered at runtime using the
following command:

$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator

Signed-off-by: Kumar Amber 
mailto:kumar.am...@intel.com>>
Co-authored-by: Harry van Haaren 
mailto:harry.van.haa...@intel.com>>
Signed-off-by: Harry van Haaren 
mailto:harry.van.haa...@intel.com>>

---
lib/dpif-netdev-private-extract.c | 141 ++
lib/dpif-netdev-private-extract.h | 15 
lib/dpif-netdev.c | 2 +-
3 files changed, 157 insertions(+), 1 deletion(-)

diff --git a/lib/dpif-netdev-private-extract.c 
b/lib/dpif-netdev-private-extract.c

index fcc56ef26..0741c19f9 100644
--- a/lib/dpif-netdev-private-extract.c
+++ b/lib/dpif-netdev-private-extract.c
@@ -32,6 +32,11 @@ VLOG_DEFINE_THIS_MODULE(dpif_netdev_extract);

/* Implementations of available extract options. */
static struct dpif_miniflow_extract_impl mfex_impls[] = {
+ {
+ .probe = NULL,
+ .extract_func = dpif_miniflow_extract_autovalidator,
+ .name = "autovalidator",
+ },
{
.probe = NULL,
.extract_func = NULL,
@@ -84,3 +89,139 @@ dpif_miniflow_extract_info_get(struct 
dpif_miniflow_extract_impl **out_ptr)

*out_ptr = mfex_impls;
return ARRAY_SIZE(mfex_impls);
}
+
+uint32_t
+dpif_miniflow_extract_autovalidator(struct dp_packet_batch *packets,
+ struct netdev_flow_key *keys,
+ uint32_t keys_size, odp_port_t in_port,
+ void *pmd_handle)
+{
+ const size_t cnt = dp_packet_batch_size(packets);
+ uint16_t good_l2_5_ofs[NETDEV_MAX_BURST];
+ uint16_t good_l3_ofs[NETDEV_MAX_BURST];
+ uint16_t good_l4_ofs[NETDEV_MAX_BURST];
+ uint16_t good_l2_pad_size[NETDEV_MAX_BURST];
+ struct dp_packet *packet;
+ struct dp_netdev_pmd_thread *pmd = pmd_handle;
+ struct dpif_miniflow_extract_impl *miniflow_funcs;
+
+ int32_t mfunc_count = 
dpif_miniflow_extract_info_get(_funcs);

+ if (mfunc_count < 0) {

In theory 0 could not be returned, but just to cover the corner case 
can we change this to include zero.


The  code has been adapted as per Flavio comments so will not be a 
concern.


+ pmd->miniflow_extract_opt = NULL;

Guess the above needs to be atomic.

Removed based on Flavio comments.

+ VLOG_ERR("failed to get miniflow extract function 
implementations\n");


Capital F to be in sync with your other error messages?

Removed based on Flavio comments.

+ return 0;
+ }
+ ovs_assert(keys_size >= cnt);



I don’t think we should assert here. Just return an error like 
above, so in production, we get notified, and this implementation gets 
disabled.


Actually we do else one would most likely be overwriting the assigned 
array space for keys and will hit a Seg fault at some point.


And hence we would like to know at the compile time if this is the 
case.



But this is not a compile time check, it will crash OVS. You could just 
do this:


if (keys_size < cnt) {
 pmd->miniflow_extract_opt = NULL;
 VLOG_ERR(“Invalid key size supplied etc. etc.\n”);
 return 0;
}

Or you could process up to key_size packets



+ struct netdev_flow_key test_keys[NETDEV_MAX_BURST];
+
+ /* Run scalar miniflow_extract to get default result. */
+ DP_PACKET_BATCH_FOR_EACH (i, packet, packets) {
+ pkt_metadata_init(>md, in_port);
+ miniflow_extract(packet, [i].mf);
+
+ /* Store known good metadata to compare with optimized metadata. */
+ good_l2_5_ofs[i] = packet->l2_5_ofs;
+ good_l3_ofs[i] = packet->l3_ofs;
+ good_l4_ofs[i] = packet->l4_ofs;
+ good_l2_pad_size[i] = packet->l2_pad_size;
+ }
+
+ /* Iterate through each version of miniflow implementations. */
+ for (int j = MFEX_IMPL_START_IDX; j < ARRAY_SIZE(mfex_impls); j++) {
+ if (!mfex_impls[j].available) {
+ continue;
+ }
+
+ /* Reset keys and offsets before each implementation. */
+ memset(test_keys, 0, keys_size * sizeof(struct netdev_flow_key));
+ DP_PACKET_BATCH_FOR_EACH (i, packet, packets) {
+ dp_packet_reset_offsets(packet);
+ }
+ /* Call optimized miniflow for each batch of packet. */
+ uint32_t hit_mask = mfex_impls[j].extract_func(packets, test_keys,
+ keys_size, in_port, pmd_handle);
+
+ /* Do a miniflow compare for bits, blocks and offsets for all the
+ * classified packets in the hitmask marked by set bits. */
+ while (hit_mask) {
+ /* Index for the set bit. */
+ uint32_t i = __builtin_ctz(hit_mask);
+ /* Set the 

Re: [ovs-dev] [PATCH ovn v3 1/3] ovn-northd: Remove lflow_add_unique.

2021-06-29 Thread Dumitru Ceara
On 6/21/21 8:51 AM, Han Zhou wrote:
> This patch removes the workaround when adding multicast group related
> lflows, because the multicast group dependency problem is fixed in
> ovn-controller in the previous commit.
> 
> This patch also removes the UniqueFlow/AnnotatedFlow usage in northd
> DDlog implementation for the same reason.
> 
> Signed-off-by: Han Zhou 
> ---

Hi Han,

The changes look good to me.

Acked-by: Dumitru Ceara 

Thanks,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v4 02/12] dpif-netdev: Add auto validation function for miniflow extract

2021-06-29 Thread Flavio Leitner


Hi,

On Tue, Jun 29, 2021 at 12:27:57PM +, Amber, Kumar wrote:
> Hi Eelco,
> 
> Thanks Again for reviews , Pls find my replies inline.
> 
> From: Eelco Chaudron 
> Sent: Tuesday, June 29, 2021 5:14 PM
> To: Van Haaren, Harry ; Amber, Kumar 
> 
> Cc: d...@openvswitch.org; i.maxim...@ovn.org; Stokes, Ian 
> ; Flavio Leitner 
> Subject: Re: [ovs-dev] [v4 02/12] dpif-netdev: Add auto validation function 
> for miniflow extract
> 
> 
> On 17 Jun 2021, at 18:27, Kumar Amber wrote:
> 
> This patch introduced the auto-validation function which
> allows users to compare the batch of packets obtained from
> different miniflow implementations against the linear
> miniflow extract and return a hitmask.
> 
> The autovaidator function can be triggered at runtime using the
> following command:
> 
> $ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
> 
> Signed-off-by: Kumar Amber 
> mailto:kumar.am...@intel.com>>
> Co-authored-by: Harry van Haaren 
> mailto:harry.van.haa...@intel.com>>
> Signed-off-by: Harry van Haaren 
> mailto:harry.van.haa...@intel.com>>
> ---
> lib/dpif-netdev-private-extract.c | 141 ++
> lib/dpif-netdev-private-extract.h | 15 
> lib/dpif-netdev.c | 2 +-
> 3 files changed, 157 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/dpif-netdev-private-extract.c 
> b/lib/dpif-netdev-private-extract.c
> index fcc56ef26..0741c19f9 100644
> --- a/lib/dpif-netdev-private-extract.c
> +++ b/lib/dpif-netdev-private-extract.c
> @@ -32,6 +32,11 @@ VLOG_DEFINE_THIS_MODULE(dpif_netdev_extract);
> 
> /* Implementations of available extract options. */
> static struct dpif_miniflow_extract_impl mfex_impls[] = {
> + {
> + .probe = NULL,
> + .extract_func = dpif_miniflow_extract_autovalidator,
> + .name = "autovalidator",
> + },
> {
> .probe = NULL,
> .extract_func = NULL,
> @@ -84,3 +89,139 @@ dpif_miniflow_extract_info_get(struct 
> dpif_miniflow_extract_impl **out_ptr)
> *out_ptr = mfex_impls;
> return ARRAY_SIZE(mfex_impls);
> }
> +
> +uint32_t
> +dpif_miniflow_extract_autovalidator(struct dp_packet_batch *packets,
> + struct netdev_flow_key *keys,
> + uint32_t keys_size, odp_port_t in_port,
> + void *pmd_handle)
> +{
> + const size_t cnt = dp_packet_batch_size(packets);
> + uint16_t good_l2_5_ofs[NETDEV_MAX_BURST];
> + uint16_t good_l3_ofs[NETDEV_MAX_BURST];
> + uint16_t good_l4_ofs[NETDEV_MAX_BURST];
> + uint16_t good_l2_pad_size[NETDEV_MAX_BURST];
> + struct dp_packet *packet;
> + struct dp_netdev_pmd_thread *pmd = pmd_handle;
> + struct dpif_miniflow_extract_impl *miniflow_funcs;
> +
> + int32_t mfunc_count = dpif_miniflow_extract_info_get(_funcs);
> + if (mfunc_count < 0) {
> 
> In theory 0 could not be returned, but just to cover the corner case can we 
> change this to include zero.
> 
> The  code has been adapted as per Flavio comments so will not be a concern.
> 
> + pmd->miniflow_extract_opt = NULL;
> 
> Guess the above needs to be atomic.
> 
> Removed based on Flavio comments.

I asked to initialize that using an API and Eelco is asking to
set it atomically. The requests are complementary, right?

> 
> + VLOG_ERR("failed to get miniflow extract function implementations\n");
> 
> Capital F to be in sync with your other error messages?
> 
> Removed based on Flavio comments.

Not sure if I got this. I mentioned that the '\n' is not needed at
the end of all VLOG_* calls. Eelco is asking to start with capital
'F'. So the requests are complementary, unless with the refactor
the message went away.

Just make sure to follow the logging style convention in OVS.

fbl



> 
> + return 0;
> + }
> + ovs_assert(keys_size >= cnt);
> 
> I don’t think we should assert here. Just return an error like above, so in 
> production, we get notified, and this implementation gets disabled.
> 
> Actually we do else one would most likely be overwriting the assigned array 
> space for keys and will hit a Seg fault at some point.
> 
> And hence we would like to know at the compile time if this is the case.
> 
> + struct netdev_flow_key test_keys[NETDEV_MAX_BURST];
> +
> + /* Run scalar miniflow_extract to get default result. */
> + DP_PACKET_BATCH_FOR_EACH (i, packet, packets) {
> + pkt_metadata_init(>md, in_port);
> + miniflow_extract(packet, [i].mf);
> +
> + /* Store known good metadata to compare with optimized metadata. */
> + good_l2_5_ofs[i] = packet->l2_5_ofs;
> + good_l3_ofs[i] = packet->l3_ofs;
> + good_l4_ofs[i] = packet->l4_ofs;
> + good_l2_pad_size[i] = packet->l2_pad_size;
> + }
> +
> + /* Iterate through each version of miniflow implementations. */
> + for (int j = MFEX_IMPL_START_IDX; j < ARRAY_SIZE(mfex_impls); j++) {
> + if (!mfex_impls[j].available) {
> + continue;
> + }
> +
> + /* Reset keys and offsets before each implementation. */
> + memset(test_keys, 0, keys_size * sizeof(struct netdev_flow_key));
> + DP_PACKET_BATCH_FOR_EACH (i, packet, packets) {
> + dp_packet_reset_offsets(packet);
> + }
> + /* Call optimized miniflow for each batch of packet. */
> + uint32_t 

Re: [ovs-dev] [PATCH v3] flow: Read recirc depth and flow api enabled once per batch in miniflow_extract

2021-06-29 Thread Ilya Maximets
On 6/29/21 7:35 AM, Eli Britstein wrote:
> 
> On 6/28/2021 6:19 PM, Balazs Nemeth wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> The call to recirc_depth_get involves accessing a TLS value. So read
>> that once, and store it on the stack for re-use while processing the
>> batch. The same goes for reading netdev_is_flow_api_enabled(), a
>> non-inlined function.
> 
> A small further suggestion:
> 
> The config other_config:hw-offload is read only once upon init, so for 
> netdev_is_flow_api_enabled(), we can have a global static variable to be set 
> only at init (dpif_netdev_set_config) by the non-inline function.
> 
> Then, we can replace all other calls with this variable.

While it's required by documentation to restart OVS after
changing the 'hw-offload' knob, technically, this could
be done in runtime.  So, I'm not sure about this solution.

> 
> Other than that, LGTM.
> 
>>
>> Signed-off-by: Balazs Nemeth 
>> Acked-by: Gaetan Rivet 
>> Acked-by: Paolo Valerio 
>> ---
>>   lib/dpif-netdev.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index c5ab35d2a..bf2112ead 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -7165,6 +7165,8 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
>>   struct dp_packet *packet;
>>   const size_t cnt = dp_packet_batch_size(packets_);
>>   uint32_t cur_min = pmd->ctx.emc_insert_min;
>> +    const uint32_t recirc_depth = *recirc_depth_get();
>> +    const bool netdev_flow_api = netdev_is_flow_api_enabled();
>>   int i;
>>   uint16_t tcp_flags;
>>   bool smc_enable_db;
>> @@ -7196,7 +7198,7 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
>>   pkt_metadata_init(>md, port_no);
>>   }
>>
>> -    if (netdev_is_flow_api_enabled() && *recirc_depth_get() == 0) {
>> +    if (netdev_flow_api && recirc_depth == 0) {
>>   if (OVS_UNLIKELY(dp_netdev_hw_flow(pmd, port_no, packet, 
>> ))) {
>>   /* Packet restoration failed and it was dropped, do not
>>    * continue processing.
>> -- 
>> 2.31.1
>>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v4 01/12] dpif-netdev: Add command line and function pointer for miniflow extract

2021-06-29 Thread Eelco Chaudron



On 29 Jun 2021, at 13:59, Amber, Kumar wrote:

> Hi Eelco,
>
> Thanks a lot for the comments and my replies are inline.
>



>>> +return;
>>> +}
>>> +
>>> +/* Add all mfex functions to reply string. */
>>> +struct ds reply = DS_EMPTY_INITIALIZER;
>>> +ds_put_cstr(, "Available Optimized Miniflow Extracts:\n");
>>> +for (uint32_t i = 0; i < count; i++) {
>>> +ds_put_format(, "  %s (available: %s)\n",
>>> +  mfex_impls[i].name, mfex_impls[i].available ?
>>> +  "True" : "False");
>>> +}
>>> +unixctl_command_reply(conn, ds_cstr());
>>> +ds_destroy();
>>
>> I think this command must output the currently configured values for all
>> data paths, or else there is no easy way to see the current setting.
>>
>
> We are planning to do a separate patch for implementing the same for DPIF,
> MFEX adnd DPCLS.
>

If you do, please do it ASAP, as I think this feature should not get in without 
being able to see in the field what the actual configuration is.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v4 07/12] test/sytem-dpdk: Add unit test for mfex autovalidator

2021-06-29 Thread Eelco Chaudron


On 28 Jun 2021, at 4:57, Flavio Leitner wrote:

> Hi,
>
>
> On Thu, Jun 17, 2021 at 09:57:49PM +0530, Kumar Amber wrote:
>> Tests:
>>   6: OVS-DPDK - MFEX Autovalidator
>>   7: OVS-DPDK - MFEX Autovalidator Fuzzy
>>
>> Added a new directory to store the PCAP file used
>> in the tests and a script to generate the fuzzy traffic
>> type pcap to be used in fuzzy unit test.
>
>
> I haven't tried this yet but am I right that these tests are
> going to pass a pcap to send traffic in a busy loop for 5
> seconds in the first case and 20 seconds in the second case?
>
> I see that when autovalidator is set OVS will crash if one
> implementation returns a different value, so I wonder why
> we need to run for that long.

I think we should remove the assert (already suggested by Harry),  so it will 
not crass by accident if someone selects autovalidator in the field (and runs 
into an issue).
Failure will then be detected by the ERROR log entries on shutdown.

I’m wondering if there is another way than a simple delay, as these tend to 
cause issues later on. Can we check packets processed or something?

> It is storing a python tool in the pcap directory. I think the
> fuzzy tool could be called 'mfex_fuzzy.py' and stay in tests/
> with other similar testing tools.
>
> Also, I don't think the test environment sets OVS_DIR. The
> 'tests/' is actually $srcdir, but I could be wrong here.
>
> BTW, scapy is not mandatory to build or test OVS, so if that
> tool is not available, the test should be skipped and not fail.
>
> Thanks,
> fbl
>
>
>>
>> Signed-off-by: Kumar Amber 
>> ---
>>  tests/automake.mk|   5 +
>>  tests/pcap/fuzzy.py  |  32 ++
>>  tests/pcap/mfex_test | Bin 0 -> 416 bytes
>>  tests/system-dpdk.at |  46 +++
>>  4 files changed, 83 insertions(+)
>>  create mode 100755 tests/pcap/fuzzy.py
>>  create mode 100644 tests/pcap/mfex_test
>>
>> diff --git a/tests/automake.mk b/tests/automake.mk
>> index 1a528aa39..532875971 100644
>> --- a/tests/automake.mk
>> +++ b/tests/automake.mk
>> @@ -142,6 +142,11 @@ $(srcdir)/tests/fuzz-regression-list.at: 
>> tests/automake.mk
>>  echo "TEST_FUZZ_REGRESSION([$$basename])"; \
>>  done > $@.tmp && mv $@.tmp $@
>>
>> +EXTRA_DIST += $(MFEX_AUTOVALIDATOR_TESTS)
>> +MFEX_AUTOVALIDATOR_TESTS = \
>> +tests/pcap/mfex_test \
>> +tests/pcap/fuzzy.py
>> +
>>  OVSDB_CLUSTER_TESTSUITE_AT = \
>>  tests/ovsdb-cluster-testsuite.at \
>>  tests/ovsdb-execution.at \
>> diff --git a/tests/pcap/fuzzy.py b/tests/pcap/fuzzy.py
>> new file mode 100755
>> index 0..a8051ba2b
>> --- /dev/null
>> +++ b/tests/pcap/fuzzy.py
>> @@ -0,0 +1,32 @@
>> +#!/usr/bin/python3
>> +try:
>> +   from scapy.all import *
>> +except ModuleNotFoundError as err:
>> +   print(err + ": Scapy")
>> +import sys
>> +import os
>> +
>> +path = os.environ['OVS_DIR'] + "/tests/pcap/fuzzy"
>> +pktdump = PcapWriter(path, append=False, sync=True)
>> +
>> +for i in range(0, 2000):
>> +
>> +   # Generate random protocol bases, use a fuzz() over the combined packet 
>> for full fuzzing.
>> +   eth = Ether(src=RandMAC(), dst=RandMAC())
>> +   vlan = Dot1Q()
>> +   ipv4 = IP(src=RandIP(), dst=RandIP())
>> +   ipv6 = IPv6(src=RandIP6(), dst=RandIP6())
>> +   udp = UDP()
>> +   tcp = TCP()
>> +
>> +   # IPv4 packets with fuzzing
>> +   pktdump.write(fuzz(eth/ipv4/udp))
>> +   pktdump.write(fuzz(eth/ipv4/tcp))
>> +   pktdump.write(fuzz(eth/vlan/ipv4/udp))
>> +   pktdump.write(fuzz(eth/vlan/ipv4/tcp))
>> +
>> +# IPv6 packets with fuzzing
>> +   pktdump.write(fuzz(eth/ipv6/udp))
>> +   pktdump.write(fuzz(eth/ipv6/tcp))
>> +   pktdump.write(fuzz(eth/vlan/ipv6/udp))
>> +   pktdump.write(fuzz(eth/vlan/ipv6/tcp))
>> \ No newline at end of file
>> diff --git a/tests/pcap/mfex_test b/tests/pcap/mfex_test
>> new file mode 100644
>> index 
>> ..1aac67b8d643ecb016c758cba4cc32212a80f52a
>> GIT binary patch
>> literal 416
>> zcmca|c+)~A1{MYw`2U}Qff2}QK`M68ITRa|G@yFii5$Gfk6YL%z>@uY&}o|
>> z2s4N<1VH2&7y^V87$)XGOtD~MV$cFgfG~zBGGJ2#YtF$> xK>KST_NTIwYriok6N4Vm)gX-Q@c^{cp<7_5LgK^UuU{2>VS0RZ!RQ+EIW
>>
>> literal 0
>> HcmV?d1
>>
>> diff --git a/tests/system-dpdk.at b/tests/system-dpdk.at
>> index 802895488..46eaea35a 100644
>> --- a/tests/system-dpdk.at
>> +++ b/tests/system-dpdk.at
>> @@ -232,3 +232,49 @@ OVS_VSWITCHD_STOP(["\@does not exist. The Open vSwitch 
>> kernel module is probably
>>  \@EAL: No free hugepages reported in hugepages-1048576kB@d"])
>>  AT_CLEANUP
>>  dnl 
>> --
>> +
>> +dnl 
>> --
>> +dnl Add standard DPDK PHY port
>> +AT_SETUP([OVS-DPDK - MFEX Autovalidator])
>> +AT_KEYWORDS([dpdk])
>> +
>> +OVS_DPDK_START()
>> +
>> +dnl Add userspace bridge and attach it to OVS
>> +AT_CHECK([ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev])
>> 

[ovs-dev] "Why perthread->mutex is needed?"

2021-06-29 Thread 贺鹏
Hi,Ben,

I am investigating the OVS RCU code, and feel confused about the
perthread->mutex, what's the usage of this mutex? it seems in the code
there are only codes that inits and destroys the mutex, but there is
no code that locks and unlocks it.

Thanks.

-- 
hepeng
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v4 10/12] dpif-netdev/mfex: Add AVX512 based optimized miniflow extract

2021-06-29 Thread Flavio Leitner


Hi,

On Thu, Jun 17, 2021 at 09:57:52PM +0530, Kumar Amber wrote:
> From: Harry van Haaren 
> 
> This commit adds AVX512 implementations of miniflow extract.
> By using the 64 bytes available in an AVX512 register, it is
> possible to convert a packet to a miniflow data-structure in
> a small quantity instructions.
> 
> The implementation here probes for Ether()/IP()/UDP() traffic,
> and builds the appropriate miniflow data-structure for packets
> that match the probe.
> 
> The implementation here is auto-validated by the miniflow
> extract autovalidator, hence its correctness can be easily
> tested and verified.
> 
> Note that this commit is designed to easily allow addition of new
> traffic profiles in a scalable way, without code duplication for
> each traffic profile.
> 
> Signed-off-by: Harry van Haaren 
> ---
>  lib/automake.mk   |   1 +
>  lib/dpif-netdev-extract-avx512.c  | 416 ++
>  lib/dpif-netdev-private-extract.c |  15 ++
>  lib/dpif-netdev-private-extract.h |  19 ++
>  4 files changed, 451 insertions(+)
>  create mode 100644 lib/dpif-netdev-extract-avx512.c
> 
> diff --git a/lib/automake.mk b/lib/automake.mk
> index 3080bb04a..2b95d6f92 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -39,6 +39,7 @@ lib_libopenvswitchavx512_la_CFLAGS = \
>   $(AM_CFLAGS)
>  lib_libopenvswitchavx512_la_SOURCES = \
>   lib/dpif-netdev-lookup-avx512-gather.c \
> + lib/dpif-netdev-extract-avx512.c \
>   lib/dpif-netdev-avx512.c
>  lib_libopenvswitchavx512_la_LDFLAGS = \
>   -static
> diff --git a/lib/dpif-netdev-extract-avx512.c 
> b/lib/dpif-netdev-extract-avx512.c
> new file mode 100644
> index 0..1145ac8a9
> --- /dev/null
> +++ b/lib/dpif-netdev-extract-avx512.c
> @@ -0,0 +1,416 @@
> +/*
> + * Copyright (c) 2021 Intel.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */


Since this is very specific to AVX512, can we have a more verbose
comment here explaining how it works? See the 'dpif, the DataPath
InterFace.' in dpif.h as an example.


> +
> +#ifdef __x86_64__
> +/* Sparse cannot handle the AVX512 instructions. */
> +#if !defined(__CHECKER__)
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "flow.h"
> +#include "dpdk.h"
> +
> +#include "dpif-netdev-private-dpcls.h"
> +#include "dpif-netdev-private-extract.h"
> +#include "dpif-netdev-private-flow.h"
> +
> +/* AVX512-BW level permutex2var_epi8 emulation. */
> +static inline __m512i
> +__attribute__((target("avx512bw")))
> +_mm512_maskz_permutex2var_epi8_skx(__mmask64 k_mask,
> +   __m512i v_data_0,
> +   __m512i v_shuf_idxs,
> +   __m512i v_data_1)
> +{
> +/* Manipulate shuffle indexes for u16 size. */
> +__mmask64 k_mask_odd_lanes = 0x;
> +/* clear away ODD lane bytes. Cannot be done above due to no u8 shift */
> +__m512i v_shuf_idx_evn = _mm512_mask_blend_epi8(k_mask_odd_lanes,
> +v_shuf_idxs, _mm512_setzero_si512());
> +v_shuf_idx_evn = _mm512_srli_epi16(v_shuf_idx_evn, 1);
> +
> +__m512i v_shuf_idx_odd = _mm512_srli_epi16(v_shuf_idxs, 9);
> +
> +/* Shuffle each half at 16-bit width */
> +__m512i v_shuf1 = _mm512_permutex2var_epi16(v_data_0, v_shuf_idx_evn,
> +v_data_1);
> +__m512i v_shuf2 = _mm512_permutex2var_epi16(v_data_0, v_shuf_idx_odd,
> +v_data_1);
> +
> +/* Find if the shuffle index was odd, via mask and compare */
> +uint16_t index_odd_mask = 0x1;
> +const __m512i v_index_mask_u16 = _mm512_set1_epi16(index_odd_mask);
> +
> +/* EVEN lanes, find if u8 index was odd,  result as u16 bitmask */
> +__m512i v_idx_even_masked = _mm512_and_si512(v_shuf_idxs,
> + v_index_mask_u16);
> +__mmask32 evn_rotate_mask = _mm512_cmpeq_epi16_mask(v_idx_even_masked,
> +v_index_mask_u16);
> +
> +/* ODD lanes, find if u8 index was odd, result as u16 bitmask */
> +__m512i v_shuf_idx_srli8 = _mm512_srli_epi16(v_shuf_idxs, 8);
> +__m512i v_idx_odd_masked = _mm512_and_si512(v_shuf_idx_srli8,
> +v_index_mask_u16);
> +__mmask32 odd_rotate_mask = 

Re: [ovs-dev] [External] : Re: [PATCH ovn v5] ovn-northd.c: Add proxy ARP support to OVN

2021-06-29 Thread Brendan Doyle




On 29/06/2021 13:24, Numan Siddique wrote:

On Tue, Jun 29, 2021 at 7:48 AM Brendan Doyle  wrote:

Numan,

Did this version apply ? I'm guessing not. This was generated with git
mail. But I don't see
an entry in 
https://urldefense.com/v3/__https://patchwork.ozlabs.org/project/ovn/list/__;!!ACWV5N9M2RV99hQ!crHDbcylO2xdE8lL5OBvisqTciHbBxY2Viml-p_H_HuEtd6-KXvvdib_NWgdqCHWi3M$
for it. Please let me know if this has issue, if so I'll try generate a PR.


No.  This didn't apply either.   Since the patch was straightforward,
I just applied
the diff manually and applied the patch to the main branch.


Great thanks.

I'm not
sure how you
generated the patch.  I presume using git-format-patch.


Yes.

You can refer this if you haven't already -
https://urldefense.com/v3/__https://github.com/ovn-org/ovn/blob/master/Documentation/internals/contributing/submitting-patches.rst__;!!ACWV5N9M2RV99hQ!crHDbcylO2xdE8lL5OBvisqTciHbBxY2Viml-p_H_HuEtd6-KXvvdib_NWgdUiKKgR0$


Yes, I have, and as far as I know I followed all the instructions. Even 
sent the patch via git mail to a colleague

who was able to apply the patch to his fresh clone of master.



I did a few changes in the code and in the test before applying.


Ok, I'll have a look - thanks

The commit is missing the ddlog part unfortunately.  I tried to add
it, but I probably need some
help from Ben.

Sorry, just don't know anything about dlog

The added test case fails for ddlog now.

Thanks
Numan


Thanks

Brendan


On 28/06/2021 12:16, Brendan Doyle wrote:

This patch provides the ability to configure proxy ARP IPs on a Logical
Switch Router port. The IPs are added as Options for router ports. This
provides a useful feature where traffic for a service must be sent to an
address in a logical network address space, but the service is provided
in a different network. For example an NFS service is provide to Logical
networks at an address in their Logical network space, but the NFS
server resides in a physical network. A Logical switch Router port can
be configured to respond to ARP requests sent to the service "Logical
address", the Logical Router/Gateway can then be configured to forward
the traffic to the underlay/physical network.

Signed-off-by: Brendan Doyle 
---
   northd/ovn-northd.c |  48 
   ovn-nb.xml  |   9 +
   tests/ovn.at| 103 

   3 files changed, 160 insertions(+)

diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index fcd6167..258b5db 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -6969,6 +6969,8 @@ build_lswitch_arp_nd_responder_known_ips(struct ovn_port 
*op,
struct ds *match)
   {
   if (op->nbsp) {
+const char *arp_proxy;
+
   if (!strcmp(op->nbsp->type, "virtual")) {
   /* Handle
*  - GARPs for virtual ip which belongs to a logical port
@@ -7126,6 +7128,52 @@ build_lswitch_arp_nd_responder_known_ips(struct ovn_port 
*op,
   }
   }
   }
+
+/*
+ * Add responses for ARP proxies.
+ */
+arp_proxy = smap_get(>nbsp->options,"arp_proxy");
+
+if (arp_proxy && op->peer) {
+struct lport_addresses proxy_arp_addrs;
+int i = 0;
+
+if (extract_ip_addresses(arp_proxy, _arp_addrs)) {
+/*
+ * Match rule on all proxy ARP IPs.
+ */
+ds_clear(match);
+ds_put_cstr(match, "arp.op == 1 && (");
+
+for (i = 0; i < proxy_arp_addrs.n_ipv4_addrs; i++) {
+if (i > 0) {
+ds_put_cstr(match, " || ");
+}
+ds_put_format(match, "arp.tpa == %s",
+proxy_arp_addrs.ipv4_addrs[i].addr_s);
+}
+
+ds_put_cstr(match, ")");
+destroy_lport_addresses(_arp_addrs);
+
+ds_clear(actions);
+ds_put_format(actions,
+"eth.dst = eth.src; "
+"eth.src = %s; "
+"arp.op = 2; /* ARP reply */ "
+"arp.tha = arp.sha; "
+"arp.sha = %s; "
+"arp.tpa <-> arp.spa; "
+"outport = inport; "
+"flags.loopback = 1; "
+"output;",
+op->peer->lrp_networks.ea_s,
+op->peer->lrp_networks.ea_s);
+
+ovn_lflow_add_with_hint(lflows, op->od, S_SWITCH_IN_ARP_ND_RSP,
+50, ds_cstr(match), ds_cstr(actions), >nbsp->header_);
+}
+}
   }
   }

diff --git a/ovn-nb.xml b/ovn-nb.xml
index 406bc85..077a2d8 100644
--- a/ovn-nb.xml
+++ b/ovn-nb.xml
@@ -848,6 +848,15 @@
   
 
   
+
+
+  Optional. A list IPv4 addresses that this

[ovs-dev] [PATCH v10] ofproto-dpif: APIs and CLI option to add/delete static fdb entry

2021-06-29 Thread Vasu Dasari
Currently there is an option to add/flush/show ARP/ND neighbor. This covers L3
side.  For L2 side, there is only fdb show command. This patch gives an option
to add/del an fdb entry via ovs-appctl.

CLI command looks like:

To add:
ovs-appctl fdb/add
ovs-appctl fdb/add br0 p1 0 50:54:00:00:00:05

To del:
ovs-appctl fdb/del   
ovs-appctl fdb/del br0 0 50:54:00:00:00:05

Added two new APIs to provide convenient interface to add and delete 
static-macs.
bool xlate_add_static_mac_entry(const struct ofproto_dpif *, ofp_port_t in_port,
   struct eth_addr dl_src, int vlan);
bool xlate_delete_static_mac_entry(const struct ofproto_dpif *,
  struct eth_addr dl_src, int vlan);

1. Static entry should not age. To indicate that entry being programmed is a 
static entry,
   'expires' field in 'struct mac_entry' will be set to a 
MAC_ENTRY_AGE_STATIC_ENTRY. A
   check for this value is made while deleting mac entry as part of regular 
aging process.
2. Another change to of mac-update logic, when a packet with same dl_src as 
that of a
   static-mac entry arrives on any port, the logic will not modify the expires 
field.
3. While flushing fdb entries, made sure static ones are not evicted.
4. Updated "ovs-appctl fdb/stats-show br0" to display numberof static entries 
in switch

Added following tests:
  ofproto-dpif - static-mac add/del/flush
  ofproto-dpif - static-mac mac moves

Signed-off-by: Vasu Dasari 
Reported-at: 
https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048894.html
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1597752
Tested-by: Eelco Chaudron 
Acked-by: Eelco Chaudron 
---
v1:
 - Fixed 0-day robot warnings
v2:
 - Fix valgrind error in the modified code in mac_learning_insert() where a 
read is
   is performed on e->expires which is not initialized
v3:
 - Addressed code review comments
 - Added more documentation
 - Fixed mac_entry_age() and is_mac_learning_update_needed() to have common
   understanding of return values when mac_entry is a static one.
 - Added NEWS item
v4:
 - Addressed code review comments
 - Static entries will not be purged when fdb/flush is performed.
 - Static entries will not be overwritten when packet with same dl_src arrives 
on
   any port of switch
 - Provided bit more detail while doing fdb/add, to indicate if static-mac is
   overriding already present entry
 - Separated test cases for a bit more clarity
 v5:
 - Addressed code review comments
 - Added new total_static counter to count number of static entries.
 - Removed mac_entry_set_idle_time()
 - Added mac_learning_add_static_entry() and mac_learning_del_static_entry()
 - Modified APIs xlate_add_static_mac_entry() and 
xlate_delete_static_mac_entry()
   return 0 on success else a failure code
 v6:
 - Fixed a probable bug with Eelco's code review comments in
   is_mac_learning_update_needed()
 v7:
 - Added a ovs-vswitchd.8 man page entry for fdb add/del commands
 v8:
 - Updaed with code review comments from Eelco.
 - Renamed total_static to static_entries
 - Added coverage counter mac_learning_static_none_move
 - Fixed a possible bug with static_entries getting cleared via
   fdb/stats-clear command
 - Initialize static_entries in mac_learning_create()
 - Modified fdb/del command by removing option to specify port-name
 - Breakup ofproto_unixctl_fdb_update into ofproto_unixctl_fdb_add
   and ofproto_unixctl_fdb_delete
 - Updated test "static-mac add/del/flush" to have interleaved mac
   entries before fdb/flush
 - Updated test "static-mac mac move" to check for newly added
   coverage counter mac_learning_static_none_move
v9:
 - Updated source code comments and addressed code review comments
 v10:
 - Simplified error code paths in ofproto_unixctl_fdb_{add,delete}
   functions
---
 NEWS |   4 +
 lib/mac-learning.c   | 155 +++
 lib/mac-learning.h   |  17 
 ofproto/ofproto-dpif-xlate.c |  48 +--
 ofproto/ofproto-dpif-xlate.h |   5 ++
 ofproto/ofproto-dpif.c   | 111 -
 tests/ofproto-dpif.at|  99 ++
 vswitchd/ovs-vswitchd.8.in   |   5 ++
 8 files changed, 415 insertions(+), 29 deletions(-)

diff --git a/NEWS b/NEWS
index f02f07cdf..909e88c6d 100644
--- a/NEWS
+++ b/NEWS
@@ -25,6 +25,10 @@ Post-v2.15.0
- ovsdb-tool:
  * New option '--election-timer' to the 'create-cluster' command to set the
leader election timer during cluster creation.
+   - ovs-appctl:
+ * Added ability to add and delete static mac entries using:
+   'ovs-appctl fdb/add'
+   'ovs-appctl fdb/del   '
 
 
 v2.15.0 - 15 Feb 2021
diff --git a/lib/mac-learning.c b/lib/mac-learning.c
index 3d5293d3b..dd3f46a8b 100644
--- a/lib/mac-learning.c
+++ b/lib/mac-learning.c
@@ -34,13 +34,25 @@ COVERAGE_DEFINE(mac_learning_learned);
 COVERAGE_DEFINE(mac_learning_expired);
 

Re: [ovs-dev] [v4 06/12] dpif-netdev: Add additional packet count parameter for study function

2021-06-29 Thread Eelco Chaudron
See some additional comments below.

//Eelco


On 17 Jun 2021, at 18:27, Kumar Amber wrote:

> This commit introduces additonal command line paramter
> for mfex study function. If user provides additional packet out
> it is used in study to compare minimum packets which must be processed
> else a default value is choosen.
>
> $ OVS_DIR/utilities/ovs-appctl dpif-netdev/miniflow-parser-set study 500
>
> Signed-off-by: Kumar Amber 
> ---
>  Documentation/topics/dpdk/bridge.rst |  8 ++-
>  lib/dpif-netdev-extract-study.c  | 15 +++-
>  lib/dpif-netdev-private-extract.h|  8 +++
>  lib/dpif-netdev.c| 34 +++-
>  4 files changed, 57 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/topics/dpdk/bridge.rst 
> b/Documentation/topics/dpdk/bridge.rst
> index 1c78adc75..e7e91289a 100644
> --- a/Documentation/topics/dpdk/bridge.rst
> +++ b/Documentation/topics/dpdk/bridge.rst
> @@ -288,7 +288,13 @@ An implementation can be selected manually by the 
> following command ::
>  Also user can select the study implementation which studies the traffic for
>  a specific number of packets by applying all availbale implementaions of
>  miniflow extract and than chooses the one with most optimal result for that
> -traffic pattern.
> +traffic pattern. User can also provide additonal parameter as packet count
> +which is minimum packets which OVS must study before choosing optimal
> +implementation, If no packet count is provided than default value is choosen.
> +

Should we mention the default value?


Also, thinking about configuring the study option, as there is no 
synchronization point between threads, do we need to mention that one PMD 
thread might still be running a previous round, and can now decide on earlier 
data?

Let say you do:

  ovs-appctl dpif-netdev/miniflow-parser-set study 3

3 busy threads are done, and a 4th is still busy as it has only done 1 
packets. Now you do:

  ovs-appctl dpif-netdev/miniflow-parser-set study 1

And one thread will be done instantly, will the other might take a while…


> +Study can be selected with packet count by the following command ::
> +
> +$ ovs-appctl dpif-netdev/miniflow-parser-set study 1024
>
>  Miniflow Extract Validation
>  ~~~
> diff --git a/lib/dpif-netdev-extract-study.c b/lib/dpif-netdev-extract-study.c
> index d063d040c..c48fb125e 100644
> --- a/lib/dpif-netdev-extract-study.c
> +++ b/lib/dpif-netdev-extract-study.c
> @@ -55,6 +55,19 @@ get_study_stats(void)
>  return stats;
>  }
>
> +static uint32_t pkt_compare_count = 0;
> +
> +uint32_t mfex_set_study_pkt_cnt(uint32_t pkt_cmp_count,
> +struct dpif_miniflow_extract_impl *opt)
> +{
> +if ((opt->extract_func == mfex_study_traffic) && (pkt_cmp_count != 0)) {
> +pkt_compare_count = pkt_cmp_count;
> +return 0;
> +}
> +pkt_compare_count = MFEX_MAX_COUNT;
> +return -EINVAL;
> +}
> +
>  uint32_t
>  mfex_study_traffic(struct dp_packet_batch *packets,
> struct netdev_flow_key *keys,
> @@ -87,7 +100,7 @@ mfex_study_traffic(struct dp_packet_batch *packets,
>
>  /* Choose the best implementation after a minimum packets have been
>   * processed. */
> -if (stats->pkt_count >= MFEX_MAX_COUNT) {
> +if (stats->pkt_count >= pkt_compare_count) {
>  uint32_t best_func_index = MFEX_IMPL_START_IDX;
>  uint32_t max_hits = 0;
>  for (int i = MFEX_IMPL_START_IDX; i < impl_count; i++) {
> diff --git a/lib/dpif-netdev-private-extract.h 
> b/lib/dpif-netdev-private-extract.h
> index d8a284db7..0ec74bef9 100644
> --- a/lib/dpif-netdev-private-extract.h
> +++ b/lib/dpif-netdev-private-extract.h
> @@ -127,5 +127,13 @@ dpif_miniflow_extract_get_default(void);
>   * overridden at runtime. */
>  void
>  dpif_miniflow_extract_set_default(miniflow_extract_func func);
> +/* Sets the packet count from user to the stats for use in
> + * study function to match against the classified packets to choose
> + * the optimal implementation.
> + * On error, returns EINVAL.
> + * On success, returns 0.
> + */
> +uint32_t mfex_set_study_pkt_cnt(uint32_t pkt_cmp_count,
> +struct dpif_miniflow_extract_impl *opt);
>
>  #endif /* DPIF_NETDEV_AVX512_EXTRACT */
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 716e0debf..35c927d55 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -1141,14 +1141,29 @@ dpif_miniflow_extract_impl_set(struct unixctl_conn 
> *conn, int argc,
>  return;
>  }
>  new_func = opt->extract_func;
> -/* argv[2] is optional datapath instance. If no datapath name is 
> provided.
> +
> +/* argv[2] is optional packet count, which user can provide along with
> + * study function to set the minimum packet that must be matched in order
> + * to choose the optimal function. */
> +uint32_t pkt_cmp_count = 0;
> +uint32_t 

Re: [ovs-dev] [v4 03/12] dpif-netdev: Add study function to select the best mfex function

2021-06-29 Thread Eelco Chaudron
More comments below. FYI I’m only reviewing right now, no testing.

//Eelco


On 17 Jun 2021, at 18:27, Kumar Amber wrote:

> The study function runs all the available implementations
> of miniflow_extract and makes a choice whose hitmask has
> maximum hits and sets the mfex to that function.
>
> Study can be run at runtime using the following command:
>
> $ ovs-appctl dpif-netdev/miniflow-parser-set study
>
> Signed-off-by: Kumar Amber 
> Co-authored-by: Harry van Haaren 
> Signed-off-by: Harry van Haaren 
> ---
>  lib/automake.mk   |   1 +
>  lib/dpif-netdev-extract-study.c   | 119 ++
>  lib/dpif-netdev-private-extract.c |   5 ++
>  lib/dpif-netdev-private-extract.h |  14 +++-
>  4 files changed, 138 insertions(+), 1 deletion(-)
>  create mode 100644 lib/dpif-netdev-extract-study.c
>
> diff --git a/lib/automake.mk b/lib/automake.mk
> index 6657b9ae5..3080bb04a 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -114,6 +114,7 @@ lib_libopenvswitch_la_SOURCES = \
>   lib/dpif-netdev.c \
>   lib/dpif-netdev.h \
>   lib/dpif-netdev-private-dfc.c \
> + lib/dpif-netdev-extract-study.c \
>   lib/dpif-netdev-private-dfc.h \
>   lib/dpif-netdev-private-dpcls.h \
>   lib/dpif-netdev-private-dpif.c \
> diff --git a/lib/dpif-netdev-extract-study.c b/lib/dpif-netdev-extract-study.c
> new file mode 100644
> index 0..d063d040c
> --- /dev/null
> +++ b/lib/dpif-netdev-extract-study.c
> @@ -0,0 +1,119 @@
> +/*
> + * Copyright (c) 2021 Intel.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "dpif-netdev-private-extract.h"
> +#include "dpif-netdev-private-thread.h"
> +#include "openvswitch/vlog.h"
> +#include "ovs-thread.h"
> +
> +VLOG_DEFINE_THIS_MODULE(dpif_mfex_extract_study);
> +
> +/* Max size of packets to be compared. */
> +#define MFEX_MAX_COUNT (128)
> +
> +/* This value is the threshold for the amount of packets that
> + * must hit on the optimized miniflow extract before it will be
> + * accepted and used in the datapath after the study phase. */
> +#define MFEX_MIN_HIT_COUNT_FOR_USE (MFEX_MAX_COUNT / 2)
> +
> +/* Struct to hold miniflow study stats. */
> +struct study_stats {
> +uint32_t pkt_count;
> +uint32_t impl_hitcount[MFEX_IMPLS_MAX_SIZE];
> +};
> +
> +/* Define per thread data to hold the study stats. */
> +DEFINE_PER_THREAD_MALLOCED_DATA(struct study_stats *, study_stats);
> +
> +/* Allocate per thread PMD pointer space for study_stats. */
> +static inline struct study_stats *
> +get_study_stats(void)
> +{
> +struct study_stats *stats = study_stats_get();
> +if (OVS_UNLIKELY(!stats)) {
> +   stats = xzalloc(sizeof *stats);
> +   study_stats_set_unsafe(stats);
> +}
> +return stats;
> +}
> +

Just got a mind-meld with the code, and realized that the function might be 
different per PMD thread due to this auto mode (and autovalidator mode in the 
previous patch).

This makes it only stronger that we need a way to see the currently selected 
mode, and not per datapath, but per PMD per datapath!

Do we also need a way to set this per PMD?

> +uint32_t
> +mfex_study_traffic(struct dp_packet_batch *packets,
> +   struct netdev_flow_key *keys,
> +   uint32_t keys_size, odp_port_t in_port,
> +   void *pmd_handle)
> +{
> +uint32_t hitmask = 0;
> +uint32_t mask = 0;
> +struct dp_netdev_pmd_thread *pmd = pmd_handle;
> +struct dpif_miniflow_extract_impl *miniflow_funcs;
> +uint32_t impl_count = dpif_miniflow_extract_info_get(_funcs);
> +struct study_stats *stats = get_study_stats();
> +
> +/* Run traffic optimized miniflow_extract to collect the hitmask
> + * to be compared after certain packets have been hit to choose
> + * the best miniflow_extract version for that traffic. */
> +for (int i = MFEX_IMPL_START_IDX; i < impl_count; i++) {
> +if (miniflow_funcs[i].available) {
> +hitmask = miniflow_funcs[i].extract_func(packets, keys, 
> keys_size,
> + in_port, pmd_handle);
> +stats->impl_hitcount[i] += count_1bits(hitmask);
> +
> +/* If traffic is not classified than we dont overwrite the keys
> + * array in minfiflow implementations so its safe to create a
> + * mask for all those 

Re: [ovs-dev] [v4 02/12] dpif-netdev: Add auto validation function for miniflow extract

2021-06-29 Thread Amber, Kumar
Hi Eelco,

Thanks Again for reviews , Pls find my replies inline.

From: Eelco Chaudron 
Sent: Tuesday, June 29, 2021 5:14 PM
To: Van Haaren, Harry ; Amber, Kumar 

Cc: d...@openvswitch.org; i.maxim...@ovn.org; Stokes, Ian 
; Flavio Leitner 
Subject: Re: [ovs-dev] [v4 02/12] dpif-netdev: Add auto validation function for 
miniflow extract


On 17 Jun 2021, at 18:27, Kumar Amber wrote:

This patch introduced the auto-validation function which
allows users to compare the batch of packets obtained from
different miniflow implementations against the linear
miniflow extract and return a hitmask.

The autovaidator function can be triggered at runtime using the
following command:

$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator

Signed-off-by: Kumar Amber mailto:kumar.am...@intel.com>>
Co-authored-by: Harry van Haaren 
mailto:harry.van.haa...@intel.com>>
Signed-off-by: Harry van Haaren 
mailto:harry.van.haa...@intel.com>>
---
lib/dpif-netdev-private-extract.c | 141 ++
lib/dpif-netdev-private-extract.h | 15 
lib/dpif-netdev.c | 2 +-
3 files changed, 157 insertions(+), 1 deletion(-)

diff --git a/lib/dpif-netdev-private-extract.c 
b/lib/dpif-netdev-private-extract.c
index fcc56ef26..0741c19f9 100644
--- a/lib/dpif-netdev-private-extract.c
+++ b/lib/dpif-netdev-private-extract.c
@@ -32,6 +32,11 @@ VLOG_DEFINE_THIS_MODULE(dpif_netdev_extract);

/* Implementations of available extract options. */
static struct dpif_miniflow_extract_impl mfex_impls[] = {
+ {
+ .probe = NULL,
+ .extract_func = dpif_miniflow_extract_autovalidator,
+ .name = "autovalidator",
+ },
{
.probe = NULL,
.extract_func = NULL,
@@ -84,3 +89,139 @@ dpif_miniflow_extract_info_get(struct 
dpif_miniflow_extract_impl **out_ptr)
*out_ptr = mfex_impls;
return ARRAY_SIZE(mfex_impls);
}
+
+uint32_t
+dpif_miniflow_extract_autovalidator(struct dp_packet_batch *packets,
+ struct netdev_flow_key *keys,
+ uint32_t keys_size, odp_port_t in_port,
+ void *pmd_handle)
+{
+ const size_t cnt = dp_packet_batch_size(packets);
+ uint16_t good_l2_5_ofs[NETDEV_MAX_BURST];
+ uint16_t good_l3_ofs[NETDEV_MAX_BURST];
+ uint16_t good_l4_ofs[NETDEV_MAX_BURST];
+ uint16_t good_l2_pad_size[NETDEV_MAX_BURST];
+ struct dp_packet *packet;
+ struct dp_netdev_pmd_thread *pmd = pmd_handle;
+ struct dpif_miniflow_extract_impl *miniflow_funcs;
+
+ int32_t mfunc_count = dpif_miniflow_extract_info_get(_funcs);
+ if (mfunc_count < 0) {

In theory 0 could not be returned, but just to cover the corner case can we 
change this to include zero.

The  code has been adapted as per Flavio comments so will not be a concern.

+ pmd->miniflow_extract_opt = NULL;

Guess the above needs to be atomic.

Removed based on Flavio comments.

+ VLOG_ERR("failed to get miniflow extract function implementations\n");

Capital F to be in sync with your other error messages?

Removed based on Flavio comments.

+ return 0;
+ }
+ ovs_assert(keys_size >= cnt);

I don’t think we should assert here. Just return an error like above, so in 
production, we get notified, and this implementation gets disabled.

Actually we do else one would most likely be overwriting the assigned array 
space for keys and will hit a Seg fault at some point.

And hence we would like to know at the compile time if this is the case.

+ struct netdev_flow_key test_keys[NETDEV_MAX_BURST];
+
+ /* Run scalar miniflow_extract to get default result. */
+ DP_PACKET_BATCH_FOR_EACH (i, packet, packets) {
+ pkt_metadata_init(>md, in_port);
+ miniflow_extract(packet, [i].mf);
+
+ /* Store known good metadata to compare with optimized metadata. */
+ good_l2_5_ofs[i] = packet->l2_5_ofs;
+ good_l3_ofs[i] = packet->l3_ofs;
+ good_l4_ofs[i] = packet->l4_ofs;
+ good_l2_pad_size[i] = packet->l2_pad_size;
+ }
+
+ /* Iterate through each version of miniflow implementations. */
+ for (int j = MFEX_IMPL_START_IDX; j < ARRAY_SIZE(mfex_impls); j++) {
+ if (!mfex_impls[j].available) {
+ continue;
+ }
+
+ /* Reset keys and offsets before each implementation. */
+ memset(test_keys, 0, keys_size * sizeof(struct netdev_flow_key));
+ DP_PACKET_BATCH_FOR_EACH (i, packet, packets) {
+ dp_packet_reset_offsets(packet);
+ }
+ /* Call optimized miniflow for each batch of packet. */
+ uint32_t hit_mask = mfex_impls[j].extract_func(packets, test_keys,
+ keys_size, in_port, pmd_handle);
+
+ /* Do a miniflow compare for bits, blocks and offsets for all the
+ * classified packets in the hitmask marked by set bits. */
+ while (hit_mask) {
+ /* Index for the set bit. */
+ uint32_t i = __builtin_ctz(hit_mask);
+ /* Set the index in hitmask to Zero. */
+ hit_mask &= (hit_mask - 1);
+
+ uint32_t failed = 0;
+
+ /* Check miniflow bits are equal. */
+ if ((keys[i].mf.map.bits[0] != test_keys[i].mf.map.bits[0]) ||
+ (keys[i].mf.map.bits[1] != test_keys[i].mf.map.bits[1])) {
+ VLOG_ERR("Good 0x%llx 0x%llx\tTest 0x%llx 0x%llx\n",
+ keys[i].mf.map.bits[0], keys[i].mf.map.bits[1],
+ test_keys[i].mf.map.bits[0],
+ test_keys[i].mf.map.bits[1]);
+ failed = 

Re: [ovs-dev] [PATCH ovn v5] ovn-northd.c: Add proxy ARP support to OVN

2021-06-29 Thread Numan Siddique
On Tue, Jun 29, 2021 at 7:48 AM Brendan Doyle  wrote:
>
> Numan,
>
> Did this version apply ? I'm guessing not. This was generated with git
> mail. But I don't see
> an entry in https://patchwork.ozlabs.org/project/ovn/list/
> for it. Please let me know if this has issue, if so I'll try generate a PR.


No.  This didn't apply either.   Since the patch was straightforward,
I just applied
the diff manually and applied the patch to the main branch.  I'm not
sure how you
generated the patch.  I presume using git-format-patch.

You can refer this if you haven't already -
https://github.com/ovn-org/ovn/blob/master/Documentation/internals/contributing/submitting-patches.rst

I did a few changes in the code and in the test before applying.

The commit is missing the ddlog part unfortunately.  I tried to add
it, but I probably need some
help from Ben.

The added test case fails for ddlog now.

Thanks
Numan

>
> Thanks
>
> Brendan
>
>
> On 28/06/2021 12:16, Brendan Doyle wrote:
> > This patch provides the ability to configure proxy ARP IPs on a Logical
> > Switch Router port. The IPs are added as Options for router ports. This
> > provides a useful feature where traffic for a service must be sent to an
> > address in a logical network address space, but the service is provided
> > in a different network. For example an NFS service is provide to Logical
> > networks at an address in their Logical network space, but the NFS
> > server resides in a physical network. A Logical switch Router port can
> > be configured to respond to ARP requests sent to the service "Logical
> > address", the Logical Router/Gateway can then be configured to forward
> > the traffic to the underlay/physical network.
> >
> > Signed-off-by: Brendan Doyle 
> > ---
> >   northd/ovn-northd.c |  48 
> >   ovn-nb.xml  |   9 +
> >   tests/ovn.at| 103 
> > 
> >   3 files changed, 160 insertions(+)
> >
> > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > index fcd6167..258b5db 100644
> > --- a/northd/ovn-northd.c
> > +++ b/northd/ovn-northd.c
> > @@ -6969,6 +6969,8 @@ build_lswitch_arp_nd_responder_known_ips(struct 
> > ovn_port *op,
> >struct ds *match)
> >   {
> >   if (op->nbsp) {
> > +const char *arp_proxy;
> > +
> >   if (!strcmp(op->nbsp->type, "virtual")) {
> >   /* Handle
> >*  - GARPs for virtual ip which belongs to a logical port
> > @@ -7126,6 +7128,52 @@ build_lswitch_arp_nd_responder_known_ips(struct 
> > ovn_port *op,
> >   }
> >   }
> >   }
> > +
> > +/*
> > + * Add responses for ARP proxies.
> > + */
> > +arp_proxy = smap_get(>nbsp->options,"arp_proxy");
> > +
> > +if (arp_proxy && op->peer) {
> > +struct lport_addresses proxy_arp_addrs;
> > +int i = 0;
> > +
> > +if (extract_ip_addresses(arp_proxy, _arp_addrs)) {
> > +/*
> > + * Match rule on all proxy ARP IPs.
> > + */
> > +ds_clear(match);
> > +ds_put_cstr(match, "arp.op == 1 && (");
> > +
> > +for (i = 0; i < proxy_arp_addrs.n_ipv4_addrs; i++) {
> > +if (i > 0) {
> > +ds_put_cstr(match, " || ");
> > +}
> > +ds_put_format(match, "arp.tpa == %s",
> > +proxy_arp_addrs.ipv4_addrs[i].addr_s);
> > +}
> > +
> > +ds_put_cstr(match, ")");
> > +destroy_lport_addresses(_arp_addrs);
> > +
> > +ds_clear(actions);
> > +ds_put_format(actions,
> > +"eth.dst = eth.src; "
> > +"eth.src = %s; "
> > +"arp.op = 2; /* ARP reply */ "
> > +"arp.tha = arp.sha; "
> > +"arp.sha = %s; "
> > +"arp.tpa <-> arp.spa; "
> > +"outport = inport; "
> > +"flags.loopback = 1; "
> > +"output;",
> > +op->peer->lrp_networks.ea_s,
> > +op->peer->lrp_networks.ea_s);
> > +
> > +ovn_lflow_add_with_hint(lflows, op->od, 
> > S_SWITCH_IN_ARP_ND_RSP,
> > +50, ds_cstr(match), ds_cstr(actions), 
> > >nbsp->header_);
> > +}
> > +}
> >   }
> >   }
> >
> > diff --git a/ovn-nb.xml b/ovn-nb.xml
> > index 406bc85..077a2d8 100644
> > --- a/ovn-nb.xml
> > +++ b/ovn-nb.xml
> > @@ -848,6 +848,15 @@
> >   
> > 
> >   
> > +
> > +
> > +  Optional. A list IPv4 addresses that this
> > +  logical switch router port will reply to ARP 
> > requests.
> > +  Example: 169.254.239.254 169.254.239.2. The
> > +  's logical 

Re: [ovs-dev] [PATCH 4/4] dpif-netdev: Allow cross-NUMA polling on selected ports

2021-06-29 Thread 0-day Robot
Bleep bloop.  Greetings , I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Unexpected sign-offs from developers who are not authors or co-authors 
or committers: Jan Scheurich , Rudra Surya Bhaskara 
Rao 
Lines checked: 241, Warnings: 1, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 3/4] dpif-netdev: pmd-rxq-affinity with optional PMD isolation

2021-06-29 Thread 0-day Robot
Bleep bloop.  Greetings , I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Unexpected sign-offs from developers who are not authors or co-authors 
or committers: Jan Scheurich , Rudra Surya Bhaskara 
Rao 
Lines checked: 219, Warnings: 1, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/4] dpif-netdev: Least-loaded scheduling algorithm for rxqs

2021-06-29 Thread 0-day Robot
Bleep bloop.  Greetings , I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Unexpected sign-offs from developers who are not authors or co-authors 
or committers: Jan Scheurich , Rudra Surya Bhaskara 
Rao 
Lines checked: 465, Warnings: 1, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/4] dpif-netdev: Refactor rxq auto-lb dry-run code.

2021-06-29 Thread 0-day Robot
Bleep bloop.  Greetings , I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Unexpected sign-offs from developers who are not authors or co-authors 
or committers: Jan Scheurich , Rudra Surya Bhaskara 
Rao 
ERROR: Inappropriate bracing around statement
#447 FILE: lib/dpif-netdev.c:5712:
if (improvement < dp->pmd_alb.rebalance_improve_thresh)

Lines checked: 460, Warnings: 1, Errors: 1


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5 ovn 3/4] ovn-northd: Add CoPP policies for flows that punt packets to ovn-controller.

2021-06-29 Thread Mark Gray
On 23/06/2021 12:05, Lorenzo Bianconi wrote:
> From: Dumitru Ceara 
> 
> Change the ovn-northd implementation to set the new 'controller_meter'
> field for flows that need to punt packets to ovn-controller.
> 
> Protocol packets for which CoPP is enforced when sending packets to
> ovn-controller (if configured):
> - ARP
> - ND_NS
> - ND_NA
> - ND_RA
> - DNS
> - IGMP
> - packets that require ARP resolution before forwarding
> - packets that require ND_NS before forwarding
> - packets that need to be replied to with ICMP Errors
> - packets that need to be replied to with TCP RST
> - packets that need to be replied to with DHCP_OPTS
> - packets that trigger a SCTP abort action
> - contoller_events
> - BFD
> 
> Co-authored-by: Lorenzo Bianconi 
> Signed-off-by: Lorenzo Bianconi 
> Signed-off-by: Dumitru Ceara 
> ---


Acked-by: Mark D. Gray 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v4 01/12] dpif-netdev: Add command line and function pointer for miniflow extract

2021-06-29 Thread Amber, Kumar
Hi Eelco,

Thanks a lot for the comments and my replies are inline.

> -Original Message-
> From: Eelco Chaudron 
> Sent: Tuesday, June 29, 2021 2:59 PM
> To: Amber, Kumar ; Van Haaren, Harry
> 
> Cc: d...@openvswitch.org; i.maxim...@ovn.org; Flavio Leitner
> 
> Subject: Re: [ovs-dev] [v4 01/12] dpif-netdev: Add command line and
> function pointer for miniflow extract
> 
> Hi Kumar/Harry,
> 
> Find my comments inline.
> 
> //Eelco
> 
> 
> On 17 Jun 2021, at 18:27, Kumar Amber wrote:
> 
> > This patch introduces the mfex function pointers which allows the user
> > to switch between different miniflow extract implementations which are
> > provided by the OVS based on optimized ISA CPU.
> >
> > The user can query for the available minflow extract variants
> > available for that CPU by following commands:
> >
> > $ovs-appctl dpif-netdev/miniflow-parser-get
> >
> > Similarly an user can set the miniflow implementation by the following
> > command :
> >
> > $ ovs-appctl dpif-netdev/miniflow-parser-set name
> >
> > This allow for more performance and flexibility to the user to choose
> > the miniflow implementation according to the needs.
> >
> > Signed-off-by: Kumar Amber 
> > Co-authored-by: Harry van Haaren 
> > Signed-off-by: Harry van Haaren 
> > ---
> >  lib/automake.mk   |   2 +
> >  lib/dpif-netdev-avx512.c  |  32 ++--
> >  lib/dpif-netdev-private-extract.c |  86 
> > lib/dpif-netdev-private-extract.h |  94 ++
> >  lib/dpif-netdev-private-thread.h  |   4 +
> >  lib/dpif-netdev.c | 126 +-
> >  6 files changed, 337 insertions(+), 7 deletions(-)  create mode
> > 100644 lib/dpif-netdev-private-extract.c  create mode 100644
> > lib/dpif-netdev-private-extract.h
> >
> > diff --git a/lib/automake.mk b/lib/automake.mk index
> > 49f42c2a3..6657b9ae5 100644
> > --- a/lib/automake.mk
> > +++ b/lib/automake.mk
> > @@ -118,6 +118,8 @@ lib_libopenvswitch_la_SOURCES = \
> > lib/dpif-netdev-private-dpcls.h \
> > lib/dpif-netdev-private-dpif.c \
> > lib/dpif-netdev-private-dpif.h \
> > +   lib/dpif-netdev-private-extract.c \
> > +   lib/dpif-netdev-private-extract.h \
> > lib/dpif-netdev-private-flow.h \
> > lib/dpif-netdev-private-hwol.h \
> > lib/dpif-netdev-private-thread.h \
> > diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c index
> > f9b199637..bb99b23ff 100644
> > --- a/lib/dpif-netdev-avx512.c
> > +++ b/lib/dpif-netdev-avx512.c
> > @@ -148,6 +148,15 @@ dp_netdev_input_outer_avx512(struct
> dp_netdev_pmd_thread *pmd,
> >   * // do all processing (HWOL->MFEX->EMC->SMC)
> >   * }
> >   */
> > +
> > +/* Do a batch minfilow extract into keys. */
> > +uint32_t mf_mask = 0;
> > +if (pmd->miniflow_extract_opt) {
> 
> This will need some atomic get/use, or else we will crash here on change.

Changed to following in v5.
miniflow_extract_func mfex_func;
atomic_read_relaxed(>miniflow_extract_opt, _func);
if (mfex_func) {

> 
> > +mf_mask = pmd->miniflow_extract_opt(packets, keys,
> > +batch_size, in_port,
> > +(void *) pmd);
> 
> Don’t think we should cast to void here, but I guess the callback should take
> struct dp_netdev_pmd_thread?
> 

Fixed in v5 already.
> 
> > +}
> > +/* Perform first packet interation */
> >  uint32_t lookup_pkts_bitmask = (1ULL << batch_size) - 1;
> >  uint32_t iter = lookup_pkts_bitmask;
> >  while (iter) {
> > @@ -159,6 +168,12 @@ dp_netdev_input_outer_avx512(struct
> dp_netdev_pmd_thread *pmd,
> >  pkt_metadata_init(>md, in_port);
> >
> >  struct dp_netdev_flow *f = NULL;
> > +struct netdev_flow_key *key = [i];
> > +
> > +/* Check the minfiflow mask to see if the packet was correctly
> > +* classifed by vector mfex else do a scalar miniflow extract
> > +* for that packet. */
> > +uint32_t mfex_hit = (mf_mask & (1 << i));
> >
> >  /* Check for partial hardware offload mark. */
> >  uint32_t mark;
> > @@ -166,7 +181,13 @@ dp_netdev_input_outer_avx512(struct
> dp_netdev_pmd_thread *pmd,
> >  f = mark_to_flow_find(pmd, mark);
> >  if (f) {
> >  rules[i] = >cr;
> > -pkt_meta[i].tcp_flags = parse_tcp_flags(packet);
> > +/* If AVX512 MFEX already classified the packet, use it. */
> > +if (mfex_hit) {
> > +pkt_meta[i].tcp_flags = 
> > miniflow_get_tcp_flags(>mf);
> > +} else {
> > +pkt_meta[i].tcp_flags = parse_tcp_flags(packet);
> > +}
> > +
> >  pkt_meta[i].bytes = dp_packet_size(packet);
> >  phwol_hits++;
> >  hwol_emc_smc_hitmask |= (1 << i); @@ -174,11 +195,12
> > @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 

Re: [ovs-dev] Fwd: Openvswitch patch doubts:conntrack: Fix missed 'conn' lookup checks.

2021-06-29 Thread Ilya Maximets
On 6/29/21 8:51 AM, user wrote:
> Hi Gaetan,
>Thanks for your resolve. It is indeed possible for tunneled packets to be 
> allocated to different CPUs by RSS, in which case a check before adding 
> connTrack is necessary. 
>And I didn't made it clear that the two packets belong to the same 
> conntrack. What I mean is that the two packets are both from the direction of 
> origin with ct_state of +new(and both of them are not tunnneled pkts). They 
> have the same header and try to generate the same conntrack. Such packets are 
> unlikely to be distributed by RSS to different CPUs. So can I say that in 
> certain situations, such as the header of pkts in the same direction of a 
> conntrack to be created are the same, we can skip the check before inserting 
> a 'conn' entry? Is there any reason other than CPU contention for us to do 
> this?

I guess, there is no other reason to lock something, unless you're
protecting yourself form a race.  But, I don't think that we can
do anything here, because we don't and can't know if other CPU is
processing the same packet right now.  The problem is that OpenFlow
pipeline can completely change almost all the packet headers before
pushing it to conntrack and in this case it doesn't matter what
was the original RSS hash or what was the original packet at all.
The only thing that matters is what are the packet headers now and
we can't be sure that other CPU doesn't have exactly the same packet,
because it maybe executing same header updates before pushing the
packet to conntrack.

We also should remember that buggy application in a guest may put
packets from the same flow to different Tx queues in a guest VM,
and there is no RSS here to re-direct these packets, so they will
be, likely, processed on different CPU cores.

Best regards, Ilya Maximets.

> 
> Thank you,
> Avis
> 
>> 下面是被转发的邮件:
>>
>> 发件人: Gaëtan Rivet 
>> 主题: 回复:[ovs-dev] Openvswitch patch doubts:conntrack: Fix missed 'conn' 
>> lookup checks.
>> 日期: 2021年6月28日 GMT+8 下午5:05:23
>> 收件人: user 
>>
>> On Mon, Jun 28, 2021, at 05:45, user wrote:
>>> Hi Ben,
>>> I think this situation may not happen,  because if there are two 
>>> pkts are going to create the same conntrack, their headers will be 
>>> roughly the same, the rss of the hardware will assign the packets to 
>>> the same cpu, so there is no chance for two threads to try to insert 
>>> the same conntrack at the same time, so I am confused about the reason 
>>> for doing this and what are the missing cases.
>>>
>>> Avis
>>> ___
>>> dev mailing list
>>> d...@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>
>>
>> Hello Avis,
>>
>> It would be easier to follow if you replied to the previous thread instead 
>> of sending
>> a separate email.
>>
>>> their headers will be roughly the same, the rss of the hardware will assign
>>> the packets to the same cpu
>>
>> No, the headers won't be roughly the same. Either the 5-tuple is the 
>> symmetrical inverse
>> for regular CT, or mostly different for NAT or in case of tunneling (RSS is 
>> done
>> on the outer packet). Even with a clean symmetrical inverse, the RSS will be 
>> completely
>> different unless symmetric RSS is used (needs explicit conf on some 
>> hardware, implicit
>> support or no support at all on some others).
>>
>> With symmetric RSS (assuming the hardware exposes a capability to detect 
>> support, which
>> it does not currently), NAT or tunneling will then break symmetry. For NAT 
>> you can tweak
>> the tuple generation such that the resulting RSS will fall on the proper 
>> CPU. For tunneling
>> however I don't see how you could tweak anything, as we don't control how 
>> the outer packet
>> is created. In VXLAN for example, some sender will modulate the UDP src port 
>> to get 16 bits
>> of entropy, but as the middlebox we cannot influence it.
>>
>> Without symmetry, we have a potential race between cores to process a 
>> connection,
>> so locks are needed for insertion or state update.
>>
>> -- 
>> Gaetan Rivet
>>
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V7 00/13] Netdev vxlan-decap offload

2021-06-29 Thread Van Haaren, Harry
> -Original Message-
> From: Ilya Maximets 
> Sent: Monday, June 28, 2021 3:33 PM
> To: Van Haaren, Harry ; Ilya Maximets
> ; Sriharsha Basavapatna
> 
> Cc: Eli Britstein ; ovs dev ; Ivan 
> Malov
> ; Majd Dibbiny ; Stokes, Ian
> ; Ferriter, Cian ; Ben Pfaff
> ; Balazs Nemeth 
> Subject: Re: [ovs-dev] [PATCH V7 00/13] Netdev vxlan-decap offload
> 
> On 6/25/21 7:28 PM, Van Haaren, Harry wrote:
> >> -Original Message-
> >> From: dev  On Behalf Of Ilya Maximets
> >> Sent: Friday, June 25, 2021 4:26 PM
> >> To: Sriharsha Basavapatna ; Ilya
> Maximets
> >> 
> >> Cc: Eli Britstein ; ovs dev ; Ivan
> Malov
> >> ; Majd Dibbiny 
> >> Subject: Re: [ovs-dev] [PATCH V7 00/13] Netdev vxlan-decap offload
> >
> > 
> >
>  That looks good to me.  So, I guess, Harsha, we're waiting for
>  your review/tests here.
> >>>
> >>> Thanks Ilya and Eli, looks good to me; I've also tested it and it works 
> >>> fine.
> >>> -Harsha
> >>
> >> Thanks, everyone.  Applied to master.
> >
> > Hi Ilya and OVS Community,
> >
> > There are open questions around this patchset, why has it been merged?
> >
> > Earlier today, new concerns were raised by Cian around the negative 
> > performance
> impact of these code changes:
> > - https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/384445.html
> >
> > Both you (Ilya) and Eli responded, and I was following the conversation. 
> > Various
> code changes were suggested,
> > and some may seem like they might work, Eli mentioned some solutions might 
> > not
> work due to the hardware:
> > I was processing both your comments and input, and planning a technical 
> > reply
> later today.
> > - suggestions: https://mail.openvswitch.org/pipermail/ovs-dev/2021-
> June/384446.html
> > - concerns around hw: https://mail.openvswitch.org/pipermail/ovs-dev/2021-
> June/384464.html
> 
> Concerns not really about the hardware, but the API itself
> that should be clarified a little bit to avoid confusion and
> avoid incorrect changes like the one I suggested.
> But this is a small enhancement that could be done on top.
> 
> >
> > Keep in mind that there are open performance issues to be worked out, that 
> > have
> not been resolved at this point in the conversation.
> 
> Performance issue that can be worked out, will be worked out
> in a separate patch , v1 for which we already have on a mailing
> list for some time, so it didn't make sense to to re-validate
> the whole series again due to this one pretty obvious change.
> 
> > There is no agreement on solutions, nor an agreement to ignore the 
> > performance
> degradation, or to try resolve this degradation later.
> 
> Particular part of the packet restoration call seems hard
> to avoid in a long term (I don't see a good solution for that),
> but the short term solution might be implemented on top.
> The part with multiple reads of recirc_id and checking if
> offloading is enabled has a fix already (that needs a v2, but
> anyway).
> 
> >
> > That these patches have been merged is inappropriate:
> > 1) Not enough time given for responses (11 am concerns raised, 5pm merged
> without resolution? (Irish timezone))
> 
> I responded with suggestions and arguments against solutions
> suggested in the report, Eli responded with rejection of one
> one of my suggestions.  And it seems clear (for me) that
> there is no good solution for this part at the moment.
> Part of the performance could be won back, but the rest
> seems to be inevitable.  As a short-term solution we can
> guard the netdev_hw_miss_packet_recover() with experimental
> API ifdef, but it will strike back anyway in the future.
> 
> > 2) Open question not addressed/resolved, resulting in a 6% known negative
> performance impact being merged.
> 
> I don't think it wasn't addressed.

Was code merged that resulted in a known regression of 6%?  Yes. Facts are 
facts.
I don't care for arguing over exactly what "addressed" means in this context.


> > 3) Suggestions provided were not reviewed technically in detail (no 
> > technical
> collaboration or code-changes/patches reviewed)
> 
> Patches was heavily reviewed/tested by at least 4 different
> parties including 2 test rounds from Intel engineers that,
> I believe, included testing of partial offloading.  And that
> bothers me the most.  If I can not trust performance test
> reports, I'm not sure performance can be a gating factor here.
> 
> >
> > I feel that the OVS process of allowing time for community review and
> collaboration was not adhered to in this instance.
> > As a result, code was merged that is known to cause performance degradation.
> >
> > Therefore, this email is a request to revert these patches as they are not 
> > currently
> fit for inclusion in my opinion.
> >
> > As next steps, I can propose the following:
> > 1) Revert the patches from master branch
> > 2) Continue technical discussion on how to avoid negative performance impact
> > 3) Review solutions, allowing time for responses and replies
> > 4) Merge 

Re: [ovs-dev] [PATCH ovn v5] ovn-northd.c: Add proxy ARP support to OVN

2021-06-29 Thread Brendan Doyle

Numan,

Did this version apply ? I'm guessing not. This was generated with git 
mail. But I don't see

an entry in https://patchwork.ozlabs.org/project/ovn/list/
for it. Please let me know if this has issue, if so I'll try generate a PR.

Thanks

Brendan


On 28/06/2021 12:16, Brendan Doyle wrote:

This patch provides the ability to configure proxy ARP IPs on a Logical
Switch Router port. The IPs are added as Options for router ports. This
provides a useful feature where traffic for a service must be sent to an
address in a logical network address space, but the service is provided
in a different network. For example an NFS service is provide to Logical
networks at an address in their Logical network space, but the NFS
server resides in a physical network. A Logical switch Router port can
be configured to respond to ARP requests sent to the service "Logical
address", the Logical Router/Gateway can then be configured to forward
the traffic to the underlay/physical network.

Signed-off-by: Brendan Doyle 
---
  northd/ovn-northd.c |  48 
  ovn-nb.xml  |   9 +
  tests/ovn.at| 103 
  3 files changed, 160 insertions(+)

diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index fcd6167..258b5db 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -6969,6 +6969,8 @@ build_lswitch_arp_nd_responder_known_ips(struct ovn_port 
*op,
   struct ds *match)
  {
  if (op->nbsp) {
+const char *arp_proxy;
+
  if (!strcmp(op->nbsp->type, "virtual")) {
  /* Handle
   *  - GARPs for virtual ip which belongs to a logical port
@@ -7126,6 +7128,52 @@ build_lswitch_arp_nd_responder_known_ips(struct ovn_port 
*op,
  }
  }
  }
+
+/*
+ * Add responses for ARP proxies.
+ */
+arp_proxy = smap_get(>nbsp->options,"arp_proxy");
+
+if (arp_proxy && op->peer) {
+struct lport_addresses proxy_arp_addrs;
+int i = 0;
+
+if (extract_ip_addresses(arp_proxy, _arp_addrs)) {
+/*
+ * Match rule on all proxy ARP IPs.
+ */
+ds_clear(match);
+ds_put_cstr(match, "arp.op == 1 && (");
+
+for (i = 0; i < proxy_arp_addrs.n_ipv4_addrs; i++) {
+if (i > 0) {
+ds_put_cstr(match, " || ");
+}
+ds_put_format(match, "arp.tpa == %s",
+proxy_arp_addrs.ipv4_addrs[i].addr_s);
+}
+
+ds_put_cstr(match, ")");
+destroy_lport_addresses(_arp_addrs);
+
+ds_clear(actions);
+ds_put_format(actions,
+"eth.dst = eth.src; "
+"eth.src = %s; "
+"arp.op = 2; /* ARP reply */ "
+"arp.tha = arp.sha; "
+"arp.sha = %s; "
+"arp.tpa <-> arp.spa; "
+"outport = inport; "
+"flags.loopback = 1; "
+"output;",
+op->peer->lrp_networks.ea_s,
+op->peer->lrp_networks.ea_s);
+
+ovn_lflow_add_with_hint(lflows, op->od, S_SWITCH_IN_ARP_ND_RSP,
+50, ds_cstr(match), ds_cstr(actions), >nbsp->header_);
+}
+}
  }
  }
  
diff --git a/ovn-nb.xml b/ovn-nb.xml

index 406bc85..077a2d8 100644
--- a/ovn-nb.xml
+++ b/ovn-nb.xml
@@ -848,6 +848,15 @@
  

  
+
+
+  Optional. A list IPv4 addresses that this
+  logical switch router port will reply to ARP requests.
+  Example: 169.254.239.254 169.254.239.2. The
+  's logical router should
+  have a route to forward packets sent to configured proxy ARP IPs to
+  an appropriate destination.
+

  


diff --git a/tests/ovn.at b/tests/ovn.at
index 5926350..1e0065d 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -26899,3 +26899,106 @@ AT_CHECK([ovs-ofctl dump-flows br-int 
"nw_src=10.0.0.0/24" | \
  OVN_CLEANUP([hv1])
  AT_CLEANUP
  ])
+
+OVN_FOR_EACH_NORTHD([
+AT_SETUP([ovn -- proxy-arp: 1 HVs, 1 LSs, 1 lport/LS, 1 LR])
+AT_KEYWORDS([proxy-arp])
+ovn_start
+
+# Logical network:
+# One LR - lr1 has switch ls1 (192.16.1.0/24) connected to it,
+# and and one HV with IP 192.16.1.6.
+
+ovn-nbctl lr-add lr1
+ovn-nbctl ls-add ls1
+
+# Connect ls1 to lr1
+ovn-nbctl lrp-add lr1 ls1 00:00:00:01:02:f1 192.16.1.1/24
+ovn-nbctl lsp-add ls1 rp-ls1 -- set Logical_Switch_Port rp-ls1 \
+type=router options:router-port=ls1 addresses=\"00:00:00:01:02:f1\"
+
+# Create logical port ls1-lp1 in ls1
+ovn-nbctl lsp-add ls1 ls1-lp1 \
+-- lsp-set-addresses ls1-lp1 "00:00:00:01:02:03 192.16.1.6"
+
+
+# Create one hypervisor and create OVS 

Re: [ovs-dev] [PATCH] ovsdb-cs: Avoid unnecessary re-connections when updating remotes.

2021-06-29 Thread Dumitru Ceara
On 6/29/21 12:56 PM, Ilya Maximets wrote:
> If a new database server added to the cluster, or if one of the
> database servers changed its IP address or port, then you need to
> update the list of remotes for the client.  For example, if a new
> OVN_Southbound database server is added, you need to update the
> ovn-remote for the ovn-controller.
> 
> However, in the current implementation, the ovsdb-cs module always
> closes the current connection and creates a new one.  This can lead
> to a storm of re-connections if all ovn-controllers will be updated
> simultaneously.  They can also start re-dowloading the database
> content, creating even more load on the database servers.
> 
> Correct this by saving an existing connection if it is still in the
> list of remotes after the update.
> 
> 'reconnect' module will report connection state updates, but that
> is OK since no real re-connection happened and we only updated the
> state of a new 'reconnect' instance.
> 
> If required, re-connection can be forced after the update of remotes
> with ovsdb_cs_force_reconnect().
> 
> Signed-off-by: Ilya Maximets 
> ---

I tried it out and it works fine; the code looks OK to me:

Acked-by: Dumitru Ceara 

Thanks,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v4 02/12] dpif-netdev: Add auto validation function for miniflow extract

2021-06-29 Thread Eelco Chaudron



On 17 Jun 2021, at 18:27, Kumar Amber wrote:


This patch introduced the auto-validation function which
allows users to compare the batch of packets obtained from
different miniflow implementations against the linear
miniflow extract and return a hitmask.

The autovaidator function can be triggered at runtime using the
following command:

$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator

Signed-off-by: Kumar Amber 
Co-authored-by: Harry van Haaren 
Signed-off-by: Harry van Haaren 
---
 lib/dpif-netdev-private-extract.c | 141 
++

 lib/dpif-netdev-private-extract.h |  15 
 lib/dpif-netdev.c |   2 +-
 3 files changed, 157 insertions(+), 1 deletion(-)

diff --git a/lib/dpif-netdev-private-extract.c 
b/lib/dpif-netdev-private-extract.c

index fcc56ef26..0741c19f9 100644
--- a/lib/dpif-netdev-private-extract.c
+++ b/lib/dpif-netdev-private-extract.c
@@ -32,6 +32,11 @@ VLOG_DEFINE_THIS_MODULE(dpif_netdev_extract);

 /* Implementations of available extract options. */
 static struct dpif_miniflow_extract_impl mfex_impls[] = {
+   {
+.probe = NULL,
+.extract_func = dpif_miniflow_extract_autovalidator,
+.name = "autovalidator",
+},
 {
 .probe = NULL,
 .extract_func = NULL,
@@ -84,3 +89,139 @@ dpif_miniflow_extract_info_get(struct 
dpif_miniflow_extract_impl **out_ptr)

 *out_ptr = mfex_impls;
 return ARRAY_SIZE(mfex_impls);
 }
+
+uint32_t
+dpif_miniflow_extract_autovalidator(struct dp_packet_batch *packets,
+struct netdev_flow_key *keys,
+uint32_t keys_size, odp_port_t 
in_port,

+void *pmd_handle)
+{
+const size_t cnt = dp_packet_batch_size(packets);
+uint16_t good_l2_5_ofs[NETDEV_MAX_BURST];
+uint16_t good_l3_ofs[NETDEV_MAX_BURST];
+uint16_t good_l4_ofs[NETDEV_MAX_BURST];
+uint16_t good_l2_pad_size[NETDEV_MAX_BURST];
+struct dp_packet *packet;
+struct dp_netdev_pmd_thread *pmd = pmd_handle;
+struct dpif_miniflow_extract_impl *miniflow_funcs;
+
+int32_t mfunc_count = 
dpif_miniflow_extract_info_get(_funcs);

+if (mfunc_count < 0) {


In theory 0 could not be returned, but just to cover the corner case can 
we change this to include zero.



+pmd->miniflow_extract_opt = NULL;


Guess the above needs to be atomic.

+VLOG_ERR("failed to get miniflow extract function 
implementations\n");


Capital F to be in sync with your other error messages?


+return 0;
+}
+ovs_assert(keys_size >= cnt);


I don’t think we should assert here. Just return an error like above, 
so in production, we get notified, and this implementation gets 
disabled.



+struct netdev_flow_key test_keys[NETDEV_MAX_BURST];
+
+/* Run scalar miniflow_extract to get default result. */
+DP_PACKET_BATCH_FOR_EACH (i, packet, packets) {
+pkt_metadata_init(>md, in_port);
+miniflow_extract(packet, [i].mf);
+
+/* Store known good metadata to compare with optimized 
metadata. */

+good_l2_5_ofs[i] = packet->l2_5_ofs;
+good_l3_ofs[i] = packet->l3_ofs;
+good_l4_ofs[i] = packet->l4_ofs;
+good_l2_pad_size[i] = packet->l2_pad_size;
+}
+
+/* Iterate through each version of miniflow implementations. */
+for (int j = MFEX_IMPL_START_IDX; j < ARRAY_SIZE(mfex_impls); 
j++) {

+if (!mfex_impls[j].available) {
+continue;
+}
+
+/* Reset keys and offsets before each implementation. */
+memset(test_keys, 0, keys_size * sizeof(struct 
netdev_flow_key));

+DP_PACKET_BATCH_FOR_EACH (i, packet, packets) {
+dp_packet_reset_offsets(packet);
+}
+/* Call optimized miniflow for each batch of packet. */
+uint32_t hit_mask = mfex_impls[j].extract_func(packets, 
test_keys,
+keys_size, in_port, 
pmd_handle);

+
+/* Do a miniflow compare for bits, blocks and offsets for all 
the

+ * classified packets in the hitmask marked by set bits. */
+while (hit_mask) {
+/* Index for the set bit. */
+uint32_t i = __builtin_ctz(hit_mask);
+/* Set the index in hitmask to Zero. */
+hit_mask &= (hit_mask - 1);
+
+uint32_t failed = 0;
+
+/* Check miniflow bits are equal. */
+if ((keys[i].mf.map.bits[0] != 
test_keys[i].mf.map.bits[0]) ||
+(keys[i].mf.map.bits[1] != 
test_keys[i].mf.map.bits[1])) {

+VLOG_ERR("Good 0x%llx 0x%llx\tTest 0x%llx 0x%llx\n",
+ keys[i].mf.map.bits[0], 
keys[i].mf.map.bits[1],

+ test_keys[i].mf.map.bits[0],
+ test_keys[i].mf.map.bits[1]);
+failed = 1;
+}
+
+if (!miniflow_equal([i].mf, _keys[i].mf)) {
+uint32_t 

[ovs-dev] [PATCH 4/4] dpif-netdev: Allow cross-NUMA polling on selected ports

2021-06-29 Thread anurag2k
From: Anurag Agarwal 

Today dpif-netdev considers PMD threads on a non-local NUMA node for
automatic assignment of the rxqs of a port only if there are no local,
non-isolated PMDs.

On typical servers with both physical ports on one NUMA node, this often
leaves the PMDs on the other NUMA node under-utilized, wasting CPU
resources. The alternative, to manually pin the rxqs to PMDs on remote
NUMA nodes, also has drawbacks as it limits OVS' ability to auto
load-balance the rxqs.

This patch introduces a new interface configuration option to allow
ports to be automatically polled by PMDs on any NUMA node:

ovs-vsctl set interface  other_config:cross-numa-polling=true

If this option is not present or set to false, legacy behaviour applies.

Signed-off-by: Anurag Agarwal 
Signed-off-by: Jan Scheurich 
Signed-off-by: Rudra Surya Bhaskara Rao 
---
 Documentation/topics/dpdk/pmd.rst | 28 ++--
 lib/dpif-netdev.c | 35 +--
 tests/pmd.at  | 30 ++
 vswitchd/vswitch.xml  | 20 
 4 files changed, 101 insertions(+), 12 deletions(-)

diff --git a/Documentation/topics/dpdk/pmd.rst 
b/Documentation/topics/dpdk/pmd.rst
index d63750e..abe1cda 100644
--- a/Documentation/topics/dpdk/pmd.rst
+++ b/Documentation/topics/dpdk/pmd.rst
@@ -78,8 +78,27 @@ To show port/Rx queue assignment::
 
 $ ovs-appctl dpif-netdev/pmd-rxq-show
 
-Rx queues may be manually pinned to cores. This will change the default Rx
-queue assignment to PMD threads::
+Normally, Rx queues are assigned to PMD threads automatically.  By default
+OVS only assigns Rx queues to PMD threads executing on the same NUMA
+node in order to avoid unnecessary latency for accessing packet buffers
+across the NUMA boundary.  Typically this overhead is higher for vhostuser
+ports than for physical ports due to the packet copy that is done for all
+rx packets.
+
+On NUMA servers with physical ports only on one NUMA node, the NUMA-local
+polling policy can lead to an under-utilization of the PMD threads on the
+remote NUMA node.  For the overall OVS performance it may in such cases be
+beneficial to utilize the spare capacity and allow polling of a physical
+port's rxqs across NUMA nodes despite the overhead involved.
+The policy can be set per port with the following configuration option::
+
+$ ovs-vsctl set Interface  \
+other_config:cross-numa-polling=true|false
+
+The default value is false.
+
+Rx queues may also be manually pinned to cores. This will change the default
+Rx queue assignment to PMD threads::
 
 $ ovs-vsctl set Interface  \
 other_config:pmd-rxq-affinity=
@@ -194,6 +213,11 @@ or can be triggered by using::
Rx queue utilization of the PMD as a percentage. Prior to this, tracking of
stats was not available.
 
+.. versionchanged:: 2.15.0
+
+   Added the interface parameter ``other_config:cross-numa-polling`` and the
+   ``no-isol`` option for ``pmd-rxq-affinity``.
+
 Automatic assignment of Port/Rx Queue to PMD Threads (experimental)
 ---
 
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 7d9078f..6b9a151 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -478,6 +478,7 @@ struct dp_netdev_port {
 bool emc_enabled;   /* If true EMC will be used. */
 char *type; /* Port type as requested by user. */
 char *rxq_affinity_list;/* Requested affinity of rx queues. */
+bool cross_numa_polling;/* If true cross polling will be enabled */
 };
 
 /* Contained by struct dp_netdev_flow's 'stats' member.  */
@@ -4548,6 +4549,7 @@ dpif_netdev_port_set_config(struct dpif *dpif, odp_port_t 
port_no,
 int error = 0;
 const char *affinity_list = smap_get(cfg, "pmd-rxq-affinity");
 bool emc_enabled = smap_get_bool(cfg, "emc-enable", true);
+bool cross_numa_polling = smap_get_bool(cfg, "cross-numa-polling", false);
 
 ovs_mutex_lock(>port_mutex);
 error = get_port_by_number(dp, port_no, );
@@ -4555,6 +4557,11 @@ dpif_netdev_port_set_config(struct dpif *dpif, 
odp_port_t port_no,
 goto unlock;
 }
 
+if (cross_numa_polling != port->cross_numa_polling) {
+port->cross_numa_polling = cross_numa_polling;
+dp_netdev_request_reconfigure(dp);
+}
+
 if (emc_enabled != port->emc_enabled) {
 struct dp_netdev_pmd_thread *pmd;
 struct ds ds = DS_EMPTY_INITIALIZER;
@@ -5173,8 +5180,8 @@ rxq_scheduling(struct dp_netdev *dp, bool dry_run)
 struct dp_netdev_port *port;
 struct dp_netdev_rxq ** rxqs = NULL;
 struct rr_numa_list rr;
-struct rr_numa *numa = NULL;
-struct rr_numa *non_local_numa = NULL;
+struct rr_numa *local_numa = NULL;
+struct rr_numa *next_numa = NULL;
 int n_rxqs = 0;
 int numa_id;
 bool assign_cyc = dp->pmd_rxq_assign_cyc;
@@ -5214,12 +5221,20 @@ 

[ovs-dev] [PATCH 3/4] dpif-netdev: pmd-rxq-affinity with optional PMD isolation

2021-06-29 Thread anurag2k
From: Anurag Agarwal 

In some scenarios it is beneficial for DPDK datapath performance to pin
rx queues to specific PMDs, for example to allow cross-NUMA polling
when both physical ports are on one NUMA node but the PMD configuration
is symmetric.

Today such rxq pinning unconditionally makes these PMDs isolated, i.e.
they are no longer available for polling unpinned rx queues and hence
limit the ability of the load-based rxq distrubution logic to use spare
capacity on these isolated PMDs for unpinned rx queues. This typically
leads to a sub-optimal load balance over the available PMDs.

The overall OVS-DPDK performance can be improved by not isolating PMDs
with pinned rxqs and let OVS decide on the optimally balanced
distribution of rxqs autonomously.

This patch introduces a new option in the pmd-rxq-affinity cofiguration
parameter to skip the isolation of the target PMD threads:

ovs-vsctl set interface   \
other_config : pmd-rxq-affinity = rxq1:cpu1,rxq2:cpu2,...[,no-isol]

Without the no-isol option, pinning isolates the target PMDs as before.
With the no-isol option, the target PMDs remain non-isolated.

Note.: A single rx queue of any one port that is pinned without the
no-isol option is enough to isolate a PMD.

Signed-off-by: Anurag Agarwal 
Signed-off-by: Jan Scheurich 
Signed-off-by: Rudra Surya Bhaskara Rao 
---
 Documentation/topics/dpdk/pmd.rst | 18 ++
 lib/dpif-netdev.c | 19 ---
 tests/pmd.at  | 25 +
 vswitchd/vswitch.xml  | 11 ++-
 4 files changed, 65 insertions(+), 8 deletions(-)

diff --git a/Documentation/topics/dpdk/pmd.rst 
b/Documentation/topics/dpdk/pmd.rst
index e481e79..d63750e 100644
--- a/Documentation/topics/dpdk/pmd.rst
+++ b/Documentation/topics/dpdk/pmd.rst
@@ -101,8 +101,18 @@ like so:
 - Queue #2 not pinned
 - Queue #3 pinned to core 8
 
-PMD threads on cores where Rx queues are *pinned* will become *isolated*. This
-means that this thread will only poll the *pinned* Rx queues.
+By default PMD threads on cores where Rx queues are *pinned* will become
+*isolated*.This means that these threads will only poll the *pinned* Rx queues.
+If this isolation of PMD threads is not wanted, it can be skipped by adding
+the ``no-isol`` option to the , e.g.
+
+$ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \
+other_config:pmd-rxq-affinity="0:3,1:7,3:8,no-isol"
+
+.. note::
+
+   A single Rx queue pinned to a CPU core without the ``no-isol`` option
+   suffices to isolate the PMD thread.
 
 .. warning::
 
@@ -111,8 +121,8 @@ means that this thread will only poll the *pinned* Rx 
queues.
 is not in ``pmd-cpu-mask``), the RX queue will not be polled
by any PMD thread.
 
-If ``pmd-rxq-affinity`` is not set for Rx queues, they will be assigned to PMDs
-(cores) automatically.
+If ``pmd-rxq-affinity`` is not set for Rx queues, they will be assigned to
+non-isolated PMDs (cores) automatically.
 
 The algorithm used to automatically assign Rxqs to PMDs can be set by::
 
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 1c458b2..7d9078f 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -453,6 +453,8 @@ struct dp_netdev_rxq {
 unsigned intrvl_idx;   /* Write index for 'cycles_intrvl'. */
 struct dp_netdev_pmd_thread *pmd;  /* pmd thread that polls this queue. */
 bool is_vhost; /* Is rxq of a vhost port. */
+bool isolate;  /* Isolate the core to which this queue
+  is pinned.*/
 
 /* Counters of cycles spent successfully polling and processing pkts. */
 atomic_ullong cycles[RXQ_N_CYCLES];
@@ -4447,7 +4449,8 @@ dpif_netdev_set_config(struct dpif *dpif, const struct 
smap *other_config)
 
 /* Parses affinity list and returns result in 'core_ids'. */
 static int
-parse_affinity_list(const char *affinity_list, unsigned *core_ids, int n_rxq)
+parse_affinity_list(const char *affinity_list, unsigned *core_ids, int n_rxq,
+bool *isolate)
 {
 unsigned i;
 char *list, *copy, *key, *value;
@@ -4466,6 +4469,11 @@ parse_affinity_list(const char *affinity_list, unsigned 
*core_ids, int n_rxq)
 while (ofputil_parse_key_value(, , )) {
 int rxq_id, core_id;
 
+if (strcmp(key, "no-isol") == 0) {
+*isolate = false;
+continue;
+}
+
 if (!str_to_int(key, 0, _id) || rxq_id < 0
 || !str_to_int(value, 0, _id) || core_id < 0) {
 error = EINVAL;
@@ -4488,15 +4496,19 @@ dpif_netdev_port_set_rxq_affinity(struct dp_netdev_port 
*port,
 {
 unsigned *core_ids, i;
 int error = 0;
+bool isolate = true;
 
 core_ids = xmalloc(port->n_rxq * sizeof *core_ids);
-if (parse_affinity_list(affinity_list, core_ids, port->n_rxq)) {
+if (parse_affinity_list(affinity_list, core_ids, port->n_rxq, )) {
 error = EINVAL;
 

[ovs-dev] [PATCH 2/4] dpif-netdev: Least-loaded scheduling algorithm for rxqs

2021-06-29 Thread anurag2k
From: Anurag Agarwal 

The current algorithm for balancing unpinned rxqs over non-isolated pmds
assigns the rxqs in descending load order to pmds in a "zig-zag" fashion
(e.g. A,B,C,C,B,A,A,B,...). This simple heuristic relies on the pmds
being initially "empty" and produces optimal results only if the rxq loads
are not too disparate (e.g. 100,10,10,10,10,10,10).

The first pre-requisite will no longer be fulfilled when the isolation
of pmds with pinned rxqs is made optional in a subsequent patch.

To prepare for this change we introduce a least-loaded scheduling
algorithm. During rxq scheduling, we keep track of the number of assigned
rxqs and their processing cycles per non-isolated pmd and maintain the
array of pmds per numa node sorted according to their load. We still
assign the rxqs in descending load order, but always assign to the least
loaded pmd. This deals with the case of pmds with a-prioy load due to
pinned rxs and, as additional benefit, handles disparate rxq loads better.

If rxq processing cycles are not used for rxq scheduling, the estimated
pmd load is based on the number of assigned rxqs. In this case, the
least-loaded scheduling algorithm effectively results in a round-robin
assignment of rxqs to pmds so that the legacy code for round-robin
assignment could be removed.

Signed-off-by: Anurag Agarwal 
Signed-off-by: Jan Scheurich 
Signed-off-by: Rudra Surya Bhaskara Rao 
---
 lib/dpif-netdev.c | 267 +-
 tests/pmd.at  |   4 +-
 2 files changed, 166 insertions(+), 105 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index fd192db..1c458b2 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -800,6 +800,10 @@ struct dp_netdev_pmd_thread {
 
 /* Next time when PMD should try RCU quiescing. */
 long long next_rcu_quiesce;
+
+/* Number and cycles of assigned rxqs during rxq scheduling. */
+int ll_n_rxq;
+uint64_t ll_cycles;
 };
 
 /* Interface to netdev-based datapath. */
@@ -4905,6 +4909,62 @@ port_reconfigure(struct dp_netdev_port *port)
 return 0;
 }
 
+/* Sort PMDs in ascending order of load. If processing cycles are not used for
+ * rxq scheduling, the ll_cycles are zero. In that case the count of
+ * assigned rxqs is taken as PMD load, resulting in round-robin scheduling. */
+static int
+compare_pmd_load(const void *a, const void *b)
+{
+struct dp_netdev_pmd_thread *pmda;
+struct dp_netdev_pmd_thread *pmdb;
+
+pmda = *(struct dp_netdev_pmd_thread **) a;
+pmdb = *(struct dp_netdev_pmd_thread **) b;
+
+if (pmda->ll_cycles != pmdb->ll_cycles) {
+return (pmda->ll_cycles > pmdb->ll_cycles) ? 1 : -1;
+} else if (pmda->ll_n_rxq != pmdb->ll_n_rxq) {
+return (pmda->ll_n_rxq > pmdb->ll_n_rxq) ? 1 : -1;
+} else {
+/* Cycles and rxqs are the same so tiebreak on core ID.
+ * Tiebreaking (as opposed to return 0) ensures consistent
+ * sort results across multiple OS's. */
+return (pmda->core_id > pmdb->core_id) ? 1 : -1;
+}
+}
+
+/* Sort Rx Queues in descending order of processing cycles they
+ * are consuming. */
+static int
+compare_rxq_cycles(const void *a, const void *b)
+{
+struct dp_netdev_rxq *qa;
+struct dp_netdev_rxq *qb;
+uint64_t cycles_qa, cycles_qb;
+
+qa = *(struct dp_netdev_rxq **) a;
+qb = *(struct dp_netdev_rxq **) b;
+
+cycles_qa = dp_netdev_rxq_get_cycles(qa, RXQ_CYCLES_PROC_HIST);
+cycles_qb = dp_netdev_rxq_get_cycles(qb, RXQ_CYCLES_PROC_HIST);
+
+if (cycles_qa != cycles_qb) {
+return (cycles_qa < cycles_qb) ? 1 : -1;
+} else {
+/* Cycles are the same so tiebreak on port/queue id.
+ * Tiebreaking (as opposed to return 0) ensures consistent
+ * sort results across multiple OS's. */
+uint32_t port_qa = odp_to_u32(qa->port->port_no);
+uint32_t port_qb = odp_to_u32(qb->port->port_no);
+if (port_qa != port_qb) {
+return (port_qa > port_qb) ? 1 : -1;
+} else {
+return (netdev_rxq_get_queue_id(qa->rx)
+- netdev_rxq_get_queue_id(qb->rx));
+}
+}
+}
+
 struct rr_numa_list {
 struct hmap numas;  /* Contains 'struct rr_numa' */
 };
@@ -4917,9 +4977,6 @@ struct rr_numa {
 /* Non isolated pmds on numa node 'numa_id' */
 struct dp_netdev_pmd_thread **pmds;
 int n_pmds;
-
-int cur_index;
-bool idx_inc;
 };
 
 static struct rr_numa *
@@ -4976,49 +5033,12 @@ rr_numa_list_populate(struct dp_netdev *dp, struct 
rr_numa_list *rr)
 numa->n_pmds++;
 numa->pmds = xrealloc(numa->pmds, numa->n_pmds * sizeof *numa->pmds);
 numa->pmds[numa->n_pmds - 1] = pmd;
-/* At least one pmd so initialise curr_idx and idx_inc. */
-numa->cur_index = 0;
-numa->idx_inc = true;
 }
-}
-
-/*
- * Returns the next pmd from the numa node.
- *
- * If 'updown' is 'true' it will alternate between selecting the next pmd in
- * 

[ovs-dev] [PATCH 1/4] dpif-netdev: Refactor rxq auto-lb dry-run code.

2021-06-29 Thread anurag2k
From: Anurag Agarwal 

The current functions performing a dry-run of the allocation of
non-pinned rxqs to PMDs during rxq auto load-balancing duplicate most
of the code of the rxq_scheduling() function used during actual rxq
reconfiguration.

This is difficult to maintain and there are actually cases today where
the dry-run behaves differently to rxq_scheduling() which can cause the
auto-lb function to fail.

This patch refactors the pmd_rebalance_dry_run() function to rely on
the rxq_scheduling() function to perform the dry-run and only implement
the comparision of current and predicted PMD load on top. The resulting
code is not only shorter but also easier to understand.

It makes use of the fact that the "pmd" member of struct dp_netdev_rxq,
which is set as part of rxq_scheduling(), is only used under the
protection of dp->port_mutex and it is safe to temporarily change it
during the dry-run. Before the end of the dry-run the original values
are restored.

Signed-off-by: Anurag Agarwal 
Signed-off-by: Jan Scheurich 
Signed-off-by: Rudra Surya Bhaskara Rao 
---
 lib/dpif-netdev.c | 317 +-
 1 file changed, 123 insertions(+), 194 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index c5ab35d..fd192db 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4922,12 +4922,6 @@ struct rr_numa {
 bool idx_inc;
 };
 
-static size_t
-rr_numa_list_count(struct rr_numa_list *rr)
-{
-return hmap_count(>numas);
-}
-
 static struct rr_numa *
 rr_numa_list_lookup(struct rr_numa_list *rr, int numa_id)
 {
@@ -5075,9 +5069,11 @@ compare_rxq_cycles(const void *a, const void *b)
  * pmds to unpinned queues.
  *
  * The function doesn't touch the pmd threads, it just stores the assignment
- * in the 'pmd' member of each rxq. */
+ * in the 'pmd' member of each rxq. Skip logging in the case of an auto
+ * load-balancing dry_run. */
 static void
-rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex)
+rxq_scheduling(struct dp_netdev *dp, bool pinned, bool dry_run)
+OVS_REQUIRES(dp->port_mutex)
 {
 struct dp_netdev_port *port;
 struct rr_numa_list rr;
@@ -5152,38 +5148,44 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) 
OVS_REQUIRES(dp->port_mutex)
Round robin on the NUMA nodes that do have pmds. */
 non_local_numa = rr_numa_list_next(, non_local_numa);
 if (!non_local_numa) {
-VLOG_ERR("There is no available (non-isolated) pmd "
- "thread for port \'%s\' queue %d. This queue "
- "will not be polled. Is pmd-cpu-mask set to "
- "zero? Or are all PMDs isolated to other "
- "queues?", netdev_rxq_get_name(rxqs[i]->rx),
- netdev_rxq_get_queue_id(rxqs[i]->rx));
+if (!dry_run) {
+VLOG_ERR("There is no available (non-isolated) pmd "
+ "thread for port \'%s\' queue %d. This queue "
+ "will not be polled. Is pmd-cpu-mask set to "
+ "zero? Or are all PMDs isolated to other "
+ "queues?", netdev_rxq_get_name(rxqs[i]->rx),
+ netdev_rxq_get_queue_id(rxqs[i]->rx));
+}
 continue;
 }
 rxqs[i]->pmd = rr_numa_get_pmd(non_local_numa, assign_cyc);
-VLOG_WARN("There's no available (non-isolated) pmd thread "
-  "on numa node %d. Queue %d on port \'%s\' will "
-  "be assigned to the pmd on core %d "
-  "(numa node %d). Expect reduced performance.",
-  numa_id, netdev_rxq_get_queue_id(rxqs[i]->rx),
-  netdev_rxq_get_name(rxqs[i]->rx),
-  rxqs[i]->pmd->core_id, rxqs[i]->pmd->numa_id);
+if (!dry_run) {
+VLOG_WARN("There's no available (non-isolated) pmd thread "
+  "on numa node %d. Queue %d on port \'%s\' will "
+  "be assigned to the pmd on core %d "
+  "(numa node %d). Expect reduced performance.",
+  numa_id, netdev_rxq_get_queue_id(rxqs[i]->rx),
+  netdev_rxq_get_name(rxqs[i]->rx),
+  rxqs[i]->pmd->core_id, rxqs[i]->pmd->numa_id);
+}
 } else {
 rxqs[i]->pmd = rr_numa_get_pmd(numa, assign_cyc);
-if (assign_cyc) {
-VLOG_INFO("Core %d on numa node %d assigned port \'%s\' "
-  "rx queue %d "
-  "(measured processing cycles %"PRIu64").",
-  rxqs[i]->pmd->core_id, numa_id,
-  netdev_rxq_get_name(rxqs[i]->rx),
-  netdev_rxq_get_queue_id(rxqs[i]->rx),
-

[ovs-dev] [PATCH 0/4] dpif-netdev: rxq auto-lb improvements

2021-06-29 Thread anurag2k
From: Anurag Agarwal 

=   Disclaimer  ==
This patch set was prepared and verified downstream in early 2021. A very 
similar set of patches with auto load balance enhancements has recently been
submitted by Red Hat:
https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/383618.html. 

We acknowledge the large overlap between both patch sets and submit our set
of patches to underline the importance of these improvements and offer our own
implementation as an alternative proposal. We are very much open to discussing
the best way forward in the OVS-DPDK community to merge all improvements
together in one patch set.
===

The rxq auto-load balancing in dpif-netdev is an important function
for maximizing the total throughput of the OVS-DPDK datapath in cloud
deployments with unknown and highly variable load on a dynamically
changing set of vhostuser ports on top of a static configuration of
physical ports and PMD threads.

We have found in live tests that the efficacy of auto-lb can be
further improved by giving OVS more freedom in distributing rxqs,
e.g. to assign phy rxqs to NUMA-remote PMDs(cross-NUMA polling)  or
vhostuser rxqs to PMDs with pinned phy rxqs.

The following two interface configuration knobs have been introduced
to relax the constraints on rxq_scheduling() function.

ovs-vsctl set interface   \
other_config : pmd-rxq-affinity = rxq1:cpu1,rxq2:cpu2,...[,no-isol]

ovs-vsctl set interface  other_config:cross-numa-polling=true

The auto-lb dry-run code is refactored to rely on rxq_scheduling() function
in order to get rid of duplicated code and inconsistencies.

Finally a new least-loaded scheduling algorithm is implemented to assign
RxQs to PMDs based on the PMD load to achieve equally balanced traffic
load to PMDs.

Anurag Agarwal (4):
  dpif-netdev: Refactor rxq auto-lb dry-run code.
  dpif-netdev: Least-loaded scheduling algorithm for rxqs
  dpif-netdev: pmd-rxq-affinity with optional PMD isolation
  dpif-netdev: Allow cross-NUMA polling on selected ports

 Documentation/topics/dpdk/pmd.rst |  46 ++-
 lib/dpif-netdev.c | 612 --
 tests/pmd.at  |  59 +++-
 vswitchd/vswitch.xml  |  31 +-
 4 files changed, 442 insertions(+), 306 deletions(-)

-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] reconnect: Add graceful reconnect.

2021-06-29 Thread Dumitru Ceara
Until now clients that needed to reconnect immediately could only use
reconnect_force_reconnect().  However, reconnect_force_reconnect()
doesn't reset the backoff for connections that were alive long enough
(more than backoff seconds).

Moreover, the reconnect library cannot determine the exact reason why a
client wishes to initiate a reconnection.  In most cases reconnection
happens because of a fatal error when communicating with the remote,
e.g., in the ovsdb-cs layer, when invalid messages are received from
ovsdb-server.  In such cases it makes sense to not reset the backoff
because the remote seems to be unhealthy.

There are however cases when reconnection is needed for other reasons.
One such example is when ovsdb-clients require "leader-only" connections
to clustered ovsdb-server databases.  Whenever the client determines
that the remote is not a leader anymore, it decides to reconnect to a
new remote from its list, searching for the new leader.  Using
jsonrpc_force_reconnect() (which calls reconnect_force_reconnect()) will
not reset backoff even though the former leader is still likely in good
shape.

Since 3c2d6274bcee ("raft: Transfer leadership before creating
snapshots.") leadership changes inside the clustered database happen
more often and therefore "leader-only" clients need to reconnect more
often too.  Not resetting the backoff every time a leadership change
happens will cause all reconnections to happen with the maximum backoff
(8 seconds) resulting in significant latency.

This commit also updates the Python reconnect and IDL implementations
and adds tests for force-reconnect and graceful-reconnect.

Reported-at: https://bugzilla.redhat.com/1977264
Signed-off-by: Dumitru Ceara 
---
 lib/jsonrpc.c   |   7 +
 lib/jsonrpc.h   |   1 +
 lib/ovsdb-cs.c  |  32 ++--
 lib/reconnect.c |  36 -
 lib/reconnect.h |   1 +
 python/ovs/db/idl.py|   8 +-
 python/ovs/jsonrpc.py   |   3 +
 python/ovs/reconnect.py |  34 +++-
 tests/reconnect.at  | 344 
 tests/test-reconnect.c  |   7 +
 tests/test-reconnect.py |   5 +
 11 files changed, 454 insertions(+), 24 deletions(-)

diff --git a/lib/jsonrpc.c b/lib/jsonrpc.c
index 926cbcb86e..66b7f9ef4c 100644
--- a/lib/jsonrpc.c
+++ b/lib/jsonrpc.c
@@ -1267,6 +1267,13 @@ jsonrpc_session_force_reconnect(struct jsonrpc_session 
*s)
 reconnect_force_reconnect(s->reconnect, time_msec());
 }
 
+/* Makes 's' gracefully drop its connection (if any) and reconnect. */
+void
+jsonrpc_session_graceful_reconnect(struct jsonrpc_session *s)
+{
+reconnect_graceful_reconnect(s->reconnect, time_msec());
+}
+
 /* Sets 'max_backoff' as the maximum time, in milliseconds, to wait after a
  * connection attempt fails before attempting to connect again. */
 void
diff --git a/lib/jsonrpc.h b/lib/jsonrpc.h
index d75d66b863..3a34424e22 100644
--- a/lib/jsonrpc.h
+++ b/lib/jsonrpc.h
@@ -136,6 +136,7 @@ void jsonrpc_session_get_reconnect_stats(const struct 
jsonrpc_session *,
 
 void jsonrpc_session_enable_reconnect(struct jsonrpc_session *);
 void jsonrpc_session_force_reconnect(struct jsonrpc_session *);
+void jsonrpc_session_graceful_reconnect(struct jsonrpc_session *);
 
 void jsonrpc_session_set_max_backoff(struct jsonrpc_session *,
  int max_backoff);
diff --git a/lib/ovsdb-cs.c b/lib/ovsdb-cs.c
index 911b71dd4f..6f16d29f6d 100644
--- a/lib/ovsdb-cs.c
+++ b/lib/ovsdb-cs.c
@@ -230,8 +230,10 @@ static void ovsdb_cs_transition_at(struct ovsdb_cs *, enum 
ovsdb_cs_state,
 #define ovsdb_cs_transition(CS, STATE) \
 ovsdb_cs_transition_at(CS, STATE, OVS_SOURCE_LOCATOR)
 
-static void ovsdb_cs_retry_at(struct ovsdb_cs *, const char *where);
-#define ovsdb_cs_retry(CS) ovsdb_cs_retry_at(CS, OVS_SOURCE_LOCATOR)
+static void ovsdb_cs_retry_at(struct ovsdb_cs *, bool graceful,
+  const char *where);
+#define ovsdb_cs_retry(CS, GRACEFUL) \
+ovsdb_cs_retry_at((CS), (GRACEFUL), OVS_SOURCE_LOCATOR)
 
 static struct vlog_rate_limit syntax_rl = VLOG_RATE_LIMIT_INIT(1, 5);
 
@@ -400,9 +402,21 @@ ovsdb_cs_send_request(struct ovsdb_cs *cs, struct 
jsonrpc_msg *request)
 }
 
 static void
-ovsdb_cs_retry_at(struct ovsdb_cs *cs, const char *where)
+ovsdb_cs_reconnect(struct ovsdb_cs *cs, bool graceful)
 {
-ovsdb_cs_force_reconnect(cs);
+if (cs->session) {
+if (graceful) {
+jsonrpc_session_graceful_reconnect(cs->session);
+} else {
+jsonrpc_session_force_reconnect(cs->session);
+}
+}
+}
+
+static void
+ovsdb_cs_retry_at(struct ovsdb_cs *cs, bool graceful, const char *where)
+{
+ovsdb_cs_reconnect(cs, graceful);
 ovsdb_cs_transition_at(cs, CS_S_RETRY, where);
 }
 
@@ -438,7 +452,7 @@ ovsdb_cs_process_response(struct ovsdb_cs *cs, struct 
jsonrpc_msg *msg)
  ovsdb_cs_state_to_string(cs->state),
  s);
 free(s);
-

Re: [ovs-dev] [v4 02/12] dpif-netdev: Add auto validation function for miniflow extract

2021-06-29 Thread Eelco Chaudron



On 29 Jun 2021, at 13:05, Van Haaren, Harry wrote:


Hi Eelco,

Would you describe the actual test being run below?

I'm having a hard time figuring out what the actual datapath packet 
flow is. It seems strange
that MFEX optimizations are affected by flow-count, that doesn't 
really logically make sense.

Hence, some more understanding on what the test setup is may help.



This is using the standard PVP scenario with ovs_perf as explained here:

[ovs_perf](https://github.com/chaudron/ovs_perf#automated-open-vswitch-pvp-testing)


To remove complexity & noise from the setup: does running a simple 
Phy-to-Phy test with L2 bridging
cause any perf degradation? If so, please describe that exact setup 
and I'll try to reproduce/replicate results here.


I’ll try to do some more testing later this week, and get back.


Regards, -Harry

PS: Apologies for top post/html email, is my mail client acting 
strange, or was this already a html email on list?
Changing it back to plain-text causes loss of all > previous reply 
indentation…


From: Eelco Chaudron 
Sent: Friday, June 25, 2021 2:00 PM
To: Amber, Kumar 
Cc: d...@openvswitch.org; i.maxim...@ovn.org; Van Haaren, Harry 
; Flavio Leitner ; 
Stokes, Ian 
Subject: Re: [ovs-dev] [v4 02/12] dpif-netdev: Add auto validation 
function for miniflow extract



Hi Kumar,

I plan to review this patch set, but I need to go over the dpif AVX512 
set first to get a better understanding.


However, I did run some performance tests on old hardware (as I do not 
have an AVX512 system) and notice some degradation (and improvements). 
This was a single run for both scenarios, with the following results 
(based on ovs_perf), on a single Intel(R) Xeon(R) CPU E5-2690 v4 @ 
2.60GHz:


   Number of flows 64  128 256 512 768 1024
1514


Delta
10 1.48% 1.72% 1.59% -0.12% 0.44% 6.99% 7.31%
1000 1.06% -0.73% -1.46% -1.42% -2.54% -0.20% -0.98%
1 -0.93% -1.62% -0.32% -1.50% -0.30% -0.56% 0.19%
10 0.39% -0.05% -0.60% -0.51% -0.90% 1.24% -1.10%

Master
10 4767168 4601575 4382381 4127125 3594158 2932787 2400479
100 3804956 3612716 3547054 3127117 2950324 2615856 2133892
1000 3251959 3257535 2985693 2869970 2549086 2286262 1979985
1 2671946 2624808 2536575 2412845 2190386 1952359 1699142

Patch
10 4838691 4682131 4453022 4122100 3609915 3153228 2589748
100 3845585 3586650 3496167 3083467 2877265 2610640 2113108
1000 3221894 3205732 2976203 2827620 2541349 2273468 1983794
1 2682461 2623585 2521419 2400627 2170751 1976909 1680607

Zero loss for master 5.8% (3,452,306pps) vs on Patch 5.7% 
(3,392,783pps).


Did you guys do any tests like this? I think it would be good not only 
to know the improvement but also the degradation of existing systems 
without AVX512.


I see Ian is currently reviewing the v4 and was wondering if you plan 
to send the v5 soon, if so, I hold off a bit, and do the v5 rather 
than doing the v4 and verify it’s not something Ian mentioned.


Cheers,

Eelco

On 17 Jun 2021, at 18:27, Kumar Amber wrote:

This patch introduced the auto-validation function which
allows users to compare the batch of packets obtained from
different miniflow implementations against the linear
miniflow extract and return a hitmask.

The autovaidator function can be triggered at runtime using the
following command:

$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator

Signed-off-by: Kumar Amber 
mailto:kumar.am...@intel.com>>
Co-authored-by: Harry van Haaren 
mailto:harry.van.haa...@intel.com>>
Signed-off-by: Harry van Haaren 
mailto:harry.van.haa...@intel.com>>

---
lib/dpif-netdev-private-extract.c | 141 ++
lib/dpif-netdev-private-extract.h | 15 
lib/dpif-netdev.c | 2 +-
3 files changed, 157 insertions(+), 1 deletion(-)

diff --git a/lib/dpif-netdev-private-extract.c 
b/lib/dpif-netdev-private-extract.c

index fcc56ef26..0741c19f9 100644
--- a/lib/dpif-netdev-private-extract.c
+++ b/lib/dpif-netdev-private-extract.c
@@ -32,6 +32,11 @@ VLOG_DEFINE_THIS_MODULE(dpif_netdev_extract);

/* Implementations of available extract options. */
static struct dpif_miniflow_extract_impl mfex_impls[] = {
+ {
+ .probe = NULL,
+ .extract_func = dpif_miniflow_extract_autovalidator,
+ .name = "autovalidator",
+ },
{
.probe = NULL,
.extract_func = NULL,
@@ -84,3 +89,139 @@ dpif_miniflow_extract_info_get(struct 
dpif_miniflow_extract_impl **out_ptr)

*out_ptr = mfex_impls;
return ARRAY_SIZE(mfex_impls);
}
+
+uint32_t
+dpif_miniflow_extract_autovalidator(struct dp_packet_batch *packets,
+ struct netdev_flow_key *keys,
+ uint32_t keys_size, odp_port_t in_port,
+ void *pmd_handle)
+{
+ const size_t cnt = dp_packet_batch_size(packets);
+ uint16_t good_l2_5_ofs[NETDEV_MAX_BURST];
+ uint16_t good_l3_ofs[NETDEV_MAX_BURST];
+ uint16_t good_l4_ofs[NETDEV_MAX_BURST];
+ uint16_t good_l2_pad_size[NETDEV_MAX_BURST];
+ struct dp_packet *packet;
+ struct dp_netdev_pmd_thread *pmd = pmd_handle;
+ struct dpif_miniflow_extract_impl *miniflow_funcs;
+
+ 

Re: [ovs-dev] [v4 02/12] dpif-netdev: Add auto validation function for miniflow extract

2021-06-29 Thread Van Haaren, Harry
Hi Eelco,

Would you describe the actual test being run below?

I'm having a hard time figuring out what the actual datapath packet flow is. It 
seems strange
that MFEX optimizations are affected by flow-count, that doesn't really 
logically make sense.
Hence, some more understanding on what the test setup is may help.

To remove complexity & noise from the setup: does running a simple Phy-to-Phy 
test with L2 bridging
cause any perf degradation? If so, please describe that exact setup and I'll 
try to reproduce/replicate results here.

Regards, -Harry

PS: Apologies for top post/html email, is my mail client acting strange, or was 
this already a html email on list?
Changing it back to plain-text causes loss of all > previous reply indentation…

From: Eelco Chaudron 
Sent: Friday, June 25, 2021 2:00 PM
To: Amber, Kumar 
Cc: d...@openvswitch.org; i.maxim...@ovn.org; Van Haaren, Harry 
; Flavio Leitner ; Stokes, Ian 

Subject: Re: [ovs-dev] [v4 02/12] dpif-netdev: Add auto validation function for 
miniflow extract


Hi Kumar,

I plan to review this patch set, but I need to go over the dpif AVX512 set 
first to get a better understanding.

However, I did run some performance tests on old hardware (as I do not have an 
AVX512 system) and notice some degradation (and improvements). This was a 
single run for both scenarios, with the following results (based on ovs_perf), 
on a single Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz:

   Number of flows 64  128 256 512 768 10241514

Delta
10 1.48% 1.72% 1.59% -0.12% 0.44% 6.99% 7.31%
1000 1.06% -0.73% -1.46% -1.42% -2.54% -0.20% -0.98%
1 -0.93% -1.62% -0.32% -1.50% -0.30% -0.56% 0.19%
10 0.39% -0.05% -0.60% -0.51% -0.90% 1.24% -1.10%

Master
10 4767168 4601575 4382381 4127125 3594158 2932787 2400479
100 3804956 3612716 3547054 3127117 2950324 2615856 2133892
1000 3251959 3257535 2985693 2869970 2549086 2286262 1979985
1 2671946 2624808 2536575 2412845 2190386 1952359 1699142

Patch
10 4838691 4682131 4453022 4122100 3609915 3153228 2589748
100 3845585 3586650 3496167 3083467 2877265 2610640 2113108
1000 3221894 3205732 2976203 2827620 2541349 2273468 1983794
1 2682461 2623585 2521419 2400627 2170751 1976909 1680607

Zero loss for master 5.8% (3,452,306pps) vs on Patch 5.7% (3,392,783pps).

Did you guys do any tests like this? I think it would be good not only to know 
the improvement but also the degradation of existing systems without AVX512.

I see Ian is currently reviewing the v4 and was wondering if you plan to send 
the v5 soon, if so, I hold off a bit, and do the v5 rather than doing the v4 
and verify it’s not something Ian mentioned.

Cheers,

Eelco

On 17 Jun 2021, at 18:27, Kumar Amber wrote:

This patch introduced the auto-validation function which
allows users to compare the batch of packets obtained from
different miniflow implementations against the linear
miniflow extract and return a hitmask.

The autovaidator function can be triggered at runtime using the
following command:

$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator

Signed-off-by: Kumar Amber mailto:kumar.am...@intel.com>>
Co-authored-by: Harry van Haaren 
mailto:harry.van.haa...@intel.com>>
Signed-off-by: Harry van Haaren 
mailto:harry.van.haa...@intel.com>>
---
lib/dpif-netdev-private-extract.c | 141 ++
lib/dpif-netdev-private-extract.h | 15 
lib/dpif-netdev.c | 2 +-
3 files changed, 157 insertions(+), 1 deletion(-)

diff --git a/lib/dpif-netdev-private-extract.c 
b/lib/dpif-netdev-private-extract.c
index fcc56ef26..0741c19f9 100644
--- a/lib/dpif-netdev-private-extract.c
+++ b/lib/dpif-netdev-private-extract.c
@@ -32,6 +32,11 @@ VLOG_DEFINE_THIS_MODULE(dpif_netdev_extract);

/* Implementations of available extract options. */
static struct dpif_miniflow_extract_impl mfex_impls[] = {
+ {
+ .probe = NULL,
+ .extract_func = dpif_miniflow_extract_autovalidator,
+ .name = "autovalidator",
+ },
{
.probe = NULL,
.extract_func = NULL,
@@ -84,3 +89,139 @@ dpif_miniflow_extract_info_get(struct 
dpif_miniflow_extract_impl **out_ptr)
*out_ptr = mfex_impls;
return ARRAY_SIZE(mfex_impls);
}
+
+uint32_t
+dpif_miniflow_extract_autovalidator(struct dp_packet_batch *packets,
+ struct netdev_flow_key *keys,
+ uint32_t keys_size, odp_port_t in_port,
+ void *pmd_handle)
+{
+ const size_t cnt = dp_packet_batch_size(packets);
+ uint16_t good_l2_5_ofs[NETDEV_MAX_BURST];
+ uint16_t good_l3_ofs[NETDEV_MAX_BURST];
+ uint16_t good_l4_ofs[NETDEV_MAX_BURST];
+ uint16_t good_l2_pad_size[NETDEV_MAX_BURST];
+ struct dp_packet *packet;
+ struct dp_netdev_pmd_thread *pmd = pmd_handle;
+ struct dpif_miniflow_extract_impl *miniflow_funcs;
+
+ int32_t mfunc_count = dpif_miniflow_extract_info_get(_funcs);
+ if (mfunc_count < 0) {
+ pmd->miniflow_extract_opt = NULL;
+ VLOG_ERR("failed to get miniflow extract function implementations\n");
+ return 0;
+ }
+ ovs_assert(keys_size >= cnt);
+ struct netdev_flow_key test_keys[NETDEV_MAX_BURST];
+
+ /* 

  1   2   >