[ovs-dev] [PATCH ovn] Set release date for 21.06.0.

2021-06-25 Thread numans
From: Numan Siddique 

And also prepare for 21.6.90.  This was missed out when v21.06.0 was
released.

Signed-off-by: Numan Siddique 
---
 NEWS | 6 +-
 configure.ac | 2 +-
 debian/changelog | 8 +++-
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/NEWS b/NEWS
index 0da7d8f97c..e779892085 100644
--- a/NEWS
+++ b/NEWS
@@ -1,4 +1,8 @@
-OVN v21.06.0 - 11 May 2021
+Post-v21.06.0
+-
+
+
+OVN v21.06.0 - 18 Jun 2021
 -
   - ovn-northd-ddlog: New implementation of northd, based on DDlog.  This
 implementation is incremental, meaning that it only recalculates what is
diff --git a/configure.ac b/configure.ac
index 53034388a3..df0b982952 100644
--- a/configure.ac
+++ b/configure.ac
@@ -13,7 +13,7 @@
 # limitations under the License.
 
 AC_PREREQ(2.63)
-AC_INIT(ovn, 21.06.0, b...@openvswitch.org)
+AC_INIT(ovn, 21.06.90, b...@openvswitch.org)
 AC_CONFIG_MACRO_DIR([m4])
 AC_CONFIG_AUX_DIR([build-aux])
 AC_CONFIG_HEADERS([config.h])
diff --git a/debian/changelog b/debian/changelog
index 1667407305..81aaed3079 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,8 +1,14 @@
+ovn (21.06.90-1) unstable; urgency=low
+
+   * New upstream version
+
+ -- OVN team   Fri, 18 Jun 2021 13:21:08 -0400
+
 ovn (21.06.0-1) unstable; urgency=low
 
* New upstream version
 
- -- OVN team   Fri, 11 May 2021 12:00:00 -0500
+ -- OVN team   Fri, 18 Jun 2021 13:21:08 -0400
 
 ovn (21.03.0-1) unstable; urgency=low
 
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH RFC ovn 0/1] RFC: Logical flow generation in ovn-controller

2021-06-25 Thread numans
From: Numan Siddique 

This is a an RFC patch to move the logical flow generation from
ovn-northd to ovn-controller.

The same patch can be found here : 
https://github.com/numansiddique/ovn/tree/lflows_in_controller_internal_rfc_v1

This patch doesn't move all the generation to ovn-controller.
ovn-northd still does the flow generation for
  - ACLs/Port groups.
  - DHCP options
  - Multicast groups
  - And few others.

Other than ACLs and Port groups it would be possible to move
flow generation for the above mentioned things.  But before doing
all that, it is worth to evaluate if the proposed RFC makes sense.

The main motivation for this RFC effort is to
  - Address scale issues seen.  For large scale deployments
ovn-northd takes lot of CPU for the computation (ovn-northd-ddlog
should help here) and memory and so does Southbound ovsdb-servers.

  - Having a very huge southbound DB and logical flows affects the
raft consenses and it requires increasing the raft election timers.

  - Logical flows contributes majorly to the overall south bound DB.

This RFC demonstrates that it is possible for each ovn-controller
to generate logical flows.

These are some of the findings with my general and scale testing.

Below are the test findings with a huge pre-existing Northbound database with
datapath groups enabled with a size of 13 M
  - Southbound DB size is:
 * with ovn-northd-master - 35 M
 * with ovn-northd-proposed-rfc - 12M

  - Number of logical flows:
 * with ovn-northd-master - 78581
 * with ovn-northd-proposed-rfc - 7933

  - RSS Memory consumption of
* ovn-northd-master -   441368 KiB
* ovn-northd-proposed-rfc - 115540 KiB

* ovn-controller-master -   1267716 KiB
* ovn-controller-proposed-rfc - 915876 KiB

* SB ovsdb-server-with-ovn-master -   612296 KiB
* SB ovsdb-server-with-proposed-rfc-ovn - 134680 KiB

With the scale testing of 500 fake multinode nodes and each node
creating having a few port bindings claimed,  the end result is
almost similar.  No signifcant improvements seen with the proposed
RFC patch.  The results are identical.

I think more scale testing needs to be done to determine if
the CPU usage and memory usage reduction in the ovn-northd and
ovsdb-servers will have a major impact or not.  Testing with
a real Kubernetes/Openstack deployments would help.

Few Observations
  -  It is possible to move the flow generation to each ovn-controller.

  -  Each ovn-controller only generates the logical flows if required
 i.e if the datapath is in the 'local_datapaths'.

  -  This RFC patch do complicate ovn-controller code which has already
 many complicated bits.

  -  I was expecting the scale test results to improve and the
 end-to-end time of a pod/VM creation would be quicker. But that is
 not the case, which is a disappointment.

Submitting this RFC patch to get a feed back and have conversation
if it is worth the effort.

Numan Siddique (1):
  RFC: Logical flow generation in ovn-controller

 controller/automake.mk  |4 +-
 controller/binding.c|  379 +---
 controller/binding.h|   13 -
 controller/lflow-generate.c |  179 ++
 controller/lflow-generate.h |   49 +
 controller/lflow.c  |  421 +++--
 controller/lflow.h  |8 +
 controller/lport.c  |   16 -
 controller/lport.h  |4 -
 controller/ovn-controller.c |  674 ++-
 controller/ovn-controller.h |   34 +-
 controller/patch.c  |1 +
 controller/physical.c   |   58 +-
 controller/pinctrl.c|   18 +-
 lib/automake.mk |6 +-
 lib/lb.c|   27 +
 lib/lb.h|2 +
 lib/ldata.c |  895 +
 lib/ldata.h |  251 +++
 lib/lflow.c | 3514 +++
 lib/lflow.h |  333 
 lib/ovn-util.c  |   83 +
 lib/ovn-util.h  |   32 +
 northd/ovn-northd.c | 3359 +++--
 ovn-sb.ovsschema|   16 +-
 ovn-sb.xml  |   16 +
 utilities/ovn-dbctl.c   |7 +-
 utilities/ovn-dbctl.h   |3 +-
 utilities/ovn-sbctl.c   |  256 +++
 29 files changed, 6980 insertions(+), 3678 deletions(-)
 create mode 100644 controller/lflow-generate.c
 create mode 100644 controller/lflow-generate.h
 create mode 100644 lib/ldata.c
 create mode 100644 lib/ldata.h
 create mode 100644 lib/lflow.c
 create mode 100644 lib/lflow.h

-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] datapath-windows: Specify external include paths

2021-06-25 Thread Guru Shetty
Alin,
 Does it make sense to apply this patch to stable branches too. I see that
2.15 fails with a similar error.

Thanks,
Guru

On Thu, 17 Jun 2021 at 02:31, Alin-Gabriel Serdean  wrote:

> On Wed, 2021-06-16 at 18:06 +0300, Alin-Gabriel Serdean wrote:
> > On Tue, 2021-06-15 at 18:06 +0200, Ilya Maximets wrote:
> > > On 6/15/21 3:43 PM, Alin Gabriel Serdean wrote:
> > > > VStudio 16.10 adds usermode includes before including the driver
> > > > kit ones.
> > > >
> > > > Bug tracked at:
> > > >
> https://developercommunity.visualstudio.com/t/error-lnk2019-unresolved-external-symbol-stdio-com/1434674
> > > >
> > > > Fixes appveyor build reported by forcing external includes.
> > >
> > > Thanks, Alin.  I know nothing about the windows build process, but
> > > I
> > > see
> > > that this patch fixes the issue with the current AppVeyor CI,
> > > therefore:
> > >
> > > Acked-by: Ilya Maximets 
> >
> > Thank you!
> >
> > > Out of curiosity, is this change backward compatible?  I mean,
> > > is it possible to build on older platform (older VS) with this
> > > change?
> >
> > It should be.
> > Usually we do not need to force the order of include directories. For
> > kernel projects it should default to the kernel includes.
> > I test with the last two versions of VS (2019, 2017).
> >
> > We should add a build matrix for different versions of VS images to
> > appveyor / GHA so we could be sure.
> > I'll try to update the appveyor side.
> >
> > FWIW a new version of VS was launched yesterday (
> >
> https://docs.microsoft.com/en-us/visualstudio/releases/2019/release-notes#visual-studio-2019-version-1610-releases
> > ), I will try to compile
> > without the patch to see if they hotfixed the issue.
>
> It did not. Applying the patch.
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] ovn-controller: Fix port group I-P when they contain non-vif ports.

2021-06-25 Thread Numan Siddique
On Fri, Jun 25, 2021 at 2:53 PM Han Zhou  wrote:
>
> On Fri, Jun 25, 2021 at 4:50 AM Dumitru Ceara  wrote:
> >
> > It's valid that port_groups contain non-vif ports, they can actually
> > contain any type of logical_switch_port.
> >
> > Also, there's no need to allocate a new sset containing the local ports'
> > names every time the I-P engine processes a change, we can maintain a
> > sset and incrementally update it when port bindings are added/removed.
> >
> Thanks Dumitru for the fix and thanks Numan for the review.
>
> I battled with myself when deciding to allocate a new sset for the local
> ports' names at the I-P main iteration level. I did this because the
> current data structures maintaining the local bindings were already very
> complex, and the sset was not to maintain any extra information but just
> redundant information (for efficiency). So I decided to abstract this part
> as a helper function so that I don't add more complexity in the binding
> data structure, and other I-P engine nodes doesn't need to understand the
> internals of how the bindings are maintained in the bindings module.
> Regarding the cost, the local binding data should be small, and the sset
> allocation is at the main loop level, so really nothing to worry about the
> cost.
>
> However, I didn't think about the non-VIF use case of port-group, and the
> local_binding doesn't maintain non-VIF bindings, so came the bug. This
> patch fixes it by maintaining a sset that includes all types of lport
> names. It looks correct to me, but I have some comments:
>
> 1) The structures in bindings module is really complex and I spent a lot of
> time to understand it before, but when I am reviewing this patch I had to
> spent some time again to understand it. There are fields in binding_ctx
> look quite similar, and the comments don't tell exactly the difference:
>
> - struct local_binding_data *lbinding_data;
> - struct sset *local_lports;
> - struct sset *local_lport_ids;
>

I agree with the complexity and the naming confusion.

I think local_lports and local_lport_ids have been maintained in the
binding code
since a long time and lbinding_data was added recently.

I think there is a lot of redundant data which can be unified.


> According to the code (and also the bug the patch is trying to fix),
> lbinding_data is supposed to maintain VIFs only.

I agree.  "lbinding_data" is supposed to maintain local vif binding information.

> local_lports maintains all types, but it maintains *potentially* local VIFs
> as well (meaning the VIF may not be bound locally yet). I was thinking if I
> could use local_lports directly. I think it would work, but just not
> accurate enough (maybe it doesn't really matter).


> The local_lport_ids may look straightforward, which maintains generated id
> keys for local_lports, but the functions update_local_lport_ids() and
> remove_local_lport_ids() are not only updating the local_lport_ids, but
> also tracking information of lbinding_data, which is really confusing.
>
> 2) Now for this patch, the intention is to include non-VIF bindings, but it
> adds a sset to maintain all types of lports in "lbinding_data", which was
> supposed to maintain VIF bindings only. I think it is not the right place
> to maintain this sset. And the
> update_local_lport_ids()/remove_local_lport_ids() are not the right place
> to update them either.
>
> So here are my suggestions:
>
> 1) Clarify a little more about the role of each of the above fields in
> binding_ctx with some comments.

These comments would be super helpful.  But I think it is outside the
scope of this bug fix patch.  It's better if it's a separate patch.

> 2) Can we use local_lports directly, instead of maintaining a new sset? If
> we can't (I am not sure yet), can we generate it on-the-fly, just updating
> the "binding_collect_local_binding_lports" by adding non-VIF lports from
> "local_lports"? I really don't think the cost makes any difference overall.
> If none of the above work, can we maintain it as a separate field at
> binding_ctx level instead of in lbinding_data (with proper comment telling
> the difference from local_lports)?

I think local_lports can be used.  The side effect would be that we
will be allocating
the zone id even for patch ports.

Thanks
Numan

>
> (+1 for Numan's comment regarding test case)
>
> I hope this makes sense. Thanks again for the fix!
> Han
>
> > Reported-at: https://github.com/ovn-org/ovn/pull/61#issuecomment-865094163
> > Reported-by: Antonio Ojea 
> > Fixes: 0cfeba6b55e3 ("ovn-controller: Fix port group conjunction flow
> explosion problem.")
> > Signed-off-by: Dumitru Ceara 
> > ---
> >  controller/binding.c| 25 +
> >  controller/binding.h| 11 ++-
> >  controller/ovn-controller.c | 24 +++-
> >  3 files changed, 14 insertions(+), 46 deletions(-)
> >
> > diff --git a/controller/binding.c b/controller/binding.c
> > index 

Re: [ovs-dev] [External] : [PATCH ovn v3] ovn-northd.c: Add proxy ARP support to OVN

2021-06-25 Thread Numan Siddique
On Thu, Jun 24, 2021 at 5:14 AM Brendan Doyle  wrote:
>
> Hi,
>
> Just wondering how to move this along. It's been in the queue a while now.
> I've made the requested changes, this is v3 of the patch.
>
> The product I'm working on requires this feature and If I can't get it
> upstream
> I'll need to look at other options.

Hi Brendan,

I just can't apply your patch cleanly.  The same issue I faced with
your previous version.
Normally when patch is submitted, the ovsrobot applies the patch on
top of the master
and runs the tests.  It generally creates a branch with the patch series number.

In your case, the series number is 249668
(https://patchwork.ozlabs.org/project/ovn/list/?series=249668)
and the branch is here -
https://github.com/ovsrobot/ovn/commits/series_249668 and I don't see
your patch applied cleanly.  So I'm pretty sure something is wrong
with your patch format.

Either you can resend the patch or send a github PR.

Thanks
Numan

>
>
> Thanks
>
> Brendan
>
> On 18/06/2021 14:53, Brendan Doyle wrote:
> > From 634fd88b726700b30cb76203ca45f1e9c041368a Mon Sep 17 00:00:00 2001
> > From: Brendan Doyle 
> > Date: Fri, 28 May 2021 10:01:17 -0700
> > Subject: [PATCH ovn] ovn-northd.c: Add proxy ARP support to OVN
> >
> > This patch provides the ability to configure proxy ARP IPs on a Logical
> > Switch Router port. The IPs are added as Options for router ports. This
> > provides a useful feature where traffic for a service must be sent to an
> > address in a logical network address space, but the service is provided
> > in a different network. For example an NFS service is provide to Logical
> > networks at an address in their Logical network space, but the NFS
> > server resides in a physical network. A Logical switch Router port can
> > be configured to respond to ARP requests sent to the service "Logical
> > address", the Logical Router/Gateway can then be configured to forward
> > the traffic to the underlay/physical network.
> >
> > Signed-off-by: Brendan Doyle 
> > ---
> >  northd/ovn-northd.c |  46 +++
> >  ovn-nb.xml  |   9 +
> >  tests/ovn.at| 103
> > 
> >  3 files changed, 158 insertions(+)
> >
> > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > index 0e5092a..2bc6f28 100644
> > --- a/northd/ovn-northd.c
> > +++ b/northd/ovn-northd.c
> > @@ -6943,6 +6943,7 @@ build_lswitch_arp_nd_responder_known_ips(struct
> > ovn_port *op,
> >   struct ds *match)
> >  {
> >  if (op->nbsp) {
> > +const char *arp_proxy;
> >  if (!strcmp(op->nbsp->type, "virtual")) {
> >  /* Handle
> >   *  - GARPs for virtual ip which belongs to a logical port
> > @@ -7096,6 +7097,51 @@ build_lswitch_arp_nd_responder_known_ips(struct
> > ovn_port *op,
> >  }
> >  }
> >  }
> > +
> > +/*
> > + * Add responses for ARP proxies.
> > + */
> > +arp_proxy = smap_get(>nbsp->options,"arp_proxy");
> > +if (arp_proxy && op->peer) {
> > +struct lport_addresses proxy_arp_addrs;
> > +int i = 0;
> > +
> > +if (extract_ip_addresses(arp_proxy, _arp_addrs)) {
> > +/*
> > + * Match rule on all proxy ARP IPs.
> > + */
> > +ds_clear(match);
> > +ds_put_cstr(match, "arp.op == 1 && (");
> > +
> > +for (i = 0; i < proxy_arp_addrs.n_ipv4_addrs; i++) {
> > +if (i > 0) {
> > +ds_put_cstr(match, " || ");
> > +}
> > +ds_put_format(match, "arp.tpa == %s",
> > +proxy_arp_addrs.ipv4_addrs[i].addr_s);
> > +}
> > +
> > +ds_put_cstr(match, ")");
> > +destroy_lport_addresses(_arp_addrs);
> > +
> > +ds_clear(actions);
> > +ds_put_format(actions,
> > +"eth.dst = eth.src; "
> > +"eth.src = %s; "
> > +"arp.op = 2; /* ARP reply */ "
> > +"arp.tha = arp.sha; "
> > +"arp.sha = %s; "
> > +"arp.tpa <-> arp.spa; "
> > +"outport = inport; "
> > +"flags.loopback = 1; "
> > +"output;",
> > +op->peer->lrp_networks.ea_s,
> > +op->peer->lrp_networks.ea_s);
> > +
> > +ovn_lflow_add_with_hint(lflows, op->od,
> > S_SWITCH_IN_ARP_ND_RSP,
> > +50, ds_cstr(match), ds_cstr(actions),
> > >nbsp->header_);
> > +}
> > +}
> >  }
> >  }
> >
> > diff --git a/ovn-nb.xml b/ovn-nb.xml
> > index 02fd216..e4a8114 100644
> > --- a/ovn-nb.xml
> > +++ b/ovn-nb.xml
> > @@ -848,6 +848,15 @@
> >  
> >
> >  
> > +
> > + 

Re: [ovs-dev] [PATCH v9] ofproto-dpif: APIs and CLI option to add/delete static fdb entry

2021-06-25 Thread Vasu Dasari
Hi Eelco. 

Thanks for the review. Will update with your suggestions. 

I am out of town. Will do this next week. 

Thanks
Vasu

> On Jun 25, 2021, at 8:55 AM, Eelco Chaudron  wrote:
> 
> 
> This one looks good, however, I would make some small changes.
> If you do, you can add my acked by to the new revision.
> 
> //Eelco
> 
> On 24 Jun 2021, at 17:57, Vasu Dasari wrote:
> 
> Currently there is an option to add/flush/show ARP/ND neighbor. This covers 
> L3 
> side. For L2 side, there is only fdb show command. This patch gives an option 
> to add/del an fdb entry via ovs-appctl.
> 
> CLI command looks like:
> 
> To add: 
> ovs-appctl fdb/add 
> ovs-appctl fdb/add br0 p1 0 50:54:00:00:00:05
> 
> To del: 
> ovs-appctl fdb/del
> ovs-appctl fdb/del br0 0 50:54:00:00:00:05
> 
> Added two new APIs to provide convenient interface to add and delete 
> static-macs. 
> bool xlate_add_static_mac_entry(const struct ofproto_dpif *, ofp_port_t 
> in_port, 
> struct eth_addr dl_src, int vlan); 
> bool xlate_delete_static_mac_entry(const struct ofproto_dpif *, 
> struct eth_addr dl_src, int vlan);
> 
> 1. Static entry should not age. To indicate that entry being programmed is a 
> static entry, 
> 'expires' field in 'struct mac_entry' will be set to a 
> MAC_ENTRY_AGE_STATIC_ENTRY. A 
> check for this value is made while deleting mac entry as part of regular 
> aging process. 
> 2. Another change to of mac-update logic, when a packet with same dl_src as 
> that of a
> static-mac entry arrives on any port, the logic will not modify the expires 
> field. 
> 3. While flushing fdb entries, made sure static ones are not evicted. 
> 4. Updated "ovs-appctl fdb/stats-show br0" to display numberof static entries 
> in switch
> 
> Added following tests: 
> ofproto-dpif - static-mac add/del/flush 
> ofproto-dpif - static-mac mac moves
> 
> Signed-off-by: Vasu Dasari  
> Reported-at: 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048894.html 
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1597752 
> Tested-by: Eelco Chaudron  
> --- 
> v1: 
> - Fixed 0-day robot warnings 
> v2: 
> - Fix valgrind error in the modified code in mac_learning_insert() where a 
> read is 
> is performed on e->expires which is not initialized 
> v3: 
> - Addressed code review comments 
> - Added more documentation 
> - Fixed mac_entry_age() and is_mac_learning_update_needed() to have common 
> understanding of return values when mac_entry is a static one. 
> - Added NEWS item 
> v4: 
> - Addressed code review comments 
> - Static entries will not be purged when fdb/flush is performed. 
> - Static entries will not be overwritten when packet with same dl_src arrives 
> on 
> any port of switch 
> - Provided bit more detail while doing fdb/add, to indicate if static-mac is 
> overriding already present entry 
> - Separated test cases for a bit more clarity 
> v5: 
> - Addressed code review comments 
> - Added new total_static counter to count number of static entries. 
> - Removed mac_entry_set_idle_time() 
> - Added mac_learning_add_static_entry() and mac_learning_del_static_entry() 
> - Modified APIs xlate_add_static_mac_entry() and 
> xlate_delete_static_mac_entry() 
> return 0 on success else a failure code 
> v6: 
> - Fixed a probable bug with Eelco's code review comments in 
> is_mac_learning_update_needed() 
> v7: 
> - Added a ovs-vswitchd.8 man page entry for fdb add/del commands 
> v8: 
> - Updaed with code review comments from Eelco. 
> - Renamed total_static to static_entries 
> - Added coverage counter mac_learning_static_none_move 
> - Fixed a possible bug with static_entries getting cleared via 
> fdb/stats-clear command 
> - Initialize static_entries in mac_learning_create() 
> - Modified fdb/del command by removing option to specify port-name 
> - Breakup ofproto_unixctl_fdb_update into ofproto_unixctl_fdb_add 
> and ofproto_unixctl_fdb_delete 
> - Updated test "static-mac add/del/flush" to have interleaved mac 
> entries before fdb/flush 
> - Updated test "static-mac mac move" to check for newly added 
> coverage counter mac_learning_static_none_move 
> v9: 
> - Updated source code comments and addressed code review comments 
> --- 
> NEWS | 4 + 
> lib/mac-learning.c | 155 +++ 
> lib/mac-learning.h | 17  
> ofproto/ofproto-dpif-xlate.c | 48 +-- 
> ofproto/ofproto-dpif-xlate.h | 5 ++ 
> ofproto/ofproto-dpif.c | 117 +- 
> tests/ofproto-dpif.at | 99 ++ 
> vswitchd/ovs-vswitchd.8.in | 5 ++ 
> 8 files changed, 421 insertions(+), 29 deletions(-)
> 
> diff --git a/NEWS b/NEWS 
> index db3faf4cc..12fb40054 100644 
> --- a/NEWS 
> +++ b/NEWS 
> @@ -20,6 +20,10 @@ Post-v2.15.0 
> - ovsdb-tool: 
> * New option '--election-timer' to the 'create-cluster' command to set the 
> leader election timer during cluster creation. 
> + - ovs-appctl: 
> + * Added ability to add and delete static mac entries using: 
> + 'ovs-appctl fdb/add

Re: [ovs-dev] [PATCH ovn] ovn-controller: Fix port group I-P when they contain non-vif ports.

2021-06-25 Thread Han Zhou
On Fri, Jun 25, 2021 at 4:50 AM Dumitru Ceara  wrote:
>
> It's valid that port_groups contain non-vif ports, they can actually
> contain any type of logical_switch_port.
>
> Also, there's no need to allocate a new sset containing the local ports'
> names every time the I-P engine processes a change, we can maintain a
> sset and incrementally update it when port bindings are added/removed.
>
Thanks Dumitru for the fix and thanks Numan for the review.

I battled with myself when deciding to allocate a new sset for the local
ports' names at the I-P main iteration level. I did this because the
current data structures maintaining the local bindings were already very
complex, and the sset was not to maintain any extra information but just
redundant information (for efficiency). So I decided to abstract this part
as a helper function so that I don't add more complexity in the binding
data structure, and other I-P engine nodes doesn't need to understand the
internals of how the bindings are maintained in the bindings module.
Regarding the cost, the local binding data should be small, and the sset
allocation is at the main loop level, so really nothing to worry about the
cost.

However, I didn't think about the non-VIF use case of port-group, and the
local_binding doesn't maintain non-VIF bindings, so came the bug. This
patch fixes it by maintaining a sset that includes all types of lport
names. It looks correct to me, but I have some comments:

1) The structures in bindings module is really complex and I spent a lot of
time to understand it before, but when I am reviewing this patch I had to
spent some time again to understand it. There are fields in binding_ctx
look quite similar, and the comments don't tell exactly the difference:

- struct local_binding_data *lbinding_data;
- struct sset *local_lports;
- struct sset *local_lport_ids;

According to the code (and also the bug the patch is trying to fix),
lbinding_data is supposed to maintain VIFs only.
local_lports maintains all types, but it maintains *potentially* local VIFs
as well (meaning the VIF may not be bound locally yet). I was thinking if I
could use local_lports directly. I think it would work, but just not
accurate enough (maybe it doesn't really matter).
The local_lport_ids may look straightforward, which maintains generated id
keys for local_lports, but the functions update_local_lport_ids() and
remove_local_lport_ids() are not only updating the local_lport_ids, but
also tracking information of lbinding_data, which is really confusing.

2) Now for this patch, the intention is to include non-VIF bindings, but it
adds a sset to maintain all types of lports in "lbinding_data", which was
supposed to maintain VIF bindings only. I think it is not the right place
to maintain this sset. And the
update_local_lport_ids()/remove_local_lport_ids() are not the right place
to update them either.

So here are my suggestions:

1) Clarify a little more about the role of each of the above fields in
binding_ctx with some comments.
2) Can we use local_lports directly, instead of maintaining a new sset? If
we can't (I am not sure yet), can we generate it on-the-fly, just updating
the "binding_collect_local_binding_lports" by adding non-VIF lports from
"local_lports"? I really don't think the cost makes any difference overall.
If none of the above work, can we maintain it as a separate field at
binding_ctx level instead of in lbinding_data (with proper comment telling
the difference from local_lports)?

(+1 for Numan's comment regarding test case)

I hope this makes sense. Thanks again for the fix!
Han

> Reported-at: https://github.com/ovn-org/ovn/pull/61#issuecomment-865094163
> Reported-by: Antonio Ojea 
> Fixes: 0cfeba6b55e3 ("ovn-controller: Fix port group conjunction flow
explosion problem.")
> Signed-off-by: Dumitru Ceara 
> ---
>  controller/binding.c| 25 +
>  controller/binding.h| 11 ++-
>  controller/ovn-controller.c | 24 +++-
>  3 files changed, 14 insertions(+), 46 deletions(-)
>
> diff --git a/controller/binding.c b/controller/binding.c
> index 7fde0fdbb..1f6188e0d 100644
> --- a/controller/binding.c
> +++ b/controller/binding.c
> @@ -549,6 +549,7 @@ update_local_lport_ids(const struct
sbrec_port_binding *pb,
>  tracked_binding_datapath_lport_add(pb,
b_ctx->tracked_dp_bindings);
>  }
>  }
> +sset_add(_ctx->lbinding_data->port_bindings, pb->logical_port);
>  }
>
>  /* Remove a port binding id from the set of local lport IDs. Also track
if
> @@ -569,6 +570,8 @@ remove_local_lport_ids(const struct
sbrec_port_binding *pb,
>  tracked_binding_datapath_lport_add(pb,
b_ctx->tracked_dp_bindings);
>  }
>  }
> +sset_find_and_delete(_ctx->lbinding_data->port_bindings,
> + pb->logical_port);
>  }
>
>  /* Corresponds to each Port_Binding.type. */
> @@ -683,6 +686,7 @@ local_binding_data_init(struct 

Re: [ovs-dev] [PATCH ovn v2 2/2] ovn-macros.at: Enable northd parallelization

2021-06-25 Thread Numan Siddique
On Tue, Jun 22, 2021 at 7:09 PM Fabrizio D'Angelo  wrote:
>
> - Add NORTHD_USE_PARALLELIZATION variable to enable it in ovn_start.
>
> - Add NORTHD_DUMMY_NUMA variable for passing --dummy-numa flag when
>   tests are run in a low CPU system.
>
> Signed-off-by: Fabrizio D'Angelo 
> Acked-by: Mark Michelson 

Thanks Fabrizio.  I applied both the patches to the main branch.

Numan

> ---
>  tests/ovn-macros.at | 40 +++-
>  1 file changed, 39 insertions(+), 1 deletion(-)
>
> diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at
> index cd02b6986..b5a01b2b5 100644
> --- a/tests/ovn-macros.at
> +++ b/tests/ovn-macros.at
> @@ -170,6 +170,10 @@ ovn_start_northd() {
>  ovn-northd-ddlog) northd_args="$northd_args 
> --ddlog-record=${AZ:+$AZ/}northd$suffix/replay.dat -v" ;;
>  esac
>
> +if test X$NORTHD_DUMMY_NUMA = Xyes; then
> +northd_args="$northd_args --dummy-numa=\"0,0,1,1\""
> +fi
> +
>  local name=${d_prefix}northd${suffix}
>  echo "${prefix}starting $name"
>  test -d "$ovs_base/$name" || mkdir "$ovs_base/$name"
> @@ -184,7 +188,7 @@ ovn_start_northd() {
>  # ovn-sbctl and ovn-nbctl use them by default, and starts ovn-northd running
>  # against them.
>  #
> -# Normally this starts an active northd and a backup norhtd.  The following
> +# Normally this starts an active northd and a backup northd.  The following
>  # options are accepted to adjust that:
>  #   --backup-northd=noneDon't start a backup northd.
>  #   --backup-northd=paused  Start the backup northd in the paused state.
> @@ -246,6 +250,10 @@ ovn_start () {
>  if test X$NORTHD_USE_DP_GROUPS = Xyes; then
>  ovn-nbctl set NB_Global . options:use_logical_dp_groups=true
>  fi
> +
> +if test X$NORTHD_USE_PARALLELIZATION = Xyes; then
> +ovn-nbctl set NB_Global . options:use_parallel_build=true
> +fi
>  }
>
>  # Interconnection networks.
> @@ -558,3 +566,33 @@ m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS],
>[m4_foreach([NORTHD_TYPE], [ovn-northd, ovn-northd-ddlog],
>   [m4_foreach([NORTHD_USE_DP_GROUPS], [no], [$1
>  ])])])
> +
> +# Test parallelization with dp groups enabled and disabled
> +m4_define([OVN_NORTHD_PARALLELIZATION_DUMMY], [
> +m4_pushdef([NORTHD_TYPE], [ovn_northd])
> +m4_pushdef(NORTHD_DUMMY_NUMA, [yes])
> +[m4_foreach([NORTHD_USE_DP_GROUPS], [yes, no],
> +[[NORTHD_USE_PARALLELIZATION], [yes]
> +])]])
> +
> +m4_define([OVN_NORTHD_PARALLELIZATION_NO_DUMMY], [
> +m4_pushdef([NORTHD_TYPE], [ovn_northd])
> +m4_pushdef(NORTHD_DUMMY_NUMA, [no])
> +[m4_foreach([NORTHD_USE_DP_GROUPS], [yes, no],
> +[[NORTHD_USE_PARALLELIZATION], [yes]
> +])]])
> +
> +# Use --dummy-numa if system has low cores
> +m4_define([HOST_HAS_LOW_CORES], [
> +if test $(nproc) -lt 4; then
> +:
> +$1
> +else
> +:
> +$2
> +fi
> +])
> +
> +m4_define([NORTHD_PARALLELIZATION], [
> +HOST_HAS_LOW_CORES([OVN_NORTHD_PARALLELIZATION_DUMMY], 
> [OVN_NORTHD_PARALLELIZATION_NO_DUMMY])
> +])
> --
> 2.31.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] Openvswitch patch doubts:conntrack: Fix missed 'conn' lookup checks.

2021-06-25 Thread Ben Pfaff
On Fri, Jun 25, 2021 at 11:36:12AM +0800, user wrote:
> Hi Darrell,
>   I'm currently doing some tests on OVS-DPDK and trying to optimize it, 
> in the process I found your patch(28274f7 conntrack: Fix missed 'conn' lookup 
> checks.) made OVS check for the presence of the 'conn' entry before inserting 
> it:
> 
> @@ -1158,8 +1165,11 @@ process_one(struct conntrack *ct, struct dp_packet 
> *pkt,
> ovs_rwlock_unlock(>resources_lock);
> 
> ovs_mutex_lock(>ct_lock);
> -conn = conn_not_found(ct, pkt, ctx, commit, now, nat_action_info,
> -  helper, alg_exp, ct_alg_ctl);
> +hash = conn_key_hash(>key, ct->hash_basis);
> +if (!conn_key_lookup(ct, >key, hash, now, NULL, NULL)) {
> +conn = conn_not_found(ct, pkt, ctx, commit, now, nat_action_info,
> +  helper, alg_exp, ct_alg_ctl);
> +}
> ovs_mutex_unlock(>ct_lock);
> }
> 
>   I'm very confused about the reason we should do this, you mentioned 
> that there were some cases that were missed, so fix them, could you please 
> explain what scenarios we need to do this?

Without looking for the details, I would guess that there is a race
between one thread creating a given connection entry and another thread
doing it, and that checking for the entry with the lock held fixes the
race.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V7 00/13] Netdev vxlan-decap offload

2021-06-25 Thread Van Haaren, Harry
> -Original Message-
> From: dev  On Behalf Of Ilya Maximets
> Sent: Friday, June 25, 2021 4:26 PM
> To: Sriharsha Basavapatna ; Ilya Maximets
> 
> Cc: Eli Britstein ; ovs dev ; Ivan 
> Malov
> ; Majd Dibbiny 
> Subject: Re: [ovs-dev] [PATCH V7 00/13] Netdev vxlan-decap offload



> >> That looks good to me.  So, I guess, Harsha, we're waiting for
> >> your review/tests here.
> >
> > Thanks Ilya and Eli, looks good to me; I've also tested it and it works 
> > fine.
> > -Harsha
> 
> Thanks, everyone.  Applied to master.

Hi Ilya and OVS Community,

There are open questions around this patchset, why has it been merged?

Earlier today, new concerns were raised by Cian around the negative performance 
impact of these code changes:
- https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/384445.html

Both you (Ilya) and Eli responded, and I was following the conversation. 
Various code changes were suggested,
and some may seem like they might work, Eli mentioned some solutions might not 
work due to the hardware:
I was processing both your comments and input, and planning a technical reply 
later today.
- suggestions: 
https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/384446.html
- concerns around hw: 
https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/384464.html

Keep in mind that there are open performance issues to be worked out, that have 
not been resolved at this point in the conversation.
There is no agreement on solutions, nor an agreement to ignore the performance 
degradation, or to try resolve this degradation later.

That these patches have been merged is inappropriate:
1) Not enough time given for responses (11 am concerns raised, 5pm merged 
without resolution? (Irish timezone))
2) Open question not addressed/resolved, resulting in a 6% known negative 
performance impact being merged.
3) Suggestions provided were not reviewed technically in detail (no technical 
collaboration or code-changes/patches reviewed)

I feel that the OVS process of allowing time for community review and 
collaboration was not adhered to in this instance.
As a result, code was merged that is known to cause performance degradation.

Therefore, this email is a request to revert these patches as they are not 
currently fit for inclusion in my opinion.

As next steps, I can propose the following:
1) Revert the patches from master branch
2) Continue technical discussion on how to avoid negative performance impact
3) Review solutions, allowing time for responses and replies
4) Merge a future revision of this patchset at a later date


> Best regards, Ilya Maximets.

Regards, -Harry

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] checkpatch: Ignore macro definitions of FOR_EACH

2021-06-25 Thread Aaron Conole
When defining a FOR_EACH macro, checkpatch freaks out and generates a
control block whitespace error.  Create an exception so that it doesn't
generate errors for this case.

Reported-at: 
https://mail.openvswitch.org/pipermail/ovs-dev/2020-August/373509.html
Reported-by: Toshiaki Makita 
Signed-off-by: Aaron Conole 
---
 tests/checkpatch.at |  5 +
 utilities/checkpatch.py | 10 +++---
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/tests/checkpatch.at b/tests/checkpatch.at
index a51e46e7ae..0718acd995 100755
--- a/tests/checkpatch.at
+++ b/tests/checkpatch.at
@@ -284,6 +284,11 @@ try_checkpatch \
 + for (init; condition; increment) { \\
 "
 
+try_checkpatch \
+   "COMMON_PATCH_HEADER
++#define SOME_FOR_EACH(a, b, c) /* Foo. */
+   "
+
 AT_CLEANUP
 
 
diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py
index ac14da29b1..699fb4b027 100755
--- a/utilities/checkpatch.py
+++ b/utilities/checkpatch.py
@@ -157,6 +157,8 @@ __regex_trailing_whitespace = re.compile(r'[^\S]+$')
 __regex_single_line_feed = re.compile(r'^\f$')
 __regex_for_if_missing_whitespace = re.compile(r' +(%s)[\(]'
% __parenthesized_constructs)
+__regex_hash_define_for_each = re.compile(
+r'#define [_A-Z]+FOR_*EACH[_A-Z0-9]*\(')
 __regex_for_if_too_much_whitespace = re.compile(r' +(%s)  +[\(]'
 % __parenthesized_constructs)
 __regex_for_if_parens_whitespace = \
@@ -245,9 +247,11 @@ def if_and_for_whitespace_checks(line):
 """
 if skip_block_whitespace_check:
 return True
-if (__regex_for_if_missing_whitespace.search(line) is not None or
-__regex_for_if_too_much_whitespace.search(line) is not None or
-__regex_for_if_parens_whitespace.search(line)):
+if (__regex_for_if_missing_whitespace.search(line) is not None and
+__regex_hash_define_for_each.search(line) is None):
+return False
+if (__regex_for_if_too_much_whitespace.search(line) is not None or
+__regex_for_if_parens_whitespace.search(line)):
 return False
 return True
 
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4] ovs-lib: pass optional --election-timer arg to ovsdb-tool

2021-06-25 Thread Ben Pfaff
On Fri, Jun 25, 2021 at 11:23:16AM -0500, Dan Williams wrote:
> Signed-off-by: Dan Williams 

Thanks, applied to master.  I added this note to the commit message:

This doesn't have any current users within the OVS repository.  OVN will
use it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v4] ovs-lib: pass optional --election-timer arg to ovsdb-tool

2021-06-25 Thread Dan Williams
Signed-off-by: Dan Williams 
---
v4: fix quoting of election_timer_arg
v3: fix line wrapping
v2: put --election-timer arg before create-cluster

 utilities/ovs-lib.in | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/utilities/ovs-lib.in b/utilities/ovs-lib.in
index ab38ece458b7b..3eda01d3c1f33 100644
--- a/utilities/ovs-lib.in
+++ b/utilities/ovs-lib.in
@@ -495,15 +495,21 @@ create_cluster () {
 DB_FILE="$1"
 DB_SCHEMA="$2"
 LOCAL_ADDR="$3"
+ELECTION_TIMER_MS="$4"
+
+election_timer_arg=
+if [ -n "$ELECTION_TIMER_MS" ]; then
+  election_timer_arg="--election-timer=$ELECTION_TIMER_MS"
+fi
 
 if test ! -e "$DB_FILE"; then
-action "Creating cluster database $DB_FILE" ovsdb_tool create-cluster 
"$DB_FILE" "$DB_SCHEMA" "$LOCAL_ADDR"
+action "Creating cluster database $DB_FILE" ovsdb_tool 
$election_timer_arg create-cluster "$DB_FILE" "$DB_SCHEMA" "$LOCAL_ADDR"
 elif ovsdb_tool db-is-standalone "$DB_FILE"; then
 # Convert standalone database to clustered.
 backup_db || return 1
 rm -f "$DB_FILE"
 action "Creating cluster database $DB_FILE from existing one" \
-   ovsdb_tool create-cluster "$DB_FILE" "$backup" "$LOCAL_ADDR"
+   ovsdb_tool $election_timer_arg create-cluster "$DB_FILE" 
"$backup" "$LOCAL_ADDR"
 fi
 }
 
-- 
2.31.1


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] ovs-lib: pass optional --election-timer arg to ovsdb-tool

2021-06-25 Thread Dan Williams
On Thu, 2021-06-24 at 14:58 -0700, Ben Pfaff wrote:
> On Wed, Jun 09, 2021 at 02:11:42PM -0500, Dan Williams wrote:
> > Signed-off-by: Dan Williams 
> > ---
> > v3: fix line wrapping
> > v2: put --election-timer arg before create-cluster per Ilya
> > 
> >  utilities/ovs-lib.in | 10 --
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> > 
> > diff --git a/utilities/ovs-lib.in b/utilities/ovs-lib.in
> > index ab38ece458b7b..61a062fa992da 100644
> > --- a/utilities/ovs-lib.in
> > +++ b/utilities/ovs-lib.in
> > @@ -495,15 +495,21 @@ create_cluster () {
> >  DB_FILE="$1"
> >  DB_SCHEMA="$2"
> >  LOCAL_ADDR="$3"
> > +    ELECTION_TIMER_MS="$4"
> > +
> > +    election_timer_arg=
> > +    if [ -n "$ELECTION_TIMER_MS" ]; then
> > +  election_timer_arg="--election-timer=$ELECTION_TIMER_MS"
> > +    fi
> >  
> >  if test ! -e "$DB_FILE"; then
> > -    action "Creating cluster database $DB_FILE" ovsdb_tool
> > create-cluster "$DB_FILE" "$DB_SCHEMA" "$LOCAL_ADDR"
> > +    action "Creating cluster database $DB_FILE" ovsdb_tool
> > "$election_timer_arg" create-cluster "$DB_FILE" "$DB_SCHEMA"
> > "$LOCAL_ADDR"
> 
> Won't this give ovsdb-tool indigestion by passing a "" argument if
> there's no election timer?  I think that $election_timer_arg needs to
> be
> unquoted.
> 
> Similar issue below.

Good point; will fix and double-check all permutations.

Thanks,
Dan

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] is vxlan well supported by ovn ?

2021-06-25 Thread Ihar Hrachyshka
On Fri, Jun 25, 2021 at 5:43 AM Krzysztof Klimonda
 wrote:
>
> Hi,
>
> Is this a limitation for a number of logical switches and logical ports that 
> are part of networks that use vxlan (for example, by utilizing vtep 
> interfaces) or for a total number of LSs and LSPs in a deployment?

It's global because tun_id ranges are global and network type-independent.

Ihar

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] Openvswitch patch doubts:conntrack: Fix missed 'conn' lookup checks.

2021-06-25 Thread user
Hi Darrell,
  I'm currently doing some tests on OVS-DPDK and trying to optimize it, in 
the process I found your patch(28274f7 conntrack: Fix missed 'conn' lookup 
checks.) made OVS check for the presence of the 'conn' entry before inserting 
it:

@@ -1158,8 +1165,11 @@ process_one(struct conntrack *ct, struct dp_packet *pkt,
ovs_rwlock_unlock(>resources_lock);

ovs_mutex_lock(>ct_lock);
-conn = conn_not_found(ct, pkt, ctx, commit, now, nat_action_info,
-  helper, alg_exp, ct_alg_ctl);
+hash = conn_key_hash(>key, ct->hash_basis);
+if (!conn_key_lookup(ct, >key, hash, now, NULL, NULL)) {
+conn = conn_not_found(ct, pkt, ctx, commit, now, nat_action_info,
+  helper, alg_exp, ct_alg_ctl);
+}
ovs_mutex_unlock(>ct_lock);
}

  I'm very confused about the reason we should do this, you mentioned that 
there were some cases that were missed, so fix them, could you please explain 
what scenarios we need to do this?

Thank You,
Avis
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] Path MTU discovery on GRE interfaces

2021-06-25 Thread Matthias May via dev
On 24/06/2021 05:51, Jesse Gross wrote:
> On Wed, Jun 23, 2021 at 10:06 AM Ben Pfaff  > wrote:
> 
> [updating Jesse's email address]
> 
> On Wed, Jun 23, 2021 at 04:48:29PM +0200, Matthias May via dev wrote:
> > I'm currently fighting with issues where TCP/UDP frames that are larger 
> than the MTU of a GRE tunnel are dropped.
> > I'm aware of the whys and how to work around the issue, but while 
> looking for solutions i stumbled over the fact that:
> > * [1] added PMTUD support to OVS
> > * [2] disabled/removed with v1.9.0 respectively v1.10.0 the feature
> >
> > Even after some significant time looking through the history i haven't 
> found a reason why this was removed, just
> that it
> > was removed.
> >
> > I started some preliminary work to add PMTUD support to OVS (again), 
> but the fact that it was removed 8 years ago
> seems
> > to me like a red flag to not do it (again).
> >
> > Could someone fluent with the OVS history from 8 years ago shed some 
> light on why PMTUD support was dropped?
> > Any pointers to a thread on this topic?
> 
> It was a layering violation.  This caused problems like, for example,
> not having a good IP address to send the "frag needed" message from.
> 
> 
> In terms of the history, I believe what happened is that PMTUD support was 
> added before the kernel module was
> upstreamed. When we later submitted the code upstream, we knew that it would 
> not fly due to the layering violations so
> support was removed before submitting.
> 
> However, as Dan mentioned, I believe that check_pkt_len can be used to 
> implement essentially the same behavior and it is
> upstream as it is more generic. It should still only be used in the context 
> of an L3 operation to avoid introducing the
> same layering issues though.


Thank you for your input.
I haven't done anything with check_pkt_len yet, but this seems promising.

Currently i simply ignore the DF bit and force fragmentation on the tunnel 
between the two sites.
After all, the proper solution is to set the MTU correctly on all involved 
devices.
--> I already don't have an issue for proper devices.

This is to work around some "industrial" devices that simply don't have the 
option to reduce the MTU, don't implement
DHCP option 26, and where the performance hit by doing fragmentation is too 
high.
IMO these devices are broken already. Being forced to have to work with them 
means that breaking L2/L3 layering to make
them work a bit better are probably the least of the issue(s).

BR
Matthias

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] ovn-controller: Fix port group I-P when they contain non-vif ports.

2021-06-25 Thread Dumitru Ceara
On 6/25/21 4:36 PM, Numan Siddique wrote:
> On Fri, Jun 25, 2021 at 7:51 AM Dumitru Ceara  wrote:
>>
>> It's valid that port_groups contain non-vif ports, they can actually
>> contain any type of logical_switch_port.
>>
>> Also, there's no need to allocate a new sset containing the local ports'
>> names every time the I-P engine processes a change, we can maintain a
>> sset and incrementally update it when port bindings are added/removed.
>>
>> Reported-at: https://github.com/ovn-org/ovn/pull/61#issuecomment-865094163
>> Reported-by: Antonio Ojea 
>> Fixes: 0cfeba6b55e3 ("ovn-controller: Fix port group conjunction flow 
>> explosion problem.")
>> Signed-off-by: Dumitru Ceara 
> 
> Hi Dumitru,

Hi Numan,

Thanks for the review!

> 
> Thanks for the fix.  I think it would be great to have a test case to
> exercise the scenario.

Definitely, I'll add one in v2.

> 
> I've one small nit comment.  Otherwise the patch looks good to me.

I'll take care of the comment too in v2.

> 
> Thanks
> Numan
> 

Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V7 00/13] Netdev vxlan-decap offload

2021-06-25 Thread Ilya Maximets
On 6/24/21 8:18 AM, Sriharsha Basavapatna wrote:
> On Thu, Jun 24, 2021 at 12:23 AM Ilya Maximets  wrote:
>>
>> On 6/23/21 5:52 PM, Eli Britstein wrote:
>>> VXLAN decap in OVS-DPDK configuration consists of two flows:
>>> F1: in_port(ens1f0),eth(),ipv4(),udp(), actions:tnl_pop(vxlan_sys_4789)
>>> F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0
>>>
>>> F1 is a classification flow. It has outer headers matches and it
>>> classifies the packet as a VXLAN packet, and using tnl_pop action the
>>> packet continues processing in F2.
>>> F2 is a flow that has matches on tunnel metadata as well as on the inner
>>> packet headers (as any other flow).
>>>
>>> In order to fully offload VXLAN decap path, both F1 and F2 should be
>>> offloaded. As there are more than one flow in HW, it is possible that
>>> F1 is done by HW but F2 is not. Packet is received by SW, and should be
>>> processed starting from F2 as F1 was already done by HW.
>>> Rte_flows are applicable only on physical port IDs. Keeping the original
>>> physical in port on which the packet is received on enables applying
>>> vport flows (e.g. F2) on that physical port.
>>>
>>> This patch-set makes use of [1] introduced in DPDK 20.11, that adds API
>>> for tunnel offloads.
>>>
>>> Note that MLX5 PMD has a bug that the tnl_pop private actions must be
>>> first. In OVS it is not.
>>> Fixing this issue is scheduled for 21.05 (and stable 20.11.2).
>>> Meanwhile, tests were done with a workaround for it [2].
>>>
>>> v2-v1:
>>> - Tracking original in_port, and applying vport on that physical port 
>>> instead of all PFs.
>>> v3-v2:
>>> - Traversing ports using a new API instead of flow_dump.
>>> - Refactor packet state recover logic, with bug fix for error pop_header.
>>> - One ref count for netdev in non-tunnel case.
>>> - Rename variables, comments, rebase.
>>> v4-v3:
>>> - Extract orig_in_port from physdev for flow modify.
>>> - Miss handling fixes.
>>> v5-v4:
>>> - Drop refactor offload rule creation commit.
>>> - Comment about setting in_port in restore.
>>> - Refactor vports flow offload commit.
>>> v6-v5:
>>> - Fixed duplicate netdev ref bug.
>>> v7-v6:
>>> - Adopting Ilya's diff, with a minor fix in set_error stub.
>>> - Fixed abort (remove OVS_NOT_REACHED()) with tunnels other than vxlan
>>>   ("netdev-offload-dpdk: Support tunnel pop action.").
>>
>> Thanks!  I see the only difference (beside the set_error fix) with what
>> I have locally is following:
>>
>> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
>> index 363f32f71..6bd5b6c9f 100644
>> --- a/lib/netdev-offload-dpdk.c
>> +++ b/lib/netdev-offload-dpdk.c
>> @@ -835,7 +835,9 @@ vport_to_rte_tunnel(struct netdev *vport,
>>netdev_dpdk_get_port_id(netdev));
>>  }
>>  } else {
>> -OVS_NOT_REACHED();
>> +VLOG_DBG_RL(, "vport type '%s' is not supported",
>> +netdev_get_type(vport));
>> +return -1;
>>  }
>>
>>  return 0;
>> ---
>>
>> That looks good to me.  So, I guess, Harsha, we're waiting for
>> your review/tests here.
> 
> Thanks Ilya and Eli, looks good to me; I've also tested it and it works fine.
> -Harsha

Thanks, everyone.  Applied to master.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V7 06/13] dpif-netdev: Add HW miss packet state recover logic.

2021-06-25 Thread Ilya Maximets
On 6/25/21 3:56 PM, Eli Britstein wrote:
> 
> On 6/25/2021 3:09 PM, Balazs Nemeth wrote:
>> *External email: Use caution opening links or attachments*
>>
>>
>> Ilya,
>>
>> Sure, I'll rebase my patch and also ensure netdev_is_flow_api_enabled()
>> is called once per batch. I don't expect any issue with that.

Thanks, Balazs!  The series is applied now, so you can work on updated patch.

> 
> Thanks Balazs.
> 
>>
>> Regards,
>> Balazs
>>
>> On Fri, Jun 25, 2021 at 2:00 PM Ilya Maximets > > wrote:
>>
>> On 6/25/21 1:04 PM, Ferriter, Cian wrote:
>> > Hi Eli,
>> >
>> > I have some concerns about how we plug the packet state recover logic 
>> into dfc_processing() here. My concerns are inline below.
>> >
>> > I'm concerned that we are hurting performance in the partial HWOL 
>> case, as this patchset is introducing new direct (non-inline) function calls 
>> per packet to the software datapath. We can reduce performance impact in 
>> this area, see specific suggestions below.
>> >
>> > Thanks,
>> > Cian
>> >
>> >> -Original Message-
>> >> From: Eli Britstein mailto:el...@nvidia.com>>
>> >> Sent: Wednesday 23 June 2021 16:53
>> >> To: d...@openvswitch.org ; Ilya Maximets 
>> mailto:i.maxim...@ovn.org>>
>> >> Cc: Gaetan Rivet mailto:gaet...@nvidia.com>>; 
>> Majd Dibbiny mailto:m...@nvidia.com>>; Sriharsha 
>> Basavapatna
>> >> > >; Hemal Shah 
>> mailto:hemal.s...@broadcom.com>>; Ivan Malov
>> >> mailto:ivan.ma...@oktetlabs.ru>>; Ferriter, 
>> Cian mailto:cian.ferri...@intel.com>>; Eli 
>> Britstein mailto:el...@nvidia.com>>;
>> >> Finn, Emma mailto:emma.f...@intel.com>>; 
>> Kovacevic, Marko > >
>> >> Subject: [PATCH V7 06/13] dpif-netdev: Add HW miss packet state 
>> recover logic.
>> >>
>> >> Recover the packet if it was partially processed by the HW. Fallback 
>> to
>> >> lookup flow by mark association.
>> >>
>> >> Signed-off-by: Eli Britstein > >
>> >> Reviewed-by: Gaetan Rivet > >
>> >> Acked-by: Sriharsha Basavapatna > >
>> >> Tested-by: Emma Finn > >
>> >> Tested-by: Marko Kovacevic > >
>> >> Signed-off-by: Ilya Maximets > >
>> >> ---
>> >>  lib/dpif-netdev.c | 45 +
>> >>  1 file changed, 41 insertions(+), 4 deletions(-)
>> >>
>> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> >> index 8fa7eb6d4..d5b7d5b73 100644
>> >> --- a/lib/dpif-netdev.c
>> >> +++ b/lib/dpif-netdev.c
>> >> @@ -114,6 +114,7 @@ COVERAGE_DEFINE(datapath_drop_invalid_port);
>> >>  COVERAGE_DEFINE(datapath_drop_invalid_bond);
>> >>  COVERAGE_DEFINE(datapath_drop_invalid_tnl_port);
>> >>  COVERAGE_DEFINE(datapath_drop_rx_invalid_packet);
>> >> +COVERAGE_DEFINE(datapath_drop_hw_miss_recover);
>> >>
>> >>  /* Protects against changes to 'dp_netdevs'. */
>> >>  static struct ovs_mutex dp_netdev_mutex = OVS_MUTEX_INITIALIZER;
>> >> @@ -7062,6 +7063,39 @@ smc_lookup_batch(struct dp_netdev_pmd_thread 
>> *pmd,
>> >>      pmd_perf_update_counter(>perf_stats, PMD_STAT_SMC_HIT, 
>> n_smc_hit);
>> >>  }
>> >>
>> >> +static struct tx_port * pmd_send_port_cache_lookup(
>> >> +    const struct dp_netdev_pmd_thread *pmd, odp_port_t port_no);
>> >> +
>> >> +static inline int
>> >> +dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
>> >> +                  odp_port_t port_no,
>> >> +                  struct dp_packet *packet,
>> >> +                  struct dp_netdev_flow **flow)
>> >> +{
>> >> +    struct tx_port *p;
>> >> +    uint32_t mark;
>> >> +
>> >
>> >
>> > Start of full HWOL recovery block
>> >
>> >> +    /* Restore the packet if HW processing was terminated before 
>> completion. */
>> >> +    p = pmd_send_port_cache_lookup(pmd, port_no);
>> >> +    if (OVS_LIKELY(p)) {
>> >> +        int err = netdev_hw_miss_packet_recover(p->port->netdev, 
>> packet);
>> >> +
>> >> +        if (err && err != EOPNOTSUPP) {
>> >> +            COVERAGE_INC(datapath_drop_hw_miss_recover);
>> >> +            return -1;
>> >> +        }
>> >> +    }
>> >
>> > End of full HWOL recovery block
>> >
>> > IIUC, the above is only relevant to full HWOL where the packet is 
>> partially processed by the HW. In a partial HWOL testcase, we see a 
>> performance drop as a result of the above code. The performance after the 
>> patchset is applied is 0.94x what the performance was before.
>>
> General speaking, adding new code in the datapath is probable to have some 
> degradation 

Re: [ovs-dev] Path MTU discovery on GRE interfaces

2021-06-25 Thread Matthias May via dev
On 23/06/2021 19:06, Ben Pfaff wrote:
> [updating Jesse's email address]
> 
> On Wed, Jun 23, 2021 at 04:48:29PM +0200, Matthias May via dev wrote:
>> I'm currently fighting with issues where TCP/UDP frames that are larger than 
>> the MTU of a GRE tunnel are dropped.
>> I'm aware of the whys and how to work around the issue, but while looking 
>> for solutions i stumbled over the fact that:
>> * [1] added PMTUD support to OVS
>> * [2] disabled/removed with v1.9.0 respectively v1.10.0 the feature
>>
>> Even after some significant time looking through the history i haven't found 
>> a reason why this was removed, just that it
>> was removed.
>>
>> I started some preliminary work to add PMTUD support to OVS (again), but the 
>> fact that it was removed 8 years ago seems
>> to me like a red flag to not do it (again).
>>
>> Could someone fluent with the OVS history from 8 years ago shed some light 
>> on why PMTUD support was dropped?
>> Any pointers to a thread on this topic?
> 
> It was a layering violation.  This caused problems like, for example,
> not having a good IP address to send the "frag needed" message from.
> 
> Jesse may remember more.
> 

OK i guessed as much.
I was thinking about what address to use when there is none available.
Maye the least breaking thing would be to just use as source of the ICMP 
unreachable the destination to which the frames
are addressed.

BR
Matthias

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v13 05/12] dpif-netdev: Add command to switch dpif implementation.

2021-06-25 Thread Eelco Chaudron



On 24 Jun 2021, at 13:53, Ferriter, Cian wrote:

> Hi Flavio,
>
> Thanks for the review. My responses are inline.
>
> Cian
>



>>>
>>> +static void
>>> +dpif_netdev_impl_set(struct unixctl_conn *conn, int argc,
>>> + const char *argv[], void *aux OVS_UNUSED)
>>> +{
>>> +/* This function requires just one parameter, the DPIF name.
>>> + * A second optional parameter can identify the datapath instance.
>>> + */
>>> +const char *dpif_name = argv[1];
>>> +
>>> +static const char *error_description[2] = {
>>> +"Unknown DPIF implementation",
>>> +"CPU doesn't support the required instruction for",
>>> +};
>>> +
>>> +dp_netdev_input_func new_func;
>>> +int32_t err = dp_netdev_impl_get_by_name(dpif_name, _func);
>>> +if (err) {
>>> +struct ds reply = DS_EMPTY_INITIALIZER;
>>> +ds_put_format(, "DPIF implementation not available: %s 
>>> %s.\n",
>>> +  error_description[ (err == -ENOTSUP) ], dpif_name);
>>> +const char *reply_str = ds_cstr();
>>> +unixctl_command_reply(conn, reply_str);
>>> +VLOG_INFO("%s", reply_str);
>>> +ds_destroy();
>>> +return;
>>> +}
>>> +
>>> +/* argv[2] is optional datapath instance. If no datapath name is 
>>> provided
>>> + * and only one datapath exists, the one existing datapath is reprobed.
>>> + */
>>> +ovs_mutex_lock(_netdev_mutex);
>>> +struct dp_netdev *dp = NULL;
>>> +
>>> +if (argc == 3) {
>>> +dp = shash_find_data(_netdevs, argv[2]);
>>> +} else if (shash_count(_netdevs) == 1) {
>>> +dp = shash_first(_netdevs)->data;
>>> +}
>>> +
>>> +if (!dp) {
>>> +ovs_mutex_unlock(_netdev_mutex);
>>> +unixctl_command_reply_error(conn,
>>> +"please specify an existing datapath");
>>> +return;
>>> +}
>>> +
>>> +/* Get PMD threads list. */
>>> +size_t n;
>>> +struct dp_netdev_pmd_thread **pmd_list;
>>> +sorted_poll_thread_list(dp, _list, );
>>> +
>>> +for (size_t i = 0; i < n; i++) {
>>> +struct dp_netdev_pmd_thread *pmd = pmd_list[i];
>>> +if (pmd->core_id == NON_PMD_CORE_ID) {
>>> +continue;
>>> +}
>>> +
>>> +/* Set PMD threads DPIF implementation to requested one. */
>>> +pmd->netdev_input_func = *new_func;
>>> +};
>>> +
>>> +ovs_mutex_unlock(_netdev_mutex);
>>> +
>>> +/* Set the default implementation for PMD threads created in the 
>>> future. */
>>> +dp_netdev_impl_set_default(*new_func);
>>
>> I checked other patches and it seems this interface could be simplified
>> and would fix the set_default() above to be more robust.
>> What do you think of:
>>
>>1) lock dp_netdev_mutex
>>2) Check if the DP argument is valid.
>>3) Use a new dp_netdev_impl_set_default_by_name()
>>   That fails if the name is not available or set the default input
>>   hiding the function pointer from outside.
>>4) Loop on each pmd doing the same as in dp_netdev_configure_pmd():
>>   pmd->netdev_input_func = dp_netdev_impl_get_default();
>>5) unlock dp_netdev_mutex
>>
>> It will hold the lock a bit more time but we don't expect to have
>> several inputs and no frequent input changes, so we should be fine.
>>
>
> Good idea. Hiding the function pointer from here is a nice improvement. I'll 
> rework it to do this.


Do we also assume that setting the function pointer happens atomically on all 
supported architectures? I would assume this requires an atomic set?

//Eelco

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [RFC] Documentation: minimize travis as our testing infrastructure

2021-06-25 Thread Aaron Conole
Recently, Travis-CI has retired the travis-ci.org service.  At the
moment, it is read-only.  In the future, it may disappear completely.

Travis-CI might still be useful for individual developers testing small
patches, but can no longer fulfill the role of project-wide continuous
integration service.  Remove the badge from the front page, and note the
other CI services that OVS supports.  Rather than completely drop all
references to travis, simply emphasize using GitHub Actions as the preferred
CI solution.

Signed-off-by: Aaron Conole 
Refs: https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/384040.html
---
NOTE: I waffled on completely removing references to Travis, and dropping
  the .travis.yml file completely.  I'm not opposed, but maybe it should
  just be done with.

 .../contributing/submitting-patches.rst   |  8 +--
 Documentation/topics/testing.rst  | 68 ---
 README.rst|  2 -
 3 files changed, 33 insertions(+), 45 deletions(-)

diff --git a/Documentation/internals/contributing/submitting-patches.rst 
b/Documentation/internals/contributing/submitting-patches.rst
index 4a6780371d..a6a10f8d89 100644
--- a/Documentation/internals/contributing/submitting-patches.rst
+++ b/Documentation/internals/contributing/submitting-patches.rst
@@ -68,10 +68,10 @@ Testing is also important:
   feature.  A bug fix patch should preferably add a test that would
   fail if the bug recurs.
 
-If you are using GitHub, then you may utilize the travis-ci.org and the GitHub
-Actions CI build systems.  They will run some of the above tests automatically
-when you push changes to your repository.  See the "Continuous Integration with
-Travis-CI" in :doc:`/topics/testing` for details on how to set it up.
+If you are using GitHub, then you may utilize the the GitHub Actions CI build
+systems.  They will run some of the above tests automatically when you push
+changes to your repository.  See the "Continuous Integration" section in
+:doc:`/topics/testing` for details on continuous integration.
 
 Email Subject
 -
diff --git a/Documentation/topics/testing.rst b/Documentation/topics/testing.rst
index 951fe9e851..9894a3f7c5 100644
--- a/Documentation/topics/testing.rst
+++ b/Documentation/topics/testing.rst
@@ -416,45 +416,35 @@ You should invoke scan-view to view analysis results. The 
last line of output
 from ``clang-analyze`` will list the command (containing results directory)
 that you should invoke to view the results on a browser.
 
-Continuous Integration with Travis CI
--
-
-A .travis.yml file is provided to automatically build Open vSwitch with various
-build configurations and run the testsuite using Travis CI. Builds will be
-performed with gcc, sparse and clang with the -Werror compiler flag included,
-therefore the build will fail if a new warning has been introduced.
-
-The CI build is triggered via git push (regardless of the specific branch) or
-pull request against any Open vSwitch GitHub repository that is linked to
-travis-ci.
-
-Instructions to setup travis-ci for your GitHub repository:
-
-1. Go to https://travis-ci.org/ and sign in using your GitHub ID.
-2. Go to the "Repositories" tab and enable the ovs repository. You may disable
-   builds for pushes or pull requests.
-3. In order to avoid forks sending build failures to the upstream mailing list,
-   the notification email recipient is encrypted. If you want to receive email
-   notification for build failures, replace the encrypted string:
-
-   1. Install the travis-ci CLI (Requires ruby >=2.0): gem install travis
-   2. In your Open vSwitch repository: travis encrypt myl...@mydomain.org
-   3. Add/replace the notifications section in .travis.yml and fill in the
-  secure string as returned by travis encrypt::
-
-  notifications:
-email:
-  recipients:
-- secure: "."
-
-  .. note::
-You may remove/omit the notifications section to fall back to default
-notification behaviour which is to send an email directly to the author and
-committer of the failing commit. Note that the email is only sent if the
-author/committer have commit rights for the particular GitHub repository.
-
-4. Pushing a commit to the repository which breaks the build or the
-   testsuite will now trigger a email sent to myl...@mydomain.org
+Continuous Integration
+--
+
+The Open vSwitch project can make use of multiple public CI services to help
+developers ensure their patches don't introduce additional regressions.  Each
+service requires different forms of configuration, and for the supported
+services the configuration file(s) to automatically build Open vSwitch with
+various build configurations and run the testsuites is/are available. For
+example, the GitHub Actions builds will be performed with gcc, sparse and clang
+with the -Werror compiler flag 

Re: [ovs-dev] [PATCH ovn] ovn-controller: Fix port group I-P when they contain non-vif ports.

2021-06-25 Thread Numan Siddique
On Fri, Jun 25, 2021 at 7:51 AM Dumitru Ceara  wrote:
>
> It's valid that port_groups contain non-vif ports, they can actually
> contain any type of logical_switch_port.
>
> Also, there's no need to allocate a new sset containing the local ports'
> names every time the I-P engine processes a change, we can maintain a
> sset and incrementally update it when port bindings are added/removed.
>
> Reported-at: https://github.com/ovn-org/ovn/pull/61#issuecomment-865094163
> Reported-by: Antonio Ojea 
> Fixes: 0cfeba6b55e3 ("ovn-controller: Fix port group conjunction flow 
> explosion problem.")
> Signed-off-by: Dumitru Ceara 

Hi Dumitru,

Thanks for the fix.  I think it would be great to have a test case to
exercise the scenario.

I've one small nit comment.  Otherwise the patch looks good to me.

Thanks
Numan

> ---
>  controller/binding.c| 25 +
>  controller/binding.h| 11 ++-
>  controller/ovn-controller.c | 24 +++-
>  3 files changed, 14 insertions(+), 46 deletions(-)
>
> diff --git a/controller/binding.c b/controller/binding.c
> index 7fde0fdbb..1f6188e0d 100644
> --- a/controller/binding.c
> +++ b/controller/binding.c
> @@ -549,6 +549,7 @@ update_local_lport_ids(const struct sbrec_port_binding 
> *pb,
>  tracked_binding_datapath_lport_add(pb, 
> b_ctx->tracked_dp_bindings);
>  }
>  }
> +sset_add(_ctx->lbinding_data->port_bindings, pb->logical_port);
>  }
>
>  /* Remove a port binding id from the set of local lport IDs. Also track if
> @@ -569,6 +570,8 @@ remove_local_lport_ids(const struct sbrec_port_binding 
> *pb,
>  tracked_binding_datapath_lport_add(pb, 
> b_ctx->tracked_dp_bindings);
>  }
>  }
> +sset_find_and_delete(_ctx->lbinding_data->port_bindings,
> + pb->logical_port);
>  }
>
>  /* Corresponds to each Port_Binding.type. */
> @@ -683,6 +686,7 @@ local_binding_data_init(struct local_binding_data 
> *lbinding_data)
>  {
>  shash_init(_data->bindings);
>  shash_init(_data->lports);
> +sset_init(_data->port_bindings);
>  }
>
>  void
> @@ -702,6 +706,7 @@ local_binding_data_destroy(struct local_binding_data 
> *lbinding_data)
>  shash_delete(_data->bindings, node);
>  }
>
> +sset_destroy(_data->port_bindings);
>  shash_destroy(_data->lports);
>  shash_destroy(_data->bindings);
>  }
> @@ -2926,23 +2931,3 @@ cleanup:
>
>  return b_lport;
>  }
> -
> -struct sset *
> -binding_collect_local_binding_lports(struct local_binding_data 
> *lbinding_data)
> -{
> -struct sset *lports = xzalloc(sizeof *lports);
> -sset_init(lports);
> -struct shash_node *shash_node;
> -SHASH_FOR_EACH (shash_node, _data->lports) {
> -struct binding_lport *b_lport = shash_node->data;
> -sset_add(lports, b_lport->name);
> -}
> -return lports;
> -}
> -
> -void
> -binding_destroy_local_binding_lports(struct sset *lports)
> -{
> -sset_destroy(lports);
> -free(lports);
> -}
> diff --git a/controller/binding.h b/controller/binding.h
> index 8f3289476..7a35faa0d 100644
> --- a/controller/binding.h
> +++ b/controller/binding.h
> @@ -22,6 +22,7 @@
>  #include "openvswitch/hmap.h"
>  #include "openvswitch/uuid.h"
>  #include "openvswitch/list.h"
> +#include "sset.h"
>
>  struct hmap;
>  struct ovsdb_idl;
> @@ -93,6 +94,7 @@ struct binding_ctx_out {
>  struct local_binding_data {
>  struct shash bindings;
>  struct shash lports;
> +struct sset port_bindings;

I'd suggest changing the name of this variable to "lport_names".  Although
sset would indicate that it represents a collection of port binding
names,  "lport_names"
would be clear in my opinion.


>  };
>
>  void local_binding_data_init(struct local_binding_data *);
> @@ -133,13 +135,4 @@ bool binding_handle_port_binding_changes(struct 
> binding_ctx_in *,
>  void binding_tracked_dp_destroy(struct hmap *tracked_datapaths);
>
>  void binding_dump_local_bindings(struct local_binding_data *, struct ds *);
> -
> -/* Generates a sset of lport names from local_binding_data.
> - * Note: the caller is responsible for destroying and freeing the returned
> - * sset, by calling binding_detroy_local_binding_lports(). */
> -struct sset *binding_collect_local_binding_lports(struct local_binding_data 
> *);
> -
> -/* Destroy and free the lports sset returned by
> - * binding_collect_local_binding_lports(). */
> -void binding_destroy_local_binding_lports(struct sset *lports);
>  #endif /* controller/binding.h */
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index 3968ef059..5675b97dd 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -1635,11 +1635,8 @@ en_port_groups_run(struct engine_node *node, void 
> *data)
>  struct ed_type_runtime_data *rt_data =
>  engine_get_input_data("runtime_data", node);
>
> -struct sset *local_b_lports = binding_collect_local_binding_lports(
> -

Re: [ovs-dev] [PATCH v7 1/4] conntrack: handle already natted packets

2021-06-25 Thread Paolo Valerio
Dumitru Ceara  writes:

> On 6/25/21 2:01 PM, Paolo Valerio wrote:
>> Dumitru Ceara  writes:
>> 
>>> On 6/21/21 12:06 PM, Paolo Valerio wrote:
 when a packet gets dnatted and then recirculated, it could be possible
 that it matches another rule that performs another nat action.
 The kernel datapath handles this situation turning to a no-op the
 second nat action, so natting only once the packet.  In the userspace
 datapath instead, when the ct action gets executed, an initial lookup
 of the translated packet fails to retrieve the connection related to
 the packet, leading to the creation of a new entry in ct for the src
 nat action with a subsequent failure of the connection establishment.

 with the following flows:

 table=0,priority=30,in_port=1,ip,nw_dst=192.168.2.100,actions=ct(commit,nat(dst=10.1.1.2:80),table=1)
 table=0,priority=20,in_port=2,ip,actions=ct(nat,table=1)
 table=0,priority=10,ip,actions=resubmit(,2)
 table=0,priority=10,arp,actions=NORMAL
 table=0,priority=0,actions=drop
 table=1,priority=5,ip,actions=ct(commit,nat(src=10.1.1.240),table=2)
 table=2,in_port=ovs-l0,actions=2
 table=2,in_port=ovs-r0,actions=1

 establishing a connection from 10.1.1.1 to 192.168.2.100 the outcome is:

 tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=4000,dport=80),reply=(src=10.1.1.2,dst=10.1.1.240,sport=80,dport=4000),protoinfo=(state=ESTABLISHED)
 tcp,orig=(src=10.1.1.1,dst=192.168.2.100,sport=4000,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=4000),protoinfo=(state=ESTABLISHED)

 with this patch applied the outcome is:

 tcp,orig=(src=10.1.1.1,dst=192.168.2.100,sport=4000,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=4000),protoinfo=(state=ESTABLISHED)

 The patch performs, for already natted packets, a lookup of the
 reverse key in order to retrieve the related entry, it also adds a
 test case that besides testing the scenario ensures that the other ct
 actions are executed.

 Reported-by: Dumitru Ceara 
 Signed-off-by: Paolo Valerio 
 ---
>>>
>>> Hi Paolo,
>>>
>>> Thanks for the patch!  I tested it and it works fine for OVN.  I have a
>>> few comments/questions below.
>>>
>> 
>> Thanks for the test and for the review.
>> 
  lib/conntrack.c |   30 +-
  tests/system-traffic.at |   35 +++
  2 files changed, 64 insertions(+), 1 deletion(-)

 diff --git a/lib/conntrack.c b/lib/conntrack.c
 index 99198a601..7e8b16a3e 100644
 --- a/lib/conntrack.c
 +++ b/lib/conntrack.c
 @@ -1281,6 +1281,33 @@ process_one_fast(uint16_t zone, const uint32_t 
 *setmark,
  }
  }
  
 +static void
 +initial_conn_lookup(struct conntrack *ct, struct conn_lookup_ctx *ctx,
 + long long now, bool natted)
>>>
>>> Nit: indentation.
>>>
>> 
>> ACK
>> 
 +{
 +bool found;
 +
 +if (natted) {
 +/* if the packet has been already natted (e.g. a previous
 + * action took place), retrieve it performing a lookup of its
 + * reverse key. */
 +conn_key_reverse(>key);
 +}
 +
 +found = conn_key_lookup(ct, >key, ctx->hash,
 +now, >conn, >reply);
 +if (natted) {
 +if (OVS_LIKELY(found)) {
 +ctx->reply = !ctx->reply;
 +ctx->key = ctx->reply ? ctx->conn->rev_key : ctx->conn->key;
 +ctx->hash = conn_key_hash(>key, ct->hash_basis);
 +} else {
 +/* in case of failure restore the initial key. */
 +conn_key_reverse(>key);
>>>
>>> Can the lookup actually fail?  I mean, if the packet was natted, there
>>> must have been a connection on which it got natted.  Anyway, I think we
>>> should probably also increment a coverage counter.  I guess dropping
>>> such packets would be hard, right?
>>>
>> 
>> I agree, it should not fail. If I'm not missing something, if it
>> happens, it should be because there's been a problem somewhere else
>> (e.g. a polluted ct_state value or more in general an unexpected
>> scenario). For this reason, I think it's better not to drop it or even
>> set it as invalid.
>
> I'm not sure, won't this create horrible to debug bugs when packets get
> forwarded in an unexpected way?  Setting it as invalid isn't good enough
> in my opinion because there might be flows later in the pipeline that
> perform actions (other than drop) on packets with ct_state +inv.
>
> The problem I have (because I don't know the conntrack code) is that I
> see no easy way to drop the packet.
>

I see your point and I understand it.
AFAICS, it doesn't seem to be an easy way to do it.

>> 
>> Yes, the coverage counter gives more meaning to the else branch.
>> Alternatively, we could probably even remove it. I would leave the NULL
>> 

Re: [ovs-dev] [PATCH V7 06/13] dpif-netdev: Add HW miss packet state recover logic.

2021-06-25 Thread Eli Britstein


On 6/25/2021 3:09 PM, Balazs Nemeth wrote:

*External email: Use caution opening links or attachments*


Ilya,

Sure, I'll rebase my patch and also ensure netdev_is_flow_api_enabled()
is called once per batch. I don't expect any issue with that.


Thanks Balazs.



Regards,
Balazs

On Fri, Jun 25, 2021 at 2:00 PM Ilya Maximets > wrote:


On 6/25/21 1:04 PM, Ferriter, Cian wrote:
> Hi Eli,
>
> I have some concerns about how we plug the packet state recover
logic into dfc_processing() here. My concerns are inline below.
>
> I'm concerned that we are hurting performance in the partial
HWOL case, as this patchset is introducing new direct (non-inline)
function calls per packet to the software datapath. We can reduce
performance impact in this area, see specific suggestions below.
>
> Thanks,
> Cian
>
>> -Original Message-
>> From: Eli Britstein mailto:el...@nvidia.com>>
>> Sent: Wednesday 23 June 2021 16:53
>> To: d...@openvswitch.org ; Ilya
Maximets mailto:i.maxim...@ovn.org>>
>> Cc: Gaetan Rivet mailto:gaet...@nvidia.com>>; Majd Dibbiny mailto:m...@nvidia.com>>; Sriharsha Basavapatna
>> mailto:sriharsha.basavapa...@broadcom.com>>; Hemal Shah
mailto:hemal.s...@broadcom.com>>; Ivan Malov
>> mailto:ivan.ma...@oktetlabs.ru>>;
Ferriter, Cian mailto:cian.ferri...@intel.com>>; Eli Britstein mailto:el...@nvidia.com>>;
>> Finn, Emma mailto:emma.f...@intel.com>>;
Kovacevic, Marko mailto:marko.kovace...@intel.com>>
>> Subject: [PATCH V7 06/13] dpif-netdev: Add HW miss packet state
recover logic.
>>
>> Recover the packet if it was partially processed by the HW.
Fallback to
>> lookup flow by mark association.
>>
>> Signed-off-by: Eli Britstein mailto:el...@nvidia.com>>
>> Reviewed-by: Gaetan Rivet mailto:gaet...@nvidia.com>>
>> Acked-by: Sriharsha Basavapatna
mailto:sriharsha.basavapa...@broadcom.com>>
>> Tested-by: Emma Finn mailto:emma.f...@intel.com>>
>> Tested-by: Marko Kovacevic mailto:marko.kovace...@intel.com>>
>> Signed-off-by: Ilya Maximets mailto:i.maxim...@ovn.org>>
>> ---
>>  lib/dpif-netdev.c | 45
+
>>  1 file changed, 41 insertions(+), 4 deletions(-)
>>
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index 8fa7eb6d4..d5b7d5b73 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -114,6 +114,7 @@ COVERAGE_DEFINE(datapath_drop_invalid_port);
>>  COVERAGE_DEFINE(datapath_drop_invalid_bond);
>>  COVERAGE_DEFINE(datapath_drop_invalid_tnl_port);
>>  COVERAGE_DEFINE(datapath_drop_rx_invalid_packet);
>> +COVERAGE_DEFINE(datapath_drop_hw_miss_recover);
>>
>>  /* Protects against changes to 'dp_netdevs'. */
>>  static struct ovs_mutex dp_netdev_mutex = OVS_MUTEX_INITIALIZER;
>> @@ -7062,6 +7063,39 @@ smc_lookup_batch(struct
dp_netdev_pmd_thread *pmd,
>> pmd_perf_update_counter(>perf_stats, PMD_STAT_SMC_HIT,
n_smc_hit);
>>  }
>>
>> +static struct tx_port * pmd_send_port_cache_lookup(
>> +    const struct dp_netdev_pmd_thread *pmd, odp_port_t port_no);
>> +
>> +static inline int
>> +dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
>> +                  odp_port_t port_no,
>> +                  struct dp_packet *packet,
>> +                  struct dp_netdev_flow **flow)
>> +{
>> +    struct tx_port *p;
>> +    uint32_t mark;
>> +
>
>
> Start of full HWOL recovery block
>
>> +    /* Restore the packet if HW processing was terminated
before completion. */
>> +    p = pmd_send_port_cache_lookup(pmd, port_no);
>> +    if (OVS_LIKELY(p)) {
>> +        int err =
netdev_hw_miss_packet_recover(p->port->netdev, packet);
>> +
>> +        if (err && err != EOPNOTSUPP) {
>> + COVERAGE_INC(datapath_drop_hw_miss_recover);
>> +            return -1;
>> +        }
>> +    }
>
> End of full HWOL recovery block
>
> IIUC, the above is only relevant to full HWOL where the packet
is partially processed by the HW. In a partial HWOL testcase, we
see a performance drop as a result of the above code. The
performance after the patchset is applied is 0.94x what the
performance was before.

General speaking, adding new code in the datapath is probable to have 
some degradation affect, that cannot be avoided completely.


I think performance optimizations for the partial offloads (or to SW 
datapath in general, even w/o offloads), can be done on top, like the 
one suggested by Balazs above, on top of it.




While reviewing the patch set I noticed this part too, but
this code was tested twice by Intel engineers, so I figured
that it doesn't hurt performance of partial offloading.

In general, it should 

Re: [ovs-dev] [PATCH v2 9/9] docs: Add documentation for ovsdb relay mode.

2021-06-25 Thread Dumitru Ceara
On 6/12/21 4:00 AM, Ilya Maximets wrote:
> Main documentation for the service model and tutorial with the use case
> and configuration examples.
> 
> Signed-off-by: Ilya Maximets 
> ---

I left a few minor comments below.  With them addressed:

Acked-by: Dumitru Ceara 

Thanks!

>  Documentation/automake.mk|   1 +
>  Documentation/ref/ovsdb.7.rst|  62 --
>  Documentation/topics/index.rst   |   1 +
>  Documentation/topics/ovsdb-relay.rst | 124 +++
>  NEWS |   3 +
>  ovsdb/ovsdb-server.1.in  |  27 +++---
>  6 files changed, 200 insertions(+), 18 deletions(-)
>  create mode 100644 Documentation/topics/ovsdb-relay.rst
> 
> diff --git a/Documentation/automake.mk b/Documentation/automake.mk
> index bc30f94c5..213d9c867 100644
> --- a/Documentation/automake.mk
> +++ b/Documentation/automake.mk
> @@ -52,6 +52,7 @@ DOC_SOURCE = \
>   Documentation/topics/networking-namespaces.rst \
>   Documentation/topics/openflow.rst \
>   Documentation/topics/ovs-extensions.rst \
> + Documentation/topics/ovsdb-relay.rst \
>   Documentation/topics/ovsdb-replication.rst \
>   Documentation/topics/porting.rst \
>   Documentation/topics/record-replay.rst \
> diff --git a/Documentation/ref/ovsdb.7.rst b/Documentation/ref/ovsdb.7.rst
> index e4f1bf766..a5b8a9c33 100644
> --- a/Documentation/ref/ovsdb.7.rst
> +++ b/Documentation/ref/ovsdb.7.rst
> @@ -121,13 +121,14 @@ schema checksum from a schema or database file, 
> respectively.
>  Service Models
>  ==
>  
> -OVSDB supports three service models for databases: **standalone**,
> -**active-backup**, and **clustered**.  The service models provide different
> -compromises among consistency, availability, and partition tolerance.  They
> -also differ in the number of servers required and in terms of performance.  
> The
> -standalone and active-backup database service models share one on-disk 
> format,
> -and clustered databases use a different format, but the OVSDB programs work
> -with both formats.  ``ovsdb(5)`` documents these file formats.
> +OVSDB supports four service models for databases: **standalone**,
> +**active-backup**, **relay** and **clustered**.  The service models provide
> +different compromises among consistency, availability, and partition 
> tolerance.
> +They also differ in the number of servers required and in terms of 
> performance.
> +The standalone and active-backup database service models share one on-disk
> +format, and clustered databases use a different format, but the OVSDB 
> programs
> +work with both formats.  ``ovsdb(5)`` documents these file formats.  Relay
> +databases has no on-disk storage.

s/has/have

>  
>  RFC 7047, which specifies the OVSDB protocol, does not mandate or specify
>  any particular service model.
> @@ -406,6 +407,50 @@ following consequences:
>that the client previously read.  The OVSDB client library in Open vSwitch
>uses this feature to avoid servers with stale data.
>  
> +Relay Service Model
> +---
> +
> +A **relay** database is a way to scale out read-mostly access to the
> +existing database working in any service model including relay.
> +
> +Relay database creates and maintains an OVSDB connection with other OVSDB
> +server.  It uses this connection to maintain in-memory copy of the remote
> +database (a.k.a. the ``relay source``) keeping the copy up-to-date as the
> +database content changes on relay source in the real time.
> +
> +The purpose of relay server is to scale out the number of database clients.
> +Read-only transactions and monitor requests are fully handled by the relay
> +server itself.  For the transactions that requests database modifications,

s/requests/request

> +relay works as a proxy between the client and the relay source, i.e. it
> +forwards transactions and replies between them.
> +
> +Compared to a clustered and active-backup models, relay service model 
> provides

s/Compared to a/Compared to the

> +read and write access to the database similarly to a clustered database (and
> +even more scalable), but with generally insignificant performance overhead of

Joke: citation needed

> +an active-backup model.  At the same time it doesn't increase availability 
> that
> +needs to be covered by the service model of the relay source.
> +
> +Relay database has no on-disk storage and therefore cannot be converted to
> +any other service model.
> +
> +If there is already a database started in any service model, to start a relay
> +database server use ``ovsdb-server relay::``, where
> + is the database name as specified in the schema of the database
> +that existing server runs, and  is an OVSDB connection 
> method
> +(see `Connection Methods`_ below) that connects to the existing database
> +server.   could contain a comma-separated list of 
> connection
> +methods, e.g. to connect to any server of the clustered database.
> 

Re: [ovs-dev] [PATCH v2 7/9] ovsdb: relay: Reflect connection status in _Server database.

2021-06-25 Thread Dumitru Ceara
On 6/12/21 4:00 AM, Ilya Maximets wrote:
> It might be important for clients to know that relay lost connection
> with the relay remote, so they could re-connect to other relay.
> 
> Signed-off-by: Ilya Maximets 
> ---

[...]

>  
> +#define RELAY_MAX_RECONNECTION_MS 3

30 seconds of relay "incorrectly" reporting that it is connected to the
source seems quite long.  Also, should we make this configurable?

The rest looks good to me, thanks!

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 8/9] ovsdb: Make clients aware of relay service model.

2021-06-25 Thread Dumitru Ceara
On 6/12/21 4:00 AM, Ilya Maximets wrote:
> Clients needs to re-connect from the relay that has no connection
> with the database source.  Also, relay acts similarly to the follower
> from a clustered model from the consistency point of view, so it's not
> suitable for leader-only connections.
> 
> Signed-off-by: Ilya Maximets 
> ---

Acked-by: Dumitru Ceara 

Thanks!

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 6/9] ovsdb: relay: Add support for transaction forwarding.

2021-06-25 Thread Dumitru Ceara
On 6/12/21 4:00 AM, Ilya Maximets wrote:
> Current version of ovsdb relay allows to scale out read-only
> access to the primary database.  However, many clients are not
> read-only but read-mostly.  For example, ovn-controller.
> 
> In order to scale out database access for this case ovsdb-server
> need to process transactions that are not read-only.  Relay is not
> allowed to do that, i.e. not allowed to modify the database, but it
> can act like a proxy and forward transactions that includes database
> modifications to the primary server and forward replies back to a
> client.  At the same time it may serve read-only transactions and
> monitor requests by itself greatly reducing the load on primary
> server.
> 
> This configuration will slightly increase transaction latency, but
> it's not very important for read-mostly use cases.
> 
> Implementation details:
> With this change instead of creating a trigger to commit the
> transaction, ovsdb-server will create a trigger for transaction
> forwarding.  Later, ovsdb_relay_run() will send all new transactions
> to the relay source.  Once transaction reply received from the
> relay source, ovsdb-relay module will update the state of the
> transaction forwarding with the reply.  After that, trigger_run()
> will complete the trigger and jsonrpc_server_run() will send the
> reply back to the client.  Since transaction reply from the relay
> source will be received after all the updates, client will receive
> all the updates before receiving the transaction reply as it is in
> a normal scenario with other database models.
> 
> Signed-off-by: Ilya Maximets 
> ---

I have a tiny nit below, otherwise:

Acked-by: Dumitru Ceara 

[...]

> diff --git a/ovsdb/relay.c b/ovsdb/relay.c
> index 5f423a0b9..ef689c649 100644
> --- a/ovsdb/relay.c
> +++ b/ovsdb/relay.c
> @@ -32,6 +32,7 @@
>  #include "row.h"
>  #include "table.h"
>  #include "transaction.h"
> +#include "transaction-forward.h"
>  #include "util.h"
>  
>  VLOG_DEFINE_THIS_MODULE(relay);
> @@ -298,6 +299,7 @@ ovsdb_relay_run(void)
>  struct relay_ctx *ctx = node->data;
>  struct ovs_list events;
>  
> +ovsdb_txn_forward_run(ctx->db, ctx->cs);
>  ovsdb_cs_run(ctx->cs, );
>  
>  struct ovsdb_cs_event *event;
> @@ -309,7 +311,9 @@ ovsdb_relay_run(void)
>  
>  switch (event->type) {
>  case OVSDB_CS_EVENT_TYPE_RECONNECT:
> -/* Nothing to do. */
> +/* Cancelling all the transactions that was already sent but

Nit: s/was/were/

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 5/9] ovsdb: New ovsdb 'relay' service model.

2021-06-25 Thread Dumitru Ceara
On 6/12/21 4:00 AM, Ilya Maximets wrote:
> New database service model 'relay' that is needed to scale out
> read-mostly database access, e.g. ovn-controller connections to
> OVN_Southbound.
> 
> In this service model ovsdb-server connects to existing OVSDB
> server and maintains in-memory copy of the database.  It serves
> read-only transactions and monitor requests by its own, but
> forwards write transactions to the relay source.
> 
> Key differences from the active-backup replication:
> - support for "write" transactions (next commit).
> - no on-disk storage. (probably, faster operation)
> - support for multiple remotes (connect to the clustered db).
> - doesn't try to keep connection as long as possible, but
>   faster reconnects to other remotes to avoid missing updates.
> - No need to know the complete database schema beforehand,
>   only the schema name.
> - can be used along with other standalone and clustered databases
>   by the same ovsdb-server process. (doesn't turn the whole
>   jsonrpc server to read-only mode)
> - supports modern version of monitors (monitor_cond_since),
>   because based on ovsdb-cs.
> - could be chained, i.e. multiple relays could be connected
>   one to another in a row or in a tree-like form.
> - doesn't increase availability.
> - cannot be converted to other service models or become a main
>   active server.
> 
> Signed-off-by: Ilya Maximets 
> ---

I have some very nitpicky comments below (and an unrelated bug report),
nevertheless:

Acked-by: Dumitru Ceara 

>  ovsdb/_server.ovsschema |   7 +-
>  ovsdb/_server.xml   |  16 +-
>  ovsdb/automake.mk   |   2 +
>  ovsdb/execution.c   |   5 +
>  ovsdb/ovsdb-server.c|  97 
>  ovsdb/ovsdb.c   |   2 +
>  ovsdb/ovsdb.h   |   3 +
>  ovsdb/relay.c   | 339 
>  ovsdb/relay.h   |  34 
>  9 files changed, 464 insertions(+), 41 deletions(-)
>  create mode 100644 ovsdb/relay.c
>  create mode 100644 ovsdb/relay.h
> 
> diff --git a/ovsdb/_server.ovsschema b/ovsdb/_server.ovsschema
> index a867e5cbf..e3d9d893b 100644
> --- a/ovsdb/_server.ovsschema
> +++ b/ovsdb/_server.ovsschema
> @@ -1,13 +1,14 @@
>  {"name": "_Server",
> - "version": "1.1.0",
> - "cksum": "3236486585 698",
> + "version": "1.2.0",
> + "cksum": "3009684573 744",
>   "tables": {
> "Database": {
>   "columns": {
> "name": {"type": "string"},
> "model": {
>   "type": {"key": {"type": "string",
> -  "enum": ["set", ["standalone", "clustered"]]}}},
> +  "enum": ["set",
> + ["standalone", "clustered", 
> "relay"]]}}},
> "connected": {"type": "boolean"},
> "leader": {"type": "boolean"},
> "schema": {
> diff --git a/ovsdb/_server.xml b/ovsdb/_server.xml
> index 70cd22db7..414be6715 100644
> --- a/ovsdb/_server.xml
> +++ b/ovsdb/_server.xml
> @@ -60,12 +60,14 @@
>  
>  
>The storage model: standalone for a standalone or
> -  active-backup database, clustered for a clustered 
> database.
> +  active-backup database, clustered for a clustered 
> database,
> +  relay for a relay database.
>  
>  
>  
>The database schema, as a JSON string.  In the case of a clustered
> -  database, this is empty until it finishes joining its cluster.
> +  database, this is empty until it finishes joining its cluster.  In the
> +  case of a relay database - until it connects to the relay source.
>  
>  
>  
> @@ -85,20 +87,20 @@
>  
>
>  True if the database is the leader in its cluster.  For a standalone 
> or
> -active-backup database, this is always true.
> +active-backup database, this is always true.  Always false for relay.
>
>  
>
>  The cluster ID for this database, which is the same for all of the
> -servers that host this particular clustered database.  For a 
> standalone
> -or active-backup database, this is empty.
> +servers that host this particular clustered database.  For a
> +standalone, active-backup or relay database, this is empty.
>
>  
>
>  The server ID for this database, different for each server that 
> hosts a
>  particular clustered database.  A server that hosts more than one
>  clustered database will have a different sid in each 
> one.
> -For a standalone or active-backup database, this is empty.
> +For a standalone, active-backup or relay database, this is empty.
>
>  
>
> @@ -112,7 +114,7 @@
>  
>  
>  
> -  For a standalone or active-backup database, this is empty.
> +  For a standalone, active-backup or relay database, this is empty.
>  
>
>  
> diff --git a/ovsdb/automake.mk b/ovsdb/automake.mk
> index 446d6c136..05c8ebbdf 100644
> --- a/ovsdb/automake.mk
> 

Re: [ovs-dev] [PATCH v2 4/9] ovsdb: row: Add support for xor-based row updates.

2021-06-25 Thread Dumitru Ceara
On 6/12/21 4:00 AM, Ilya Maximets wrote:
> This will be used to apply update3 type updates to ovsdb tables
> while processing updates for future ovsdb 'relay' service model.
> 
> 'ovsdb_datum_apply_diff' is allowed to fail, so adding support
> to return this error.
> 
> Signed-off-by: Ilya Maximets 
> ---

Acked-by: Dumitru Ceara 

Thanks!

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 2/9] ovsdb: storage: Allow setting the name for the unbacked storage.

2021-06-25 Thread Dumitru Ceara
On 6/12/21 4:00 AM, Ilya Maximets wrote:
> ovsdb_create() requires schema or storage to be nonnull, but in
> practice it requires to have schema name or a storage name to
> use it as a database name.  Only clustered storage has a name.
> This means that only clustered database can be created without
> schema,  Changing that by allowing unbacked storage to have a
> name.  This way we can create database with unbacked storage
> without schema.  Will be used in next commits to create database
> for ovsdb 'relay' service model.
> 
> Signed-off-by: Ilya Maximets 
> ---

Acked-by: Dumitru Ceara 

Thanks!

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 3/9] ovsdb: table: Expose functions to execute operations on ovsdb tables.

2021-06-25 Thread Dumitru Ceara
On 6/12/21 4:00 AM, Ilya Maximets wrote:
> These functions will be used later for ovsdb 'relay' service model, so
> moving them to a common code.
> 
> Warnings translated to ovsdb errors, caller in replication.c only
> printed inconsistency warnings, but mostly ignored them.  Implementing
> the same logic by checking the error tag.
> 
> Also ovsdb_execute_insert() previously printed incorrect warning about
> duplicate row while it was a syntax error in json.  Fixing that by
> actually checking for the duplicate and reporting correct ovsdb error.
> 
> Signed-off-by: Ilya Maximets 
> ---

Acked-by: Dumitru Ceara 

Thanks!

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 1/9] jsonrpc-server: Wake up jsonrpc session if there are completed triggers.

2021-06-25 Thread Dumitru Ceara
On 6/12/21 4:00 AM, Ilya Maximets wrote:
> If there are completed triggers, jsonrpc server should wake up and
> update clients with the new data, but there is no such condition
> in ovsdb_jsonrpc_session_wait().  For some reason this doesn't result
> in any processing delays in current code, probably because there are
> always some other types of events in this case that could wake ovsdb
> server up.  But it will become a problem in upcoming ovsdb 'relay'
> service model because triggers could be completed from a different
> place, i.e. after receiving transaction reply from the relay source.
> 
> Fix that by waking up ovsdb-server in case there are completed triggers
> that needs to be handled.
> 
> Signed-off-by: Ilya Maximets 
> ---

Acked-by: Dumitru Ceara 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 0/9] OVSDB Relay Service Model. (Was: OVSDB 2-Tier deployment)

2021-06-25 Thread Dumitru Ceara
On 6/12/21 3:59 AM, Ilya Maximets wrote:
> Replication can be used to scale out read-only access to the database.
> But there are clients that are not read-only, but read-mostly.
> One of the main examples is ovn-controller that mostly monitors
> updates from the Southbound DB, but needs to claim ports by sending
> transactions that changes some database tables.
> 
> Southbound database serves lots of connections: all connections
> from ovn-controllers and some service connections from cloud
> infrastructure, e.g. some OpenStack agents are monitoring updates.
> At a high scale and with a big size of the database ovsdb-server
> spends too much time processing monitor updates and it's required
> to move this load somewhere else.  This patch-set aims to introduce
> required functionality to scale out read-mostly connections by
> introducing a new OVSDB 'relay' service model .
> 
> In this new service model ovsdb-server connects to existing OVSDB
> server and maintains in-memory copy of the database.  It serves
> read-only transactions and monitor requests by its own, but forwards
> write transactions to the relay source.
> 
> Key differences from the active-backup replication:
> - support for "write" transactions.
> - no on-disk storage. (probably, faster operation)
> - support for multiple remotes (connect to the clustered db).
> - doesn't try to keep connection as long as possible, but
>   faster reconnects to other remotes to avoid missing updates.
> - No need to know the complete database schema beforehand,
>   only the schema name.
> - can be used along with other standalone and clustered databases
>   by the same ovsdb-server process. (doesn't turn the whole
>   jsonrpc server to read-only mode)
> - supports modern version of monitors (monitor_cond_since),
>   because based on ovsdb-cs.
> - could be chained, i.e. multiple relays could be connected
>   one to another in a row or in a tree-like form.
> 
> Bringing all above functionality to the existing active-backup
> replication doesn't look right as it will make it less reliable
> for the actual backup use case, and this also would be much
> harder from the implementation point of view, because current
> replication code is not based on ovsdb-cs or idl and all the required
> features would be likely duplicated or replication would be fully
> re-written on top of ovsdb-cs with severe modifications of the former.
> 
> Relay is somewhere in the middle between active-backup replication and
> the clustered model taking a lot from both, therefore is hard to
> implement on top of any of them.
> 
> To run ovsdb-server in relay mode, user need to simply run:
> 
>   ovsdb-server --remote=punix:db.sock relay::
> 
> e.g.
> 
>   ovsdb-server --remote=punix:db.sock relay:OVN_Southbound:tcp:127.0.0.1:6642
> 
> More details and examples in the documentation in the last patch
> of the series.
> 
> I actually tried to implement transaction forwarding on top of
> active-backup replication in v1 of this seies, but it required
> a lot of tricky changes, including schema format changes in order
> to bring required information to the end clients, so I decided
> to fully rewrite the functionality in v2 with a different approach.
> 
> Future work:
> - Add support for transaction history (it could be just inherited
>   from the transaction ids received from the relay source).  This
>   will allow clients to utilize monitor_cond_since while working
>   with relay.

Hi Ilya,

I acked most of the patches in the series (except 7/9 which I think
might need a rather straightforward change) and I saw Mark also left
some comments.

I wonder though if the lack of monitor_cond_since will be a show stopper
for deploying this in production?  Or do you expect reconnects to happen
less often do to the multi-tier nature of new deployments?

I guess we need some scale test data with this deployed to have a better
idea.

In any case, very nice work!

Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v4 02/12] dpif-netdev: Add auto validation function for miniflow extract

2021-06-25 Thread Eelco Chaudron

Hi Kumar,

I plan to review this patch set, but I need to go over the dpif AVX512 
set first to get a better understanding.


However, I did run some performance tests on old hardware (as I do not 
have an AVX512 system) and notice some degradation (and improvements). 
This was a single run for both scenarios, with the following results 
(based on ovs_perf), on a single Intel(R) Xeon(R) CPU E5-2690 v4 @ 
2.60GHz:


   Number of flows 64  128 256 512 768 1024
1514

Delta
   10  1.48%1.72%   1.59%  -0.12%   0.44%   6.99%   
7.31%
   10001.06%   -0.73%  -1.46%  -1.42%  -2.54%  -0.20%  
-0.98%
   1  -0.93%   -1.62%  -0.32%  -1.50%  -0.30%  -0.56%   
0.19%
   10  0.39%   -0.05%  -0.60%  -0.51%  -0.90%   1.24%  
-1.10%



Master
   10  4767168 4601575 4382381 4127125 3594158 2932787 
2400479
   100 3804956 3612716 3547054 3127117 2950324 2615856 
2133892
   10003251959 3257535 2985693 2869970 2549086 2286262 
1979985
   1   2671946 2624808 2536575 2412845 2190386 1952359 
1699142



Patch
   10  4838691 4682131 4453022 4122100 3609915 3153228 
2589748
   100 3845585 3586650 3496167 3083467 2877265 2610640 
2113108
   10003221894 3205732 2976203 2827620 2541349 2273468 
1983794
   1   2682461 2623585 2521419 2400627 2170751 1976909 
1680607


Zero loss for master 5.8% (3,452,306pps) vs on Patch 5.7% 
(3,392,783pps).



Did you guys do any tests like this? I think it would be good not only 
to know the improvement but also the degradation of existing systems 
without AVX512.



I see Ian is currently reviewing the v4 and was wondering if you plan to 
send the v5 soon, if so, I hold off a bit, and do the v5 rather than 
doing the v4 and verify it’s not something Ian mentioned.


Cheers,


Eelco


On 17 Jun 2021, at 18:27, Kumar Amber wrote:


This patch introduced the auto-validation function which
allows users to compare the batch of packets obtained from
different miniflow implementations against the linear
miniflow extract and return a hitmask.

The autovaidator function can be triggered at runtime using the
following command:

$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator

Signed-off-by: Kumar Amber 
Co-authored-by: Harry van Haaren 
Signed-off-by: Harry van Haaren 
---
 lib/dpif-netdev-private-extract.c | 141 
++

 lib/dpif-netdev-private-extract.h |  15 
 lib/dpif-netdev.c |   2 +-
 3 files changed, 157 insertions(+), 1 deletion(-)

diff --git a/lib/dpif-netdev-private-extract.c 
b/lib/dpif-netdev-private-extract.c

index fcc56ef26..0741c19f9 100644
--- a/lib/dpif-netdev-private-extract.c
+++ b/lib/dpif-netdev-private-extract.c
@@ -32,6 +32,11 @@ VLOG_DEFINE_THIS_MODULE(dpif_netdev_extract);

 /* Implementations of available extract options. */
 static struct dpif_miniflow_extract_impl mfex_impls[] = {
+   {
+.probe = NULL,
+.extract_func = dpif_miniflow_extract_autovalidator,
+.name = "autovalidator",
+},
 {
 .probe = NULL,
 .extract_func = NULL,
@@ -84,3 +89,139 @@ dpif_miniflow_extract_info_get(struct 
dpif_miniflow_extract_impl **out_ptr)

 *out_ptr = mfex_impls;
 return ARRAY_SIZE(mfex_impls);
 }
+
+uint32_t
+dpif_miniflow_extract_autovalidator(struct dp_packet_batch *packets,
+struct netdev_flow_key *keys,
+uint32_t keys_size, odp_port_t 
in_port,

+void *pmd_handle)
+{
+const size_t cnt = dp_packet_batch_size(packets);
+uint16_t good_l2_5_ofs[NETDEV_MAX_BURST];
+uint16_t good_l3_ofs[NETDEV_MAX_BURST];
+uint16_t good_l4_ofs[NETDEV_MAX_BURST];
+uint16_t good_l2_pad_size[NETDEV_MAX_BURST];
+struct dp_packet *packet;
+struct dp_netdev_pmd_thread *pmd = pmd_handle;
+struct dpif_miniflow_extract_impl *miniflow_funcs;
+
+int32_t mfunc_count = 
dpif_miniflow_extract_info_get(_funcs);

+if (mfunc_count < 0) {
+pmd->miniflow_extract_opt = NULL;
+VLOG_ERR("failed to get miniflow extract function 
implementations\n");

+return 0;
+}
+ovs_assert(keys_size >= cnt);
+struct netdev_flow_key test_keys[NETDEV_MAX_BURST];
+
+/* Run scalar miniflow_extract to get default result. */
+DP_PACKET_BATCH_FOR_EACH (i, packet, packets) {
+pkt_metadata_init(>md, in_port);
+miniflow_extract(packet, [i].mf);
+
+/* Store known good metadata to compare with optimized 
metadata. */

+good_l2_5_ofs[i] = packet->l2_5_ofs;
+good_l3_ofs[i] = packet->l3_ofs;
+good_l4_ofs[i] = packet->l4_ofs;
+good_l2_pad_size[i] = packet->l2_pad_size;
+}
+
+/* Iterate through each version of miniflow implementations. */
+for (int j = MFEX_IMPL_START_IDX; j < 

Re: [ovs-dev] [PATCH v9] ofproto-dpif: APIs and CLI option to add/delete static fdb entry

2021-06-25 Thread Eelco Chaudron

This one looks good, however, I would make some small changes.
If you do, you can add my acked by to the new revision.

//Eelco


On 24 Jun 2021, at 17:57, Vasu Dasari wrote:

Currently there is an option to add/flush/show ARP/ND neighbor. This 
covers L3
side.  For L2 side, there is only fdb show command. This patch gives 
an option

to add/del an fdb entry via ovs-appctl.

CLI command looks like:

To add:
ovs-appctl fdb/add
ovs-appctl fdb/add br0 p1 0 50:54:00:00:00:05

To del:
ovs-appctl fdb/del   
ovs-appctl fdb/del br0 0 50:54:00:00:00:05

Added two new APIs to provide convenient interface to add and delete 
static-macs.
bool xlate_add_static_mac_entry(const struct ofproto_dpif *, 
ofp_port_t in_port,

   struct eth_addr dl_src, int vlan);
bool xlate_delete_static_mac_entry(const struct ofproto_dpif *,
  struct eth_addr dl_src, int vlan);

1. Static entry should not age. To indicate that entry being 
programmed is a static entry,
   'expires' field in 'struct mac_entry' will be set to a 
MAC_ENTRY_AGE_STATIC_ENTRY. A
   check for this value is made while deleting mac entry as part of 
regular aging process.
2. Another change to of mac-update logic, when a packet with same 
dl_src as that of a
   static-mac entry arrives on any port, the logic will not modify the 
expires field.

3. While flushing fdb entries, made sure static ones are not evicted.
4. Updated "ovs-appctl fdb/stats-show br0" to display numberof static 
entries in switch


Added following tests:
  ofproto-dpif - static-mac add/del/flush
  ofproto-dpif - static-mac mac moves

Signed-off-by: Vasu Dasari 
Reported-at: 
https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048894.html

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1597752
Tested-by: Eelco Chaudron 
---
v1:
 - Fixed 0-day robot warnings
v2:
 - Fix valgrind error in the modified code in mac_learning_insert() 
where a read is

   is performed on e->expires which is not initialized
v3:
 - Addressed code review comments
 - Added more documentation
 - Fixed mac_entry_age() and is_mac_learning_update_needed() to have 
common

   understanding of return values when mac_entry is a static one.
 - Added NEWS item
v4:
 - Addressed code review comments
 - Static entries will not be purged when fdb/flush is performed.
 - Static entries will not be overwritten when packet with same dl_src 
arrives on

   any port of switch
 - Provided bit more detail while doing fdb/add, to indicate if 
static-mac is

   overriding already present entry
 - Separated test cases for a bit more clarity
 v5:
 - Addressed code review comments
 - Added new total_static counter to count number of static entries.
 - Removed mac_entry_set_idle_time()
 - Added mac_learning_add_static_entry() and 
mac_learning_del_static_entry()
 - Modified APIs xlate_add_static_mac_entry() and 
xlate_delete_static_mac_entry()

   return 0 on success else a failure code
 v6:
 - Fixed a probable bug with Eelco's code review comments in
   is_mac_learning_update_needed()
 v7:
 - Added a ovs-vswitchd.8 man page entry for fdb add/del commands
 v8:
 - Updaed with code review comments from Eelco.
 - Renamed total_static to static_entries
 - Added coverage counter mac_learning_static_none_move
 - Fixed a possible bug with static_entries getting cleared via
   fdb/stats-clear command
 - Initialize static_entries in mac_learning_create()
 - Modified fdb/del command by removing option to specify port-name
 - Breakup ofproto_unixctl_fdb_update into ofproto_unixctl_fdb_add
   and ofproto_unixctl_fdb_delete
 - Updated test "static-mac add/del/flush" to have interleaved mac
   entries before fdb/flush
 - Updated test "static-mac mac move" to check for newly added
   coverage counter mac_learning_static_none_move
v9:
 - Updated source code comments and addressed code review comments
---
 NEWS |   4 +
 lib/mac-learning.c   | 155 
+++

 lib/mac-learning.h   |  17 
 ofproto/ofproto-dpif-xlate.c |  48 +--
 ofproto/ofproto-dpif-xlate.h |   5 ++
 ofproto/ofproto-dpif.c   | 117 +-
 tests/ofproto-dpif.at|  99 ++
 vswitchd/ovs-vswitchd.8.in   |   5 ++
 8 files changed, 421 insertions(+), 29 deletions(-)

diff --git a/NEWS b/NEWS
index db3faf4cc..12fb40054 100644
--- a/NEWS
+++ b/NEWS
@@ -20,6 +20,10 @@ Post-v2.15.0
- ovsdb-tool:
  * New option '--election-timer' to the 'create-cluster' command 
to set the

leader election timer during cluster creation.
+   - ovs-appctl:
+ * Added ability to add and delete static mac entries using:
+   'ovs-appctl fdb/add'
+   'ovs-appctl fdb/del   '


 v2.15.0 - 15 Feb 2021
diff --git a/lib/mac-learning.c b/lib/mac-learning.c
index 3d5293d3b..dd3f46a8b 100644
--- a/lib/mac-learning.c
+++ b/lib/mac-learning.c
@@ -34,13 +34,25 @@ 

Re: [ovs-dev] [PATCH V7 06/13] dpif-netdev: Add HW miss packet state recover logic.

2021-06-25 Thread Balazs Nemeth
Ilya,

Sure, I'll rebase my patch and also ensure netdev_is_flow_api_enabled()
is called once per batch. I don't expect any issue with that.

Regards,
Balazs

On Fri, Jun 25, 2021 at 2:00 PM Ilya Maximets  wrote:

> On 6/25/21 1:04 PM, Ferriter, Cian wrote:
> > Hi Eli,
> >
> > I have some concerns about how we plug the packet state recover logic
> into dfc_processing() here. My concerns are inline below.
> >
> > I'm concerned that we are hurting performance in the partial HWOL case,
> as this patchset is introducing new direct (non-inline) function calls per
> packet to the software datapath. We can reduce performance impact in this
> area, see specific suggestions below.
> >
> > Thanks,
> > Cian
> >
> >> -Original Message-
> >> From: Eli Britstein 
> >> Sent: Wednesday 23 June 2021 16:53
> >> To: d...@openvswitch.org; Ilya Maximets 
> >> Cc: Gaetan Rivet ; Majd Dibbiny ;
> Sriharsha Basavapatna
> >> ; Hemal Shah <
> hemal.s...@broadcom.com>; Ivan Malov
> >> ; Ferriter, Cian ;
> Eli Britstein ;
> >> Finn, Emma ; Kovacevic, Marko <
> marko.kovace...@intel.com>
> >> Subject: [PATCH V7 06/13] dpif-netdev: Add HW miss packet state recover
> logic.
> >>
> >> Recover the packet if it was partially processed by the HW. Fallback to
> >> lookup flow by mark association.
> >>
> >> Signed-off-by: Eli Britstein 
> >> Reviewed-by: Gaetan Rivet 
> >> Acked-by: Sriharsha Basavapatna 
> >> Tested-by: Emma Finn 
> >> Tested-by: Marko Kovacevic 
> >> Signed-off-by: Ilya Maximets 
> >> ---
> >>  lib/dpif-netdev.c | 45 +
> >>  1 file changed, 41 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> >> index 8fa7eb6d4..d5b7d5b73 100644
> >> --- a/lib/dpif-netdev.c
> >> +++ b/lib/dpif-netdev.c
> >> @@ -114,6 +114,7 @@ COVERAGE_DEFINE(datapath_drop_invalid_port);
> >>  COVERAGE_DEFINE(datapath_drop_invalid_bond);
> >>  COVERAGE_DEFINE(datapath_drop_invalid_tnl_port);
> >>  COVERAGE_DEFINE(datapath_drop_rx_invalid_packet);
> >> +COVERAGE_DEFINE(datapath_drop_hw_miss_recover);
> >>
> >>  /* Protects against changes to 'dp_netdevs'. */
> >>  static struct ovs_mutex dp_netdev_mutex = OVS_MUTEX_INITIALIZER;
> >> @@ -7062,6 +7063,39 @@ smc_lookup_batch(struct dp_netdev_pmd_thread
> *pmd,
> >>  pmd_perf_update_counter(>perf_stats, PMD_STAT_SMC_HIT,
> n_smc_hit);
> >>  }
> >>
> >> +static struct tx_port * pmd_send_port_cache_lookup(
> >> +const struct dp_netdev_pmd_thread *pmd, odp_port_t port_no);
> >> +
> >> +static inline int
> >> +dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
> >> +  odp_port_t port_no,
> >> +  struct dp_packet *packet,
> >> +  struct dp_netdev_flow **flow)
> >> +{
> >> +struct tx_port *p;
> >> +uint32_t mark;
> >> +
> >
> >
> > Start of full HWOL recovery block
> >
> >> +/* Restore the packet if HW processing was terminated before
> completion. */
> >> +p = pmd_send_port_cache_lookup(pmd, port_no);
> >> +if (OVS_LIKELY(p)) {
> >> +int err = netdev_hw_miss_packet_recover(p->port->netdev,
> packet);
> >> +
> >> +if (err && err != EOPNOTSUPP) {
> >> +COVERAGE_INC(datapath_drop_hw_miss_recover);
> >> +return -1;
> >> +}
> >> +}
> >
> > End of full HWOL recovery block
> >
> > IIUC, the above is only relevant to full HWOL where the packet is
> partially processed by the HW. In a partial HWOL testcase, we see a
> performance drop as a result of the above code. The performance after the
> patchset is applied is 0.94x what the performance was before.
>
> While reviewing the patch set I noticed this part too, but
> this code was tested twice by Intel engineers, so I figured
> that it doesn't hurt performance of partial offloading.
>
> In general, it should be easy to re-order partial and full
> offloading checks like this (didn't test):
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index c5ab35d2a..36a5976f2 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -7113,6 +7113,14 @@ dp_netdev_hw_flow(const struct dp_netdev_pmd_thread
> *pmd,
>  struct tx_port *p;
>  uint32_t mark;
>
> +/* If no mark, no flow to find. */
> +if (dp_packet_has_flow_mark(packet, )) {
> +*flow = mark_to_flow_find(pmd, mark);
> +return 0;
> +}
> +
> +*flow = NULL;
> +
>  /* Restore the packet if HW processing was terminated before
> completion. */
>  p = pmd_send_port_cache_lookup(pmd, port_no);
>  if (OVS_LIKELY(p)) {
> @@ -7123,14 +7131,6 @@ dp_netdev_hw_flow(const struct dp_netdev_pmd_thread
> *pmd,
>  return -1;
>  }
>  }
> -
> -/* If no mark, no flow to find. */
> -if (!dp_packet_has_flow_mark(packet, )) {
> -*flow = NULL;
> -return 0;
> -}
> -
> -*flow = mark_to_flow_find(pmd, mark);
>  return 0;
>  }
> ---
>
> If everyone agrees, I can do this change.
>
> >
> > We should branch over 

Re: [ovs-dev] [PATCH v7 1/4] conntrack: handle already natted packets

2021-06-25 Thread Dumitru Ceara
On 6/25/21 2:01 PM, Paolo Valerio wrote:
> Dumitru Ceara  writes:
> 
>> On 6/21/21 12:06 PM, Paolo Valerio wrote:
>>> when a packet gets dnatted and then recirculated, it could be possible
>>> that it matches another rule that performs another nat action.
>>> The kernel datapath handles this situation turning to a no-op the
>>> second nat action, so natting only once the packet.  In the userspace
>>> datapath instead, when the ct action gets executed, an initial lookup
>>> of the translated packet fails to retrieve the connection related to
>>> the packet, leading to the creation of a new entry in ct for the src
>>> nat action with a subsequent failure of the connection establishment.
>>>
>>> with the following flows:
>>>
>>> table=0,priority=30,in_port=1,ip,nw_dst=192.168.2.100,actions=ct(commit,nat(dst=10.1.1.2:80),table=1)
>>> table=0,priority=20,in_port=2,ip,actions=ct(nat,table=1)
>>> table=0,priority=10,ip,actions=resubmit(,2)
>>> table=0,priority=10,arp,actions=NORMAL
>>> table=0,priority=0,actions=drop
>>> table=1,priority=5,ip,actions=ct(commit,nat(src=10.1.1.240),table=2)
>>> table=2,in_port=ovs-l0,actions=2
>>> table=2,in_port=ovs-r0,actions=1
>>>
>>> establishing a connection from 10.1.1.1 to 192.168.2.100 the outcome is:
>>>
>>> tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=4000,dport=80),reply=(src=10.1.1.2,dst=10.1.1.240,sport=80,dport=4000),protoinfo=(state=ESTABLISHED)
>>> tcp,orig=(src=10.1.1.1,dst=192.168.2.100,sport=4000,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=4000),protoinfo=(state=ESTABLISHED)
>>>
>>> with this patch applied the outcome is:
>>>
>>> tcp,orig=(src=10.1.1.1,dst=192.168.2.100,sport=4000,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=4000),protoinfo=(state=ESTABLISHED)
>>>
>>> The patch performs, for already natted packets, a lookup of the
>>> reverse key in order to retrieve the related entry, it also adds a
>>> test case that besides testing the scenario ensures that the other ct
>>> actions are executed.
>>>
>>> Reported-by: Dumitru Ceara 
>>> Signed-off-by: Paolo Valerio 
>>> ---
>>
>> Hi Paolo,
>>
>> Thanks for the patch!  I tested it and it works fine for OVN.  I have a
>> few comments/questions below.
>>
> 
> Thanks for the test and for the review.
> 
>>>  lib/conntrack.c |   30 +-
>>>  tests/system-traffic.at |   35 +++
>>>  2 files changed, 64 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/lib/conntrack.c b/lib/conntrack.c
>>> index 99198a601..7e8b16a3e 100644
>>> --- a/lib/conntrack.c
>>> +++ b/lib/conntrack.c
>>> @@ -1281,6 +1281,33 @@ process_one_fast(uint16_t zone, const uint32_t 
>>> *setmark,
>>>  }
>>>  }
>>>  
>>> +static void
>>> +initial_conn_lookup(struct conntrack *ct, struct conn_lookup_ctx *ctx,
>>> + long long now, bool natted)
>>
>> Nit: indentation.
>>
> 
> ACK
> 
>>> +{
>>> +bool found;
>>> +
>>> +if (natted) {
>>> +/* if the packet has been already natted (e.g. a previous
>>> + * action took place), retrieve it performing a lookup of its
>>> + * reverse key. */
>>> +conn_key_reverse(>key);
>>> +}
>>> +
>>> +found = conn_key_lookup(ct, >key, ctx->hash,
>>> +now, >conn, >reply);
>>> +if (natted) {
>>> +if (OVS_LIKELY(found)) {
>>> +ctx->reply = !ctx->reply;
>>> +ctx->key = ctx->reply ? ctx->conn->rev_key : ctx->conn->key;
>>> +ctx->hash = conn_key_hash(>key, ct->hash_basis);
>>> +} else {
>>> +/* in case of failure restore the initial key. */
>>> +conn_key_reverse(>key);
>>
>> Can the lookup actually fail?  I mean, if the packet was natted, there
>> must have been a connection on which it got natted.  Anyway, I think we
>> should probably also increment a coverage counter.  I guess dropping
>> such packets would be hard, right?
>>
> 
> I agree, it should not fail. If I'm not missing something, if it
> happens, it should be because there's been a problem somewhere else
> (e.g. a polluted ct_state value or more in general an unexpected
> scenario). For this reason, I think it's better not to drop it or even
> set it as invalid.

I'm not sure, won't this create horrible to debug bugs when packets get
forwarded in an unexpected way?  Setting it as invalid isn't good enough
in my opinion because there might be flows later in the pipeline that
perform actions (other than drop) on packets with ct_state +inv.

The problem I have (because I don't know the conntrack code) is that I
see no easy way to drop the packet.

> 
> Yes, the coverage counter gives more meaning to the else branch.
> Alternatively, we could probably even remove it. I would leave the NULL
> check or equivalent.
> 
> I have no strong preference.
> WDYT?
> 

I would prefer a coverage counter, at least to have an indication of a
problem "somewhere" in conntrack.

Regards,
Dumitru

___

Re: [ovs-dev] [PATCH v7 1/4] conntrack: handle already natted packets

2021-06-25 Thread Paolo Valerio
Dumitru Ceara  writes:

> On 6/21/21 12:06 PM, Paolo Valerio wrote:
>> when a packet gets dnatted and then recirculated, it could be possible
>> that it matches another rule that performs another nat action.
>> The kernel datapath handles this situation turning to a no-op the
>> second nat action, so natting only once the packet.  In the userspace
>> datapath instead, when the ct action gets executed, an initial lookup
>> of the translated packet fails to retrieve the connection related to
>> the packet, leading to the creation of a new entry in ct for the src
>> nat action with a subsequent failure of the connection establishment.
>> 
>> with the following flows:
>> 
>> table=0,priority=30,in_port=1,ip,nw_dst=192.168.2.100,actions=ct(commit,nat(dst=10.1.1.2:80),table=1)
>> table=0,priority=20,in_port=2,ip,actions=ct(nat,table=1)
>> table=0,priority=10,ip,actions=resubmit(,2)
>> table=0,priority=10,arp,actions=NORMAL
>> table=0,priority=0,actions=drop
>> table=1,priority=5,ip,actions=ct(commit,nat(src=10.1.1.240),table=2)
>> table=2,in_port=ovs-l0,actions=2
>> table=2,in_port=ovs-r0,actions=1
>> 
>> establishing a connection from 10.1.1.1 to 192.168.2.100 the outcome is:
>> 
>> tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=4000,dport=80),reply=(src=10.1.1.2,dst=10.1.1.240,sport=80,dport=4000),protoinfo=(state=ESTABLISHED)
>> tcp,orig=(src=10.1.1.1,dst=192.168.2.100,sport=4000,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=4000),protoinfo=(state=ESTABLISHED)
>> 
>> with this patch applied the outcome is:
>> 
>> tcp,orig=(src=10.1.1.1,dst=192.168.2.100,sport=4000,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=4000),protoinfo=(state=ESTABLISHED)
>> 
>> The patch performs, for already natted packets, a lookup of the
>> reverse key in order to retrieve the related entry, it also adds a
>> test case that besides testing the scenario ensures that the other ct
>> actions are executed.
>> 
>> Reported-by: Dumitru Ceara 
>> Signed-off-by: Paolo Valerio 
>> ---
>
> Hi Paolo,
>
> Thanks for the patch!  I tested it and it works fine for OVN.  I have a
> few comments/questions below.
>

Thanks for the test and for the review.

>>  lib/conntrack.c |   30 +-
>>  tests/system-traffic.at |   35 +++
>>  2 files changed, 64 insertions(+), 1 deletion(-)
>> 
>> diff --git a/lib/conntrack.c b/lib/conntrack.c
>> index 99198a601..7e8b16a3e 100644
>> --- a/lib/conntrack.c
>> +++ b/lib/conntrack.c
>> @@ -1281,6 +1281,33 @@ process_one_fast(uint16_t zone, const uint32_t 
>> *setmark,
>>  }
>>  }
>>  
>> +static void
>> +initial_conn_lookup(struct conntrack *ct, struct conn_lookup_ctx *ctx,
>> + long long now, bool natted)
>
> Nit: indentation.
>

ACK

>> +{
>> +bool found;
>> +
>> +if (natted) {
>> +/* if the packet has been already natted (e.g. a previous
>> + * action took place), retrieve it performing a lookup of its
>> + * reverse key. */
>> +conn_key_reverse(>key);
>> +}
>> +
>> +found = conn_key_lookup(ct, >key, ctx->hash,
>> +now, >conn, >reply);
>> +if (natted) {
>> +if (OVS_LIKELY(found)) {
>> +ctx->reply = !ctx->reply;
>> +ctx->key = ctx->reply ? ctx->conn->rev_key : ctx->conn->key;
>> +ctx->hash = conn_key_hash(>key, ct->hash_basis);
>> +} else {
>> +/* in case of failure restore the initial key. */
>> +conn_key_reverse(>key);
>
> Can the lookup actually fail?  I mean, if the packet was natted, there
> must have been a connection on which it got natted.  Anyway, I think we
> should probably also increment a coverage counter.  I guess dropping
> such packets would be hard, right?
>

I agree, it should not fail. If I'm not missing something, if it
happens, it should be because there's been a problem somewhere else
(e.g. a polluted ct_state value or more in general an unexpected
scenario). For this reason, I think it's better not to drop it or even
set it as invalid.

Yes, the coverage counter gives more meaning to the else branch.
Alternatively, we could probably even remove it. I would leave the NULL
check or equivalent.

I have no strong preference.
WDYT?

> Thanks,
> Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] ovn-controller: Fix port group I-P when they contain non-vif ports.

2021-06-25 Thread Dumitru Ceara
On 6/25/21 1:50 PM, Dumitru Ceara wrote:
> It's valid that port_groups contain non-vif ports, they can actually
> contain any type of logical_switch_port.
> 
> Also, there's no need to allocate a new sset containing the local ports'
> names every time the I-P engine processes a change, we can maintain a
> sset and incrementally update it when port bindings are added/removed.
> 
> Reported-at: https://github.com/ovn-org/ovn/pull/61#issuecomment-865094163
> Reported-by: Antonio Ojea 
> Fixes: 0cfeba6b55e3 ("ovn-controller: Fix port group conjunction flow 
> explosion problem.")
> Signed-off-by: Dumitru Ceara 
> ---

Hi Han,

It would be great if you could have a look at this patch, commit
0cfeba6b55e3 ("ovn-controller: Fix port group conjunction flow explosion
problem.") breaks ACL use cases in ovn-kubernetes.

Thanks,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] ovn-controller: Fix port group I-P when they contain non-vif ports.

2021-06-25 Thread Dumitru Ceara
It's valid that port_groups contain non-vif ports, they can actually
contain any type of logical_switch_port.

Also, there's no need to allocate a new sset containing the local ports'
names every time the I-P engine processes a change, we can maintain a
sset and incrementally update it when port bindings are added/removed.

Reported-at: https://github.com/ovn-org/ovn/pull/61#issuecomment-865094163
Reported-by: Antonio Ojea 
Fixes: 0cfeba6b55e3 ("ovn-controller: Fix port group conjunction flow explosion 
problem.")
Signed-off-by: Dumitru Ceara 
---
 controller/binding.c| 25 +
 controller/binding.h| 11 ++-
 controller/ovn-controller.c | 24 +++-
 3 files changed, 14 insertions(+), 46 deletions(-)

diff --git a/controller/binding.c b/controller/binding.c
index 7fde0fdbb..1f6188e0d 100644
--- a/controller/binding.c
+++ b/controller/binding.c
@@ -549,6 +549,7 @@ update_local_lport_ids(const struct sbrec_port_binding *pb,
 tracked_binding_datapath_lport_add(pb, b_ctx->tracked_dp_bindings);
 }
 }
+sset_add(_ctx->lbinding_data->port_bindings, pb->logical_port);
 }
 
 /* Remove a port binding id from the set of local lport IDs. Also track if
@@ -569,6 +570,8 @@ remove_local_lport_ids(const struct sbrec_port_binding *pb,
 tracked_binding_datapath_lport_add(pb, b_ctx->tracked_dp_bindings);
 }
 }
+sset_find_and_delete(_ctx->lbinding_data->port_bindings,
+ pb->logical_port);
 }
 
 /* Corresponds to each Port_Binding.type. */
@@ -683,6 +686,7 @@ local_binding_data_init(struct local_binding_data 
*lbinding_data)
 {
 shash_init(_data->bindings);
 shash_init(_data->lports);
+sset_init(_data->port_bindings);
 }
 
 void
@@ -702,6 +706,7 @@ local_binding_data_destroy(struct local_binding_data 
*lbinding_data)
 shash_delete(_data->bindings, node);
 }
 
+sset_destroy(_data->port_bindings);
 shash_destroy(_data->lports);
 shash_destroy(_data->bindings);
 }
@@ -2926,23 +2931,3 @@ cleanup:
 
 return b_lport;
 }
-
-struct sset *
-binding_collect_local_binding_lports(struct local_binding_data *lbinding_data)
-{
-struct sset *lports = xzalloc(sizeof *lports);
-sset_init(lports);
-struct shash_node *shash_node;
-SHASH_FOR_EACH (shash_node, _data->lports) {
-struct binding_lport *b_lport = shash_node->data;
-sset_add(lports, b_lport->name);
-}
-return lports;
-}
-
-void
-binding_destroy_local_binding_lports(struct sset *lports)
-{
-sset_destroy(lports);
-free(lports);
-}
diff --git a/controller/binding.h b/controller/binding.h
index 8f3289476..7a35faa0d 100644
--- a/controller/binding.h
+++ b/controller/binding.h
@@ -22,6 +22,7 @@
 #include "openvswitch/hmap.h"
 #include "openvswitch/uuid.h"
 #include "openvswitch/list.h"
+#include "sset.h"
 
 struct hmap;
 struct ovsdb_idl;
@@ -93,6 +94,7 @@ struct binding_ctx_out {
 struct local_binding_data {
 struct shash bindings;
 struct shash lports;
+struct sset port_bindings;
 };
 
 void local_binding_data_init(struct local_binding_data *);
@@ -133,13 +135,4 @@ bool binding_handle_port_binding_changes(struct 
binding_ctx_in *,
 void binding_tracked_dp_destroy(struct hmap *tracked_datapaths);
 
 void binding_dump_local_bindings(struct local_binding_data *, struct ds *);
-
-/* Generates a sset of lport names from local_binding_data.
- * Note: the caller is responsible for destroying and freeing the returned
- * sset, by calling binding_detroy_local_binding_lports(). */
-struct sset *binding_collect_local_binding_lports(struct local_binding_data *);
-
-/* Destroy and free the lports sset returned by
- * binding_collect_local_binding_lports(). */
-void binding_destroy_local_binding_lports(struct sset *lports);
 #endif /* controller/binding.h */
diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 3968ef059..5675b97dd 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -1635,11 +1635,8 @@ en_port_groups_run(struct engine_node *node, void *data)
 struct ed_type_runtime_data *rt_data =
 engine_get_input_data("runtime_data", node);
 
-struct sset *local_b_lports = binding_collect_local_binding_lports(
-_data->lbinding_data);
-port_groups_init(pg_table, local_b_lports, >port_group_ssets,
- >port_groups_cs_local);
-binding_destroy_local_binding_lports(local_b_lports);
+port_groups_init(pg_table, _data->lbinding_data.port_bindings,
+ >port_group_ssets, >port_groups_cs_local);
 
 engine_set_node_state(node, EN_UPDATED);
 }
@@ -1656,12 +1653,9 @@ port_groups_sb_port_group_handler(struct engine_node 
*node, void *data)
 struct ed_type_runtime_data *rt_data =
 engine_get_input_data("runtime_data", node);
 
-struct sset *local_b_lports = binding_collect_local_binding_lports(
-_data->lbinding_data);
-

Re: [ovs-dev] [PATCH V7 06/13] dpif-netdev: Add HW miss packet state recover logic.

2021-06-25 Thread Ilya Maximets
On 6/25/21 1:04 PM, Ferriter, Cian wrote:
> Hi Eli,
> 
> I have some concerns about how we plug the packet state recover logic into 
> dfc_processing() here. My concerns are inline below.
> 
> I'm concerned that we are hurting performance in the partial HWOL case, as 
> this patchset is introducing new direct (non-inline) function calls per 
> packet to the software datapath. We can reduce performance impact in this 
> area, see specific suggestions below.
> 
> Thanks,
> Cian
> 
>> -Original Message-
>> From: Eli Britstein 
>> Sent: Wednesday 23 June 2021 16:53
>> To: d...@openvswitch.org; Ilya Maximets 
>> Cc: Gaetan Rivet ; Majd Dibbiny ; 
>> Sriharsha Basavapatna
>> ; Hemal Shah ; 
>> Ivan Malov
>> ; Ferriter, Cian ; Eli 
>> Britstein ;
>> Finn, Emma ; Kovacevic, Marko 
>> 
>> Subject: [PATCH V7 06/13] dpif-netdev: Add HW miss packet state recover 
>> logic.
>>
>> Recover the packet if it was partially processed by the HW. Fallback to
>> lookup flow by mark association.
>>
>> Signed-off-by: Eli Britstein 
>> Reviewed-by: Gaetan Rivet 
>> Acked-by: Sriharsha Basavapatna 
>> Tested-by: Emma Finn 
>> Tested-by: Marko Kovacevic 
>> Signed-off-by: Ilya Maximets 
>> ---
>>  lib/dpif-netdev.c | 45 +
>>  1 file changed, 41 insertions(+), 4 deletions(-)
>>
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index 8fa7eb6d4..d5b7d5b73 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -114,6 +114,7 @@ COVERAGE_DEFINE(datapath_drop_invalid_port);
>>  COVERAGE_DEFINE(datapath_drop_invalid_bond);
>>  COVERAGE_DEFINE(datapath_drop_invalid_tnl_port);
>>  COVERAGE_DEFINE(datapath_drop_rx_invalid_packet);
>> +COVERAGE_DEFINE(datapath_drop_hw_miss_recover);
>>
>>  /* Protects against changes to 'dp_netdevs'. */
>>  static struct ovs_mutex dp_netdev_mutex = OVS_MUTEX_INITIALIZER;
>> @@ -7062,6 +7063,39 @@ smc_lookup_batch(struct dp_netdev_pmd_thread *pmd,
>>  pmd_perf_update_counter(>perf_stats, PMD_STAT_SMC_HIT, n_smc_hit);
>>  }
>>
>> +static struct tx_port * pmd_send_port_cache_lookup(
>> +const struct dp_netdev_pmd_thread *pmd, odp_port_t port_no);
>> +
>> +static inline int
>> +dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
>> +  odp_port_t port_no,
>> +  struct dp_packet *packet,
>> +  struct dp_netdev_flow **flow)
>> +{
>> +struct tx_port *p;
>> +uint32_t mark;
>> +
> 
> 
> Start of full HWOL recovery block
> 
>> +/* Restore the packet if HW processing was terminated before 
>> completion. */
>> +p = pmd_send_port_cache_lookup(pmd, port_no);
>> +if (OVS_LIKELY(p)) {
>> +int err = netdev_hw_miss_packet_recover(p->port->netdev, packet);
>> +
>> +if (err && err != EOPNOTSUPP) {
>> +COVERAGE_INC(datapath_drop_hw_miss_recover);
>> +return -1;
>> +}
>> +}
> 
> End of full HWOL recovery block
> 
> IIUC, the above is only relevant to full HWOL where the packet is partially 
> processed by the HW. In a partial HWOL testcase, we see a performance drop as 
> a result of the above code. The performance after the patchset is applied is 
> 0.94x what the performance was before.

While reviewing the patch set I noticed this part too, but
this code was tested twice by Intel engineers, so I figured
that it doesn't hurt performance of partial offloading.

In general, it should be easy to re-order partial and full
offloading checks like this (didn't test):

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index c5ab35d2a..36a5976f2 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -7113,6 +7113,14 @@ dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
 struct tx_port *p;
 uint32_t mark;
 
+/* If no mark, no flow to find. */
+if (dp_packet_has_flow_mark(packet, )) {
+*flow = mark_to_flow_find(pmd, mark);
+return 0;
+}
+
+*flow = NULL;
+
 /* Restore the packet if HW processing was terminated before completion. */
 p = pmd_send_port_cache_lookup(pmd, port_no);
 if (OVS_LIKELY(p)) {
@@ -7123,14 +7131,6 @@ dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
 return -1;
 }
 }
-
-/* If no mark, no flow to find. */
-if (!dp_packet_has_flow_mark(packet, )) {
-*flow = NULL;
-return 0;
-}
-
-*flow = mark_to_flow_find(pmd, mark);
 return 0;
 }
---

If everyone agrees, I can do this change.

> 
> We should branch over this code in the partial HWOL scenario, where we don't 
> need to get the call to pmd_send_port_cache_lookup() and 
> netdev_hw_miss_packet_recover(). We want this branch to be transparent to the 
> user. Since both full and partial HWOL falls under the 
> other_config:hw-offload=true switch, we might need a configure time check NIC 
> capabilities solution or something similar to branch over full HWOL packet 
> recovery code. Does this make sense?

Compile time check of capabilities doesn't 

Re: [ovs-dev] [PATCH V7 06/13] dpif-netdev: Add HW miss packet state recover logic.

2021-06-25 Thread Ferriter, Cian
Hi Eli,

I have some concerns about how we plug the packet state recover logic into 
dfc_processing() here. My concerns are inline below.

I'm concerned that we are hurting performance in the partial HWOL case, as this 
patchset is introducing new direct (non-inline) function calls per packet to 
the software datapath. We can reduce performance impact in this area, see 
specific suggestions below.

Thanks,
Cian

> -Original Message-
> From: Eli Britstein 
> Sent: Wednesday 23 June 2021 16:53
> To: d...@openvswitch.org; Ilya Maximets 
> Cc: Gaetan Rivet ; Majd Dibbiny ; 
> Sriharsha Basavapatna
> ; Hemal Shah ; 
> Ivan Malov
> ; Ferriter, Cian ; Eli 
> Britstein ;
> Finn, Emma ; Kovacevic, Marko 
> Subject: [PATCH V7 06/13] dpif-netdev: Add HW miss packet state recover logic.
> 
> Recover the packet if it was partially processed by the HW. Fallback to
> lookup flow by mark association.
> 
> Signed-off-by: Eli Britstein 
> Reviewed-by: Gaetan Rivet 
> Acked-by: Sriharsha Basavapatna 
> Tested-by: Emma Finn 
> Tested-by: Marko Kovacevic 
> Signed-off-by: Ilya Maximets 
> ---
>  lib/dpif-netdev.c | 45 +
>  1 file changed, 41 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 8fa7eb6d4..d5b7d5b73 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -114,6 +114,7 @@ COVERAGE_DEFINE(datapath_drop_invalid_port);
>  COVERAGE_DEFINE(datapath_drop_invalid_bond);
>  COVERAGE_DEFINE(datapath_drop_invalid_tnl_port);
>  COVERAGE_DEFINE(datapath_drop_rx_invalid_packet);
> +COVERAGE_DEFINE(datapath_drop_hw_miss_recover);
> 
>  /* Protects against changes to 'dp_netdevs'. */
>  static struct ovs_mutex dp_netdev_mutex = OVS_MUTEX_INITIALIZER;
> @@ -7062,6 +7063,39 @@ smc_lookup_batch(struct dp_netdev_pmd_thread *pmd,
>  pmd_perf_update_counter(>perf_stats, PMD_STAT_SMC_HIT, n_smc_hit);
>  }
> 
> +static struct tx_port * pmd_send_port_cache_lookup(
> +const struct dp_netdev_pmd_thread *pmd, odp_port_t port_no);
> +
> +static inline int
> +dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
> +  odp_port_t port_no,
> +  struct dp_packet *packet,
> +  struct dp_netdev_flow **flow)
> +{
> +struct tx_port *p;
> +uint32_t mark;
> +


Start of full HWOL recovery block

> +/* Restore the packet if HW processing was terminated before completion. 
> */
> +p = pmd_send_port_cache_lookup(pmd, port_no);
> +if (OVS_LIKELY(p)) {
> +int err = netdev_hw_miss_packet_recover(p->port->netdev, packet);
> +
> +if (err && err != EOPNOTSUPP) {
> +COVERAGE_INC(datapath_drop_hw_miss_recover);
> +return -1;
> +}
> +}

End of full HWOL recovery block

IIUC, the above is only relevant to full HWOL where the packet is partially 
processed by the HW. In a partial HWOL testcase, we see a performance drop as a 
result of the above code. The performance after the patchset is applied is 
0.94x what the performance was before.

We should branch over this code in the partial HWOL scenario, where we don't 
need to get the call to pmd_send_port_cache_lookup() and 
netdev_hw_miss_packet_recover(). We want this branch to be transparent to the 
user. Since both full and partial HWOL falls under the 
other_config:hw-offload=true switch, we might need a configure time check NIC 
capabilities solution or something similar to branch over full HWOL packet 
recovery code. Does this make sense?


perf top showing cycles spent per function in my partial HWOL scenario. We can 
see netdev_hw_miss_packet_recover(), 
netdev_offload_dpdk_hw_miss_packet_recover() and netdev_is_flow_api_enabled() 
taking cycles:
28.79%  pmd-c01/id:8  ovs-vswitchd[.] dp_netdev_input__
13.72%  pmd-c01/id:8  ovs-vswitchd[.] parse_tcp_flags
11.07%  pmd-c01/id:8  ovs-vswitchd[.] i40e_recv_pkts_vec_avx2
10.94%  pmd-c01/id:8  ovs-vswitchd[.] i40e_xmit_fixed_burst_vec_avx2
 7.48%  pmd-c01/id:8  ovs-vswitchd[.] cmap_find
 4.94%  pmd-c01/id:8  ovs-vswitchd[.] netdev_hw_miss_packet_recover
 4.77%  pmd-c01/id:8  ovs-vswitchd[.] dp_execute_output_action
 3.81%  pmd-c01/id:8  ovs-vswitchd[.] 
dp_netdev_pmd_flush_output_on_port
 3.40%  pmd-c01/id:8  ovs-vswitchd[.] netdev_send
 2.49%  pmd-c01/id:8  ovs-vswitchd[.] netdev_dpdk_eth_send
 1.16%  pmd-c01/id:8  ovs-vswitchd[.] netdev_dpdk_rxq_recv
 0.90%  pmd-c01/id:8  ovs-vswitchd[.] pmd_perf_end_iteration
 0.75%  pmd-c01/id:8  ovs-vswitchd[.] dp_netdev_process_rxq_port
 0.68%  pmd-c01/id:8  ovs-vswitchd[.] netdev_is_flow_api_enabled
 0.55%  pmd-c01/id:8  ovs-vswitchd[.] 
netdev_offload_dpdk_hw_miss_packet_recover

> +
> +/* If no mark, no flow to find. */
> +if (!dp_packet_has_flow_mark(packet, )) {
> +*flow = NULL;
> +return 0;

Re: [ovs-dev] [PATCH v7 1/1] ovs-numa: Support non-contiguous numa nodes and offline CPU cores

2021-06-25 Thread Kevin Traynor
On 22/06/2021 19:53, David Wilder wrote:
> This change removes the assumption that numa nodes and cores are numbered
> contiguously in linux.  This change is required to support some Power
> systems.
> 
> A check has been added to verify that cores are online,
> offline cores result in non-contiguously numbered cores.
> 
> Dpdk EAL option generation is updated to work with non-contiguous numa nodes.
> These options can be seen in the ovs-vswitchd.log.  For example:
> a system containing only numa nodes 0 and 8 will generate the following:
> 
> EAL ARGS: ovs-vswitchd --socket-mem 1024,0,0,0,0,0,0,0,1024 \
>  --socket-limit 1024,0,0,0,0,0,0,0,1024 -l 0
> 
> Tests for pmd and dpif-netdev have been updated to validate non-contiguous
> numbered nodes.
> 
> Signed-off-by: David Wilder 
> ---
>  lib/dpdk.c   | 57 +++--
>  lib/ovs-numa.c   | 51 
>  lib/ovs-numa.h   |  2 ++
>  tests/dpif-netdev.at |  2 +-
>  tests/pmd.at | 61 
>  5 files changed, 142 insertions(+), 31 deletions(-)
> 
> diff --git a/lib/dpdk.c b/lib/dpdk.c
> index 2eaaa569c..238d0fffb 100644
> --- a/lib/dpdk.c
> +++ b/lib/dpdk.c
> @@ -130,22 +130,63 @@ construct_dpdk_options(const struct smap 
> *ovs_other_config, struct svec *args)
>  }
>  }
>  
> +static int
> +compare_numa_node_list(const void *a_, const void *b_)
> +{
> +int a = *(const int *) a_;
> +int b = *(const int *) b_;
> +
> +if (a < b) {
> +return -1;
> +}
> +if (a > b) {
> +return 1;
> +}
> +return 0;
> +}
> +
>  static char *
>  construct_dpdk_socket_mem(void)
>  {
>  const char *def_value = "1024";
> -int numa, numa_nodes = ovs_numa_get_n_numas();
> +struct ovs_numa_dump *dump;
> +const struct ovs_numa_info_numa *node;
> +int k = 0, last_node = 0, n_numa_nodes, *numa_node_list;

I probably would have kept k and n_numa_nodes as size_t as they are used
for handling around the hmap_count() return, but realistically it's not
going to make any difference. i'm not sure it's worth another re-spin
unless you have to do it anyway for other comments.

Acked-by: Kevin Traynor 

>  struct ds dpdk_socket_mem = DS_EMPTY_INITIALIZER;
>  
> -if (numa_nodes == 0 || numa_nodes == OVS_NUMA_UNSPEC) {
> -numa_nodes = 1;
> -}
> +/* Build a list of all numa nodes with at least one core */
> +dump = ovs_numa_dump_n_cores_per_numa(1);
> +n_numa_nodes = hmap_count(>numas);
> +numa_node_list = xcalloc(n_numa_nodes, sizeof *numa_node_list);
>  
> -ds_put_cstr(_socket_mem, def_value);
> -for (numa = 1; numa < numa_nodes; ++numa) {
> -ds_put_format(_socket_mem, ",%s", def_value);
> +FOR_EACH_NUMA_ON_DUMP(node, dump) {
> +if (k >= n_numa_nodes) {
> +break;
> +}
> +numa_node_list[k++] = node->numa_id;
>  }
> -
> +qsort(numa_node_list, k, sizeof *numa_node_list, compare_numa_node_list);
> +
> +for (int i = 0; i < n_numa_nodes; i++) {
> +while (numa_node_list[i] > last_node &&
> +   numa_node_list[i] != OVS_NUMA_UNSPEC &&
> +   numa_node_list[i] <= MAX_NUMA_NODES){
> +if (last_node == 0) {
> +ds_put_format(_socket_mem, "%s", "0");
> +} else {
> +ds_put_format(_socket_mem, ",%s", "0");
> +}
> +last_node++;
> +}
> +if (numa_node_list[i] == 0) {
> +ds_put_format(_socket_mem, "%s", def_value);
> +} else {
> +ds_put_format(_socket_mem, ",%s", def_value);
> +}
> +last_node++;
> +}
> +free(numa_node_list);
> +ovs_numa_dump_destroy(dump);
>  return ds_cstr(_socket_mem);
>  }
>  
> diff --git a/lib/ovs-numa.c b/lib/ovs-numa.c
> index 6d0a68522..b825ecbdd 100644
> --- a/lib/ovs-numa.c
> +++ b/lib/ovs-numa.c
> @@ -42,21 +42,22 @@ VLOG_DEFINE_THIS_MODULE(ovs_numa);
>   * This module stores the affinity information of numa nodes and cpu cores.
>   * It also provides functions to bookkeep the pin of threads on cpu cores.
>   *
> - * It is assumed that the numa node ids and cpu core ids all start from 0 and
> - * range continuously.  So, for example, if 'ovs_numa_get_n_cores()' returns 
> N,
> - * user can assume core ids from 0 to N-1 are all valid and there is a
> - * 'struct cpu_core' for each id.
> + * It is assumed that the numa node ids and cpu core ids all start from 0.
> + * There is no guarantee that node and cpu ids are numbered consecutively
> + * (this is a change from earlier version of the code). So, for example,
> + * if two nodes exist with ids 0 and 8, 'ovs_numa_get_n_nodes()' will
> + * return 2, no assumption of node numbering should be made.
>   *
>   * NOTE, this module should only be used by the main thread.
>   *
> - * NOTE, the assumption above will fail when cpu hotplug is used.  

Re: [ovs-dev] is vxlan well supported by ovn ?

2021-06-25 Thread Krzysztof Klimonda
Hi,

Is this a limitation for a number of logical switches and logical ports that 
are part of networks that use vxlan (for example, by utilizing vtep interfaces) 
or for a total number of LSs and LSPs in a deployment?

Best Regards
Krzysztof

On Thu, Jun 24, 2021, at 18:32, Ihar Hrachyshka wrote:
> It is supported but with a number of limitations. Specifically, the
> number of switches, and ports per switch, is limited to 2^11 when
> VXLAN is used in a cluster. This is due to design limitations as
> described in e.g.:
> https://mail.openvswitch.org/pipermail/ovs-dev/2020-March/369201.html
> 
> Ihar
> 
> On Thu, Jun 24, 2021 at 11:58 AM ovsluck  wrote:
> >
> > Hi, Pettit and all:
> >
> >vxlan is almost data-center's standard protocol ,can be  offloaded  
> > to hardware.
> >
> >  is vxlan  well supported by ovn ? and what plans for the future?
> >
> >
> >
> >
> > thanks.
> >
> >
> > ___
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 


-- 
  Krzysztof Klimonda
  kklimo...@syntaxhighlighted.com
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v13 05/12] dpif-netdev: Add command to switch dpif implementation.

2021-06-25 Thread Ferriter, Cian
Hi Flavio,

Just one correction I noticed while implementing your feedback. I've replied 
inline.

Thanks,
Cian

> -Original Message-
> From: Ferriter, Cian
> Sent: Thursday 24 June 2021 12:54
> To: Flavio Leitner 
> Cc: ovs-dev@openvswitch.org; i.maxim...@ovn.org
> Subject: RE: [ovs-dev] [v13 05/12] dpif-netdev: Add command to switch dpif 
> implementation.
> 
> Hi Flavio,
> 
> Thanks for the review. My responses are inline.
> 
> Cian
> 
> > -Original Message-
> > From: Flavio Leitner 
> > Sent: Wednesday 23 June 2021 19:18
> > To: Ferriter, Cian 
> > Cc: ovs-dev@openvswitch.org; i.maxim...@ovn.org
> > Subject: Re: [ovs-dev] [v13 05/12] dpif-netdev: Add command to switch dpif 
> > implementation.
> >
> > On Thu, Jun 17, 2021 at 05:18:18PM +0100, Cian Ferriter wrote:
> > > From: Harry van Haaren 
> > >



> > > +
> > > +if (!avx512f_available || !bmi2_available) {
> > > +return -ENOTSUP;
> >
> > Also the callers of the probe function only cares about true or
> > false, so there is no need to return errno here, then the new
> > include can also be removed.
> >
> 
> I'll simplify the return and remove the errno include.
> 

This return -ENOTSUP is used in dpif_netdev_impl_set() to select the correct 
error message to show the user on error.

static const char *error_description[2] = {
"Unknown DPIF implementation",
"CPU doesn't support the required instruction for",
};

I'm going to leave in. I hope that makes sense.

> > > +}
> > > +
> > > +return 0;
> > > +}
> > > +
> > >  int32_t
> > >  dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd,
> > >   struct dp_packet_batch *packets,


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [ovn] howto? L3 support in HW VTEP

2021-06-25 Thread Vladislav Odintsov
Hi Numan,

did you have a chance to look at db dump?

Regards,
Vladislav Odintsov

> On 18 Jun 2021, at 23:34, Vladislav Odintsov  wrote:
> 
> Sorry, there was a typo.
> Sure, VM has IP 192.168.1.2/24 and host 192.168.2.2/24.
> 
> [root@ovn-1 ~]# ovn-nbctl show
> switch 8e1e2828-67b2-432b-8b2e-088f312dab5d (switch1)
> port switch1-uplink
> type: router
> router-port: switch1-gw
> port switch1-vm
> addresses: ["00:00:00:00:00:01 192.168.1.2"]
> switch 42127732-2b67-483e-9768-777eae4d9cbe (switch2)
> port switch2-uplink
> type: router
> router-port: switch2-gw
> port switch2-vtep
> type: vtep
> addresses: ["unknown"]
> router b11e7ebe-0dde-4a9f-b006-37c177f38bae (router1)
> port switch1-gw
> mac: "00:00:00:00:00:f1"
> networks: ["192.168.1.1/24"]
> port switch2-gw
> mac: "00:00:00:00:00:f2"  # <-- this mac address I created as a 
> Ucast_Macs_Remote record in VTEP DB.
> networks: ["192.168.2.1/24”]  #  When physical host sends traffic to 
> its chassis, routing between sw1 and sw2 works.
> gateway chassis: [9721a9e9-73ef-4a9a-8a50-afa84811c6ef]
> 
> 
> OVN NB DB is in attachment.
> 
> 
> 
> Regards,
> Vladislav Odintsov
> 
>> On 18 Jun 2021, at 21:27, Numan Siddique > > wrote:
>> 
>> On Fri, Jun 18, 2021 at 7:42 AM Vladislav Odintsov > > wrote:
>>> 
>>> Hi all,
>>> 
>>> I’m trying to implement support for L3 routing between OVN and HW VTEP 
>>> devices.
>>> In my setup I use Cumulus Linux-managed Mellanox SN2000 switches.
>>> Current L2 functionality in this setup works well: ovn-controller-vtep and a
>>> small python service on the switch (which installs necessary mcast_macs 
>>> entries
>>> in switch fdb, since Cumulus Linux vtep support is limited to service_node
>>> replication mode).
>>> 
>>> My logical topology for L3 setup:
>>> 
>>> 2 logical_switches connected to same logical_router:
>>> Net1: 192.168.1.0/24, gw ip (lrouter): 192.168.1.1, VM 192.168.1.2
>>> Net2: 192.168.2.0/24, gw ip (lrouter): 192.168.2.1, Physical host 
>>> 192.168.2.2
>>> 
>>> Net1 has attached logical_switch_port with type vtep. In Net2 there is a VM
>>> (192.168.2.2/24), which needs ip connectivity to physical host 
>>> (192.168.1.2/24)
>>> connected to HW VTEP Mellanox switch over vtep lport from Net1.
>> 
>> I'm a little confused.  Above you said the physical host has IP
>> 192.168.2.2 but you also
>> mentioned there is a VM (192.168.2.2/24).
>> 
>> So in Net2 is there a logical switch port with IP 192.168.2.2 (and the
>> corresponding VIF)
>> and it wants to ping to a physical host 192.168.1.2 ?
>> 
>> Maybe you can share the ovn north db to better understand the problem ?
>> 
>> Thanks
>> Numan
>> 
>> 
>>> 
>>> For Net1’s LRP (192.168.1.1) I’ve created chassis_redirect port_binding to 
>>> some
>>> Chassis and patched controller-step code so that such CR LRP’s MAC is also
>>> added to Ucast_Macs_Remote vtep table.
>>> Traffic from ovn-host to vtep (192.168.1.2 to 192.168.2.2) passes well, but 
>>> in
>>> reverse direction physical server sends to all ovn-hosts ARP request Who has
>>> 192.168.2.1? tell 192.168.2.2. But no answer. If I manually configure arp on
>>> this physical server, connectivity between VM and physical host works well!
>>> 
>>> Now I’m stuck with arp resolution for LRP from VTEP lport from OVN side. Can
>>> somebody give an idea how to make ovn-controllers answer such ARP request?
>>> 
>>> Thanks.
>>> 
>>> Regards,
>>> Vladislav Odintsov
>>> 
>>> ___
>>> dev mailing list
>>> d...@openvswitch.org 
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev 
>>> 
>> ___
>> dev mailing list
>> d...@openvswitch.org 
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev 
>> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev