Re: [ovs-dev] [RFC ovn 0/2] Basic eBPF/XDP support in OVN.

2022-06-07 Thread Han Zhou
On Mon, May 30, 2022 at 5:46 PM  wrote:
>
> From: Numan Siddique 
>
> XDP program - ovn_xdp.c added in this RFC patch  series implements basic
port
> security and drops any packet if the port security check fails.
> There are still few TODOs in the port security checks. Like
>   - Make ovn xdp configurable.
>   - Removing the ingress Openflow rules from table 73 and 74 if ovn
xdp
> is enabled.
>   - Add IPv6 support.
>   - Enhance the port security xdp program for ARP/IPv6 ND checks.
>
> This patch adds a basic XDP support in OVN and in future we can
> leverage eBPF/XDP features.
>
> I'm not sure how much value this RFC patch adds to make use of eBPF/XDP
> just for port security.  Submitting as RFC to get some feedback and
> start some conversation on eBPF/XDP in OVN.
>
Hi Numan,

This is really cool. It demonstrates how OVN could leverage eBPF/XDP.

On the other hand, for the port-security feature in XDP, I keep thinking
about the scenarios and it is still not very clear to me. One advantage I
can think of is to prevent DOS attacks from VM/Pod when invalid IP/MAC are
used, XDP may perform better and drop packets with lower CPU cost
(comparing with OVS kernel datapath). However, I am also wondering why
would a attacker use invalid IP/MAC for DOS attacks? Do you have some more
thoughts about the use cases? And do you have any performance results
comparing with the current OVS implementation?

Another question is, would it work with smart NIC HW-offload, where VF
representer ports are added to OVS on the smart NIC? I guess XDP doesn't
support representer port, right?

Thanks,
Han

> In order to attach and detach xdp programs,  libxdp [1] and libbpf is
used.
>
> To test it out locally, please install libxdp-devel and libbpf-devel
> and the compile OVN first and then compile ovn_xdp by running "make
> bpf".  Copy ovn_xdp.o to either /usr/share/ovn/ or /usr/local/share/ovn/
>
>
> Numan Siddique (2):
>   RFC: Add basic xdp/eBPF support in OVN.
>   RFC: ovn-controller: Attach XDP progs to the VIFs of the logical
> ports.
>
>  Makefile.am |   6 +-
>  bpf/.gitignore  |   5 +
>  bpf/automake.mk |  23 +++
>  bpf/ovn_xdp.c   | 156 +++
>  configure.ac|   2 +
>  controller/automake.mk  |   4 +-
>  controller/binding.c|  45 +++--
>  controller/binding.h|   7 +
>  controller/ovn-controller.c |  79 +++-
>  controller/xdp.c| 389 
>  controller/xdp.h|  41 
>  m4/ovn.m4   |  20 ++
>  tests/automake.mk   |   1 +
>  13 files changed, 753 insertions(+), 25 deletions(-)
>  create mode 100644 bpf/.gitignore
>  create mode 100644 bpf/automake.mk
>  create mode 100644 bpf/ovn_xdp.c
>  create mode 100644 controller/xdp.c
>  create mode 100644 controller/xdp.h
>
> --
> 2.35.3
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH RFC ovn] nb: Remove possibility of disabling logical datapath groups.

2022-06-07 Thread Han Zhou
On Tue, Jun 7, 2022 at 8:20 AM Dumitru Ceara  wrote:
>
> In large scale scenarios this option hugely reduces the size of the
> Southbound database positively affecting end to end performance.  In
> such scenarios there's no real reason to ever disable datapath groups.
>
> In lower scale scenarios any potential overhead due to logical datapath
> groups is, very likely, negligible.
>
> Aside from potential scalability concerns, the
> NB.NB_Global.options:use_logical_dp_group knob was kept until now to
> ensure that in case of a bug in the logical datapath groups code a CMS
> may turn it off and fall back to the mode in which logical flows are not
> grouped together.  As far as I know, this has never happened until now.
>
> Moreover, datpath_groups are enabled by default since v21.09.0 (4 stable
> releases ago), via 90daa7ce18dc ("northd: Enable logical dp groups by
> default.").
>
> From a testing perspective removing this knob will halve the CI matrix.
> This is desirable, especially in the context of more tests being added,
> e.g.:
>
>
https://patchwork.ozlabs.org/project/ovn/patch/20220531093318.2270409-1-mh...@redhat.com/
>
> Signed-off-by: Dumitru Ceara 

Thanks Dumitru! I haven't reviewed the patch in detail, but the idea sounds
good to me.

Han

> ---
>  NEWS|  1 +
>  northd/northd.c | 77 +++--
>  ovn-nb.xml  | 19 +++---
>  tests/ovn-macros.at | 23 ++--
>  tests/ovn-northd.at | 92 -
>  tests/ovs-macros.at |  4 +-
>  6 files changed, 41 insertions(+), 175 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index e015ae8e7..20b4c5d91 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -4,6 +4,7 @@ Post v22.06.0
>  "ovn-encap-df_default" to enable or disable tunnel DF flag.
>- Add option "localnet_learn_fdb" to LSP that will allow localnet
>  ports to learn MAC addresses and store them in FDB table.
> +  - Removed possibility of disabling logical datapath groups.
>
>  OVN v22.06.0 - XX XXX 
>  --
> diff --git a/northd/northd.c b/northd/northd.c
> index 0207f6ce1..a97a321cd 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -4813,10 +4813,6 @@ ovn_lflow_equal(const struct ovn_lflow *a, const
struct ovn_datapath *od,
>  && nullable_string_is_equal(a->ctrl_meter, ctrl_meter));
>  }
>
> -/* If this option is 'true' northd will combine logical flows that
differ by
> - * logical datapath only by creating a datapath group. */
> -static bool use_logical_dp_groups = false;
> -
>  enum {
>  STATE_NULL,   /* parallelization is off */
>  STATE_INIT_HASH_SIZES,/* parallelization is on; hashes sizing
needed */
> @@ -4841,8 +4837,7 @@ ovn_lflow_init(struct ovn_lflow *lflow, struct
ovn_datapath *od,
>  lflow->ctrl_meter = ctrl_meter;
>  lflow->dpg = NULL;
>  lflow->where = where;
> -if ((parallelization_state != STATE_NULL)
> -&& use_logical_dp_groups) {
> +if (parallelization_state != STATE_NULL) {
>  ovs_mutex_init(&lflow->odg_lock);
>  }
>  }
> @@ -4852,7 +4847,7 @@ ovn_dp_group_add_with_reference(struct ovn_lflow
*lflow_ref,
>  struct ovn_datapath *od)
>  OVS_NO_THREAD_SAFETY_ANALYSIS
>  {
> -if (!use_logical_dp_groups || !lflow_ref) {
> +if (!lflow_ref) {
>  return false;
>  }
>
> @@ -4931,13 +4926,11 @@ do_ovn_lflow_add(struct hmap *lflow_map, struct
ovn_datapath *od,
>  struct ovn_lflow *old_lflow;
>  struct ovn_lflow *lflow;
>
> -if (use_logical_dp_groups) {
> -old_lflow = ovn_lflow_find(lflow_map, NULL, stage, priority,
match,
> -   actions, ctrl_meter, hash);
> -if (old_lflow) {
> -ovn_dp_group_add_with_reference(old_lflow, od);
> -return old_lflow;
> -}
> +old_lflow = ovn_lflow_find(lflow_map, NULL, stage, priority, match,
> +   actions, ctrl_meter, hash);
> +if (old_lflow) {
> +ovn_dp_group_add_with_reference(old_lflow, od);
> +return old_lflow;
>  }
>
>  lflow = xmalloc(sizeof *lflow);
> @@ -4993,8 +4986,7 @@ ovn_lflow_add_at_with_hash(struct hmap *lflow_map,
struct ovn_datapath *od,
>  struct ovn_lflow *lflow;
>
>  ovs_assert(ovn_stage_to_datapath_type(stage) ==
ovn_datapath_get_type(od));
> -if (use_logical_dp_groups
> -&& (parallelization_state == STATE_USE_PARALLELIZATION)) {
> +if (parallelization_state == STATE_USE_PARALLELIZATION) {
>  lflow = do_ovn_lflow_add_pd(lflow_map, od, hash, stage, priority,
>  match, actions, io_port, stage_hint,
where,
>  ctrl_meter);
> @@ -5080,8 +5072,7 @@ static void
>  ovn_lflow_destroy(struct hmap *lflows, struct ovn_lflow *lflow)
>  {
>  if (lflow) {
> -if ((parallelization_state != STATE_NULL)
> - 

[ovs-dev] [PATCH net-next] net: rename reference+tracking helpers

2022-06-07 Thread Jakub Kicinski
Netdev reference helpers have a dev_ prefix for historic
reasons. Renaming the old helpers would be too much churn
but we can rename the tracking ones which are relatively
recent and should be the default for new code.

Rename:
 dev_hold_track()-> netdev_hold()
 dev_put_track() -> netdev_put()
 dev_replace_track() -> netdev_ref_replace()

Signed-off-by: Jakub Kicinski 
---
CC: dsah...@kernel.org
CC: steffen.klass...@secunet.com
CC: jreu...@yaina.de
CC: ra...@blackwall.org
CC: j...@resnulli.us
CC: kgr...@linux.ibm.com
CC: ivec...@redhat.com
CC: jma...@redhat.com
CC: ying@windriver.com
CC: lucien@gmail.com
CC: a...@arndb.de
CC: yajun.d...@linux.dev
CC: aten...@kernel.org
CC: richardsonn...@google.com
CC: hkallwe...@gmail.com
CC: linux-h...@vger.kernel.org
CC: d...@openvswitch.org
CC: linux-s...@vger.kernel.org
CC: tipc-discuss...@lists.sourceforge.net
---
 drivers/net/eql.c  |  4 ++--
 drivers/net/macsec.c   |  4 ++--
 drivers/net/macvlan.c  |  4 ++--
 drivers/net/netconsole.c   |  2 +-
 drivers/net/vrf.c  |  8 
 include/linux/netdevice.h  | 24 
 include/net/xfrm.h |  2 +-
 net/8021q/vlan_dev.c   |  4 ++--
 net/ax25/af_ax25.c |  7 ---
 net/ax25/ax25_dev.c|  6 +++---
 net/bridge/br_if.c | 10 +-
 net/core/dev.c | 10 +-
 net/core/dev_ioctl.c   |  4 ++--
 net/core/drop_monitor.c|  5 +++--
 net/core/dst.c |  8 
 net/core/failover.c|  4 ++--
 net/core/link_watch.c  |  2 +-
 net/core/neighbour.c   | 18 +-
 net/core/net-sysfs.c   |  8 
 net/core/netpoll.c |  2 +-
 net/core/pktgen.c  |  6 +++---
 net/ethtool/ioctl.c|  4 ++--
 net/ethtool/netlink.c  |  6 +++---
 net/ethtool/netlink.h  |  2 +-
 net/ipv4/devinet.c |  4 ++--
 net/ipv4/fib_semantics.c   | 11 ++-
 net/ipv4/ipmr.c|  2 +-
 net/ipv4/route.c   |  7 +++
 net/ipv4/xfrm4_policy.c|  2 +-
 net/ipv6/addrconf.c|  4 ++--
 net/ipv6/addrconf_core.c   |  2 +-
 net/ipv6/ip6_gre.c |  8 
 net/ipv6/ip6_tunnel.c  |  4 ++--
 net/ipv6/ip6_vti.c |  4 ++--
 net/ipv6/ip6mr.c   |  2 +-
 net/ipv6/route.c   | 10 +-
 net/ipv6/sit.c |  4 ++--
 net/ipv6/xfrm6_policy.c|  4 ++--
 net/llc/af_llc.c   |  2 +-
 net/openvswitch/vport-netdev.c |  6 +++---
 net/packet/af_packet.c | 12 ++--
 net/sched/act_mirred.c |  6 +++---
 net/sched/sch_api.c|  2 +-
 net/sched/sch_generic.c| 11 ++-
 net/smc/smc_pnet.c |  7 ---
 net/switchdev/switchdev.c  |  4 ++--
 net/tipc/bearer.c  |  4 ++--
 net/xfrm/xfrm_device.c |  2 +-
 48 files changed, 141 insertions(+), 137 deletions(-)

diff --git a/drivers/net/eql.c b/drivers/net/eql.c
index 557ca8ff9dec..ca3e4700a813 100644
--- a/drivers/net/eql.c
+++ b/drivers/net/eql.c
@@ -225,7 +225,7 @@ static void eql_kill_one_slave(slave_queue_t *queue, 
slave_t *slave)
list_del(&slave->list);
queue->num_slaves--;
slave->dev->flags &= ~IFF_SLAVE;
-   dev_put_track(slave->dev, &slave->dev_tracker);
+   netdev_put(slave->dev, &slave->dev_tracker);
kfree(slave);
 }
 
@@ -399,7 +399,7 @@ static int __eql_insert_slave(slave_queue_t *queue, slave_t 
*slave)
if (duplicate_slave)
eql_kill_one_slave(queue, duplicate_slave);
 
-   dev_hold_track(slave->dev, &slave->dev_tracker, GFP_ATOMIC);
+   netdev_hold(slave->dev, &slave->dev_tracker, GFP_ATOMIC);
list_add(&slave->list, &queue->all_slaves);
queue->num_slaves++;
slave->dev->flags |= IFF_SLAVE;
diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index 817577e713d7..815738c0e067 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -3462,7 +3462,7 @@ static int macsec_dev_init(struct net_device *dev)
memcpy(dev->broadcast, real_dev->broadcast, dev->addr_len);
 
/* Get macsec's reference to real_dev */
-   dev_hold_track(real_dev, &macsec->dev_tracker, GFP_KERNEL);
+   netdev_hold(real_dev, &macsec->dev_tracker, GFP_KERNEL);
 
return 0;
 }
@@ -3710,7 +3710,7 @@ static void macsec_free_netdev(struct net_device *dev)
free_percpu(macsec->secy.tx_sc.stats);
 
/* Get rid of the macsec's reference to real_dev */
-   dev_put_track(macsec->real_dev, &macsec->dev_tracker);
+   netdev_put(macsec->real_dev, &macsec->dev_tracker);
 }
 
 static void macsec_setup(struct net_device *dev)
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index eff75beb1395..5b46a6526fc6 100644
--- a/drivers/net/macvlan.c
++

Re: [ovs-dev] [PATCH v4 2/2] Ensure pid belongs to ovsdb-server in ovn-ctl

2022-06-07 Thread 0-day Robot
Bleep bloop.  Greetings Terry Wilson, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


Patch skipped due to previous failure.

Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4 1/2] Handle re-used pids in pidfile_is_running

2022-06-07 Thread 0-day Robot
Bleep bloop.  Greetings Terry Wilson, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


git-am:
error: sha1 information is lacking or useless (utilities/ovn-ctl).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch' to see the failed patch
Patch failed at 0001 Handle re-used pids in pidfile_is_running
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".


Patch skipped due to previous failure.

Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v4 2/2] Ensure pid belongs to ovsdb-server in ovn-ctl

2022-06-07 Thread Terry Wilson
When checking if ovsdb-server is running, ensure that the binary
we are going to run matches the one actually running with the the
pid that was in our pidfile.

Signed-off-by: Terry Wilson 
---
 utilities/ovn-ctl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/utilities/ovn-ctl b/utilities/ovn-ctl
index 14d37a3d6..e2f05915b 100755
--- a/utilities/ovn-ctl
+++ b/utilities/ovn-ctl
@@ -200,7 +200,7 @@ start_ovsdb__() {
 ovn_install_dir "$ovn_etcdir"
 
 # Check and eventually start ovsdb-server for DB
-if pidfile_is_running $db_pid_file; then
+if pidfile_is_running $db_pid_file ovsdb-server; then
 return
 fi
 
-- 
2.34.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v4 1/2] Handle re-used pids in pidfile_is_running

2022-06-07 Thread Terry Wilson
Since pids can be re-used, it is necessary to check that the
process that is running with a pid matches the one that we expect.

This adds the ability to optionally pass a 'binary' argument to
pidfile_is_running, and if it is passed to match the binary against
/proc/$pid/exe.

Signed-off-by: Terry Wilson 
---
 utilities/ovn-ctl | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/utilities/ovn-ctl b/utilities/ovn-ctl
index d733aa42d..14d37a3d6 100755
--- a/utilities/ovn-ctl
+++ b/utilities/ovn-ctl
@@ -42,7 +42,8 @@ ovn_ic_db_conf_file="$ovn_etcdir/ovn-ic-db-params.conf"
 
 pidfile_is_running () {
 pidfile=$1
-test -e "$pidfile" && [ -s "$pidfile" ] && pid=`cat "$pidfile"` && 
pid_exists "$pid"
+cmd=$2
+test -e "$pidfile" && [ -s "$pidfile" ] && pid=`cat "$pidfile"` && 
pid_exists "$pid" && [ -z $cmd -o pid_comm_check "$cmd" "$pid" ]
 } >/dev/null 2>&1
 
 stop_nb_ovsdb() {
-- 
2.34.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v13 ovn] Implement RARP activation strategy for ports

2022-06-07 Thread Ihar Hrachyshka
On Mon, Jun 6, 2022 at 3:48 AM Han Zhou  wrote:
>
>
>
> On Fri, Jun 3, 2022 at 1:20 PM Ihar Hrachyshka  wrote:
> >
> > When options:activation-strategy is set to "rarp" for LSP, when used in
> > combination with multiple chassis names listed in
> > options:requested-chassis, additional chassis will install special flows
> > that would block all ingress and egress traffic for the port until a
> > special activation event happens.
> >
> > For "rarp" strategy, an observation of a RARP packet sent from the port
> > on the additional chassis is such an event. When it occurs, a special
> > flow passes control to a controller() action handler that eventually
> > removes the installed blocking flows and also marks the port as
> > options:additional-chassis-activated in southbound db.
> >
> > This feature is useful in live migration scenarios where it's not
> > advisable to unlock the destination port location prematurily to avoid
> > duplicate packets originating from the port.
> >
> > Signed-off-by: Ihar Hrachyshka 
> > ---
> > v13: use resubmit() action to reinject RARP into pipeline.
> > v13: lock / unlock pinctrl_mutex in functions invoked from main thread.
> > v13: db_is_port_activated->lport_is_activated_by_activation_strategy.
>
> Thanks Ihar for the revision. Sorry that I didn't follow up in time after v6. 
> Please see some of my quick comments for this version regarding I-P:
>
> > ---
> >  NEWS|   2 +
> >  controller/lport.c  |  22 +++
> >  controller/lport.h  |   3 +
> >  controller/ovn-controller.c |  87 +
> >  controller/physical.c   |  94 ++
> >  controller/pinctrl.c| 145 +-
> >  controller/pinctrl.h|  13 ++
> >  include/ovn/actions.h   |   3 +
> >  northd/northd.c |  10 +
> >  northd/ovn-northd.c |   5 +-
> >  ovn-nb.xml  |  11 ++
> >  ovn-sb.xml  |  15 ++
> >  tests/ovn.at| 365 
> >  13 files changed, 772 insertions(+), 3 deletions(-)
> >
> > diff --git a/NEWS b/NEWS
> > index 2ee283a56..7c54670ed 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -29,6 +29,8 @@ OVN v22.06.0 - XX XXX 
> >- Added support for setting the Next server IP in the DHCP header
> >  using the private DHCP option - 253 in native OVN DHCPv4 responder.
> >- Support list of chassis for 
> > Logical_Switch_Port:options:requested-chassis.
> > +  - Support Logical_Switch_Port:options:activation-strategy for live 
> > migration
> > +scenarios.
> >
> >  OVN v22.03.0 - 11 Mar 2022
> >  --
> > diff --git a/controller/lport.c b/controller/lport.c
> > index bf55d83f2..add7e91aa 100644
> > --- a/controller/lport.c
> > +++ b/controller/lport.c
> > @@ -197,3 +197,25 @@ get_peer_lport(const struct sbrec_port_binding *pb,
> >  peer_name);
> >  return (peer && peer->datapath) ? peer : NULL;
> >  }
> > +
> > +bool
> > +lport_is_activated_by_activation_strategy(const struct sbrec_port_binding 
> > *pb,
> > +  const struct sbrec_chassis 
> > *chassis)
> > +{
> > +const char *activated_chassis = smap_get(&pb->options,
> > + 
> > "additional-chassis-activated");
> > +if (activated_chassis) {
> > +char *save_ptr;
> > +char *tokstr = xstrdup(activated_chassis);
> > +for (const char *chassis_name = strtok_r(tokstr, ",", &save_ptr);
> > + chassis_name != NULL;
> > + chassis_name = strtok_r(NULL, ",", &save_ptr)) {
> > +if (!strcmp(chassis_name, chassis->name)) {
> > +free(tokstr);
> > +return true;
> > +}
> > +}
> > +free(tokstr);
> > +}
> > +return false;
> > +}
> > diff --git a/controller/lport.h b/controller/lport.h
> > index 115881655..644c67255 100644
> > --- a/controller/lport.h
> > +++ b/controller/lport.h
> > @@ -70,4 +70,7 @@ const struct sbrec_port_binding *lport_get_peer(
> >  const struct sbrec_port_binding *lport_get_l3gw_peer(
> >  const struct sbrec_port_binding *,
> >  struct ovsdb_idl_index *sbrec_port_binding_by_name);
> > +bool
> > +lport_is_activated_by_activation_strategy(const struct sbrec_port_binding 
> > *pb,
> > +  const struct sbrec_chassis 
> > *chassis);
> >  #endif /* controller/lport.h */
> > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> > index b597c0e37..a37dfcb78 100644
> > --- a/controller/ovn-controller.c
> > +++ b/controller/ovn-controller.c
> > @@ -1047,6 +1047,50 @@ en_ofctrl_is_connected_run(struct engine_node *node, 
> > void *data)
> >  engine_set_node_state(node, EN_UNCHANGED);
> >  }
> >
> > +struct ed_type_activated_ports {
> > +struct ovs_list *activated_ports;
> > +};
> > +
> > +static void *
> > +en_activated_ports_init(struct engine_node *nod

[ovs-dev] [PATCH ovn v14] Implement RARP activation strategy for ports

2022-06-07 Thread Ihar Hrachyshka
When options:activation-strategy is set to "rarp" for LSP, when used in
combination with multiple chassis names listed in
options:requested-chassis, additional chassis will install special flows
that would block all ingress and egress traffic for the port until a
special activation event happens.

For "rarp" strategy, an observation of a RARP packet sent from the port
on the additional chassis is such an event. When it occurs, a special
flow passes control to a controller() action handler that eventually
removes the installed blocking flows and also marks the port as
options:additional-chassis-activated in southbound db.

This feature is useful in live migration scenarios where it's not
advisable to unlock the destination port location prematurily to avoid
duplicate packets originating from the port.

Signed-off-by: Ihar Hrachyshka 
---
 NEWS|   2 +
 controller/lport.c  |  22 +++
 controller/lport.h  |   3 +
 controller/ovn-controller.c |  97 ++
 controller/physical.c   |  94 ++
 controller/pinctrl.c| 155 ++-
 controller/pinctrl.h|  13 ++
 include/ovn/actions.h   |   3 +
 northd/northd.c |  10 +
 northd/ovn-northd.c |   5 +-
 ovn-nb.xml  |  11 ++
 ovn-sb.xml  |  15 ++
 tests/ovn.at| 365 
 13 files changed, 792 insertions(+), 3 deletions(-)

diff --git a/NEWS b/NEWS
index 2ee283a56..7c54670ed 100644
--- a/NEWS
+++ b/NEWS
@@ -29,6 +29,8 @@ OVN v22.06.0 - XX XXX 
   - Added support for setting the Next server IP in the DHCP header
 using the private DHCP option - 253 in native OVN DHCPv4 responder.
   - Support list of chassis for Logical_Switch_Port:options:requested-chassis.
+  - Support Logical_Switch_Port:options:activation-strategy for live migration
+scenarios.
 
 OVN v22.03.0 - 11 Mar 2022
 --
diff --git a/controller/lport.c b/controller/lport.c
index bf55d83f2..add7e91aa 100644
--- a/controller/lport.c
+++ b/controller/lport.c
@@ -197,3 +197,25 @@ get_peer_lport(const struct sbrec_port_binding *pb,
 peer_name);
 return (peer && peer->datapath) ? peer : NULL;
 }
+
+bool
+lport_is_activated_by_activation_strategy(const struct sbrec_port_binding *pb,
+  const struct sbrec_chassis *chassis)
+{
+const char *activated_chassis = smap_get(&pb->options,
+ "additional-chassis-activated");
+if (activated_chassis) {
+char *save_ptr;
+char *tokstr = xstrdup(activated_chassis);
+for (const char *chassis_name = strtok_r(tokstr, ",", &save_ptr);
+ chassis_name != NULL;
+ chassis_name = strtok_r(NULL, ",", &save_ptr)) {
+if (!strcmp(chassis_name, chassis->name)) {
+free(tokstr);
+return true;
+}
+}
+free(tokstr);
+}
+return false;
+}
diff --git a/controller/lport.h b/controller/lport.h
index 115881655..644c67255 100644
--- a/controller/lport.h
+++ b/controller/lport.h
@@ -70,4 +70,7 @@ const struct sbrec_port_binding *lport_get_peer(
 const struct sbrec_port_binding *lport_get_l3gw_peer(
 const struct sbrec_port_binding *,
 struct ovsdb_idl_index *sbrec_port_binding_by_name);
+bool
+lport_is_activated_by_activation_strategy(const struct sbrec_port_binding *pb,
+  const struct sbrec_chassis *chassis);
 #endif /* controller/lport.h */
diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 2793c8687..78f58e312 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -1417,6 +1417,100 @@ en_runtime_data_run(struct engine_node *node, void 
*data)
 engine_set_node_state(node, EN_UPDATED);
 }
 
+struct ed_type_activated_ports {
+struct ovs_list *activated_ports;
+};
+
+static void *
+en_activated_ports_init(struct engine_node *node OVS_UNUSED,
+struct engine_arg *arg OVS_UNUSED)
+{
+struct ed_type_activated_ports *data = xzalloc(sizeof *data);
+data->activated_ports = NULL;
+return data;
+}
+
+static void
+en_activated_ports_cleanup(void *data_)
+{
+struct ed_type_activated_ports *data = data_;
+
+struct activated_port *pp;
+if (!data->activated_ports) {
+return;
+}
+
+LIST_FOR_EACH_POP (pp, list, data->activated_ports) {
+free(pp);
+}
+free(data->activated_ports);
+data->activated_ports = NULL;
+}
+
+static void
+en_activated_ports_clear_tracked_data(void *data)
+{
+en_activated_ports_cleanup(data);
+}
+
+static void
+en_activated_ports_run(struct engine_node *node, void *data_)
+{
+struct ed_type_activated_ports *data = data_;
+enum engine_node_state state = EN_UNCHANGED;
+
+if (!data->activated_ports) {
+get_activated_ports(&data->activated

Re: [ovs-dev] [ovs-discuss] Commit 355fef6f2 seems to break connectivity in my setup

2022-06-07 Thread Frode Nordahl
On Tue, Jun 7, 2022 at 12:16 AM Ilya Maximets  wrote:
>
> On 5/31/22 23:48, Ilya Maximets wrote:
> > On 5/31/22 21:15, Frode Nordahl wrote:
> >> On Mon, May 30, 2022 at 5:25 PM Frode Nordahl
>
> 
>
> >> I've pushed the first part of the fix here:
> >> https://mail.openvswitch.org/pipermail/ovs-dev/2022-May/394450.html
> >
> > Thanks!  I saw that and I tend to think that it is correct.
> > I'll try to test it and apply in the next couple of days.
> >
> > One question about the test above: which entity actually adds
> > the ct_state to the packet or at which moment that happens?
> > I see it, but I'm not sure I fully understand that.  Looks
> > like I'm missing smething obvious.
>
> I found what is going on there - kernel simply tracks everything
> if not told to do so, so ICMP packets create the ct entry and
> subsequent packets re-use it, so icmp replies have +trk set while
> entering OVS.

Great, my hunch was that something along these lines was happening as
well, I have to admit the test case was found by locating something
closest to the real life use case and it proved to work as a good test
for this condition.

> 
>
> Let's have some summary of the issues discovered here so far,
> including a few new issues:
>
> 1. ct states set externally are not tracked correctly by OVS.
>Fix: https://mail.openvswitch.org/pipermail/ovs-dev/2022-May/394450.html
>Status:  LGTM, will apply soon.
>This fixes the problem originally reported by Liam, IIUC.  Right?

Yes.

> 2. Kernel ct() actions are trying to re-use the cached connection
>after the tuple changes.
>This ends up to be the OVN hairpin issue as reported here:
>  https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856
>
>Proposed Fix:
>  
> https://lore.kernel.org/netdev/20220606221140.488984-1-i.maxim...@ovn.org/T/#u
>
>Status: Needs review.

I can confirm that the proposed fix resolves the OVN hairpin issue. It
also looks simple enough to be backportable all the way to where we
would need it (kernel 5.4.0). I'll have a look at giving this wider
exposure in an internal CI environment as a canary for any unintended
consequences if that would be helpful.

> 3. ct_clear is not a reversible action, because it required
>to pass the packet through conntrack again in order to restore
>the conntrack state.
>
>Fix: To move the CT_CLEAR to a group of irreversible actions
> in the reversible_actions() in ofproto/ofproto-dpif-xlate.c.
>
>Status: An easy fix that does nothing useful without fixes for
>later issues, because we clear the ct_state before
>the patch_port_output() processing and optimizing out
>the CT_CLEAR action.
>
>
> 4. (new!) reversible_actions() optimization is logically incorrect.
>The reason is that reversible_actions() doesn't look further
>than a list of actions of the first OF rule in the second bridge.
>If the list of action contains resubmit, there can still be
>irreversible actions and packet will be irreversibly modified by
>the patch port processing without cloning.
>
>Simple reproducer:
>
> diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
> index dbb3b6dda..be30ad5bf 100644
> --- a/tests/ofproto-dpif.at
> +++ b/tests/ofproto-dpif.at
> @@ -8736,7 +8736,8 @@ OVS_VSWITCHD_START(
>  AT_CHECK([ovs-ofctl -O OpenFlow13 add-meter br1 'meter=1 pktps stats 
> bands=type=drop rate=2'])
>  AT_CHECK([ovs-ofctl del-flows br0])
>  AT_CHECK([ovs-ofctl -O OpenFlow13 add-flow br0 in_port=local,ip,actions=2,1])
> -AT_CHECK([ovs-ofctl -O OpenFlow13 add-flow br1 
> in_port=1,ip,actions=meter:1,3])
> +AT_CHECK([ovs-ofctl -O OpenFlow13 add-flow br1 
> "in_port=1,ip,actions=resubmit(,7)"])
> +AT_CHECK([ovs-ofctl -O OpenFlow13 add-flow br1 
> table=7,in_port=1,ip,actions=meter:1,3])
>
>  AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 
> 'in_port(100),eth(src=f8:bc:12:44:34:b6,dst=f8:bc:12:46:58:e0),eth_type(0x0800),ipv4(src=10.1.1.22,dst=10.0.0.3,proto=6,tos=0,ttl=64,frag=no),tcp(src=53295,dst=8080)'],
>  [0], [stdout])
>  AT_CHECK([tail -1 stdout], [0],
> ---
>
>Status: unclear.
>One idea is to reverse the logic and check datapath actions for
>being reversible when going up in recursion instead of trying to
>predict all the future actions while going down.
>
>Ideas are welcome.
>Hammerhead approach: Mark 'resubmit' as irreversible action.  But that
>will effectively mean that we're always cloning on patch port output,
>which is not great, see the issue #8.
>
>
> 5. Packets after the ct() in a non-forked pipeline must be untracked.
>Status: unclear.
>
>Fix for the issue #2 will cover issues for already tracked packets,
>but if the packet is supposed to be untracked, but it is tracked,
>then we must emit the ct_clear action from the userspace.
>
>The most viable approach, I guess, is the previously proposed
>'pending_ct_clear' fix.  Alternative is to always emit ct_clear
> 

Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: Fix internal CT state for non-recirc traffic.

2022-06-07 Thread Frode Nordahl
On Tue, Jun 7, 2022 at 10:52 PM Ilya Maximets  wrote:
>
> On 5/30/22 21:07, Frode Nordahl wrote:
> > In some circumstances a flow may get its ct_state set without
> > conscious intervention by the OVS user space code.
> >
> > Commit 355fef6f2ccbc optimizes out uneccessary ct_clear actions
> > based on an internal struct xlate_ctx->conntracked state flag.
> >
> > Before this commit the xlate_ctx->conntracked state flag would
> > be initialized to 'false' and only set during thawing for
> > recirculation.
> >
> > This patch checks the flow ct_state for the non-recirc case and
> > sets the internal conntracked state appropriately.  A system
> > traffic test is also added to avoid regression.
> >
> > Fixes: 355fef6f2ccbc ("ofproto-dpif-xlate: Avoid successive ct_clear 
> > datapath actions.")
> > Signed-off-by: Frode Nordahl 
> > ---
> >  ofproto/ofproto-dpif-xlate.c |  6 +
> >  tests/system-traffic.at  | 46 
> >  2 files changed, 52 insertions(+)
>
> Thanks!  Applied and backported down to 2.13.
>
> Best regards, Ilya Maximets.

Thank you for the merges!

-- 
Frode Nordahl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 2/2] Ensure pid belongs to ovsdb-server in ovn-ctl

2022-06-07 Thread 0-day Robot
Bleep bloop.  Greetings Terry Wilson, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


Patch skipped due to previous failure.

Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 1/2] Handle re-used pids in pidfile_is_running

2022-06-07 Thread 0-day Robot
Bleep bloop.  Greetings Terry Wilson, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


git-am:
error: sha1 information is lacking or useless (utilities/ovn-ctl).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch' to see the failed patch
Patch failed at 0001 Handle re-used pids in pidfile_is_running
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".


Patch skipped due to previous failure.

Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 1/2] Handle re-used pids in pidfile_is_running

2022-06-07 Thread Terry Wilson
Since pids can be re-used, it is necessary to check that the
process that is running with a pid matches the one that we expect.

This adds the ability to optionally pass a 'binary' argument to
pidfile_is_running, and if it is passed to match the binary against
/proc/$pid/exe.

Signed-off-by: Terry Wilson 
---
 utilities/ovn-ctl | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/utilities/ovn-ctl b/utilities/ovn-ctl
index d733aa42d..14d37a3d6 100755
--- a/utilities/ovn-ctl
+++ b/utilities/ovn-ctl
@@ -42,7 +42,8 @@ ovn_ic_db_conf_file="$ovn_etcdir/ovn-ic-db-params.conf"
 
 pidfile_is_running () {
 pidfile=$1
-test -e "$pidfile" && [ -s "$pidfile" ] && pid=`cat "$pidfile"` && 
pid_exists "$pid"
+cmd=$2
+test -e "$pidfile" && [ -s "$pidfile" ] && pid=`cat "$pidfile"` && 
pid_exists "$pid" && [ -z $cmd -o pid_comm_check "$cmd" "$pid" ]
 } >/dev/null 2>&1
 
 stop_nb_ovsdb() {
-- 
2.34.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 2/2] Ensure pid belongs to ovsdb-server in ovn-ctl

2022-06-07 Thread Terry Wilson
When checking if ovsdb-server is running, ensure that the binary
we are going to run matches the one actually running with the the
pid that was in our pidfile.

Signed-off-by: Terry Wilson 
---
 utilities/ovn-ctl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/utilities/ovn-ctl b/utilities/ovn-ctl
index 14d37a3d6..e2f05915b 100755
--- a/utilities/ovn-ctl
+++ b/utilities/ovn-ctl
@@ -200,7 +200,7 @@ start_ovsdb__() {
 ovn_install_dir "$ovn_etcdir"
 
 # Check and eventually start ovsdb-server for DB
-if pidfile_is_running $db_pid_file; then
+if pidfile_is_running $db_pid_file ovsdb-server; then
 return
 fi
 
-- 
2.34.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 3/3] netdev-offload-dpdk: use flow transfer proxy mechanism

2022-06-07 Thread Ivan Malov

Hi Eli,

On Wed, 1 Jun 2022, Eli Britstein wrote:


- Missing proper handling of the testpmd syntax logging. It changes the used port 
according to "transfer", but the log still uses netdev_dpdk_get_port_id().


Thanks for noticing. I will see to it in the next version.


- The usage of the "proxy" port for rte-flow implies that this proxy port is attached to 
OVS, otherwise it is not "started" and creation of flows will fail.


That's the way it is. If there is no proxy for a given port, then the
original port value will be used for managing flows. For vendors that
don't need the proxy, this will work. For others, it won't. That's OK.




-Original Message-
From: Ivan Malov 
Sent: Monday, May 30, 2022 5:16 PM
To: d...@openvswitch.org
Cc: Andrew Rybchenko ; Ilya Maximets
; Ori Kam ; Eli Britstein
; NBU-Contact-Thomas Monjalon (EXTERNAL)
; Stephen Hemminger
; David Marchand
; Gaetan Rivet ; Maxime
Coquelin 
Subject: [PATCH 3/3] netdev-offload-dpdk: use flow transfer proxy
mechanism

External email: Use caution opening links or attachments


Manage "transfer" flows via the corresponding mechanism.
Doing so requires that the traffic source be specified explicitly, via the
corresponding pattern item.

Signed-off-by: Ivan Malov 
Acked-by: Andrew Rybchenko 
---
lib/netdev-dpdk.c | 73 ---
lib/netdev-dpdk.h |  2 +-
lib/netdev-offload-dpdk.c | 43 ++-
3 files changed, 103 insertions(+), 15 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
45e5d26d2..d0bf4613a 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -420,6 +420,7 @@ enum dpdk_hw_ol_features {

struct netdev_dpdk {
PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, cacheline0,
+dpdk_port_t flow_transfer_proxy_port_id;
dpdk_port_t port_id;

/* If true, device was attached by rte_eth_dev_attach(). */ @@ -1115,6
+1116,23 @@ dpdk_eth_dev_init_rx_metadata(struct netdev_dpdk *dev)
  DPDK_PORT_ID_FMT, dev->port_id);
}
}
+
+static void
+dpdk_eth_dev_init_flow_transfer_proxy(struct netdev_dpdk *dev) {
+int ret;
+
+ret = rte_flow_pick_transfer_proxy(dev->port_id,
+   &dev->flow_transfer_proxy_port_id, 
NULL);
+if (ret == 0)
+return;
+
+/*
+ * The PMD does not indicate the proxy port.
+ * It is OK to assume the proxy is unneeded.
+ */
+dev->flow_transfer_proxy_port_id = dev->port_id; }
#endif /* ALLOW_EXPERIMENTAL_API */

static int
@@ -1141,6 +1159,19 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
 * Request delivery of such metadata.
 */
dpdk_eth_dev_init_rx_metadata(dev);
+
+/*
+ * Managing "transfer" flows requires that the user communicate them
+ * via a port which has the privilege to control the embedded switch.
+ * For some vendors, all ports in a given switching domain have
+ * this privilege. For other vendors, it's only one port.
+ *
+ * Get the proxy port ID and remember it for later use.
+ */
+dpdk_eth_dev_init_flow_transfer_proxy(dev);
+#else /* ! ALLOW_EXPERIMENTAL_API */
+/* It is OK to assume the proxy is unneeded. */
+dev->flow_transfer_proxy_port_id = dev->port_id;
#endif /* ALLOW_EXPERIMENTAL_API */

rte_eth_dev_info_get(dev->port_id, &info);
@@ -5214,13 +5245,15 @@ out:

int
netdev_dpdk_rte_flow_destroy(struct netdev *netdev,
- struct rte_flow *rte_flow,
+ bool transfer, struct rte_flow *rte_flow,
 struct rte_flow_error *error)
{
struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
int ret;

-ret = rte_flow_destroy(dev->port_id, rte_flow, error);
+ret = rte_flow_destroy(transfer ?
+   dev->flow_transfer_proxy_port_id : dev->port_id,
+   rte_flow, error);
return ret;
}

@@ -5234,7 +5267,19 @@ netdev_dpdk_rte_flow_create(struct netdev
*netdev,
struct rte_flow *flow;
struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);

-flow = rte_flow_create(dev->port_id, attr, items, actions, error);
+#ifdef ALLOW_EXPERIMENTAL_API
+if (!attr->transfer) {
+/*
+ * The 1st item in any pattern is a traffic source one.
+ * It is unnecessary in the case of non-transfer rules.
+ */
+++(items);
+}
+#endif /* ALLOW_EXPERIMENTAL_API */
+
+flow = rte_flow_create(attr->transfer ?
+   dev->flow_transfer_proxy_port_id : dev->port_id,
+   attr, items, actions, error);
return flow;
}

@@ -5262,7 +5307,8 @@ netdev_dpdk_rte_flow_query_count(struct netdev
*netdev,
}

dev = netdev_dpdk_cast(netdev);
-ret = rte_flow_query(dev->port_id, rte_flow, actions, query, error);
+ret = rte_flow_query(dev->flow_transfer_proxy_port_id, rte_flow,
+ actions, query, error);
return ret;
}

@@ -5284,8 +5330,8 @@ 

Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: Fix internal CT state for non-recirc traffic.

2022-06-07 Thread Ilya Maximets
On 5/30/22 21:07, Frode Nordahl wrote:
> In some circumstances a flow may get its ct_state set without
> conscious intervention by the OVS user space code.
> 
> Commit 355fef6f2ccbc optimizes out uneccessary ct_clear actions
> based on an internal struct xlate_ctx->conntracked state flag.
> 
> Before this commit the xlate_ctx->conntracked state flag would
> be initialized to 'false' and only set during thawing for
> recirculation.
> 
> This patch checks the flow ct_state for the non-recirc case and
> sets the internal conntracked state appropriately.  A system
> traffic test is also added to avoid regression.
> 
> Fixes: 355fef6f2ccbc ("ofproto-dpif-xlate: Avoid successive ct_clear datapath 
> actions.")
> Signed-off-by: Frode Nordahl 
> ---
>  ofproto/ofproto-dpif-xlate.c |  6 +
>  tests/system-traffic.at  | 46 
>  2 files changed, 52 insertions(+)

Thanks!  Applied and backported down to 2.13.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] classifier: adjust segment boundary to execute prerequisite processing

2022-06-07 Thread Ilya Maximets
On 5/24/22 23:35, Aaron Conole wrote:
> During flow processing, the flow wildcards are checked as a series of
> stages, and these stages are intended to carry dependencies in a single
> direction.  But when the neighbor discovery processing, for example, was
> executed there is an incorrect dependency chain - we need fields from
> stage 4 to determine whether we need fields from stage 3.
> 
> We can build a set of flow rules to demonstrate this:
>   table=0,priority=100,ipv6,ipv6_src=1000::/10 actions=resubmit(,1)
>   table=0,priority=0 actions=NORMAL
>   table=1,priority=110,ipv6,ipv6_dst=1000::3 actions=resubmit(,2)
>   table=1,priority=100,ipv6,ipv6_dst=1000::4 actions=resubmit(,2)
>   table=1,priority=0 actions=NORMAL
>   
> table=2,priority=120,icmp6,nw_ttl=255,icmp_type=135,icmp_code=0,nd_sll=10:de:ad:be:ef:10
>  actions=NORMAL
>   table=2,priority=100,tcp actions=NORMAL
>   table=2,priority=100,icmp6 actions=NORMAL
>   table=2,priority=0 actions=NORMAL
> 
> With this set of flows, any IPv6 packet that executes through this pipeline
> will have the corresponding nd_sll field flagged as required match for
> classification even if that field doesn't make sense in such a context
> (for example, TCP packets).  When the corresponding flow is installed into
> the kernel datapath, this field is not reflected when the revalidator
> executes the dump stage (see net/openvswitch/flow_netlink.c for more details).
> 
> During the sweep stage, revalidator will compare the dumped WC with a
> generated WC - these will mismatch because the generated WC will match on
> the Neighbor Discovery fields, while the datapath WC will not match on
> these fields.  We will then invalidate the flow and as a side effect
> force an upcall.
> 
> By redefining the boundary, we shift these fields to the l4 subtable, and
> cause masks to be generated matching just the requisite fields.  The list
> of fields being shifted:
> 
> struct in6_addr nd_target;
> struct eth_addr arp_sha;
> struct eth_addr arp_tha;
> ovs_be16 tcp_flags;
> ovs_be16 pad2;
> struct ovs_key_nsh nsh;
> 
> A standout field would be tcp_flags moving from l3 subtable matches to
> the l4 subtable matches.  This reverts a partial performance optimization
> in the case of stateless firewalling.  The tcp_flags field might have
> been a good candidate to retain in the l3 segment, but it got overloaded
> with ICMPv6 ND matching, and therefore we can't preserve this kind of
> optimization.
> 
> Two other approaches were considered - moving the nd_target field alone
> and collapsing the l3/l4 segments into a single subtable for matching.
> Moving any field individually introduces ABI mismatch, and doesn't
> completely address the problems with other neighbor discovery related
> fields (such as nd_sll/nd_tll).  Collapsing the two subtables creates
> an issue with datapath flow explosion, since the l3 and l4 fields will
> be unwildcarded together (this can be seen with some of the existing
> classifier tests).
> 
> A simple test is added to showcase the behavior.
> 
> Fixes: 476f36e83bc5 ("Classifier: Staged subtable matching.")
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2081773
> Reported-by: Numan Siddique 
> Suggested-by: Ilya Maximets 
> Signed-off-by: Aaron Conole 
> Acked-by: Eelco Chaudron 
> Acked-by: Cian Ferriter 
> Tested-by: Numan Siddique 
> ---
> v1->v2: Added missing OVS_VSWITCHD_STOP and adjust whitespace on test case

This fixes an important bug that is really hard to track down on real
deployments.  Testing so far didn't show any significant issues with
this approach, so I applied the patch and backported down to 2.13.

Thanks!

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ovs-tcpdump: Fix error when stopping ovs-tcpdump.

2022-06-07 Thread Ilya Maximets
On 6/3/22 17:51, Mike Pattrick wrote:
> On Fri, May 20, 2022 at 5:55 AM Han Ding  wrote:
>>
>>
>> Sometimes we need to dump packets on more than two interfaces in a bridge
>> at the same time. Then when we stop dumping in order, ovs-tcpdump print
>> traceback and fail to delete mirror interface for some interface.
>>
>> For example:
>> br-int has two interface tap1 and br-int. We use ovs-tcpdump dump tap1 first
>> and dump br-int next. Then stopping tap1 ovs-tcpdump first, and stopping
>> br-int second. When we stop ovs-tcpdump for br-int, the screen show the error
>> like this:
>> __main__.OVSDBException: Unable to delete Mirror m_br-int
>>
>> Signed-off-by: Han Ding 
> 
> I can confirm that this issue does occur, and this patch fixes it.
> 
> Acked-by: Mike Pattrick 
> 
> Small nit but instead of closing and reopening the OVSDB object, maybe
> we should just atexit.register the call to ovsdb.close_idl.
> 
> Cheers,
> M

Thanks, Han and Mike!

The fully correct solution, I guess, is to periodically call idl.run()
to update the current database representation, but it's not clear, how
to implement that.  And if we're not calling idl.run() server will
disconnect us at some point anyway because of inactivity.  So, closing
the connection before starting tcpdump does make sense.

Applied to master and branch 2.17.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 1/2] conntrack: remove the IP iterations in nat_get_unique_l4

2022-06-07 Thread Ilya Maximets
On 5/5/22 16:00, we...@chinatelecom.cn wrote:
> From: wenxu 
> 
> Removing the IP iterations, and just picking the IP address
> with the hash base on the least-used src-ip/dst-ip/proto triple.
> 
> Signed-off-by: wenxu 
> Acked-by: Paolo Valerio 
> ---

Thanks, wenxu and Paolo!  I applied both patches.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/3] netdev-dpdk: negotiate delivery of per-packet Rx metadata

2022-06-07 Thread Ivan Malov

Hi Eli,

Thanks for reviewing the series.

On Wed, 1 Jun 2022, Eli Britstein wrote:


"TUNNEL_ID" is a bad name, but that's how dpdk called it.
There was a discussion about having this knowledge in OVS so we can avoid 
calling rte_flow_get_restore_info(). How else it is used?


As far as I know, avoiding rte_flow_get_restore_info() demands that
tunnel ID be retrieved from mbufs themselves, using a dynamic field.
That proposal does not discard the need to deliver tunnel data from
the NIC to the PMD, and the patch that we discuss is still required.




-Original Message-
From: Ivan Malov 
Sent: Monday, May 30, 2022 5:16 PM
To: d...@openvswitch.org
Cc: Andrew Rybchenko ; Ilya Maximets
; Ori Kam ; Eli Britstein
; NBU-Contact-Thomas Monjalon (EXTERNAL)
; Stephen Hemminger
; David Marchand
; Gaetan Rivet ; Maxime
Coquelin 
Subject: [PATCH 1/3] netdev-dpdk: negotiate delivery of per-packet Rx
metadata

External email: Use caution opening links or attachments


This may be required by some PMDs in offload scenarios.

Signed-off-by: Ivan Malov 
Acked-by: Andrew Rybchenko 
---
lib/netdev-dpdk.c | 44 
1 file changed, 44 insertions(+)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index f9535bfb4..45e5d26d2
100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -1085,6 +1085,38 @@ dpdk_eth_flow_ctrl_setup(struct netdev_dpdk
*dev) OVS_REQUIRES(dev->mutex)
}
}

+#ifdef ALLOW_EXPERIMENTAL_API
+static void
+dpdk_eth_dev_init_rx_metadata(struct netdev_dpdk *dev) {
+uint64_t rx_metadata = 0;
+int ret;
+
+/* For the fallback offload (non-"transfer" rules) */
+rx_metadata |= RTE_ETH_RX_METADATA_USER_MARK;
+/* For the full offload ("transfer" rules) */
+rx_metadata |= RTE_ETH_RX_METADATA_TUNNEL_ID;
+
+ret = rte_eth_rx_metadata_negotiate(dev->port_id, &rx_metadata);
+if (ret == 0) {
+if (!(rx_metadata & RTE_ETH_RX_METADATA_USER_MARK)) {
+VLOG_DBG("The NIC will not provide per-packet USER_MARK on port
"
+ DPDK_PORT_ID_FMT, dev->port_id);
+}
+if (!(rx_metadata & RTE_ETH_RX_METADATA_TUNNEL_ID)) {
+VLOG_DBG("The NIC will not provide per-packet TUNNEL_ID on port "
+ DPDK_PORT_ID_FMT, dev->port_id);
+}
+} else if (ret == -ENOTSUP) {
+VLOG_DBG("Rx metadata negotiate procedure is not supported on port
"
+ DPDK_PORT_ID_FMT, dev->port_id);
+} else {
+VLOG_WARN("Cannot negotiate Rx metadata on port "
+  DPDK_PORT_ID_FMT, dev->port_id);
+}
+}
+#endif /* ALLOW_EXPERIMENTAL_API */
+
static int
dpdk_eth_dev_init(struct netdev_dpdk *dev)
OVS_REQUIRES(dev->mutex)
@@ -1099,6 +1131,18 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
 RTE_ETH_RX_OFFLOAD_TCP_CKSUM |
 RTE_ETH_RX_OFFLOAD_IPV4_CKSUM;

+#ifdef ALLOW_EXPERIMENTAL_API
+/*
+ * Full tunnel offload requires that tunnel ID metadata be
+ * delivered with "miss" packets from the hardware to the
+ * PMD. The same goes for megaflow mark metadata which is
+ * used in MARK + RSS offload scenario.
+ *
+ * Request delivery of such metadata.
+ */
+dpdk_eth_dev_init_rx_metadata(dev);
+#endif /* ALLOW_EXPERIMENTAL_API */
+
rte_eth_dev_info_get(dev->port_id, &info);

if (strstr(info.driver_name, "vf") != NULL) {
--
2.30.2




___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH RFC ovn] nb: Remove possibility of disabling logical datapath groups.

2022-06-07 Thread Dumitru Ceara
In large scale scenarios this option hugely reduces the size of the
Southbound database positively affecting end to end performance.  In
such scenarios there's no real reason to ever disable datapath groups.

In lower scale scenarios any potential overhead due to logical datapath
groups is, very likely, negligible.

Aside from potential scalability concerns, the
NB.NB_Global.options:use_logical_dp_group knob was kept until now to
ensure that in case of a bug in the logical datapath groups code a CMS
may turn it off and fall back to the mode in which logical flows are not
grouped together.  As far as I know, this has never happened until now.

Moreover, datpath_groups are enabled by default since v21.09.0 (4 stable
releases ago), via 90daa7ce18dc ("northd: Enable logical dp groups by
default.").

>From a testing perspective removing this knob will halve the CI matrix.
This is desirable, especially in the context of more tests being added,
e.g.:

https://patchwork.ozlabs.org/project/ovn/patch/20220531093318.2270409-1-mh...@redhat.com/

Signed-off-by: Dumitru Ceara 
---
 NEWS|  1 +
 northd/northd.c | 77 +++--
 ovn-nb.xml  | 19 +++---
 tests/ovn-macros.at | 23 ++--
 tests/ovn-northd.at | 92 -
 tests/ovs-macros.at |  4 +-
 6 files changed, 41 insertions(+), 175 deletions(-)

diff --git a/NEWS b/NEWS
index e015ae8e7..20b4c5d91 100644
--- a/NEWS
+++ b/NEWS
@@ -4,6 +4,7 @@ Post v22.06.0
 "ovn-encap-df_default" to enable or disable tunnel DF flag.
   - Add option "localnet_learn_fdb" to LSP that will allow localnet
 ports to learn MAC addresses and store them in FDB table.
+  - Removed possibility of disabling logical datapath groups.
 
 OVN v22.06.0 - XX XXX 
 --
diff --git a/northd/northd.c b/northd/northd.c
index 0207f6ce1..a97a321cd 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -4813,10 +4813,6 @@ ovn_lflow_equal(const struct ovn_lflow *a, const struct 
ovn_datapath *od,
 && nullable_string_is_equal(a->ctrl_meter, ctrl_meter));
 }
 
-/* If this option is 'true' northd will combine logical flows that differ by
- * logical datapath only by creating a datapath group. */
-static bool use_logical_dp_groups = false;
-
 enum {
 STATE_NULL,   /* parallelization is off */
 STATE_INIT_HASH_SIZES,/* parallelization is on; hashes sizing needed */
@@ -4841,8 +4837,7 @@ ovn_lflow_init(struct ovn_lflow *lflow, struct 
ovn_datapath *od,
 lflow->ctrl_meter = ctrl_meter;
 lflow->dpg = NULL;
 lflow->where = where;
-if ((parallelization_state != STATE_NULL)
-&& use_logical_dp_groups) {
+if (parallelization_state != STATE_NULL) {
 ovs_mutex_init(&lflow->odg_lock);
 }
 }
@@ -4852,7 +4847,7 @@ ovn_dp_group_add_with_reference(struct ovn_lflow 
*lflow_ref,
 struct ovn_datapath *od)
 OVS_NO_THREAD_SAFETY_ANALYSIS
 {
-if (!use_logical_dp_groups || !lflow_ref) {
+if (!lflow_ref) {
 return false;
 }
 
@@ -4931,13 +4926,11 @@ do_ovn_lflow_add(struct hmap *lflow_map, struct 
ovn_datapath *od,
 struct ovn_lflow *old_lflow;
 struct ovn_lflow *lflow;
 
-if (use_logical_dp_groups) {
-old_lflow = ovn_lflow_find(lflow_map, NULL, stage, priority, match,
-   actions, ctrl_meter, hash);
-if (old_lflow) {
-ovn_dp_group_add_with_reference(old_lflow, od);
-return old_lflow;
-}
+old_lflow = ovn_lflow_find(lflow_map, NULL, stage, priority, match,
+   actions, ctrl_meter, hash);
+if (old_lflow) {
+ovn_dp_group_add_with_reference(old_lflow, od);
+return old_lflow;
 }
 
 lflow = xmalloc(sizeof *lflow);
@@ -4993,8 +4986,7 @@ ovn_lflow_add_at_with_hash(struct hmap *lflow_map, struct 
ovn_datapath *od,
 struct ovn_lflow *lflow;
 
 ovs_assert(ovn_stage_to_datapath_type(stage) == ovn_datapath_get_type(od));
-if (use_logical_dp_groups
-&& (parallelization_state == STATE_USE_PARALLELIZATION)) {
+if (parallelization_state == STATE_USE_PARALLELIZATION) {
 lflow = do_ovn_lflow_add_pd(lflow_map, od, hash, stage, priority,
 match, actions, io_port, stage_hint, where,
 ctrl_meter);
@@ -5080,8 +5072,7 @@ static void
 ovn_lflow_destroy(struct hmap *lflows, struct ovn_lflow *lflow)
 {
 if (lflow) {
-if ((parallelization_state != STATE_NULL)
-&& use_logical_dp_groups) {
+if (parallelization_state != STATE_NULL) {
 ovs_mutex_destroy(&lflow->odg_lock);
 }
 if (lflows) {
@@ -13883,24 +13874,15 @@ build_lswitch_and_lrouter_flows(const struct hmap 
*datapaths,
 int index;
 
 lsiv = xcalloc(sizeof(*lsiv), build_lflows_pool->size);
-i

Re: [ovs-dev] [PATCH 1/1] packets: Re-calculate IPv6 checksum only for last frags upon modify.

2022-06-07 Thread Salem Sol via dev
> -Original Message-
> From: Ilya Maximets 
> Sent: Tuesday, June 7, 2022 1:14 PM
> To: Salem Sol ; d...@openvswitch.org
> Cc: Eli Britstein ; Michael Pattrick ;
> i.maxim...@ovn.org
> Subject: Re: [ovs-dev] [PATCH 1/1] packets: Re-calculate IPv6 checksum only
> for last frags upon modify.
> 
> External email: Use caution opening links or attachments
> 
> 
> On 6/6/22 12:12, Salem Sol via dev wrote:
> > In case of modifying an IPv6 packet src/dst address the checksum
> > should be recalculated only for the last frag. Currently it's done for
> > all frags, leading to incorrect reassembled packet checksum.
> > Fix it by adding a new flag to recalculate the checksum only for last frag.
> 
> Hi.  Thanks for working on this!
> 
> There is definitely a problem with checksum calculation, but I don't
> understand the logic of the fix.  We're talking about re-calculation of the L4
> checksum, but the *last* fragment should not contain the L4 header.  So,
> we're just corrupting the data of the last fragment.  If we want to 
> re-calculate
> the checksum, we should do that for the *first* fragment instead, because
> it's the only fragment that will likely contain the L4 header.  Or am I 
> missing
> something?
> 
> BTW, it looks like we have the same issue for ipv4 packets.  We should
> update checksum only for the first fragment.  This can be a separate fix
> though.
> 
> Some more comments inline.
> 
> Best regards, Ilya Maximets.
> 
Thanks for the catch, you're right, my bad, in my testing I used only one 
extension header and the original patchset seemed to resolve the issue.
New implementation can be found in V3.
> >
> > Fixes: bc7a5acdff08 ("datapath: add ipv6 'set' action")
> > Signed-off-by: Salem Sol 
> > Acked-by: Mike Pattrick 
> > ---
> >  lib/packets.c   | 20 ++--
> >  tests/system-traffic.at | 35 +++
> >  2 files changed, 53 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/packets.c b/lib/packets.c index d0fba8176..b7aef41a5
> > 100644
> > --- a/lib/packets.c
> > +++ b/lib/packets.c
> > @@ -1148,6 +1148,18 @@ packet_set_ipv4_addr(struct dp_packet
> *packet,
> >  put_16aligned_be32(addr, new_addr);  }
> >
> > +static bool
> > +packet_is_last_ipv6_frag(struct dp_packet *packet) {
> > +const struct ovs_16aligned_ip6_frag *frag_hdr;
> > +uint8_t *data = dp_packet_l3(packet);
> > +
> > +data += sizeof (struct ovs_16aligned_ip6_hdr);
> > +frag_hdr = ALIGNED_CAST(struct ovs_16aligned_ip6_frag *, data);
> > +return (frag_hdr->ip6f_offlg & IP6F_OFF_MASK) &&
> > +   !(frag_hdr->ip6f_offlg & IP6F_MORE_FRAG); }
> > +
> >  /* Returns true, if packet contains at least one routing header where
> >   * segements_left > 0.
> >   *
> > @@ -1334,17 +1346,21 @@ packet_set_ipv6(struct dp_packet *packet,
> > const struct in6_addr *src,  {
> >  struct ovs_16aligned_ip6_hdr *nh = dp_packet_l3(packet);
> >  uint8_t proto = 0;
> > +bool recalc_csum;
> >  bool rh_present;
> >
> >  rh_present = packet_rh_present(packet, &proto);
> > +recalc_csum = nh->ip6_nxt == IPPROTO_FRAGMENT ?
> > +packet_is_last_ipv6_frag(packet) : true;
> 
> The fragmentation header is usually the last extension header, not the first.
> This check will work for the simple case with only one extension header, but
> it will not work in the general case.  We should search for the fragmentation
> header the same way as we look for the routing header in
> packet_rh_present().
> We should, probably, modify the packet_rh_present() function to also say if
> the packet is fragmented / is it the first fragment.
> 
Thanks for the comment, I followed your suggestion and changed 
packet_rh_present() to update if it's the first frag in V3.
> >
> >  if (memcmp(&nh->ip6_src, src, sizeof(ovs_be32[4]))) {
> > -packet_set_ipv6_addr(packet, proto, nh->ip6_src.be32, src, true);
> > +packet_set_ipv6_addr(packet, proto, nh->ip6_src.be32,
> > + src, recalc_csum);
> >  }
> >
> >  if (memcmp(&nh->ip6_dst, dst, sizeof(ovs_be32[4]))) {
> >  packet_set_ipv6_addr(packet, proto, nh->ip6_dst.be32, dst,
> > - !rh_present);
> > + !rh_present && recalc_csum);
> >  }
> >
> >  packet_set_ipv6_tc(&nh->ip6_flow, key_tc); diff --git
> > a/tests/system-traffic.at b/tests/system-traffic.at index
> > 239105e89..16ba42d84 100644
> > --- a/tests/system-traffic.at
> > +++ b/tests/system-traffic.at
> > @@ -192,6 +192,41 @@ NS_CHECK_EXEC([at_ns0], [ping6 -s 3200 -q -c 3 -i
> > 0.3 -w 2 fc00:1::2 | FORMAT_PI  OVS_TRAFFIC_VSWITCHD_STOP
> AT_CLEANUP
> >
> > +AT_SETUP([datapath - ping6 between two ports with header modify])
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH(p0, at_ns0, br0, "fc00::1/96", e4:11:22:33:44:55)
> > +ADD_VETH(p1, at_n

[ovs-dev] [PATCH v3 1/1] packets: Re-calculate IPv6 checksum only for first frag upon modify.

2022-06-07 Thread Salem Sol via dev
In case of modifying an IPv6 packet src/dst address the L4 checksum should be
recalculated only for the first frag. Currently it's done for all frags,
leading to incorrect reassembled packet checksum.
Fix it by adding a new flag to recalculate the checksum only for last frag.

Fixes: bc7a5acdff08 ("datapath: add ipv6 'set' action")
Signed-off-by: Salem Sol 
---
 lib/packets.c   | 12 
 tests/system-traffic.at | 35 +++
 2 files changed, 43 insertions(+), 4 deletions(-)

Revision history:
- v3 -> v2: Changed the checksum calculation to be on the first frag
- v2 -> v1: Removed redundant line - Acked-by: Mike Pattrick 
.

Link to passing github actions can be found in [1].

[1]: https://github.com/salemsol/ovs/actions/runs/2455292688

diff --git a/lib/packets.c b/lib/packets.c
index d0fba8176..8f5976bbd 100644
--- a/lib/packets.c
+++ b/lib/packets.c
@@ -1153,7 +1153,7 @@ packet_set_ipv4_addr(struct dp_packet *packet,
  *
  * This function assumes that L3 and L4 offsets are set in the packet. */
 static bool
-packet_rh_present(struct dp_packet *packet, uint8_t *nexthdr)
+packet_rh_present(struct dp_packet *packet, uint8_t *nexthdr, bool *first_frag)
 {
 const struct ovs_16aligned_ip6_hdr *nh;
 size_t len;
@@ -1203,6 +1203,8 @@ packet_rh_present(struct dp_packet *packet, uint8_t 
*nexthdr)
 const struct ovs_16aligned_ip6_frag *frag_hdr
 = ALIGNED_CAST(struct ovs_16aligned_ip6_frag *, data);
 
+*first_frag = !(frag_hdr->ip6f_offlg & IP6F_OFF_MASK) &&
+   (frag_hdr->ip6f_offlg & IP6F_MORE_FRAG);
 *nexthdr = frag_hdr->ip6f_nxt;
 len = sizeof *frag_hdr;
 } else if (*nexthdr == IPPROTO_ROUTING) {
@@ -1333,18 +1335,20 @@ packet_set_ipv6(struct dp_packet *packet, const struct 
in6_addr *src,
 uint8_t key_hl)
 {
 struct ovs_16aligned_ip6_hdr *nh = dp_packet_l3(packet);
+bool recalc_csum = true;
 uint8_t proto = 0;
 bool rh_present;
 
-rh_present = packet_rh_present(packet, &proto);
+rh_present = packet_rh_present(packet, &proto, &recalc_csum);
 
 if (memcmp(&nh->ip6_src, src, sizeof(ovs_be32[4]))) {
-packet_set_ipv6_addr(packet, proto, nh->ip6_src.be32, src, true);
+packet_set_ipv6_addr(packet, proto, nh->ip6_src.be32,
+ src, recalc_csum);
 }
 
 if (memcmp(&nh->ip6_dst, dst, sizeof(ovs_be32[4]))) {
 packet_set_ipv6_addr(packet, proto, nh->ip6_dst.be32, dst,
- !rh_present);
+ !rh_present && recalc_csum);
 }
 
 packet_set_ipv6_tc(&nh->ip6_flow, key_tc);
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 239105e89..588e1b2e2 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -192,6 +192,41 @@ NS_CHECK_EXEC([at_ns0], [ping6 -s 3200 -q -c 3 -i 0.3 -w 2 
fc00:1::2 | FORMAT_PI
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([datapath - ping6 between two ports with header modify])
+OVS_TRAFFIC_VSWITCHD_START()
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "fc00::1/96", e4:11:22:33:44:55)
+ADD_VETH(p1, at_ns1, br0, "fc00::2/96", e4:11:22:33:44:54)
+NS_CHECK_EXEC([at_ns0], [ip -6 neigh add fc00::3 lladdr e4:11:22:33:44:54 dev 
p0])
+NS_CHECK_EXEC([at_ns0], [ip -6 neigh add fc00::2 lladdr e4:11:22:33:44:54 dev 
p0])
+NS_CHECK_EXEC([at_ns0], [ip -6 neigh add fc00::1 lladdr e4:11:22:33:44:55 dev 
p0])
+
+dnl Linux seems to take a little time to get its IPv6 stack in order. Without
+dnl waiting, we get occasional failures due to the following error:
+dnl "connect: Cannot assign requested address"
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2])
+
+AT_CHECK([ovs-ofctl del-flows -OOpenFlow15 br0])
+AT_CHECK([ovs-ofctl add-flow -OOpenFlow15 br0 
in_port=ovs-p0,ipv6,ipv6_dst=fc00::3,ipv6_src=fc00::1,actions=set_field:fc00::2-\>ipv6_dst,ovs-p1])
+AT_CHECK([ovs-ofctl add-flow -OOpenFlow15 br0 
in_port=ovs-p1,ipv6,ipv6_dst=fc00::1,ipv6_src=fc00::2,actions=set_field:fc00::3-\>ipv6_src,ovs-p0])
+
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::3 | FORMAT_PING], 
[0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping6 -s 1600 -q -c 3 -i 0.3 -w 2 fc00::3 | 
FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping6 -s 3200 -q -c 3 -i 0.3 -w 2 fc00::3 | 
FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([datapath - ping over bond])
 OVS_TRAFFIC_VSWITCHD_START()
 
-- 
2.21.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 ovn] northd: add the capability to inherit logical routers lbs on logical switches

2022-06-07 Thread Numan Siddique
On Mon, Jun 6, 2022 at 6:44 PM Lorenzo Bianconi
 wrote:
>
> Add the capability to automatically deploy a load-balancer on each
> logical-switch connected to a logical router where the load-balancer
> has been installed by the CMS. This patch allow to overcome the
> distributed gw router scenario limitation where a load-balancer must be
> installed on each datapath to properly reach the load-balancer.
>
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2043543
> Signed-off-by: Lorenzo Bianconi 

Thanks for the rebase.

Acked-by: Numan Siddique 

Numan

> ---
> Changes since v2:
> - rebase on top of ovn master
>
> Changes since v1:
> - rebase on top of ovn master
> - add NEWS entry
> - improve selftests
> ---
>  NEWS|  5 +++
>  northd/northd.c | 56 +
>  northd/ovn-northd.8.xml |  8 +
>  tests/ovn-northd.at | 80 +
>  4 files changed, 149 insertions(+)
>
> diff --git a/NEWS b/NEWS
> index e015ae8e7..8b4f91553 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -4,6 +4,11 @@ Post v22.06.0
>  "ovn-encap-df_default" to enable or disable tunnel DF flag.
>- Add option "localnet_learn_fdb" to LSP that will allow localnet
>  ports to learn MAC addresses and store them in FDB table.
> +  - northd: introduce the capability to automatically deploy a load-balancer
> +on each logical-switch connected to a logical router where the
> +load-balancer has been installed by the CMS. In order to enable the
> +feature the CMS has to set install_ls_lb_from_router to true in option
> +column of NB_Global table.
>
>  OVN v22.06.0 - XX XXX 
>  --
> diff --git a/northd/northd.c b/northd/northd.c
> index 0207f6ce1..a95a5148e 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -63,6 +63,8 @@ static bool lflow_hash_lock_initialized = false;
>
>  static bool check_lsp_is_up;
>
> +static bool install_ls_lb_from_router;
> +
>  /* MAC allocated for service monitor usage. Just one mac is allocated
>   * for this purpose and ovn-controller's on each chassis will make use
>   * of this mac when sending out the packets to monitor the services
> @@ -4140,6 +4142,55 @@ build_lrouter_lbs_reachable_ips(struct hmap 
> *datapaths, struct hmap *lbs)
>  }
>  }
>
> +static void
> +build_lswitch_lbs_from_lrouter(struct hmap *datapaths, struct hmap *lbs)
> +{
> +if (!install_ls_lb_from_router) {
> +return;
> +}
> +
> +struct ovn_datapath *od;
> +HMAP_FOR_EACH (od, key_node, datapaths) {
> +if (!od->nbs) {
> +continue;
> +}
> +
> +struct ovn_port *op;
> +LIST_FOR_EACH (op, dp_node, &od->port_list) {
> +if (!lsp_is_router(op->nbsp)) {
> +continue;
> +}
> +if (!op->peer) {
> +continue;
> +}
> +
> +struct ovn_datapath *peer_od = op->peer->od;
> +for (size_t i = 0; i < peer_od->nbr->n_load_balancer; i++) {
> +bool installed = false;
> +const struct uuid *lb_uuid =
> +&peer_od->nbr->load_balancer[i]->header_.uuid;
> +struct ovn_northd_lb *lb = ovn_northd_lb_find(lbs, lb_uuid);
> +if (!lb) {
> +continue;
> +}
> +
> +for (size_t j = 0; j < lb->n_nb_ls; j++) {
> +   if (lb->nb_ls[j] == od) {
> +   installed = true;
> +   break;
> +   }
> +}
> +if (!installed) {
> +ovn_northd_lb_add_ls(lb, od);
> +}
> +if (lb->nlb) {
> +od->has_lb_vip |= lb_has_vip(lb->nlb);
> +}
> +}
> +}
> +}
> +}
> +
>  /* This must be called after all ports have been processed, i.e., after
>   * build_ports() because the reachability check requires the router ports
>   * networks to have been parsed.
> @@ -4152,6 +4203,7 @@ build_lb_port_related_data(struct hmap *datapaths, 
> struct hmap *ports,
>  build_lrouter_lbs_check(datapaths);
>  build_lrouter_lbs_reachable_ips(datapaths, lbs);
>  build_lb_svcs(input_data, ovnsb_txn, ports, lbs);
> +build_lswitch_lbs_from_lrouter(datapaths, lbs);
>  }
>
>  /* Syncs relevant load balancers (applied to logical switches) to the
> @@ -15378,6 +15430,10 @@ ovnnb_db_run(struct northd_input *input_data,
>   "ignore_lsp_down", true);
>  default_acl_drop = smap_get_bool(&nb->options, "default_acl_drop", 
> false);
>
> +install_ls_lb_from_router = smap_get_bool(&nb->options,
> +  "install_ls_lb_from_router",
> +  false);
> +
>  build_chassis_features(input_data, &data->features);
>  build_datapaths(input_data, ovnsb_txn, &data->datap

[ovs-dev] [PATCH v2] dpcls: add unlisted alias for subtable lookup command

2022-06-07 Thread Harry van Haaren
This patch adds the old name "subtable-lookup-prio-get" as an unlisted command,
to restore a consistency between OVS releases for testing scripts.

Fixes 738c76a503f4 ("dpcls: Change info-get function to fetch dpcls usage 
stats.")
Suggested-by: Eelco Chaudron 
Suggested-by: Ilya Maximets 
Signed-off-by: Harry van Haaren 

---

v2:
- Based on discussion and push back on v1 patch here, this is a v2
  implementing the suggested "alias" method. Suggested by tags added
  for both Eelco (for alias concept) and Ilya (for unlisted concept).
  
https://patchwork.ozlabs.org/project/openvswitch/patch/20220525141014.661907-1-harry.van.haa...@intel.com/

---
 lib/dpif-netdev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index ff57b3961..f46b9fe18 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -1609,6 +1609,9 @@ dpif_netdev_init(void)
 unixctl_command_register("dpif-netdev/subtable-lookup-info-get", "",
  0, 0, dpif_netdev_subtable_lookup_get,
  NULL);
+unixctl_command_register("dpif-netdev/subtable-lookup-prio-get", NULL,
+ 0, 0, dpif_netdev_subtable_lookup_get,
+ NULL);
 unixctl_command_register("dpif-netdev/dpif-impl-set",
  "dpif_implementation_name",
  1, 1, dpif_netdev_impl_set,
-- 
2.32.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/1] packets: Re-calculate IPv6 checksum only for last frags upon modify.

2022-06-07 Thread Ilya Maximets
On 6/6/22 12:12, Salem Sol via dev wrote:
> In case of modifying an IPv6 packet src/dst address the checksum should be
> recalculated only for the last frag. Currently it's done for all frags,
> leading to incorrect reassembled packet checksum.
> Fix it by adding a new flag to recalculate the checksum only for last frag.

Hi.  Thanks for working on this!

There is definitely a problem with checksum calculation, but I don't
understand the logic of the fix.  We're talking about re-calculation
of the L4 checksum, but the *last* fragment should not contain the L4
header.  So, we're just corrupting the data of the last fragment.  If
we want to re-calculate the checksum, we should do that for the *first*
fragment instead, because it's the only fragment that will likely
contain the L4 header.  Or am I missing something?

BTW, it looks like we have the same issue for ipv4 packets.  We should
update checksum only for the first fragment.  This can be a separate
fix though.

Some more comments inline.

Best regards, Ilya Maximets.

> 
> Fixes: bc7a5acdff08 ("datapath: add ipv6 'set' action")
> Signed-off-by: Salem Sol 
> Acked-by: Mike Pattrick 
> ---
>  lib/packets.c   | 20 ++--
>  tests/system-traffic.at | 35 +++
>  2 files changed, 53 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/packets.c b/lib/packets.c
> index d0fba8176..b7aef41a5 100644
> --- a/lib/packets.c
> +++ b/lib/packets.c
> @@ -1148,6 +1148,18 @@ packet_set_ipv4_addr(struct dp_packet *packet,
>  put_16aligned_be32(addr, new_addr);
>  }
>  
> +static bool
> +packet_is_last_ipv6_frag(struct dp_packet *packet)
> +{
> +const struct ovs_16aligned_ip6_frag *frag_hdr;
> +uint8_t *data = dp_packet_l3(packet);
> +
> +data += sizeof (struct ovs_16aligned_ip6_hdr);
> +frag_hdr = ALIGNED_CAST(struct ovs_16aligned_ip6_frag *, data);
> +return (frag_hdr->ip6f_offlg & IP6F_OFF_MASK) &&
> +   !(frag_hdr->ip6f_offlg & IP6F_MORE_FRAG);
> +}
> +
>  /* Returns true, if packet contains at least one routing header where
>   * segements_left > 0.
>   *
> @@ -1334,17 +1346,21 @@ packet_set_ipv6(struct dp_packet *packet, const 
> struct in6_addr *src,
>  {
>  struct ovs_16aligned_ip6_hdr *nh = dp_packet_l3(packet);
>  uint8_t proto = 0;
> +bool recalc_csum;
>  bool rh_present;
>  
>  rh_present = packet_rh_present(packet, &proto);
> +recalc_csum = nh->ip6_nxt == IPPROTO_FRAGMENT ?
> +packet_is_last_ipv6_frag(packet) : true;

The fragmentation header is usually the last extension header,
not the first.  This check will work for the simple case with
only one extension header, but it will not work in the general
case.  We should search for the fragmentation header the same
way as we look for the routing header in packet_rh_present().
We should, probably, modify the packet_rh_present() function
to also say if the packet is fragmented / is it the first
fragment.

>  
>  if (memcmp(&nh->ip6_src, src, sizeof(ovs_be32[4]))) {
> -packet_set_ipv6_addr(packet, proto, nh->ip6_src.be32, src, true);
> +packet_set_ipv6_addr(packet, proto, nh->ip6_src.be32,
> + src, recalc_csum);
>  }
>  
>  if (memcmp(&nh->ip6_dst, dst, sizeof(ovs_be32[4]))) {
>  packet_set_ipv6_addr(packet, proto, nh->ip6_dst.be32, dst,
> - !rh_present);
> + !rh_present && recalc_csum);
>  }
>  
>  packet_set_ipv6_tc(&nh->ip6_flow, key_tc);
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index 239105e89..16ba42d84 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -192,6 +192,41 @@ NS_CHECK_EXEC([at_ns0], [ping6 -s 3200 -q -c 3 -i 0.3 -w 
> 2 fc00:1::2 | FORMAT_PI
>  OVS_TRAFFIC_VSWITCHD_STOP
>  AT_CLEANUP
>  
> +AT_SETUP([datapath - ping6 between two ports with header modify])
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH(p0, at_ns0, br0, "fc00::1/96", e4:11:22:33:44:55)
> +ADD_VETH(p1, at_ns1, br0, "fc00::2/96", e4:11:22:33:44:54)
> +NS_CHECK_EXEC([at_ns0], [ip -6 neigh add fc00::3 lladdr e4:11:22:33:44:54 
> dev p0])
> +NS_CHECK_EXEC([at_ns0], [ip -6 neigh add fc00::2 lladdr e4:11:22:33:44:54 
> dev p0])
> +NS_CHECK_EXEC([at_ns0], [ip -6 neigh add fc00::1 lladdr e4:11:22:33:44:55 
> dev p0])
> +
> +dnl Linux seems to take a little time to get its IPv6 stack in order. Without
> +dnl waiting, we get occasional failures due to the following error:
> +dnl "connect: Cannot assign requested address"
> +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2])
> +
> +AT_CHECK([ovs-ofctl del-flows -OOpenFlow15 br0], [], [stdout], [stderr])
> +AT_CHECK([ovs-ofctl add-flow -OOpenFlow15 br0 
> in_port=ovs-p0,ipv6,ipv6_dst=fc00::3,ipv6_src=fc00::1,actions=set_field:fc00::2-\>ipv6_dst,ovs-p1],
>  [], [stdout], [stderr])
> +AT_CHECK([ovs-ofctl a

Re: [ovs-dev] User space connection tracking benchmarks

2022-06-07 Thread Robin Jarry
Paolo Valerio, Jun 05, 2022 at 19:37:
> Just a note that may be useful.
> After some tests, I noticed that establishing e.g. two TCP connections,
> and leaving the first one idle after 3whs, once the second connection
> expires (after moving to TIME_WAIT as a result of termination), the
> second doesn't get evicted until any event gets scheduled for the first.
>
> ovs-appctl dpctl/dump-conntrack -s
> tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=9090,dport=8080),reply=(src=10.1.1.2,dst=10.1.1.1,sport=8080,dport=9090),zone=1,timeout=84576,protoinfo=(state=ESTABLISHED)
> tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=9091,dport=8080),reply=(src=10.1.1.2,dst=10.1.1.1,sport=8080,dport=9091),zone=1,timeout=0,protoinfo=(state=TIME_WAIT)
>
> This may be somewhat related to your results as during the
> test, the number of connections may reach the limit so apparently reducing
> the performances.

Indeed, there was an issue in my test procedure. Due to the way T-Rex
generates connections, it is easy to fill the conntrack table after
a few iterations, making the test results inconsistent.

Also, the flows which I had configured were not correct. There was an
extraneous action=NORMAL flow at the end. When the conntrack table is
full and a new packet cannot be tracked, it is marked as +trk+inv and
not dropped. This behaviour is specific to the userspace datapath. The
linux kernel datapath seems to drop the packet when it cannot be added
to connection tracking.

Gaƫtan's series (v4) seems less resilient to the conntrack table being
full, especially when there is more than one PMD core.

I have changed the t-rex script to allow running arbitrary commands in
between traffic iterations. This is leveraged to flush the conntrack
table and run each iteration in the same conditions.

https://github.com/cisco-system-traffic-generator/trex-core/blob/v2.98/scripts/cps_ndr.py

To avoid filling the conntrack table, the max size was increased to 50M.
The DUT configuration can be summarized as the following:

ovs-vsctl set open_vswitch . other_config:dpdk-init=true
ovs-vsctl set open_vswitch . other_config:pmd-cpu-mask="0x15554"
ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev
ovs-vsctl add-port br0 pf0 -- set Interface pf0 type=dpdk \
options:dpdk-devargs=:3b:00.0 options:n_rxq=4 options:n_rxq_desc=4096
ovs-vsctl add-port br0 pf1 -- set Interface pf1 type=dpdk \
options:dpdk-devargs=:3b:00.1 options:n_rxq=4 options:n_rxq_desc=4096
ovs-appctl dpctl/ct-set-maxconns 5000
ovs-ofctl add-flow br0 
"table=0,priority=10,ip,ct_state=-trk,actions=ct(table=0)"
ovs-ofctl add-flow br0 
"table=0,priority=10,ip,ct_state=+trk+new,actions=ct(commit),NORMAL"
ovs-ofctl add-flow br0 "table=0,priority=10,ip,ct_state=+trk+est,actions=NORMAL"
ovs-ofctl add-flow br0 "table=0,priority=0,actions=drop"

Short Lived Connections
---

./cps_ndr.py --sample-time 10 --max-iterations 8 --error-threshold 0.01 \
--reset-command "ssh $dut ovs-appctl dpctl/flush-conntrack" \
--udp-percent 1 --num-messages 1 --message-size 20 --server-wait 0 \
--min-cps 10k --max-cps 600k

== === == == ===
Series Num. Flows  CPSPPSBPS
== === == == ===
Baseline   40.1K   79.3K/s556Kp/s347Mb/s
Gaetan v1  60.5K   121K/s 837Kp/s522Mb/s
Gaetan v4  61.4K   122K/s 850Kp/s530Mb/s
Paolo  377K756K/s 5.3Mp/s3.3Gb/s
== === == == ===

Even after fixing the test procedure, Paolo's series still performs
a lot better with short lived connections.

Long Lived Connections
--

./cps_ndr.py --sample-time 30 --max-iterations 8 --error-threshold 0.01 \
--reset-command "ssh $dut ovs-appctl dpctl/flush-conntrack" \
--udp-percent 1 --num-messages 500 --message-size 20 --server-wait 50 \
--min-cps 100 --max-cps 10k

== === == == ===
Series Num. Flows  CPSPPSBPS
== === == == ===
Baseline   17.4K   504/s  633Kp/s422Mb/s
Gaetan v1  80.4K   3.1K/s 4.6Mp/s3.0Gb/s
Gaetan v4  139K5.4K/s 8.2Mp/s5.4Gb/s
Paolo  132K5.2K/s 7.7Mp/s5.2Gb/s
== === == == ===

Thanks to Paolo for his help on this second round of tests.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev