Re: [ovs-dev] [RFC ovn 0/2] Basic eBPF/XDP support in OVN.
On Mon, May 30, 2022 at 5:46 PM wrote: > > From: Numan Siddique > > XDP program - ovn_xdp.c added in this RFC patch series implements basic port > security and drops any packet if the port security check fails. > There are still few TODOs in the port security checks. Like > - Make ovn xdp configurable. > - Removing the ingress Openflow rules from table 73 and 74 if ovn xdp > is enabled. > - Add IPv6 support. > - Enhance the port security xdp program for ARP/IPv6 ND checks. > > This patch adds a basic XDP support in OVN and in future we can > leverage eBPF/XDP features. > > I'm not sure how much value this RFC patch adds to make use of eBPF/XDP > just for port security. Submitting as RFC to get some feedback and > start some conversation on eBPF/XDP in OVN. > Hi Numan, This is really cool. It demonstrates how OVN could leverage eBPF/XDP. On the other hand, for the port-security feature in XDP, I keep thinking about the scenarios and it is still not very clear to me. One advantage I can think of is to prevent DOS attacks from VM/Pod when invalid IP/MAC are used, XDP may perform better and drop packets with lower CPU cost (comparing with OVS kernel datapath). However, I am also wondering why would a attacker use invalid IP/MAC for DOS attacks? Do you have some more thoughts about the use cases? And do you have any performance results comparing with the current OVS implementation? Another question is, would it work with smart NIC HW-offload, where VF representer ports are added to OVS on the smart NIC? I guess XDP doesn't support representer port, right? Thanks, Han > In order to attach and detach xdp programs, libxdp [1] and libbpf is used. > > To test it out locally, please install libxdp-devel and libbpf-devel > and the compile OVN first and then compile ovn_xdp by running "make > bpf". Copy ovn_xdp.o to either /usr/share/ovn/ or /usr/local/share/ovn/ > > > Numan Siddique (2): > RFC: Add basic xdp/eBPF support in OVN. > RFC: ovn-controller: Attach XDP progs to the VIFs of the logical > ports. > > Makefile.am | 6 +- > bpf/.gitignore | 5 + > bpf/automake.mk | 23 +++ > bpf/ovn_xdp.c | 156 +++ > configure.ac| 2 + > controller/automake.mk | 4 +- > controller/binding.c| 45 +++-- > controller/binding.h| 7 + > controller/ovn-controller.c | 79 +++- > controller/xdp.c| 389 > controller/xdp.h| 41 > m4/ovn.m4 | 20 ++ > tests/automake.mk | 1 + > 13 files changed, 753 insertions(+), 25 deletions(-) > create mode 100644 bpf/.gitignore > create mode 100644 bpf/automake.mk > create mode 100644 bpf/ovn_xdp.c > create mode 100644 controller/xdp.c > create mode 100644 controller/xdp.h > > -- > 2.35.3 > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH RFC ovn] nb: Remove possibility of disabling logical datapath groups.
On Tue, Jun 7, 2022 at 8:20 AM Dumitru Ceara wrote: > > In large scale scenarios this option hugely reduces the size of the > Southbound database positively affecting end to end performance. In > such scenarios there's no real reason to ever disable datapath groups. > > In lower scale scenarios any potential overhead due to logical datapath > groups is, very likely, negligible. > > Aside from potential scalability concerns, the > NB.NB_Global.options:use_logical_dp_group knob was kept until now to > ensure that in case of a bug in the logical datapath groups code a CMS > may turn it off and fall back to the mode in which logical flows are not > grouped together. As far as I know, this has never happened until now. > > Moreover, datpath_groups are enabled by default since v21.09.0 (4 stable > releases ago), via 90daa7ce18dc ("northd: Enable logical dp groups by > default."). > > From a testing perspective removing this knob will halve the CI matrix. > This is desirable, especially in the context of more tests being added, > e.g.: > > https://patchwork.ozlabs.org/project/ovn/patch/20220531093318.2270409-1-mh...@redhat.com/ > > Signed-off-by: Dumitru Ceara Thanks Dumitru! I haven't reviewed the patch in detail, but the idea sounds good to me. Han > --- > NEWS| 1 + > northd/northd.c | 77 +++-- > ovn-nb.xml | 19 +++--- > tests/ovn-macros.at | 23 ++-- > tests/ovn-northd.at | 92 - > tests/ovs-macros.at | 4 +- > 6 files changed, 41 insertions(+), 175 deletions(-) > > diff --git a/NEWS b/NEWS > index e015ae8e7..20b4c5d91 100644 > --- a/NEWS > +++ b/NEWS > @@ -4,6 +4,7 @@ Post v22.06.0 > "ovn-encap-df_default" to enable or disable tunnel DF flag. >- Add option "localnet_learn_fdb" to LSP that will allow localnet > ports to learn MAC addresses and store them in FDB table. > + - Removed possibility of disabling logical datapath groups. > > OVN v22.06.0 - XX XXX > -- > diff --git a/northd/northd.c b/northd/northd.c > index 0207f6ce1..a97a321cd 100644 > --- a/northd/northd.c > +++ b/northd/northd.c > @@ -4813,10 +4813,6 @@ ovn_lflow_equal(const struct ovn_lflow *a, const struct ovn_datapath *od, > && nullable_string_is_equal(a->ctrl_meter, ctrl_meter)); > } > > -/* If this option is 'true' northd will combine logical flows that differ by > - * logical datapath only by creating a datapath group. */ > -static bool use_logical_dp_groups = false; > - > enum { > STATE_NULL, /* parallelization is off */ > STATE_INIT_HASH_SIZES,/* parallelization is on; hashes sizing needed */ > @@ -4841,8 +4837,7 @@ ovn_lflow_init(struct ovn_lflow *lflow, struct ovn_datapath *od, > lflow->ctrl_meter = ctrl_meter; > lflow->dpg = NULL; > lflow->where = where; > -if ((parallelization_state != STATE_NULL) > -&& use_logical_dp_groups) { > +if (parallelization_state != STATE_NULL) { > ovs_mutex_init(&lflow->odg_lock); > } > } > @@ -4852,7 +4847,7 @@ ovn_dp_group_add_with_reference(struct ovn_lflow *lflow_ref, > struct ovn_datapath *od) > OVS_NO_THREAD_SAFETY_ANALYSIS > { > -if (!use_logical_dp_groups || !lflow_ref) { > +if (!lflow_ref) { > return false; > } > > @@ -4931,13 +4926,11 @@ do_ovn_lflow_add(struct hmap *lflow_map, struct ovn_datapath *od, > struct ovn_lflow *old_lflow; > struct ovn_lflow *lflow; > > -if (use_logical_dp_groups) { > -old_lflow = ovn_lflow_find(lflow_map, NULL, stage, priority, match, > - actions, ctrl_meter, hash); > -if (old_lflow) { > -ovn_dp_group_add_with_reference(old_lflow, od); > -return old_lflow; > -} > +old_lflow = ovn_lflow_find(lflow_map, NULL, stage, priority, match, > + actions, ctrl_meter, hash); > +if (old_lflow) { > +ovn_dp_group_add_with_reference(old_lflow, od); > +return old_lflow; > } > > lflow = xmalloc(sizeof *lflow); > @@ -4993,8 +4986,7 @@ ovn_lflow_add_at_with_hash(struct hmap *lflow_map, struct ovn_datapath *od, > struct ovn_lflow *lflow; > > ovs_assert(ovn_stage_to_datapath_type(stage) == ovn_datapath_get_type(od)); > -if (use_logical_dp_groups > -&& (parallelization_state == STATE_USE_PARALLELIZATION)) { > +if (parallelization_state == STATE_USE_PARALLELIZATION) { > lflow = do_ovn_lflow_add_pd(lflow_map, od, hash, stage, priority, > match, actions, io_port, stage_hint, where, > ctrl_meter); > @@ -5080,8 +5072,7 @@ static void > ovn_lflow_destroy(struct hmap *lflows, struct ovn_lflow *lflow) > { > if (lflow) { > -if ((parallelization_state != STATE_NULL) > -
[ovs-dev] [PATCH net-next] net: rename reference+tracking helpers
Netdev reference helpers have a dev_ prefix for historic reasons. Renaming the old helpers would be too much churn but we can rename the tracking ones which are relatively recent and should be the default for new code. Rename: dev_hold_track()-> netdev_hold() dev_put_track() -> netdev_put() dev_replace_track() -> netdev_ref_replace() Signed-off-by: Jakub Kicinski --- CC: dsah...@kernel.org CC: steffen.klass...@secunet.com CC: jreu...@yaina.de CC: ra...@blackwall.org CC: j...@resnulli.us CC: kgr...@linux.ibm.com CC: ivec...@redhat.com CC: jma...@redhat.com CC: ying@windriver.com CC: lucien@gmail.com CC: a...@arndb.de CC: yajun.d...@linux.dev CC: aten...@kernel.org CC: richardsonn...@google.com CC: hkallwe...@gmail.com CC: linux-h...@vger.kernel.org CC: d...@openvswitch.org CC: linux-s...@vger.kernel.org CC: tipc-discuss...@lists.sourceforge.net --- drivers/net/eql.c | 4 ++-- drivers/net/macsec.c | 4 ++-- drivers/net/macvlan.c | 4 ++-- drivers/net/netconsole.c | 2 +- drivers/net/vrf.c | 8 include/linux/netdevice.h | 24 include/net/xfrm.h | 2 +- net/8021q/vlan_dev.c | 4 ++-- net/ax25/af_ax25.c | 7 --- net/ax25/ax25_dev.c| 6 +++--- net/bridge/br_if.c | 10 +- net/core/dev.c | 10 +- net/core/dev_ioctl.c | 4 ++-- net/core/drop_monitor.c| 5 +++-- net/core/dst.c | 8 net/core/failover.c| 4 ++-- net/core/link_watch.c | 2 +- net/core/neighbour.c | 18 +- net/core/net-sysfs.c | 8 net/core/netpoll.c | 2 +- net/core/pktgen.c | 6 +++--- net/ethtool/ioctl.c| 4 ++-- net/ethtool/netlink.c | 6 +++--- net/ethtool/netlink.h | 2 +- net/ipv4/devinet.c | 4 ++-- net/ipv4/fib_semantics.c | 11 ++- net/ipv4/ipmr.c| 2 +- net/ipv4/route.c | 7 +++ net/ipv4/xfrm4_policy.c| 2 +- net/ipv6/addrconf.c| 4 ++-- net/ipv6/addrconf_core.c | 2 +- net/ipv6/ip6_gre.c | 8 net/ipv6/ip6_tunnel.c | 4 ++-- net/ipv6/ip6_vti.c | 4 ++-- net/ipv6/ip6mr.c | 2 +- net/ipv6/route.c | 10 +- net/ipv6/sit.c | 4 ++-- net/ipv6/xfrm6_policy.c| 4 ++-- net/llc/af_llc.c | 2 +- net/openvswitch/vport-netdev.c | 6 +++--- net/packet/af_packet.c | 12 ++-- net/sched/act_mirred.c | 6 +++--- net/sched/sch_api.c| 2 +- net/sched/sch_generic.c| 11 ++- net/smc/smc_pnet.c | 7 --- net/switchdev/switchdev.c | 4 ++-- net/tipc/bearer.c | 4 ++-- net/xfrm/xfrm_device.c | 2 +- 48 files changed, 141 insertions(+), 137 deletions(-) diff --git a/drivers/net/eql.c b/drivers/net/eql.c index 557ca8ff9dec..ca3e4700a813 100644 --- a/drivers/net/eql.c +++ b/drivers/net/eql.c @@ -225,7 +225,7 @@ static void eql_kill_one_slave(slave_queue_t *queue, slave_t *slave) list_del(&slave->list); queue->num_slaves--; slave->dev->flags &= ~IFF_SLAVE; - dev_put_track(slave->dev, &slave->dev_tracker); + netdev_put(slave->dev, &slave->dev_tracker); kfree(slave); } @@ -399,7 +399,7 @@ static int __eql_insert_slave(slave_queue_t *queue, slave_t *slave) if (duplicate_slave) eql_kill_one_slave(queue, duplicate_slave); - dev_hold_track(slave->dev, &slave->dev_tracker, GFP_ATOMIC); + netdev_hold(slave->dev, &slave->dev_tracker, GFP_ATOMIC); list_add(&slave->list, &queue->all_slaves); queue->num_slaves++; slave->dev->flags |= IFF_SLAVE; diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c index 817577e713d7..815738c0e067 100644 --- a/drivers/net/macsec.c +++ b/drivers/net/macsec.c @@ -3462,7 +3462,7 @@ static int macsec_dev_init(struct net_device *dev) memcpy(dev->broadcast, real_dev->broadcast, dev->addr_len); /* Get macsec's reference to real_dev */ - dev_hold_track(real_dev, &macsec->dev_tracker, GFP_KERNEL); + netdev_hold(real_dev, &macsec->dev_tracker, GFP_KERNEL); return 0; } @@ -3710,7 +3710,7 @@ static void macsec_free_netdev(struct net_device *dev) free_percpu(macsec->secy.tx_sc.stats); /* Get rid of the macsec's reference to real_dev */ - dev_put_track(macsec->real_dev, &macsec->dev_tracker); + netdev_put(macsec->real_dev, &macsec->dev_tracker); } static void macsec_setup(struct net_device *dev) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index eff75beb1395..5b46a6526fc6 100644 --- a/drivers/net/macvlan.c ++
Re: [ovs-dev] [PATCH v4 2/2] Ensure pid belongs to ovsdb-server in ovn-ctl
Bleep bloop. Greetings Terry Wilson, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. Patch skipped due to previous failure. Please check this out. If you feel there has been an error, please email acon...@redhat.com Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v4 1/2] Handle re-used pids in pidfile_is_running
Bleep bloop. Greetings Terry Wilson, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. git-am: error: sha1 information is lacking or useless (utilities/ovn-ctl). error: could not build fake ancestor hint: Use 'git am --show-current-patch' to see the failed patch Patch failed at 0001 Handle re-used pids in pidfile_is_running When you have resolved this problem, run "git am --continue". If you prefer to skip this patch, run "git am --skip" instead. To restore the original branch and stop patching, run "git am --abort". Patch skipped due to previous failure. Please check this out. If you feel there has been an error, please email acon...@redhat.com Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v4 2/2] Ensure pid belongs to ovsdb-server in ovn-ctl
When checking if ovsdb-server is running, ensure that the binary we are going to run matches the one actually running with the the pid that was in our pidfile. Signed-off-by: Terry Wilson --- utilities/ovn-ctl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/utilities/ovn-ctl b/utilities/ovn-ctl index 14d37a3d6..e2f05915b 100755 --- a/utilities/ovn-ctl +++ b/utilities/ovn-ctl @@ -200,7 +200,7 @@ start_ovsdb__() { ovn_install_dir "$ovn_etcdir" # Check and eventually start ovsdb-server for DB -if pidfile_is_running $db_pid_file; then +if pidfile_is_running $db_pid_file ovsdb-server; then return fi -- 2.34.3 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v4 1/2] Handle re-used pids in pidfile_is_running
Since pids can be re-used, it is necessary to check that the process that is running with a pid matches the one that we expect. This adds the ability to optionally pass a 'binary' argument to pidfile_is_running, and if it is passed to match the binary against /proc/$pid/exe. Signed-off-by: Terry Wilson --- utilities/ovn-ctl | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/utilities/ovn-ctl b/utilities/ovn-ctl index d733aa42d..14d37a3d6 100755 --- a/utilities/ovn-ctl +++ b/utilities/ovn-ctl @@ -42,7 +42,8 @@ ovn_ic_db_conf_file="$ovn_etcdir/ovn-ic-db-params.conf" pidfile_is_running () { pidfile=$1 -test -e "$pidfile" && [ -s "$pidfile" ] && pid=`cat "$pidfile"` && pid_exists "$pid" +cmd=$2 +test -e "$pidfile" && [ -s "$pidfile" ] && pid=`cat "$pidfile"` && pid_exists "$pid" && [ -z $cmd -o pid_comm_check "$cmd" "$pid" ] } >/dev/null 2>&1 stop_nb_ovsdb() { -- 2.34.3 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v13 ovn] Implement RARP activation strategy for ports
On Mon, Jun 6, 2022 at 3:48 AM Han Zhou wrote: > > > > On Fri, Jun 3, 2022 at 1:20 PM Ihar Hrachyshka wrote: > > > > When options:activation-strategy is set to "rarp" for LSP, when used in > > combination with multiple chassis names listed in > > options:requested-chassis, additional chassis will install special flows > > that would block all ingress and egress traffic for the port until a > > special activation event happens. > > > > For "rarp" strategy, an observation of a RARP packet sent from the port > > on the additional chassis is such an event. When it occurs, a special > > flow passes control to a controller() action handler that eventually > > removes the installed blocking flows and also marks the port as > > options:additional-chassis-activated in southbound db. > > > > This feature is useful in live migration scenarios where it's not > > advisable to unlock the destination port location prematurily to avoid > > duplicate packets originating from the port. > > > > Signed-off-by: Ihar Hrachyshka > > --- > > v13: use resubmit() action to reinject RARP into pipeline. > > v13: lock / unlock pinctrl_mutex in functions invoked from main thread. > > v13: db_is_port_activated->lport_is_activated_by_activation_strategy. > > Thanks Ihar for the revision. Sorry that I didn't follow up in time after v6. > Please see some of my quick comments for this version regarding I-P: > > > --- > > NEWS| 2 + > > controller/lport.c | 22 +++ > > controller/lport.h | 3 + > > controller/ovn-controller.c | 87 + > > controller/physical.c | 94 ++ > > controller/pinctrl.c| 145 +- > > controller/pinctrl.h| 13 ++ > > include/ovn/actions.h | 3 + > > northd/northd.c | 10 + > > northd/ovn-northd.c | 5 +- > > ovn-nb.xml | 11 ++ > > ovn-sb.xml | 15 ++ > > tests/ovn.at| 365 > > 13 files changed, 772 insertions(+), 3 deletions(-) > > > > diff --git a/NEWS b/NEWS > > index 2ee283a56..7c54670ed 100644 > > --- a/NEWS > > +++ b/NEWS > > @@ -29,6 +29,8 @@ OVN v22.06.0 - XX XXX > >- Added support for setting the Next server IP in the DHCP header > > using the private DHCP option - 253 in native OVN DHCPv4 responder. > >- Support list of chassis for > > Logical_Switch_Port:options:requested-chassis. > > + - Support Logical_Switch_Port:options:activation-strategy for live > > migration > > +scenarios. > > > > OVN v22.03.0 - 11 Mar 2022 > > -- > > diff --git a/controller/lport.c b/controller/lport.c > > index bf55d83f2..add7e91aa 100644 > > --- a/controller/lport.c > > +++ b/controller/lport.c > > @@ -197,3 +197,25 @@ get_peer_lport(const struct sbrec_port_binding *pb, > > peer_name); > > return (peer && peer->datapath) ? peer : NULL; > > } > > + > > +bool > > +lport_is_activated_by_activation_strategy(const struct sbrec_port_binding > > *pb, > > + const struct sbrec_chassis > > *chassis) > > +{ > > +const char *activated_chassis = smap_get(&pb->options, > > + > > "additional-chassis-activated"); > > +if (activated_chassis) { > > +char *save_ptr; > > +char *tokstr = xstrdup(activated_chassis); > > +for (const char *chassis_name = strtok_r(tokstr, ",", &save_ptr); > > + chassis_name != NULL; > > + chassis_name = strtok_r(NULL, ",", &save_ptr)) { > > +if (!strcmp(chassis_name, chassis->name)) { > > +free(tokstr); > > +return true; > > +} > > +} > > +free(tokstr); > > +} > > +return false; > > +} > > diff --git a/controller/lport.h b/controller/lport.h > > index 115881655..644c67255 100644 > > --- a/controller/lport.h > > +++ b/controller/lport.h > > @@ -70,4 +70,7 @@ const struct sbrec_port_binding *lport_get_peer( > > const struct sbrec_port_binding *lport_get_l3gw_peer( > > const struct sbrec_port_binding *, > > struct ovsdb_idl_index *sbrec_port_binding_by_name); > > +bool > > +lport_is_activated_by_activation_strategy(const struct sbrec_port_binding > > *pb, > > + const struct sbrec_chassis > > *chassis); > > #endif /* controller/lport.h */ > > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > > index b597c0e37..a37dfcb78 100644 > > --- a/controller/ovn-controller.c > > +++ b/controller/ovn-controller.c > > @@ -1047,6 +1047,50 @@ en_ofctrl_is_connected_run(struct engine_node *node, > > void *data) > > engine_set_node_state(node, EN_UNCHANGED); > > } > > > > +struct ed_type_activated_ports { > > +struct ovs_list *activated_ports; > > +}; > > + > > +static void * > > +en_activated_ports_init(struct engine_node *nod
[ovs-dev] [PATCH ovn v14] Implement RARP activation strategy for ports
When options:activation-strategy is set to "rarp" for LSP, when used in combination with multiple chassis names listed in options:requested-chassis, additional chassis will install special flows that would block all ingress and egress traffic for the port until a special activation event happens. For "rarp" strategy, an observation of a RARP packet sent from the port on the additional chassis is such an event. When it occurs, a special flow passes control to a controller() action handler that eventually removes the installed blocking flows and also marks the port as options:additional-chassis-activated in southbound db. This feature is useful in live migration scenarios where it's not advisable to unlock the destination port location prematurily to avoid duplicate packets originating from the port. Signed-off-by: Ihar Hrachyshka --- NEWS| 2 + controller/lport.c | 22 +++ controller/lport.h | 3 + controller/ovn-controller.c | 97 ++ controller/physical.c | 94 ++ controller/pinctrl.c| 155 ++- controller/pinctrl.h| 13 ++ include/ovn/actions.h | 3 + northd/northd.c | 10 + northd/ovn-northd.c | 5 +- ovn-nb.xml | 11 ++ ovn-sb.xml | 15 ++ tests/ovn.at| 365 13 files changed, 792 insertions(+), 3 deletions(-) diff --git a/NEWS b/NEWS index 2ee283a56..7c54670ed 100644 --- a/NEWS +++ b/NEWS @@ -29,6 +29,8 @@ OVN v22.06.0 - XX XXX - Added support for setting the Next server IP in the DHCP header using the private DHCP option - 253 in native OVN DHCPv4 responder. - Support list of chassis for Logical_Switch_Port:options:requested-chassis. + - Support Logical_Switch_Port:options:activation-strategy for live migration +scenarios. OVN v22.03.0 - 11 Mar 2022 -- diff --git a/controller/lport.c b/controller/lport.c index bf55d83f2..add7e91aa 100644 --- a/controller/lport.c +++ b/controller/lport.c @@ -197,3 +197,25 @@ get_peer_lport(const struct sbrec_port_binding *pb, peer_name); return (peer && peer->datapath) ? peer : NULL; } + +bool +lport_is_activated_by_activation_strategy(const struct sbrec_port_binding *pb, + const struct sbrec_chassis *chassis) +{ +const char *activated_chassis = smap_get(&pb->options, + "additional-chassis-activated"); +if (activated_chassis) { +char *save_ptr; +char *tokstr = xstrdup(activated_chassis); +for (const char *chassis_name = strtok_r(tokstr, ",", &save_ptr); + chassis_name != NULL; + chassis_name = strtok_r(NULL, ",", &save_ptr)) { +if (!strcmp(chassis_name, chassis->name)) { +free(tokstr); +return true; +} +} +free(tokstr); +} +return false; +} diff --git a/controller/lport.h b/controller/lport.h index 115881655..644c67255 100644 --- a/controller/lport.h +++ b/controller/lport.h @@ -70,4 +70,7 @@ const struct sbrec_port_binding *lport_get_peer( const struct sbrec_port_binding *lport_get_l3gw_peer( const struct sbrec_port_binding *, struct ovsdb_idl_index *sbrec_port_binding_by_name); +bool +lport_is_activated_by_activation_strategy(const struct sbrec_port_binding *pb, + const struct sbrec_chassis *chassis); #endif /* controller/lport.h */ diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index 2793c8687..78f58e312 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -1417,6 +1417,100 @@ en_runtime_data_run(struct engine_node *node, void *data) engine_set_node_state(node, EN_UPDATED); } +struct ed_type_activated_ports { +struct ovs_list *activated_ports; +}; + +static void * +en_activated_ports_init(struct engine_node *node OVS_UNUSED, +struct engine_arg *arg OVS_UNUSED) +{ +struct ed_type_activated_ports *data = xzalloc(sizeof *data); +data->activated_ports = NULL; +return data; +} + +static void +en_activated_ports_cleanup(void *data_) +{ +struct ed_type_activated_ports *data = data_; + +struct activated_port *pp; +if (!data->activated_ports) { +return; +} + +LIST_FOR_EACH_POP (pp, list, data->activated_ports) { +free(pp); +} +free(data->activated_ports); +data->activated_ports = NULL; +} + +static void +en_activated_ports_clear_tracked_data(void *data) +{ +en_activated_ports_cleanup(data); +} + +static void +en_activated_ports_run(struct engine_node *node, void *data_) +{ +struct ed_type_activated_ports *data = data_; +enum engine_node_state state = EN_UNCHANGED; + +if (!data->activated_ports) { +get_activated_ports(&data->activated
Re: [ovs-dev] [ovs-discuss] Commit 355fef6f2 seems to break connectivity in my setup
On Tue, Jun 7, 2022 at 12:16 AM Ilya Maximets wrote: > > On 5/31/22 23:48, Ilya Maximets wrote: > > On 5/31/22 21:15, Frode Nordahl wrote: > >> On Mon, May 30, 2022 at 5:25 PM Frode Nordahl > > > > >> I've pushed the first part of the fix here: > >> https://mail.openvswitch.org/pipermail/ovs-dev/2022-May/394450.html > > > > Thanks! I saw that and I tend to think that it is correct. > > I'll try to test it and apply in the next couple of days. > > > > One question about the test above: which entity actually adds > > the ct_state to the packet or at which moment that happens? > > I see it, but I'm not sure I fully understand that. Looks > > like I'm missing smething obvious. > > I found what is going on there - kernel simply tracks everything > if not told to do so, so ICMP packets create the ct entry and > subsequent packets re-use it, so icmp replies have +trk set while > entering OVS. Great, my hunch was that something along these lines was happening as well, I have to admit the test case was found by locating something closest to the real life use case and it proved to work as a good test for this condition. > > > Let's have some summary of the issues discovered here so far, > including a few new issues: > > 1. ct states set externally are not tracked correctly by OVS. >Fix: https://mail.openvswitch.org/pipermail/ovs-dev/2022-May/394450.html >Status: LGTM, will apply soon. >This fixes the problem originally reported by Liam, IIUC. Right? Yes. > 2. Kernel ct() actions are trying to re-use the cached connection >after the tuple changes. >This ends up to be the OVN hairpin issue as reported here: > https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856 > >Proposed Fix: > > https://lore.kernel.org/netdev/20220606221140.488984-1-i.maxim...@ovn.org/T/#u > >Status: Needs review. I can confirm that the proposed fix resolves the OVN hairpin issue. It also looks simple enough to be backportable all the way to where we would need it (kernel 5.4.0). I'll have a look at giving this wider exposure in an internal CI environment as a canary for any unintended consequences if that would be helpful. > 3. ct_clear is not a reversible action, because it required >to pass the packet through conntrack again in order to restore >the conntrack state. > >Fix: To move the CT_CLEAR to a group of irreversible actions > in the reversible_actions() in ofproto/ofproto-dpif-xlate.c. > >Status: An easy fix that does nothing useful without fixes for >later issues, because we clear the ct_state before >the patch_port_output() processing and optimizing out >the CT_CLEAR action. > > > 4. (new!) reversible_actions() optimization is logically incorrect. >The reason is that reversible_actions() doesn't look further >than a list of actions of the first OF rule in the second bridge. >If the list of action contains resubmit, there can still be >irreversible actions and packet will be irreversibly modified by >the patch port processing without cloning. > >Simple reproducer: > > diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at > index dbb3b6dda..be30ad5bf 100644 > --- a/tests/ofproto-dpif.at > +++ b/tests/ofproto-dpif.at > @@ -8736,7 +8736,8 @@ OVS_VSWITCHD_START( > AT_CHECK([ovs-ofctl -O OpenFlow13 add-meter br1 'meter=1 pktps stats > bands=type=drop rate=2']) > AT_CHECK([ovs-ofctl del-flows br0]) > AT_CHECK([ovs-ofctl -O OpenFlow13 add-flow br0 in_port=local,ip,actions=2,1]) > -AT_CHECK([ovs-ofctl -O OpenFlow13 add-flow br1 > in_port=1,ip,actions=meter:1,3]) > +AT_CHECK([ovs-ofctl -O OpenFlow13 add-flow br1 > "in_port=1,ip,actions=resubmit(,7)"]) > +AT_CHECK([ovs-ofctl -O OpenFlow13 add-flow br1 > table=7,in_port=1,ip,actions=meter:1,3]) > > AT_CHECK([ovs-appctl ofproto/trace ovs-dummy > 'in_port(100),eth(src=f8:bc:12:44:34:b6,dst=f8:bc:12:46:58:e0),eth_type(0x0800),ipv4(src=10.1.1.22,dst=10.0.0.3,proto=6,tos=0,ttl=64,frag=no),tcp(src=53295,dst=8080)'], > [0], [stdout]) > AT_CHECK([tail -1 stdout], [0], > --- > >Status: unclear. >One idea is to reverse the logic and check datapath actions for >being reversible when going up in recursion instead of trying to >predict all the future actions while going down. > >Ideas are welcome. >Hammerhead approach: Mark 'resubmit' as irreversible action. But that >will effectively mean that we're always cloning on patch port output, >which is not great, see the issue #8. > > > 5. Packets after the ct() in a non-forked pipeline must be untracked. >Status: unclear. > >Fix for the issue #2 will cover issues for already tracked packets, >but if the packet is supposed to be untracked, but it is tracked, >then we must emit the ct_clear action from the userspace. > >The most viable approach, I guess, is the previously proposed >'pending_ct_clear' fix. Alternative is to always emit ct_clear >
Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: Fix internal CT state for non-recirc traffic.
On Tue, Jun 7, 2022 at 10:52 PM Ilya Maximets wrote: > > On 5/30/22 21:07, Frode Nordahl wrote: > > In some circumstances a flow may get its ct_state set without > > conscious intervention by the OVS user space code. > > > > Commit 355fef6f2ccbc optimizes out uneccessary ct_clear actions > > based on an internal struct xlate_ctx->conntracked state flag. > > > > Before this commit the xlate_ctx->conntracked state flag would > > be initialized to 'false' and only set during thawing for > > recirculation. > > > > This patch checks the flow ct_state for the non-recirc case and > > sets the internal conntracked state appropriately. A system > > traffic test is also added to avoid regression. > > > > Fixes: 355fef6f2ccbc ("ofproto-dpif-xlate: Avoid successive ct_clear > > datapath actions.") > > Signed-off-by: Frode Nordahl > > --- > > ofproto/ofproto-dpif-xlate.c | 6 + > > tests/system-traffic.at | 46 > > 2 files changed, 52 insertions(+) > > Thanks! Applied and backported down to 2.13. > > Best regards, Ilya Maximets. Thank you for the merges! -- Frode Nordahl ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3 2/2] Ensure pid belongs to ovsdb-server in ovn-ctl
Bleep bloop. Greetings Terry Wilson, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. Patch skipped due to previous failure. Please check this out. If you feel there has been an error, please email acon...@redhat.com Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3 1/2] Handle re-used pids in pidfile_is_running
Bleep bloop. Greetings Terry Wilson, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. git-am: error: sha1 information is lacking or useless (utilities/ovn-ctl). error: could not build fake ancestor hint: Use 'git am --show-current-patch' to see the failed patch Patch failed at 0001 Handle re-used pids in pidfile_is_running When you have resolved this problem, run "git am --continue". If you prefer to skip this patch, run "git am --skip" instead. To restore the original branch and stop patching, run "git am --abort". Patch skipped due to previous failure. Please check this out. If you feel there has been an error, please email acon...@redhat.com Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3 1/2] Handle re-used pids in pidfile_is_running
Since pids can be re-used, it is necessary to check that the process that is running with a pid matches the one that we expect. This adds the ability to optionally pass a 'binary' argument to pidfile_is_running, and if it is passed to match the binary against /proc/$pid/exe. Signed-off-by: Terry Wilson --- utilities/ovn-ctl | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/utilities/ovn-ctl b/utilities/ovn-ctl index d733aa42d..14d37a3d6 100755 --- a/utilities/ovn-ctl +++ b/utilities/ovn-ctl @@ -42,7 +42,8 @@ ovn_ic_db_conf_file="$ovn_etcdir/ovn-ic-db-params.conf" pidfile_is_running () { pidfile=$1 -test -e "$pidfile" && [ -s "$pidfile" ] && pid=`cat "$pidfile"` && pid_exists "$pid" +cmd=$2 +test -e "$pidfile" && [ -s "$pidfile" ] && pid=`cat "$pidfile"` && pid_exists "$pid" && [ -z $cmd -o pid_comm_check "$cmd" "$pid" ] } >/dev/null 2>&1 stop_nb_ovsdb() { -- 2.34.3 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3 2/2] Ensure pid belongs to ovsdb-server in ovn-ctl
When checking if ovsdb-server is running, ensure that the binary we are going to run matches the one actually running with the the pid that was in our pidfile. Signed-off-by: Terry Wilson --- utilities/ovn-ctl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/utilities/ovn-ctl b/utilities/ovn-ctl index 14d37a3d6..e2f05915b 100755 --- a/utilities/ovn-ctl +++ b/utilities/ovn-ctl @@ -200,7 +200,7 @@ start_ovsdb__() { ovn_install_dir "$ovn_etcdir" # Check and eventually start ovsdb-server for DB -if pidfile_is_running $db_pid_file; then +if pidfile_is_running $db_pid_file ovsdb-server; then return fi -- 2.34.3 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 3/3] netdev-offload-dpdk: use flow transfer proxy mechanism
Hi Eli, On Wed, 1 Jun 2022, Eli Britstein wrote: - Missing proper handling of the testpmd syntax logging. It changes the used port according to "transfer", but the log still uses netdev_dpdk_get_port_id(). Thanks for noticing. I will see to it in the next version. - The usage of the "proxy" port for rte-flow implies that this proxy port is attached to OVS, otherwise it is not "started" and creation of flows will fail. That's the way it is. If there is no proxy for a given port, then the original port value will be used for managing flows. For vendors that don't need the proxy, this will work. For others, it won't. That's OK. -Original Message- From: Ivan Malov Sent: Monday, May 30, 2022 5:16 PM To: d...@openvswitch.org Cc: Andrew Rybchenko ; Ilya Maximets ; Ori Kam ; Eli Britstein ; NBU-Contact-Thomas Monjalon (EXTERNAL) ; Stephen Hemminger ; David Marchand ; Gaetan Rivet ; Maxime Coquelin Subject: [PATCH 3/3] netdev-offload-dpdk: use flow transfer proxy mechanism External email: Use caution opening links or attachments Manage "transfer" flows via the corresponding mechanism. Doing so requires that the traffic source be specified explicitly, via the corresponding pattern item. Signed-off-by: Ivan Malov Acked-by: Andrew Rybchenko --- lib/netdev-dpdk.c | 73 --- lib/netdev-dpdk.h | 2 +- lib/netdev-offload-dpdk.c | 43 ++- 3 files changed, 103 insertions(+), 15 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 45e5d26d2..d0bf4613a 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -420,6 +420,7 @@ enum dpdk_hw_ol_features { struct netdev_dpdk { PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, cacheline0, +dpdk_port_t flow_transfer_proxy_port_id; dpdk_port_t port_id; /* If true, device was attached by rte_eth_dev_attach(). */ @@ -1115,6 +1116,23 @@ dpdk_eth_dev_init_rx_metadata(struct netdev_dpdk *dev) DPDK_PORT_ID_FMT, dev->port_id); } } + +static void +dpdk_eth_dev_init_flow_transfer_proxy(struct netdev_dpdk *dev) { +int ret; + +ret = rte_flow_pick_transfer_proxy(dev->port_id, + &dev->flow_transfer_proxy_port_id, NULL); +if (ret == 0) +return; + +/* + * The PMD does not indicate the proxy port. + * It is OK to assume the proxy is unneeded. + */ +dev->flow_transfer_proxy_port_id = dev->port_id; } #endif /* ALLOW_EXPERIMENTAL_API */ static int @@ -1141,6 +1159,19 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) * Request delivery of such metadata. */ dpdk_eth_dev_init_rx_metadata(dev); + +/* + * Managing "transfer" flows requires that the user communicate them + * via a port which has the privilege to control the embedded switch. + * For some vendors, all ports in a given switching domain have + * this privilege. For other vendors, it's only one port. + * + * Get the proxy port ID and remember it for later use. + */ +dpdk_eth_dev_init_flow_transfer_proxy(dev); +#else /* ! ALLOW_EXPERIMENTAL_API */ +/* It is OK to assume the proxy is unneeded. */ +dev->flow_transfer_proxy_port_id = dev->port_id; #endif /* ALLOW_EXPERIMENTAL_API */ rte_eth_dev_info_get(dev->port_id, &info); @@ -5214,13 +5245,15 @@ out: int netdev_dpdk_rte_flow_destroy(struct netdev *netdev, - struct rte_flow *rte_flow, + bool transfer, struct rte_flow *rte_flow, struct rte_flow_error *error) { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); int ret; -ret = rte_flow_destroy(dev->port_id, rte_flow, error); +ret = rte_flow_destroy(transfer ? + dev->flow_transfer_proxy_port_id : dev->port_id, + rte_flow, error); return ret; } @@ -5234,7 +5267,19 @@ netdev_dpdk_rte_flow_create(struct netdev *netdev, struct rte_flow *flow; struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); -flow = rte_flow_create(dev->port_id, attr, items, actions, error); +#ifdef ALLOW_EXPERIMENTAL_API +if (!attr->transfer) { +/* + * The 1st item in any pattern is a traffic source one. + * It is unnecessary in the case of non-transfer rules. + */ +++(items); +} +#endif /* ALLOW_EXPERIMENTAL_API */ + +flow = rte_flow_create(attr->transfer ? + dev->flow_transfer_proxy_port_id : dev->port_id, + attr, items, actions, error); return flow; } @@ -5262,7 +5307,8 @@ netdev_dpdk_rte_flow_query_count(struct netdev *netdev, } dev = netdev_dpdk_cast(netdev); -ret = rte_flow_query(dev->port_id, rte_flow, actions, query, error); +ret = rte_flow_query(dev->flow_transfer_proxy_port_id, rte_flow, + actions, query, error); return ret; } @@ -5284,8 +5330,8 @@
Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: Fix internal CT state for non-recirc traffic.
On 5/30/22 21:07, Frode Nordahl wrote: > In some circumstances a flow may get its ct_state set without > conscious intervention by the OVS user space code. > > Commit 355fef6f2ccbc optimizes out uneccessary ct_clear actions > based on an internal struct xlate_ctx->conntracked state flag. > > Before this commit the xlate_ctx->conntracked state flag would > be initialized to 'false' and only set during thawing for > recirculation. > > This patch checks the flow ct_state for the non-recirc case and > sets the internal conntracked state appropriately. A system > traffic test is also added to avoid regression. > > Fixes: 355fef6f2ccbc ("ofproto-dpif-xlate: Avoid successive ct_clear datapath > actions.") > Signed-off-by: Frode Nordahl > --- > ofproto/ofproto-dpif-xlate.c | 6 + > tests/system-traffic.at | 46 > 2 files changed, 52 insertions(+) Thanks! Applied and backported down to 2.13. Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] classifier: adjust segment boundary to execute prerequisite processing
On 5/24/22 23:35, Aaron Conole wrote: > During flow processing, the flow wildcards are checked as a series of > stages, and these stages are intended to carry dependencies in a single > direction. But when the neighbor discovery processing, for example, was > executed there is an incorrect dependency chain - we need fields from > stage 4 to determine whether we need fields from stage 3. > > We can build a set of flow rules to demonstrate this: > table=0,priority=100,ipv6,ipv6_src=1000::/10 actions=resubmit(,1) > table=0,priority=0 actions=NORMAL > table=1,priority=110,ipv6,ipv6_dst=1000::3 actions=resubmit(,2) > table=1,priority=100,ipv6,ipv6_dst=1000::4 actions=resubmit(,2) > table=1,priority=0 actions=NORMAL > > table=2,priority=120,icmp6,nw_ttl=255,icmp_type=135,icmp_code=0,nd_sll=10:de:ad:be:ef:10 > actions=NORMAL > table=2,priority=100,tcp actions=NORMAL > table=2,priority=100,icmp6 actions=NORMAL > table=2,priority=0 actions=NORMAL > > With this set of flows, any IPv6 packet that executes through this pipeline > will have the corresponding nd_sll field flagged as required match for > classification even if that field doesn't make sense in such a context > (for example, TCP packets). When the corresponding flow is installed into > the kernel datapath, this field is not reflected when the revalidator > executes the dump stage (see net/openvswitch/flow_netlink.c for more details). > > During the sweep stage, revalidator will compare the dumped WC with a > generated WC - these will mismatch because the generated WC will match on > the Neighbor Discovery fields, while the datapath WC will not match on > these fields. We will then invalidate the flow and as a side effect > force an upcall. > > By redefining the boundary, we shift these fields to the l4 subtable, and > cause masks to be generated matching just the requisite fields. The list > of fields being shifted: > > struct in6_addr nd_target; > struct eth_addr arp_sha; > struct eth_addr arp_tha; > ovs_be16 tcp_flags; > ovs_be16 pad2; > struct ovs_key_nsh nsh; > > A standout field would be tcp_flags moving from l3 subtable matches to > the l4 subtable matches. This reverts a partial performance optimization > in the case of stateless firewalling. The tcp_flags field might have > been a good candidate to retain in the l3 segment, but it got overloaded > with ICMPv6 ND matching, and therefore we can't preserve this kind of > optimization. > > Two other approaches were considered - moving the nd_target field alone > and collapsing the l3/l4 segments into a single subtable for matching. > Moving any field individually introduces ABI mismatch, and doesn't > completely address the problems with other neighbor discovery related > fields (such as nd_sll/nd_tll). Collapsing the two subtables creates > an issue with datapath flow explosion, since the l3 and l4 fields will > be unwildcarded together (this can be seen with some of the existing > classifier tests). > > A simple test is added to showcase the behavior. > > Fixes: 476f36e83bc5 ("Classifier: Staged subtable matching.") > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2081773 > Reported-by: Numan Siddique > Suggested-by: Ilya Maximets > Signed-off-by: Aaron Conole > Acked-by: Eelco Chaudron > Acked-by: Cian Ferriter > Tested-by: Numan Siddique > --- > v1->v2: Added missing OVS_VSWITCHD_STOP and adjust whitespace on test case This fixes an important bug that is really hard to track down on real deployments. Testing so far didn't show any significant issues with this approach, so I applied the patch and backported down to 2.13. Thanks! Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ovs-tcpdump: Fix error when stopping ovs-tcpdump.
On 6/3/22 17:51, Mike Pattrick wrote: > On Fri, May 20, 2022 at 5:55 AM Han Ding wrote: >> >> >> Sometimes we need to dump packets on more than two interfaces in a bridge >> at the same time. Then when we stop dumping in order, ovs-tcpdump print >> traceback and fail to delete mirror interface for some interface. >> >> For example: >> br-int has two interface tap1 and br-int. We use ovs-tcpdump dump tap1 first >> and dump br-int next. Then stopping tap1 ovs-tcpdump first, and stopping >> br-int second. When we stop ovs-tcpdump for br-int, the screen show the error >> like this: >> __main__.OVSDBException: Unable to delete Mirror m_br-int >> >> Signed-off-by: Han Ding > > I can confirm that this issue does occur, and this patch fixes it. > > Acked-by: Mike Pattrick > > Small nit but instead of closing and reopening the OVSDB object, maybe > we should just atexit.register the call to ovsdb.close_idl. > > Cheers, > M Thanks, Han and Mike! The fully correct solution, I guess, is to periodically call idl.run() to update the current database representation, but it's not clear, how to implement that. And if we're not calling idl.run() server will disconnect us at some point anyway because of inactivity. So, closing the connection before starting tcpdump does make sense. Applied to master and branch 2.17. Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3 1/2] conntrack: remove the IP iterations in nat_get_unique_l4
On 5/5/22 16:00, we...@chinatelecom.cn wrote: > From: wenxu > > Removing the IP iterations, and just picking the IP address > with the hash base on the least-used src-ip/dst-ip/proto triple. > > Signed-off-by: wenxu > Acked-by: Paolo Valerio > --- Thanks, wenxu and Paolo! I applied both patches. Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 1/3] netdev-dpdk: negotiate delivery of per-packet Rx metadata
Hi Eli, Thanks for reviewing the series. On Wed, 1 Jun 2022, Eli Britstein wrote: "TUNNEL_ID" is a bad name, but that's how dpdk called it. There was a discussion about having this knowledge in OVS so we can avoid calling rte_flow_get_restore_info(). How else it is used? As far as I know, avoiding rte_flow_get_restore_info() demands that tunnel ID be retrieved from mbufs themselves, using a dynamic field. That proposal does not discard the need to deliver tunnel data from the NIC to the PMD, and the patch that we discuss is still required. -Original Message- From: Ivan Malov Sent: Monday, May 30, 2022 5:16 PM To: d...@openvswitch.org Cc: Andrew Rybchenko ; Ilya Maximets ; Ori Kam ; Eli Britstein ; NBU-Contact-Thomas Monjalon (EXTERNAL) ; Stephen Hemminger ; David Marchand ; Gaetan Rivet ; Maxime Coquelin Subject: [PATCH 1/3] netdev-dpdk: negotiate delivery of per-packet Rx metadata External email: Use caution opening links or attachments This may be required by some PMDs in offload scenarios. Signed-off-by: Ivan Malov Acked-by: Andrew Rybchenko --- lib/netdev-dpdk.c | 44 1 file changed, 44 insertions(+) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index f9535bfb4..45e5d26d2 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -1085,6 +1085,38 @@ dpdk_eth_flow_ctrl_setup(struct netdev_dpdk *dev) OVS_REQUIRES(dev->mutex) } } +#ifdef ALLOW_EXPERIMENTAL_API +static void +dpdk_eth_dev_init_rx_metadata(struct netdev_dpdk *dev) { +uint64_t rx_metadata = 0; +int ret; + +/* For the fallback offload (non-"transfer" rules) */ +rx_metadata |= RTE_ETH_RX_METADATA_USER_MARK; +/* For the full offload ("transfer" rules) */ +rx_metadata |= RTE_ETH_RX_METADATA_TUNNEL_ID; + +ret = rte_eth_rx_metadata_negotiate(dev->port_id, &rx_metadata); +if (ret == 0) { +if (!(rx_metadata & RTE_ETH_RX_METADATA_USER_MARK)) { +VLOG_DBG("The NIC will not provide per-packet USER_MARK on port " + DPDK_PORT_ID_FMT, dev->port_id); +} +if (!(rx_metadata & RTE_ETH_RX_METADATA_TUNNEL_ID)) { +VLOG_DBG("The NIC will not provide per-packet TUNNEL_ID on port " + DPDK_PORT_ID_FMT, dev->port_id); +} +} else if (ret == -ENOTSUP) { +VLOG_DBG("Rx metadata negotiate procedure is not supported on port " + DPDK_PORT_ID_FMT, dev->port_id); +} else { +VLOG_WARN("Cannot negotiate Rx metadata on port " + DPDK_PORT_ID_FMT, dev->port_id); +} +} +#endif /* ALLOW_EXPERIMENTAL_API */ + static int dpdk_eth_dev_init(struct netdev_dpdk *dev) OVS_REQUIRES(dev->mutex) @@ -1099,6 +1131,18 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) RTE_ETH_RX_OFFLOAD_TCP_CKSUM | RTE_ETH_RX_OFFLOAD_IPV4_CKSUM; +#ifdef ALLOW_EXPERIMENTAL_API +/* + * Full tunnel offload requires that tunnel ID metadata be + * delivered with "miss" packets from the hardware to the + * PMD. The same goes for megaflow mark metadata which is + * used in MARK + RSS offload scenario. + * + * Request delivery of such metadata. + */ +dpdk_eth_dev_init_rx_metadata(dev); +#endif /* ALLOW_EXPERIMENTAL_API */ + rte_eth_dev_info_get(dev->port_id, &info); if (strstr(info.driver_name, "vf") != NULL) { -- 2.30.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH RFC ovn] nb: Remove possibility of disabling logical datapath groups.
In large scale scenarios this option hugely reduces the size of the Southbound database positively affecting end to end performance. In such scenarios there's no real reason to ever disable datapath groups. In lower scale scenarios any potential overhead due to logical datapath groups is, very likely, negligible. Aside from potential scalability concerns, the NB.NB_Global.options:use_logical_dp_group knob was kept until now to ensure that in case of a bug in the logical datapath groups code a CMS may turn it off and fall back to the mode in which logical flows are not grouped together. As far as I know, this has never happened until now. Moreover, datpath_groups are enabled by default since v21.09.0 (4 stable releases ago), via 90daa7ce18dc ("northd: Enable logical dp groups by default."). >From a testing perspective removing this knob will halve the CI matrix. This is desirable, especially in the context of more tests being added, e.g.: https://patchwork.ozlabs.org/project/ovn/patch/20220531093318.2270409-1-mh...@redhat.com/ Signed-off-by: Dumitru Ceara --- NEWS| 1 + northd/northd.c | 77 +++-- ovn-nb.xml | 19 +++--- tests/ovn-macros.at | 23 ++-- tests/ovn-northd.at | 92 - tests/ovs-macros.at | 4 +- 6 files changed, 41 insertions(+), 175 deletions(-) diff --git a/NEWS b/NEWS index e015ae8e7..20b4c5d91 100644 --- a/NEWS +++ b/NEWS @@ -4,6 +4,7 @@ Post v22.06.0 "ovn-encap-df_default" to enable or disable tunnel DF flag. - Add option "localnet_learn_fdb" to LSP that will allow localnet ports to learn MAC addresses and store them in FDB table. + - Removed possibility of disabling logical datapath groups. OVN v22.06.0 - XX XXX -- diff --git a/northd/northd.c b/northd/northd.c index 0207f6ce1..a97a321cd 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -4813,10 +4813,6 @@ ovn_lflow_equal(const struct ovn_lflow *a, const struct ovn_datapath *od, && nullable_string_is_equal(a->ctrl_meter, ctrl_meter)); } -/* If this option is 'true' northd will combine logical flows that differ by - * logical datapath only by creating a datapath group. */ -static bool use_logical_dp_groups = false; - enum { STATE_NULL, /* parallelization is off */ STATE_INIT_HASH_SIZES,/* parallelization is on; hashes sizing needed */ @@ -4841,8 +4837,7 @@ ovn_lflow_init(struct ovn_lflow *lflow, struct ovn_datapath *od, lflow->ctrl_meter = ctrl_meter; lflow->dpg = NULL; lflow->where = where; -if ((parallelization_state != STATE_NULL) -&& use_logical_dp_groups) { +if (parallelization_state != STATE_NULL) { ovs_mutex_init(&lflow->odg_lock); } } @@ -4852,7 +4847,7 @@ ovn_dp_group_add_with_reference(struct ovn_lflow *lflow_ref, struct ovn_datapath *od) OVS_NO_THREAD_SAFETY_ANALYSIS { -if (!use_logical_dp_groups || !lflow_ref) { +if (!lflow_ref) { return false; } @@ -4931,13 +4926,11 @@ do_ovn_lflow_add(struct hmap *lflow_map, struct ovn_datapath *od, struct ovn_lflow *old_lflow; struct ovn_lflow *lflow; -if (use_logical_dp_groups) { -old_lflow = ovn_lflow_find(lflow_map, NULL, stage, priority, match, - actions, ctrl_meter, hash); -if (old_lflow) { -ovn_dp_group_add_with_reference(old_lflow, od); -return old_lflow; -} +old_lflow = ovn_lflow_find(lflow_map, NULL, stage, priority, match, + actions, ctrl_meter, hash); +if (old_lflow) { +ovn_dp_group_add_with_reference(old_lflow, od); +return old_lflow; } lflow = xmalloc(sizeof *lflow); @@ -4993,8 +4986,7 @@ ovn_lflow_add_at_with_hash(struct hmap *lflow_map, struct ovn_datapath *od, struct ovn_lflow *lflow; ovs_assert(ovn_stage_to_datapath_type(stage) == ovn_datapath_get_type(od)); -if (use_logical_dp_groups -&& (parallelization_state == STATE_USE_PARALLELIZATION)) { +if (parallelization_state == STATE_USE_PARALLELIZATION) { lflow = do_ovn_lflow_add_pd(lflow_map, od, hash, stage, priority, match, actions, io_port, stage_hint, where, ctrl_meter); @@ -5080,8 +5072,7 @@ static void ovn_lflow_destroy(struct hmap *lflows, struct ovn_lflow *lflow) { if (lflow) { -if ((parallelization_state != STATE_NULL) -&& use_logical_dp_groups) { +if (parallelization_state != STATE_NULL) { ovs_mutex_destroy(&lflow->odg_lock); } if (lflows) { @@ -13883,24 +13874,15 @@ build_lswitch_and_lrouter_flows(const struct hmap *datapaths, int index; lsiv = xcalloc(sizeof(*lsiv), build_lflows_pool->size); -i
Re: [ovs-dev] [PATCH 1/1] packets: Re-calculate IPv6 checksum only for last frags upon modify.
> -Original Message- > From: Ilya Maximets > Sent: Tuesday, June 7, 2022 1:14 PM > To: Salem Sol ; d...@openvswitch.org > Cc: Eli Britstein ; Michael Pattrick ; > i.maxim...@ovn.org > Subject: Re: [ovs-dev] [PATCH 1/1] packets: Re-calculate IPv6 checksum only > for last frags upon modify. > > External email: Use caution opening links or attachments > > > On 6/6/22 12:12, Salem Sol via dev wrote: > > In case of modifying an IPv6 packet src/dst address the checksum > > should be recalculated only for the last frag. Currently it's done for > > all frags, leading to incorrect reassembled packet checksum. > > Fix it by adding a new flag to recalculate the checksum only for last frag. > > Hi. Thanks for working on this! > > There is definitely a problem with checksum calculation, but I don't > understand the logic of the fix. We're talking about re-calculation of the L4 > checksum, but the *last* fragment should not contain the L4 header. So, > we're just corrupting the data of the last fragment. If we want to > re-calculate > the checksum, we should do that for the *first* fragment instead, because > it's the only fragment that will likely contain the L4 header. Or am I > missing > something? > > BTW, it looks like we have the same issue for ipv4 packets. We should > update checksum only for the first fragment. This can be a separate fix > though. > > Some more comments inline. > > Best regards, Ilya Maximets. > Thanks for the catch, you're right, my bad, in my testing I used only one extension header and the original patchset seemed to resolve the issue. New implementation can be found in V3. > > > > Fixes: bc7a5acdff08 ("datapath: add ipv6 'set' action") > > Signed-off-by: Salem Sol > > Acked-by: Mike Pattrick > > --- > > lib/packets.c | 20 ++-- > > tests/system-traffic.at | 35 +++ > > 2 files changed, 53 insertions(+), 2 deletions(-) > > > > diff --git a/lib/packets.c b/lib/packets.c index d0fba8176..b7aef41a5 > > 100644 > > --- a/lib/packets.c > > +++ b/lib/packets.c > > @@ -1148,6 +1148,18 @@ packet_set_ipv4_addr(struct dp_packet > *packet, > > put_16aligned_be32(addr, new_addr); } > > > > +static bool > > +packet_is_last_ipv6_frag(struct dp_packet *packet) { > > +const struct ovs_16aligned_ip6_frag *frag_hdr; > > +uint8_t *data = dp_packet_l3(packet); > > + > > +data += sizeof (struct ovs_16aligned_ip6_hdr); > > +frag_hdr = ALIGNED_CAST(struct ovs_16aligned_ip6_frag *, data); > > +return (frag_hdr->ip6f_offlg & IP6F_OFF_MASK) && > > + !(frag_hdr->ip6f_offlg & IP6F_MORE_FRAG); } > > + > > /* Returns true, if packet contains at least one routing header where > > * segements_left > 0. > > * > > @@ -1334,17 +1346,21 @@ packet_set_ipv6(struct dp_packet *packet, > > const struct in6_addr *src, { > > struct ovs_16aligned_ip6_hdr *nh = dp_packet_l3(packet); > > uint8_t proto = 0; > > +bool recalc_csum; > > bool rh_present; > > > > rh_present = packet_rh_present(packet, &proto); > > +recalc_csum = nh->ip6_nxt == IPPROTO_FRAGMENT ? > > +packet_is_last_ipv6_frag(packet) : true; > > The fragmentation header is usually the last extension header, not the first. > This check will work for the simple case with only one extension header, but > it will not work in the general case. We should search for the fragmentation > header the same way as we look for the routing header in > packet_rh_present(). > We should, probably, modify the packet_rh_present() function to also say if > the packet is fragmented / is it the first fragment. > Thanks for the comment, I followed your suggestion and changed packet_rh_present() to update if it's the first frag in V3. > > > > if (memcmp(&nh->ip6_src, src, sizeof(ovs_be32[4]))) { > > -packet_set_ipv6_addr(packet, proto, nh->ip6_src.be32, src, true); > > +packet_set_ipv6_addr(packet, proto, nh->ip6_src.be32, > > + src, recalc_csum); > > } > > > > if (memcmp(&nh->ip6_dst, dst, sizeof(ovs_be32[4]))) { > > packet_set_ipv6_addr(packet, proto, nh->ip6_dst.be32, dst, > > - !rh_present); > > + !rh_present && recalc_csum); > > } > > > > packet_set_ipv6_tc(&nh->ip6_flow, key_tc); diff --git > > a/tests/system-traffic.at b/tests/system-traffic.at index > > 239105e89..16ba42d84 100644 > > --- a/tests/system-traffic.at > > +++ b/tests/system-traffic.at > > @@ -192,6 +192,41 @@ NS_CHECK_EXEC([at_ns0], [ping6 -s 3200 -q -c 3 -i > > 0.3 -w 2 fc00:1::2 | FORMAT_PI OVS_TRAFFIC_VSWITCHD_STOP > AT_CLEANUP > > > > +AT_SETUP([datapath - ping6 between two ports with header modify]) > > +OVS_TRAFFIC_VSWITCHD_START() > > + > > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) > > + > > +ADD_NAMESPACES(at_ns0, at_ns1) > > + > > +ADD_VETH(p0, at_ns0, br0, "fc00::1/96", e4:11:22:33:44:55) > > +ADD_VETH(p1, at_n
[ovs-dev] [PATCH v3 1/1] packets: Re-calculate IPv6 checksum only for first frag upon modify.
In case of modifying an IPv6 packet src/dst address the L4 checksum should be recalculated only for the first frag. Currently it's done for all frags, leading to incorrect reassembled packet checksum. Fix it by adding a new flag to recalculate the checksum only for last frag. Fixes: bc7a5acdff08 ("datapath: add ipv6 'set' action") Signed-off-by: Salem Sol --- lib/packets.c | 12 tests/system-traffic.at | 35 +++ 2 files changed, 43 insertions(+), 4 deletions(-) Revision history: - v3 -> v2: Changed the checksum calculation to be on the first frag - v2 -> v1: Removed redundant line - Acked-by: Mike Pattrick . Link to passing github actions can be found in [1]. [1]: https://github.com/salemsol/ovs/actions/runs/2455292688 diff --git a/lib/packets.c b/lib/packets.c index d0fba8176..8f5976bbd 100644 --- a/lib/packets.c +++ b/lib/packets.c @@ -1153,7 +1153,7 @@ packet_set_ipv4_addr(struct dp_packet *packet, * * This function assumes that L3 and L4 offsets are set in the packet. */ static bool -packet_rh_present(struct dp_packet *packet, uint8_t *nexthdr) +packet_rh_present(struct dp_packet *packet, uint8_t *nexthdr, bool *first_frag) { const struct ovs_16aligned_ip6_hdr *nh; size_t len; @@ -1203,6 +1203,8 @@ packet_rh_present(struct dp_packet *packet, uint8_t *nexthdr) const struct ovs_16aligned_ip6_frag *frag_hdr = ALIGNED_CAST(struct ovs_16aligned_ip6_frag *, data); +*first_frag = !(frag_hdr->ip6f_offlg & IP6F_OFF_MASK) && + (frag_hdr->ip6f_offlg & IP6F_MORE_FRAG); *nexthdr = frag_hdr->ip6f_nxt; len = sizeof *frag_hdr; } else if (*nexthdr == IPPROTO_ROUTING) { @@ -1333,18 +1335,20 @@ packet_set_ipv6(struct dp_packet *packet, const struct in6_addr *src, uint8_t key_hl) { struct ovs_16aligned_ip6_hdr *nh = dp_packet_l3(packet); +bool recalc_csum = true; uint8_t proto = 0; bool rh_present; -rh_present = packet_rh_present(packet, &proto); +rh_present = packet_rh_present(packet, &proto, &recalc_csum); if (memcmp(&nh->ip6_src, src, sizeof(ovs_be32[4]))) { -packet_set_ipv6_addr(packet, proto, nh->ip6_src.be32, src, true); +packet_set_ipv6_addr(packet, proto, nh->ip6_src.be32, + src, recalc_csum); } if (memcmp(&nh->ip6_dst, dst, sizeof(ovs_be32[4]))) { packet_set_ipv6_addr(packet, proto, nh->ip6_dst.be32, dst, - !rh_present); + !rh_present && recalc_csum); } packet_set_ipv6_tc(&nh->ip6_flow, key_tc); diff --git a/tests/system-traffic.at b/tests/system-traffic.at index 239105e89..588e1b2e2 100644 --- a/tests/system-traffic.at +++ b/tests/system-traffic.at @@ -192,6 +192,41 @@ NS_CHECK_EXEC([at_ns0], [ping6 -s 3200 -q -c 3 -i 0.3 -w 2 fc00:1::2 | FORMAT_PI OVS_TRAFFIC_VSWITCHD_STOP AT_CLEANUP +AT_SETUP([datapath - ping6 between two ports with header modify]) +OVS_TRAFFIC_VSWITCHD_START() + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH(p0, at_ns0, br0, "fc00::1/96", e4:11:22:33:44:55) +ADD_VETH(p1, at_ns1, br0, "fc00::2/96", e4:11:22:33:44:54) +NS_CHECK_EXEC([at_ns0], [ip -6 neigh add fc00::3 lladdr e4:11:22:33:44:54 dev p0]) +NS_CHECK_EXEC([at_ns0], [ip -6 neigh add fc00::2 lladdr e4:11:22:33:44:54 dev p0]) +NS_CHECK_EXEC([at_ns0], [ip -6 neigh add fc00::1 lladdr e4:11:22:33:44:55 dev p0]) + +dnl Linux seems to take a little time to get its IPv6 stack in order. Without +dnl waiting, we get occasional failures due to the following error: +dnl "connect: Cannot assign requested address" +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2]) + +AT_CHECK([ovs-ofctl del-flows -OOpenFlow15 br0]) +AT_CHECK([ovs-ofctl add-flow -OOpenFlow15 br0 in_port=ovs-p0,ipv6,ipv6_dst=fc00::3,ipv6_src=fc00::1,actions=set_field:fc00::2-\>ipv6_dst,ovs-p1]) +AT_CHECK([ovs-ofctl add-flow -OOpenFlow15 br0 in_port=ovs-p1,ipv6,ipv6_dst=fc00::1,ipv6_src=fc00::2,actions=set_field:fc00::3-\>ipv6_src,ovs-p0]) + +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::3 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([at_ns0], [ping6 -s 1600 -q -c 3 -i 0.3 -w 2 fc00::3 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([at_ns0], [ping6 -s 3200 -q -c 3 -i 0.3 -w 2 fc00::3 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + AT_SETUP([datapath - ping over bond]) OVS_TRAFFIC_VSWITCHD_START() -- 2.21.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3 ovn] northd: add the capability to inherit logical routers lbs on logical switches
On Mon, Jun 6, 2022 at 6:44 PM Lorenzo Bianconi wrote: > > Add the capability to automatically deploy a load-balancer on each > logical-switch connected to a logical router where the load-balancer > has been installed by the CMS. This patch allow to overcome the > distributed gw router scenario limitation where a load-balancer must be > installed on each datapath to properly reach the load-balancer. > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2043543 > Signed-off-by: Lorenzo Bianconi Thanks for the rebase. Acked-by: Numan Siddique Numan > --- > Changes since v2: > - rebase on top of ovn master > > Changes since v1: > - rebase on top of ovn master > - add NEWS entry > - improve selftests > --- > NEWS| 5 +++ > northd/northd.c | 56 + > northd/ovn-northd.8.xml | 8 + > tests/ovn-northd.at | 80 + > 4 files changed, 149 insertions(+) > > diff --git a/NEWS b/NEWS > index e015ae8e7..8b4f91553 100644 > --- a/NEWS > +++ b/NEWS > @@ -4,6 +4,11 @@ Post v22.06.0 > "ovn-encap-df_default" to enable or disable tunnel DF flag. >- Add option "localnet_learn_fdb" to LSP that will allow localnet > ports to learn MAC addresses and store them in FDB table. > + - northd: introduce the capability to automatically deploy a load-balancer > +on each logical-switch connected to a logical router where the > +load-balancer has been installed by the CMS. In order to enable the > +feature the CMS has to set install_ls_lb_from_router to true in option > +column of NB_Global table. > > OVN v22.06.0 - XX XXX > -- > diff --git a/northd/northd.c b/northd/northd.c > index 0207f6ce1..a95a5148e 100644 > --- a/northd/northd.c > +++ b/northd/northd.c > @@ -63,6 +63,8 @@ static bool lflow_hash_lock_initialized = false; > > static bool check_lsp_is_up; > > +static bool install_ls_lb_from_router; > + > /* MAC allocated for service monitor usage. Just one mac is allocated > * for this purpose and ovn-controller's on each chassis will make use > * of this mac when sending out the packets to monitor the services > @@ -4140,6 +4142,55 @@ build_lrouter_lbs_reachable_ips(struct hmap > *datapaths, struct hmap *lbs) > } > } > > +static void > +build_lswitch_lbs_from_lrouter(struct hmap *datapaths, struct hmap *lbs) > +{ > +if (!install_ls_lb_from_router) { > +return; > +} > + > +struct ovn_datapath *od; > +HMAP_FOR_EACH (od, key_node, datapaths) { > +if (!od->nbs) { > +continue; > +} > + > +struct ovn_port *op; > +LIST_FOR_EACH (op, dp_node, &od->port_list) { > +if (!lsp_is_router(op->nbsp)) { > +continue; > +} > +if (!op->peer) { > +continue; > +} > + > +struct ovn_datapath *peer_od = op->peer->od; > +for (size_t i = 0; i < peer_od->nbr->n_load_balancer; i++) { > +bool installed = false; > +const struct uuid *lb_uuid = > +&peer_od->nbr->load_balancer[i]->header_.uuid; > +struct ovn_northd_lb *lb = ovn_northd_lb_find(lbs, lb_uuid); > +if (!lb) { > +continue; > +} > + > +for (size_t j = 0; j < lb->n_nb_ls; j++) { > + if (lb->nb_ls[j] == od) { > + installed = true; > + break; > + } > +} > +if (!installed) { > +ovn_northd_lb_add_ls(lb, od); > +} > +if (lb->nlb) { > +od->has_lb_vip |= lb_has_vip(lb->nlb); > +} > +} > +} > +} > +} > + > /* This must be called after all ports have been processed, i.e., after > * build_ports() because the reachability check requires the router ports > * networks to have been parsed. > @@ -4152,6 +4203,7 @@ build_lb_port_related_data(struct hmap *datapaths, > struct hmap *ports, > build_lrouter_lbs_check(datapaths); > build_lrouter_lbs_reachable_ips(datapaths, lbs); > build_lb_svcs(input_data, ovnsb_txn, ports, lbs); > +build_lswitch_lbs_from_lrouter(datapaths, lbs); > } > > /* Syncs relevant load balancers (applied to logical switches) to the > @@ -15378,6 +15430,10 @@ ovnnb_db_run(struct northd_input *input_data, > "ignore_lsp_down", true); > default_acl_drop = smap_get_bool(&nb->options, "default_acl_drop", > false); > > +install_ls_lb_from_router = smap_get_bool(&nb->options, > + "install_ls_lb_from_router", > + false); > + > build_chassis_features(input_data, &data->features); > build_datapaths(input_data, ovnsb_txn, &data->datap
[ovs-dev] [PATCH v2] dpcls: add unlisted alias for subtable lookup command
This patch adds the old name "subtable-lookup-prio-get" as an unlisted command, to restore a consistency between OVS releases for testing scripts. Fixes 738c76a503f4 ("dpcls: Change info-get function to fetch dpcls usage stats.") Suggested-by: Eelco Chaudron Suggested-by: Ilya Maximets Signed-off-by: Harry van Haaren --- v2: - Based on discussion and push back on v1 patch here, this is a v2 implementing the suggested "alias" method. Suggested by tags added for both Eelco (for alias concept) and Ilya (for unlisted concept). https://patchwork.ozlabs.org/project/openvswitch/patch/20220525141014.661907-1-harry.van.haa...@intel.com/ --- lib/dpif-netdev.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index ff57b3961..f46b9fe18 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -1609,6 +1609,9 @@ dpif_netdev_init(void) unixctl_command_register("dpif-netdev/subtable-lookup-info-get", "", 0, 0, dpif_netdev_subtable_lookup_get, NULL); +unixctl_command_register("dpif-netdev/subtable-lookup-prio-get", NULL, + 0, 0, dpif_netdev_subtable_lookup_get, + NULL); unixctl_command_register("dpif-netdev/dpif-impl-set", "dpif_implementation_name", 1, 1, dpif_netdev_impl_set, -- 2.32.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 1/1] packets: Re-calculate IPv6 checksum only for last frags upon modify.
On 6/6/22 12:12, Salem Sol via dev wrote: > In case of modifying an IPv6 packet src/dst address the checksum should be > recalculated only for the last frag. Currently it's done for all frags, > leading to incorrect reassembled packet checksum. > Fix it by adding a new flag to recalculate the checksum only for last frag. Hi. Thanks for working on this! There is definitely a problem with checksum calculation, but I don't understand the logic of the fix. We're talking about re-calculation of the L4 checksum, but the *last* fragment should not contain the L4 header. So, we're just corrupting the data of the last fragment. If we want to re-calculate the checksum, we should do that for the *first* fragment instead, because it's the only fragment that will likely contain the L4 header. Or am I missing something? BTW, it looks like we have the same issue for ipv4 packets. We should update checksum only for the first fragment. This can be a separate fix though. Some more comments inline. Best regards, Ilya Maximets. > > Fixes: bc7a5acdff08 ("datapath: add ipv6 'set' action") > Signed-off-by: Salem Sol > Acked-by: Mike Pattrick > --- > lib/packets.c | 20 ++-- > tests/system-traffic.at | 35 +++ > 2 files changed, 53 insertions(+), 2 deletions(-) > > diff --git a/lib/packets.c b/lib/packets.c > index d0fba8176..b7aef41a5 100644 > --- a/lib/packets.c > +++ b/lib/packets.c > @@ -1148,6 +1148,18 @@ packet_set_ipv4_addr(struct dp_packet *packet, > put_16aligned_be32(addr, new_addr); > } > > +static bool > +packet_is_last_ipv6_frag(struct dp_packet *packet) > +{ > +const struct ovs_16aligned_ip6_frag *frag_hdr; > +uint8_t *data = dp_packet_l3(packet); > + > +data += sizeof (struct ovs_16aligned_ip6_hdr); > +frag_hdr = ALIGNED_CAST(struct ovs_16aligned_ip6_frag *, data); > +return (frag_hdr->ip6f_offlg & IP6F_OFF_MASK) && > + !(frag_hdr->ip6f_offlg & IP6F_MORE_FRAG); > +} > + > /* Returns true, if packet contains at least one routing header where > * segements_left > 0. > * > @@ -1334,17 +1346,21 @@ packet_set_ipv6(struct dp_packet *packet, const > struct in6_addr *src, > { > struct ovs_16aligned_ip6_hdr *nh = dp_packet_l3(packet); > uint8_t proto = 0; > +bool recalc_csum; > bool rh_present; > > rh_present = packet_rh_present(packet, &proto); > +recalc_csum = nh->ip6_nxt == IPPROTO_FRAGMENT ? > +packet_is_last_ipv6_frag(packet) : true; The fragmentation header is usually the last extension header, not the first. This check will work for the simple case with only one extension header, but it will not work in the general case. We should search for the fragmentation header the same way as we look for the routing header in packet_rh_present(). We should, probably, modify the packet_rh_present() function to also say if the packet is fragmented / is it the first fragment. > > if (memcmp(&nh->ip6_src, src, sizeof(ovs_be32[4]))) { > -packet_set_ipv6_addr(packet, proto, nh->ip6_src.be32, src, true); > +packet_set_ipv6_addr(packet, proto, nh->ip6_src.be32, > + src, recalc_csum); > } > > if (memcmp(&nh->ip6_dst, dst, sizeof(ovs_be32[4]))) { > packet_set_ipv6_addr(packet, proto, nh->ip6_dst.be32, dst, > - !rh_present); > + !rh_present && recalc_csum); > } > > packet_set_ipv6_tc(&nh->ip6_flow, key_tc); > diff --git a/tests/system-traffic.at b/tests/system-traffic.at > index 239105e89..16ba42d84 100644 > --- a/tests/system-traffic.at > +++ b/tests/system-traffic.at > @@ -192,6 +192,41 @@ NS_CHECK_EXEC([at_ns0], [ping6 -s 3200 -q -c 3 -i 0.3 -w > 2 fc00:1::2 | FORMAT_PI > OVS_TRAFFIC_VSWITCHD_STOP > AT_CLEANUP > > +AT_SETUP([datapath - ping6 between two ports with header modify]) > +OVS_TRAFFIC_VSWITCHD_START() > + > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) > + > +ADD_NAMESPACES(at_ns0, at_ns1) > + > +ADD_VETH(p0, at_ns0, br0, "fc00::1/96", e4:11:22:33:44:55) > +ADD_VETH(p1, at_ns1, br0, "fc00::2/96", e4:11:22:33:44:54) > +NS_CHECK_EXEC([at_ns0], [ip -6 neigh add fc00::3 lladdr e4:11:22:33:44:54 > dev p0]) > +NS_CHECK_EXEC([at_ns0], [ip -6 neigh add fc00::2 lladdr e4:11:22:33:44:54 > dev p0]) > +NS_CHECK_EXEC([at_ns0], [ip -6 neigh add fc00::1 lladdr e4:11:22:33:44:55 > dev p0]) > + > +dnl Linux seems to take a little time to get its IPv6 stack in order. Without > +dnl waiting, we get occasional failures due to the following error: > +dnl "connect: Cannot assign requested address" > +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2]) > + > +AT_CHECK([ovs-ofctl del-flows -OOpenFlow15 br0], [], [stdout], [stderr]) > +AT_CHECK([ovs-ofctl add-flow -OOpenFlow15 br0 > in_port=ovs-p0,ipv6,ipv6_dst=fc00::3,ipv6_src=fc00::1,actions=set_field:fc00::2-\>ipv6_dst,ovs-p1], > [], [stdout], [stderr]) > +AT_CHECK([ovs-ofctl a
Re: [ovs-dev] User space connection tracking benchmarks
Paolo Valerio, Jun 05, 2022 at 19:37: > Just a note that may be useful. > After some tests, I noticed that establishing e.g. two TCP connections, > and leaving the first one idle after 3whs, once the second connection > expires (after moving to TIME_WAIT as a result of termination), the > second doesn't get evicted until any event gets scheduled for the first. > > ovs-appctl dpctl/dump-conntrack -s > tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=9090,dport=8080),reply=(src=10.1.1.2,dst=10.1.1.1,sport=8080,dport=9090),zone=1,timeout=84576,protoinfo=(state=ESTABLISHED) > tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=9091,dport=8080),reply=(src=10.1.1.2,dst=10.1.1.1,sport=8080,dport=9091),zone=1,timeout=0,protoinfo=(state=TIME_WAIT) > > This may be somewhat related to your results as during the > test, the number of connections may reach the limit so apparently reducing > the performances. Indeed, there was an issue in my test procedure. Due to the way T-Rex generates connections, it is easy to fill the conntrack table after a few iterations, making the test results inconsistent. Also, the flows which I had configured were not correct. There was an extraneous action=NORMAL flow at the end. When the conntrack table is full and a new packet cannot be tracked, it is marked as +trk+inv and not dropped. This behaviour is specific to the userspace datapath. The linux kernel datapath seems to drop the packet when it cannot be added to connection tracking. Gaƫtan's series (v4) seems less resilient to the conntrack table being full, especially when there is more than one PMD core. I have changed the t-rex script to allow running arbitrary commands in between traffic iterations. This is leveraged to flush the conntrack table and run each iteration in the same conditions. https://github.com/cisco-system-traffic-generator/trex-core/blob/v2.98/scripts/cps_ndr.py To avoid filling the conntrack table, the max size was increased to 50M. The DUT configuration can be summarized as the following: ovs-vsctl set open_vswitch . other_config:dpdk-init=true ovs-vsctl set open_vswitch . other_config:pmd-cpu-mask="0x15554" ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev ovs-vsctl add-port br0 pf0 -- set Interface pf0 type=dpdk \ options:dpdk-devargs=:3b:00.0 options:n_rxq=4 options:n_rxq_desc=4096 ovs-vsctl add-port br0 pf1 -- set Interface pf1 type=dpdk \ options:dpdk-devargs=:3b:00.1 options:n_rxq=4 options:n_rxq_desc=4096 ovs-appctl dpctl/ct-set-maxconns 5000 ovs-ofctl add-flow br0 "table=0,priority=10,ip,ct_state=-trk,actions=ct(table=0)" ovs-ofctl add-flow br0 "table=0,priority=10,ip,ct_state=+trk+new,actions=ct(commit),NORMAL" ovs-ofctl add-flow br0 "table=0,priority=10,ip,ct_state=+trk+est,actions=NORMAL" ovs-ofctl add-flow br0 "table=0,priority=0,actions=drop" Short Lived Connections --- ./cps_ndr.py --sample-time 10 --max-iterations 8 --error-threshold 0.01 \ --reset-command "ssh $dut ovs-appctl dpctl/flush-conntrack" \ --udp-percent 1 --num-messages 1 --message-size 20 --server-wait 0 \ --min-cps 10k --max-cps 600k == === == == === Series Num. Flows CPSPPSBPS == === == == === Baseline 40.1K 79.3K/s556Kp/s347Mb/s Gaetan v1 60.5K 121K/s 837Kp/s522Mb/s Gaetan v4 61.4K 122K/s 850Kp/s530Mb/s Paolo 377K756K/s 5.3Mp/s3.3Gb/s == === == == === Even after fixing the test procedure, Paolo's series still performs a lot better with short lived connections. Long Lived Connections -- ./cps_ndr.py --sample-time 30 --max-iterations 8 --error-threshold 0.01 \ --reset-command "ssh $dut ovs-appctl dpctl/flush-conntrack" \ --udp-percent 1 --num-messages 500 --message-size 20 --server-wait 50 \ --min-cps 100 --max-cps 10k == === == == === Series Num. Flows CPSPPSBPS == === == == === Baseline 17.4K 504/s 633Kp/s422Mb/s Gaetan v1 80.4K 3.1K/s 4.6Mp/s3.0Gb/s Gaetan v4 139K5.4K/s 8.2Mp/s5.4Gb/s Paolo 132K5.2K/s 7.7Mp/s5.2Gb/s == === == == === Thanks to Paolo for his help on this second round of tests. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev