Re: [ovs-dev] [PATCH] datapath: use after free flow mask on destroy flow table
Hi,TongHao and Ilya this patch was reported at https://mail.openvswitch.org/pipermail/ovs-discuss/2020-August/050489.html and it was fixed by patch 1f3a090("net: openvswitch: introduce common code for flushing flows") Best regards, Wentao Jia Hi,TongHao and Ilya I guess this bug was fixed by patch 1f3a090("net: openvswitch: introduce common code for flushing flows"),but this patch is not for fix bug use-after-free flow mask >> On 3/24/22 05:17, Wentao Jia wrote: >> > >> > >> > on destroy flow table >> instance, referenced flow mask may be released >> > too. fuction >> ovs_flow_tbl_destroy(), release flow mask first and then >> > destroy flow >> table instance. this will trigger kernel panic on detroy >> > datapath >> > >> >> > >> > [ 377.647756] kernel BUG at .../datapath/linux/flow_table.c:272! >> >> > [ 377.653794] invalid opcode: [#1] SMP PTI >> > [ 377.666827] RIP: >> 0010:table_instance_flow_free.isra.7+0x148/0x150 >> > [ 377.711465] Call >> Trace: >> > [ 377.715238] >> > [ 377.718964] >> table_instance_destroy+0xbe/0x160 [openvswitch] >> > [ 377.722793] >> destroy_dp_rcu+0x12/0x40 [openvswitch] >> > [ 377.726651] >> rcu_process_callbacks+0x297/0x460 >> > [ 377.736795] __do_softirq+0xe3/0x30a >> >> > [ 377.740654] ? ktime_get+0x36/0xa0 >> > [ 377.744490] >> irq_exit+0x100/0x110 >> > [ 377.748514] smp_apic_timer_interrupt+0x74/0x140 >> >> > [ 377.752817] apic_timer_interrupt+0xf/0x20 >> > [ 377.758802] >> >> > >> > >> > Fixes: 6d1cf7f3e ("datapath: fix possible memleak on destroy >> >> > flow-table") >for linux upstream, fix tag: >Fixes: 50b0e61b32ee ("net: >> openvswitch: fix possible memleak on >destroy flow-table") >> > >> > >> Signed-off-by: Wentao Jia >> > Signed-off-by: >> Chuanjie Zeng >> > --- >> >> Hi, Wentao Jia. >> Thanks for the patch! >> >> Please, send it to the mainline linux kernel >> ('netdev' mailing list, >> keeping the ovs-dev in CC) using the linux kernel >> process for >> submitting patches. >> >> When it is accepted to the upstream >> kernel, it can be backported to >> the OOT kernel module in OVS repository. >> >> >> Best regards, Ilya Maximets. >> >> > datapath/flow_table.c | 2 +- >> > >> 1 file changed, 1 insertion(+), 1 deletion(-) >> > >> > >> > diff --git >> a/datapath/flow_table.c b/datapath/flow_table.c >> > index >> 650338fb0..b2f4b1108 100644 >> > --- a/datapath/flow_table.c >> > +++ >> b/datapath/flow_table.c >> > @@ -415,8 +415,8 @@ void >> ovs_flow_tbl_destroy(struct flow_table *table) >> > struct table_instance >> *ufid_ti = rcu_dereference_raw(table->ufid_ti); >> > >> > >> > >> free_percpu(table->mask_cache); >> > - >> kfree(rcu_dereference_raw(table->mask_array)); >> > >> table_instance_destroy(table, ti, ufid_ti, false); >> > + >> kfree(rcu_dereference_raw(table->mask_array)); >> > } >> > > >-- >Best >> regards, Tonghao ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] datapath: use after free flow mask on destroy flow table
Hi,TongHao and Ilya I guess this bug was fixed by patch 1f3a090("net: openvswitch: introduce common code for flushing flows"),but this patch is not for fix bug use-after-free flow mask Best regards, Wentao Jia >> On 3/24/22 05:17, Wentao Jia wrote: >> > >> > >> > on destroy flow table instance, referenced flow mask may be released >> > too. fuction ovs_flow_tbl_destroy(), release flow mask first and then >> > destroy flow table instance. this will trigger kernel panic on detroy >> > datapath >> > >> > >> > [ 377.647756] kernel BUG at .../datapath/linux/flow_table.c:272! >> > [ 377.653794] invalid opcode: [#1] SMP PTI >> > [ 377.666827] RIP: 0010:table_instance_flow_free.isra.7+0x148/0x150 >> > [ 377.711465] Call Trace: >> > [ 377.715238] >> > [ 377.718964] table_instance_destroy+0xbe/0x160 [openvswitch] >> > [ 377.722793] destroy_dp_rcu+0x12/0x40 [openvswitch] >> > [ 377.726651] rcu_process_callbacks+0x297/0x460 >> > [ 377.736795] __do_softirq+0xe3/0x30a >> > [ 377.740654] ? ktime_get+0x36/0xa0 >> > [ 377.744490] irq_exit+0x100/0x110 >> > [ 377.748514] smp_apic_timer_interrupt+0x74/0x140 >> > [ 377.752817] apic_timer_interrupt+0xf/0x20 >> > [ 377.758802] >> > >> > >> > Fixes: 6d1cf7f3e ("datapath: fix possible memleak on destroy >> > flow-table") >for linux upstream, fix tag: >Fixes: 50b0e61b32ee ("net: openvswitch: fix possible memleak on >destroy flow-table") >> > >> > Signed-off-by: Wentao Jia >> > Signed-off-by: Chuanjie Zeng >> > --- >> >> Hi, Wentao Jia. Thanks for the patch! >> >> Please, send it to the mainline linux kernel ('netdev' mailing list, >> keeping the ovs-dev in CC) using the linux kernel process for >> submitting patches. >> >> When it is accepted to the upstream kernel, it can be backported to >> the OOT kernel module in OVS repository. >> >> Best regards, Ilya Maximets. >> >> > datapath/flow_table.c | 2 +- >> > 1 file changed, 1 insertion(+), 1 deletion(-) >> > >> > >> > diff --git a/datapath/flow_table.c b/datapath/flow_table.c >> > index 650338fb0..b2f4b1108 100644 >> > --- a/datapath/flow_table.c >> > +++ b/datapath/flow_table.c >> > @@ -415,8 +415,8 @@ void ovs_flow_tbl_destroy(struct flow_table *table) >> > struct table_instance *ufid_ti = rcu_dereference_raw(table->ufid_ti); >> > >> > >> > free_percpu(table->mask_cache); >> > - kfree(rcu_dereference_raw(table->mask_array)); >> > table_instance_destroy(table, ti, ufid_ti, false); >> > + kfree(rcu_dereference_raw(table->mask_array)); >> > } >> > > >-- >Best regards, Tonghao ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v6] Add a northbound interface to program MAC_Binding table
References: <20220324214158.137613-1-svc.eng.git-m...@nutanix.com> Bleep bloop. Greetings svc.eng.git-mail, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. checkpatch: WARNING: Line lacks whitespace around operator #1033 FILE: utilities/ovn-nbctl.c:456: static-mac-binding-add LOGICAL_PORT IP MAC\n\ WARNING: Line lacks whitespace around operator #1035 FILE: utilities/ovn-nbctl.c:458: static-mac-binding-del LOGICAL_PORT IP\n\ WARNING: Line lacks whitespace around operator #1037 FILE: utilities/ovn-nbctl.c:460: static-mac-binding-list List all Static_MAC_Binding entries\n\ Lines checked: 1209, Warnings: 3, Errors: 0 Please check this out. If you feel there has been an error, please email acon...@redhat.com Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v5] Add a northbound interface to program MAC_Binding table
Hi, I’ve updated the patch and sent out v6. There was a memory leak in ovn-nbctl.c which is fixed now. Hopefully the build should pass now. Thank you, Karthik C On 3/24/22, 9:27 AM, "Lorenzo Bianconi" wrote: > From: Karthik Chandrashekar > mailto:karthi...@nutanix.com>> > > Add a new NB and SB table for managing Static MAC_Binding entries. > This table is currently supported for logical routers. OVN northd > is responsible for propagating the values from NB to SB. OVN controller > is responsible for installation MAC lookup flows. The priority of > the installed flows are based on override_dynamic_mac flag. This helps > control the precedence of statically programmed vs dynamically learnt > MAC Bindings. Hi Karthik, thx for working on v5. I think this is mostly fine. Just two comments inline. Regards, Lorenzo > > Signed-off-by: Karthik Chandrashekar > mailto:karthi...@nutanix.com>> > --- > controller/lflow.c | 103 - > controller/lflow.h | 10 +- > controller/ovn-controller.c| 38 +++- > lib/automake.mk| 2 + > lib/static-mac-binding-index.c | 43 + > lib/static-mac-binding-index.h | 27 ++ > northd/en-northd.c | 8 ++ > northd/inc-proc-northd.c | 14 ++- > northd/northd.c| 76 > northd/northd.h| 5 + > ovn-nb.ovsschema | 15 ++- > ovn-nb.xml | 29 ++ > ovn-sb.ovsschema | 15 ++- > ovn-sb.xml | 27 ++ > tests/ovn-nbctl.at | 69 ++ > tests/ovn-northd.at| 25 - > tests/ovn.at | 90 ++ > utilities/ovn-nbctl.c | 162 - > 18 files changed, 718 insertions(+), 40 deletions(-) > create mode 100644 lib/static-mac-binding-index.c > create mode 100644 lib/static-mac-binding-index.h > > diff --git a/controller/lflow.c b/controller/lflow.c > index e169edef1..075373e7b 100644 > --- a/controller/lflow.c > +++ b/controller/lflow.c > @@ -1622,40 +1622,50 @@ static void > consider_neighbor_flow(struct ovsdb_idl_index *sbrec_port_binding_by_name, > const struct hmap *local_datapaths, > const struct sbrec_mac_binding *b, > - struct ovn_desired_flow_table *flow_table) > + const struct sbrec_static_mac_binding *smb, > + struct ovn_desired_flow_table *flow_table, > + uint16_t priority) > { > +if (!b && !smb) { > +return; > +} > + > +char *logical_port = !b ? smb->logical_port : b->logical_port; > +char *ip = !b ? smb->ip : b->ip; > +char *mac = !b ? smb->mac : b->mac; > + > const struct sbrec_port_binding *pb > -= lport_lookup_by_name(sbrec_port_binding_by_name, b->logical_port); > += lport_lookup_by_name(sbrec_port_binding_by_name, logical_port); > if (!pb || !get_local_datapath(local_datapaths, > pb->datapath->tunnel_key)) { > return; > } > > -struct eth_addr mac; > -if (!eth_addr_from_string(b->mac, &mac)) { > +struct eth_addr mac_addr; > +if (!eth_addr_from_string(mac, &mac_addr)) { > static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); > -VLOG_WARN_RL(&rl, "bad 'mac' %s", b->mac); > +VLOG_WARN_RL(&rl, "bad 'mac' %s", mac); > return; > } > > struct match get_arp_match = MATCH_CATCHALL_INITIALIZER; > struct match lookup_arp_match = MATCH_CATCHALL_INITIALIZER; > > -if (strchr(b->ip, '.')) { > -ovs_be32 ip; > -if (!ip_parse(b->ip, &ip)) { > +if (strchr(ip, '.')) { > +ovs_be32 ip_addr; > +if (!ip_parse(ip, &ip_addr)) { > static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); > -VLOG_WARN_RL(&rl, "bad 'ip' %s", b->ip); > +VLOG_WARN_RL(&rl, "bad 'ip' %s", ip); > return; > } > -match_set_reg(&get_arp_match, 0, ntohl(ip)); > -match_set_reg(&lookup_arp_match, 0, ntohl(ip)); > +match_set_reg(&get_arp_match, 0, ntohl(ip_addr)); > +match_set_reg(&lookup_arp_match, 0, ntohl(ip_addr)); > match_set_dl_type(&lookup_arp_match, htons(ETH_TYPE_ARP)); > } else { > struct in6_addr ip6; > -if (!ipv6_parse(b->ip, &ip6)) { > +if (!ipv6_parse(ip, &ip6)) { > static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); > -VLOG_WARN_RL(&rl, "bad 'ip' %s", b->ip); > +VLOG_WARN_RL(&rl, "bad 'ip' %s", ip); > return; > } > ovs_be128 value; > @@ -1678,20 +1688,22 @@ consider_neighbor_flow(struct ovsdb_idl_index > *sbrec_port_binding_by_name, > uint64_t stub[1024 / 8]; > struct ofpbuf ofpacts = OFPBUF_STUB_INITIALIZER(stub); >
[ovs-dev] [PATCH ovn v6] Add a northbound interface to program MAC_Binding table
From: Karthik Chandrashekar Add a new NB and SB table for managing Static MAC_Binding entries. This table is currently supported for logical routers. OVN northd is responsible for propagating the values from NB to SB. OVN controller is responsible for installation MAC lookup flows. The priority of the installed flows are based on override_dynamic_mac flag. This helps control the precedence of statically programmed vs dynamically learnt MAC Bindings. Signed-off-by: Karthik Chandrashekar --- controller/lflow.c | 103 - controller/lflow.h | 10 +- controller/ovn-controller.c| 38 +++- lib/automake.mk| 2 + lib/static-mac-binding-index.c | 43 + lib/static-mac-binding-index.h | 27 ++ northd/en-northd.c | 8 ++ northd/inc-proc-northd.c | 14 ++- northd/northd.c| 76 +++ northd/northd.h| 5 + ovn-nb.ovsschema | 15 ++- ovn-nb.xml | 29 ++ ovn-sb.ovsschema | 15 ++- ovn-sb.xml | 27 ++ tests/ovn-nbctl.at | 69 ++ tests/ovn-northd.at| 25 - tests/ovn.at | 90 ++ utilities/ovn-nbctl.c | 164 - 18 files changed, 720 insertions(+), 40 deletions(-) create mode 100644 lib/static-mac-binding-index.c create mode 100644 lib/static-mac-binding-index.h diff --git a/controller/lflow.c b/controller/lflow.c index e169edef1..075373e7b 100644 --- a/controller/lflow.c +++ b/controller/lflow.c @@ -1622,40 +1622,50 @@ static void consider_neighbor_flow(struct ovsdb_idl_index *sbrec_port_binding_by_name, const struct hmap *local_datapaths, const struct sbrec_mac_binding *b, - struct ovn_desired_flow_table *flow_table) + const struct sbrec_static_mac_binding *smb, + struct ovn_desired_flow_table *flow_table, + uint16_t priority) { +if (!b && !smb) { +return; +} + +char *logical_port = !b ? smb->logical_port : b->logical_port; +char *ip = !b ? smb->ip : b->ip; +char *mac = !b ? smb->mac : b->mac; + const struct sbrec_port_binding *pb -= lport_lookup_by_name(sbrec_port_binding_by_name, b->logical_port); += lport_lookup_by_name(sbrec_port_binding_by_name, logical_port); if (!pb || !get_local_datapath(local_datapaths, pb->datapath->tunnel_key)) { return; } -struct eth_addr mac; -if (!eth_addr_from_string(b->mac, &mac)) { +struct eth_addr mac_addr; +if (!eth_addr_from_string(mac, &mac_addr)) { static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); -VLOG_WARN_RL(&rl, "bad 'mac' %s", b->mac); +VLOG_WARN_RL(&rl, "bad 'mac' %s", mac); return; } struct match get_arp_match = MATCH_CATCHALL_INITIALIZER; struct match lookup_arp_match = MATCH_CATCHALL_INITIALIZER; -if (strchr(b->ip, '.')) { -ovs_be32 ip; -if (!ip_parse(b->ip, &ip)) { +if (strchr(ip, '.')) { +ovs_be32 ip_addr; +if (!ip_parse(ip, &ip_addr)) { static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); -VLOG_WARN_RL(&rl, "bad 'ip' %s", b->ip); +VLOG_WARN_RL(&rl, "bad 'ip' %s", ip); return; } -match_set_reg(&get_arp_match, 0, ntohl(ip)); -match_set_reg(&lookup_arp_match, 0, ntohl(ip)); +match_set_reg(&get_arp_match, 0, ntohl(ip_addr)); +match_set_reg(&lookup_arp_match, 0, ntohl(ip_addr)); match_set_dl_type(&lookup_arp_match, htons(ETH_TYPE_ARP)); } else { struct in6_addr ip6; -if (!ipv6_parse(b->ip, &ip6)) { +if (!ipv6_parse(ip, &ip6)) { static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); -VLOG_WARN_RL(&rl, "bad 'ip' %s", b->ip); +VLOG_WARN_RL(&rl, "bad 'ip' %s", ip); return; } ovs_be128 value; @@ -1678,20 +1688,22 @@ consider_neighbor_flow(struct ovsdb_idl_index *sbrec_port_binding_by_name, uint64_t stub[1024 / 8]; struct ofpbuf ofpacts = OFPBUF_STUB_INITIALIZER(stub); uint8_t value = 1; -put_load(mac.ea, sizeof mac.ea, MFF_ETH_DST, 0, 48, &ofpacts); +put_load(mac_addr.ea, sizeof mac_addr.ea, MFF_ETH_DST, 0, 48, &ofpacts); put_load(&value, sizeof value, MFF_LOG_FLAGS, MLF_LOOKUP_MAC_BIT, 1, &ofpacts); -ofctrl_add_flow(flow_table, OFTABLE_MAC_BINDING, 100, -b->header_.uuid.parts[0], &get_arp_match, -&ofpacts, &b->header_.uuid); +ofctrl_add_flow(flow_table, OFTABLE_MAC_BINDING, priority, +!b ? smb->header_.uuid.parts[0] : b->header_.uuid.parts[0], +
Re: [ovs-dev] [PATCH ovn] rhel: fix logrotate user config option
Thanks Numan. Regards, Vladislav Odintsov > On 24 Mar 2022, at 17:54, Numan Siddique wrote: > > On Tue, Mar 22, 2022 at 2:33 PM Mark Michelson wrote: >> >> Acked-by: Mark Michelson >> > > Thanks. > > I applied this patch to the main branch and backported upto branch-21.03. > > Numan >> On 3/22/22 13:37, Vladislav Odintsov wrote: >>> If rpm is built with libcapng (default), /etc/logrotate.d/ovn >>> config file user is SEDed in postinstall section with 'ovn ovn'. >>> >>> If one run logrotate, next error occurs: >>> ``` >>> [root@host ~]# logrotate /etc/logrotate.d/ovn >>> error: /etc/logrotate.d/ovn:9 unknown user 'ovn' >>> error: found error in /var/log/ovn/*.log , skipping >>> ``` >>> >>> Replace 'ovn' user with 'openvswitch' to fix the issue. >>> >>> Signed-off-by: Vladislav Odintsov >>> --- >>> rhel/ovn-fedora.spec.in | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/rhel/ovn-fedora.spec.in b/rhel/ovn-fedora.spec.in >>> index 3fb854a37..821eb03cc 100644 >>> --- a/rhel/ovn-fedora.spec.in >>> +++ b/rhel/ovn-fedora.spec.in >>> @@ -323,7 +323,7 @@ ln -sf ovn_detrace.py %{_bindir}/ovn-detrace >>> %if %{with libcapng} >>> if [ $1 -eq 1 ]; then >>> sed -i 's:^#OVN_USER_ID=:OVN_USER_ID=:' %{_sysconfdir}/sysconfig/ovn >>> -sed -i 's:\(.*su\).*:\1 ovn ovn:' %{_sysconfdir}/logrotate.d/ovn >>> +sed -i 's:\(.*su\).*:\1 openvswitch openvswitch:' >>> %{_sysconfdir}/logrotate.d/ovn >>> fi >>> %endif > > >>> >>> >> >> ___ >> dev mailing list >> d...@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] northd: Use separate SNAT for already-DNATted traffic.
On Thu, Mar 24, 2022 at 4:47 PM Mark Michelson wrote: > > Hi Numan, > > This message got buried in my inbox, so sorry about only replying now. I > have some replies in-line below > > On 3/18/22 16:34, Numan Siddique wrote: > > On Wed, Mar 9, 2022 at 2:38 PM Mark Michelson wrote: > >> > >> Commit 4deac4509abbedd6ffaecf27eed01ddefccea40a introduced functionality > >> in ovn-northd to use a single zone for SNAT and DNAT when possible. It > >> accounts for certain situations, such as hairpinned traffic, where we > >> still need a separate SNAT and DNAT zone. > >> > >> However, one situation it did not account for was when traffic traverses > >> a logical router and is DNATted as a result of a load balancer, then > >> when the traffic egresses the router, it needs to be SNATted. In this > >> situation, we try to use the same CT zone for the SNAT as for the load > >> balancer DNAT, which does not work. > >> > >> This commit fixes the issue by installing a flow in the OUT_SNAT stage > >> of the logical router egress pipeline for each SNAT. This flow will fall > >> back to using a separate CT zone for SNAT if the ct_label.natted flag is > >> set. > >> > >> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2044127 > >> > >> Signed-off-by: Mark Michelson > > > > > > Hi Mark, > > > > Few comments, > > > > - There are a few test cases failing with this patch. > > Interesting. I'll have to double-check about that but I hadn't noticed > it happening locally. > > > > > - I'm pretty sure the issue will not be seen if the load balancer is > > also associated with the logical switch. In that case, dnat would > > happen in the logical switch pipeline. > > But still we should address this issue as it's a regression > > introduced by the commit 4deac4509abb. > > > > - Only small concern I've with this patch is the usage of > > ct_label.natted field in the egress pipeline. In the ingress pipeline > > of the router we send the pkt to conntrack and then do NAT. > > I'm just worried what if the ct states/labels are not valid or they > > get cleared when the pkt enters ingress to egress pipeline. Looks > > like we don't clear the conntrack state now. > > But future patches could. I think we should document this somewhere. > > The tricky thing here is that I don't know if there is any other > reliable way to know in the egress pipeline if the packet was natted in > the ingress pipeline. AFAIK, there is no scenario where the router > egress pipeline can run on a different node than the router ingress > pipeline, so at least the way it currently works should be reliable. > > Where is the best place to document this? I guess a code comment in northd.c would be helpful. I'd suggest to see if we can document in ovn-northd.8.xml while updating this file with the newly added logical flow(s). I don't have any strong preference. So please see if it makes sense to document in ovn-northd.8.xml too. Numan > > > > > - This patch needs to add the relevant documentation in the > > ovn-northd.8.xml for the new flows added. > > > > - Instead of adding the flows in the "lr_out_snat" stage, I'd > > suggest to add the below flow in the "lr_out_chk_dnat_local" pipeline > > > > table=0 (lr_out_chk_dnat_local), priority=50 , match=(ip && > > ct_label.natted == 1), action=(reg9[4] = 1; next;) > > > > We already have flows in the "lr_out_snat" stage to make use of > > "ct_snat" action instead of "ct_snat_in_czone" if reg9[4] is set. > > Can you please see if this suggestion is possible ? > > That sounds perfectly reasonable. I'll test it out and see if it works. > > > > > > > Thanks > > Numan > > > > > > > >> --- > >> Note: The "SNAT in separate zone from DNAT" test in system-ovn.at can > >> fail on some systems, and at this point it's not clear what causes it. > >> On my development system running Fedora 33 and kernel 5.14.18-100, the > >> test fails because the pings fail. In a container running Fedora 33 and > >> kernel 5.10.19-200, the test succeeds. It's unknown if it's the kernel > >> version or some other setting that has a bearing on the success or > >> failure of the test. The OpenFlow that ovn-controller generates is > >> identical on both successful and failure scenarios. > >> > >> If this inconsistency causes issues with CI, then this patch may need to > >> be re-examined to determine if there are other parameters that may be > >> needed to correct the referenced issue. > >> --- > >> northd/northd.c | 53 ++-- > >> tests/ovn-northd.at | 36 ++ > >> tests/system-ovn.at | 117 > >> 3 files changed, 190 insertions(+), 16 deletions(-) > >> > >> diff --git a/northd/northd.c b/northd/northd.c > >> index b264fb850..866a81310 100644 > >> --- a/northd/northd.c > >> +++ b/northd/northd.c > >> @@ -12961,23 +12961,44 @@ build_lrouter_out_snat_flow(struct hmap *lflows, > >> struct ovn_datapath *od, > >>
Re: [ovs-dev] [PATCH ovn] northd: Use separate SNAT for already-DNATted traffic.
Hi Numan, This message got buried in my inbox, so sorry about only replying now. I have some replies in-line below On 3/18/22 16:34, Numan Siddique wrote: On Wed, Mar 9, 2022 at 2:38 PM Mark Michelson wrote: Commit 4deac4509abbedd6ffaecf27eed01ddefccea40a introduced functionality in ovn-northd to use a single zone for SNAT and DNAT when possible. It accounts for certain situations, such as hairpinned traffic, where we still need a separate SNAT and DNAT zone. However, one situation it did not account for was when traffic traverses a logical router and is DNATted as a result of a load balancer, then when the traffic egresses the router, it needs to be SNATted. In this situation, we try to use the same CT zone for the SNAT as for the load balancer DNAT, which does not work. This commit fixes the issue by installing a flow in the OUT_SNAT stage of the logical router egress pipeline for each SNAT. This flow will fall back to using a separate CT zone for SNAT if the ct_label.natted flag is set. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2044127 Signed-off-by: Mark Michelson Hi Mark, Few comments, - There are a few test cases failing with this patch. Interesting. I'll have to double-check about that but I hadn't noticed it happening locally. - I'm pretty sure the issue will not be seen if the load balancer is also associated with the logical switch. In that case, dnat would happen in the logical switch pipeline. But still we should address this issue as it's a regression introduced by the commit 4deac4509abb. - Only small concern I've with this patch is the usage of ct_label.natted field in the egress pipeline. In the ingress pipeline of the router we send the pkt to conntrack and then do NAT. I'm just worried what if the ct states/labels are not valid or they get cleared when the pkt enters ingress to egress pipeline. Looks like we don't clear the conntrack state now. But future patches could. I think we should document this somewhere. The tricky thing here is that I don't know if there is any other reliable way to know in the egress pipeline if the packet was natted in the ingress pipeline. AFAIK, there is no scenario where the router egress pipeline can run on a different node than the router ingress pipeline, so at least the way it currently works should be reliable. Where is the best place to document this? - This patch needs to add the relevant documentation in the ovn-northd.8.xml for the new flows added. - Instead of adding the flows in the "lr_out_snat" stage, I'd suggest to add the below flow in the "lr_out_chk_dnat_local" pipeline table=0 (lr_out_chk_dnat_local), priority=50 , match=(ip && ct_label.natted == 1), action=(reg9[4] = 1; next;) We already have flows in the "lr_out_snat" stage to make use of "ct_snat" action instead of "ct_snat_in_czone" if reg9[4] is set. Can you please see if this suggestion is possible ? That sounds perfectly reasonable. I'll test it out and see if it works. Thanks Numan --- Note: The "SNAT in separate zone from DNAT" test in system-ovn.at can fail on some systems, and at this point it's not clear what causes it. On my development system running Fedora 33 and kernel 5.14.18-100, the test fails because the pings fail. In a container running Fedora 33 and kernel 5.10.19-200, the test succeeds. It's unknown if it's the kernel version or some other setting that has a bearing on the success or failure of the test. The OpenFlow that ovn-controller generates is identical on both successful and failure scenarios. If this inconsistency causes issues with CI, then this patch may need to be re-examined to determine if there are other parameters that may be needed to correct the referenced issue. --- northd/northd.c | 53 ++-- tests/ovn-northd.at | 36 ++ tests/system-ovn.at | 117 3 files changed, 190 insertions(+), 16 deletions(-) diff --git a/northd/northd.c b/northd/northd.c index b264fb850..866a81310 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -12961,23 +12961,44 @@ build_lrouter_out_snat_flow(struct hmap *lflows, struct ovn_datapath *od, priority, ds_cstr(match), ds_cstr(actions), &nat->header_); -if (!stateless) { -ds_put_cstr(match, " && "REGBIT_DST_NAT_IP_LOCAL" == 1"); -ds_clear(actions); -if (distributed) { -ds_put_format(actions, "eth.src = "ETH_ADDR_FMT"; ", - ETH_ADDR_ARGS(mac)); -} -ds_put_format(actions, REGBIT_DST_NAT_IP_LOCAL" = 0; ct_snat(%s", - nat->external_ip); -if (nat->external_port_range[0]) { -ds_put_format(actions, ",%s", nat->external_port_range); -} -ds_put_format(actions, ");"); -ovn_lflow_ad
Re: [ovs-dev] [PATCH ovn branch-22.03 1/2] Set release date for 22.03.0.
On 3/24/22 12:32, Ilya Maximets wrote: On 3/10/22 21:49, Mark Michelson wrote: Signed-off-by: Mark Michelson --- NEWS | 2 +- debian/changelog | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/NEWS b/NEWS index 9f3ce3cf3..0f8e068b1 100644 --- a/NEWS +++ b/NEWS @@ -1,4 +1,4 @@ -OVN v22.03.0 - XX XXX +OVN v22.03.0 - 11 Mar 2022 -- - Refactor CoPP commands introducing a unique name index in CoPP NB table. Add following new CoPP commands to manage CoPP table: Hey, Mark. It looks like the version of the patch that got applied to the branch-22.03 doesn't have the NEWS update above. We also need this patch ported to the main branch to update all the dates. (And it looks like I myself forgot to do that for OVS 2.17...) Bets regards, Ilya Maximets. Thanks Ilya, I pushed corrections as follows: I cherry-picked this change to main and updated NEWS to have the correct release date. I pushed a commit to branch-22.03 that updates NEWS to have the correct release date. diff --git a/debian/changelog b/debian/changelog index fe0e7f488..a26420b7a 100644 --- a/debian/changelog +++ b/debian/changelog @@ -2,7 +2,7 @@ ovn (22.03.0-1) unstable; urgency=low * New upstream version - -- OVN team Thu, 24 Feb 2022 16:55:45 -0500 + -- OVN team Thu, 10 Mar 2022 15:47:41 -0500 ovn (21.12.0-1) unstable; urgency=low ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH branch-2.17 0/2] Release patches for v2.17.0.
On 2/17/22 23:13, Ilya Maximets wrote: > On 2/17/22 19:45, Ilya Maximets wrote: >> >> Ilya Maximets (2): >> Set release date for 2.17.0. >> Prepare for 2.17.1. >> >> NEWS | 7 ++- >> configure.ac | 2 +- >> debian/changelog | 8 +++- >> 3 files changed, 14 insertions(+), 3 deletions(-) >> > > Applied. Thanks, Flavio! I did apply the patches to branch-2.17, but I forgot to apply the fist one to the master branch. Fixed now. > > Will send release announce soon. > > Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] system-traffic.at: Fix flaky DNAT load balancing test.
Hello Ilya, Ilya Maximets writes: > 'conntrack - DNAT load balancing' test fails from time to time > because not all the group buckets are getting hit. > > In short, the test creates a group with 3 buckets with the same > weight. It creates 12 TCP sessions and expects that each bucket > will be used at least once. However, there is a solid chance that > this will not happen. The probability of having at least one > empty bucket is: > > C(3, 1) x (2/3)^N - C(3, 2) x (1/3)^N > > Where N is the number of distinct TCP sessions. For N=12, the > probability is about 0.023, i.e. there is a 2.3% chance for a > test to fail, which is not great for CI. > > Increasing the number of sessions to 50 to reduce the probability > of failure down to 4.7e-9. In my testing, the configuration with > 50 TCP sessions didn't fail after 6000 runs. Should be good > enough for CI systems. > > Fixes: 2c66ebe47a88 ("ofp-actions: Allow conntrack action in group buckets.") > Signed-off-by: Ilya Maximets > --- the patch LGTM, Acked-by: Paolo Valerio ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] system-traffic.at: Fix flaky DNAT load balancing test.
> -Original Message- > From: Ilya Maximets > Sent: Thursday 24 March 2022 11:20 > To: ovs-dev@openvswitch.org > Cc: Aaron Conole ; Phelan, Michael > ; Ilya Maximets > Subject: [PATCH] system-traffic.at: Fix flaky DNAT load balancing test. > > 'conntrack - DNAT load balancing' test fails from time to time because not all > the group buckets are getting hit. > > In short, the test creates a group with 3 buckets with the same weight. It > creates 12 TCP sessions and expects that each bucket will be used at least > once. > However, there is a solid chance that this will not happen. The probability > of > having at least one empty bucket is: > > C(3, 1) x (2/3)^N - C(3, 2) x (1/3)^N > > Where N is the number of distinct TCP sessions. For N=12, the probability is > about 0.023, i.e. there is a 2.3% chance for a test to fail, which is not > great for > CI. > > Increasing the number of sessions to 50 to reduce the probability of failure > down to 4.7e-9. In my testing, the configuration with > 50 TCP sessions didn't fail after 6000 runs. Should be good enough for CI > systems. > > Fixes: 2c66ebe47a88 ("ofp-actions: Allow conntrack action in group buckets.") > Signed-off-by: Ilya Maximets > --- > tests/system-traffic.at | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/tests/system-traffic.at b/tests/system-traffic.at index > 95383275a..4a7fa49fc 100644 > --- a/tests/system-traffic.at > +++ b/tests/system-traffic.at > @@ -6471,7 +6471,7 @@ on_exit 'ovs-appctl revalidator/purge' > on_exit 'ovs-appctl dpif/dump-flows br0' > > dnl Should work with the virtual IP address through NAT -for i in 1 2 3 4 5 > 6 7 8 9 > 10 11 12; do > +for i in $(seq 1 50); do > echo Request $i > NS_CHECK_EXEC([at_ns1], [wget 10.1.1.64 -t 5 -T 1 --retry-connrefused -v > -o > wget$i.log]) done > -- > 2.34.1 Acked-by: Michael Phelan ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] dpif-netdev: Allow cross-NUMA polling on selected ports
On 24/03/2022 16:35, Jan Scheurich wrote: Hi Kevin, This was a bit of a misunderstanding. We didn't check your RFC patch carefully enough to realize that you had meant to encompass our cross-numa-polling function in that RFC patch. Sorry for the confusion. I wouldn't say we are particularly keen on upstreaming exactly our implementation of cross-numa-polling for ALB, as long as we get the functionality with a per-interface configuration option (preferably as in out patch, so that we can maintain backward compatibility with our downstream solution). I suggest we have a closer look at your RFC and come back with comments on that. No problems. It doesn't have selection logic/tests/docs etc, it was just a way of seeing how the additions below could be added. It is mainly just reorganising the schedule() fn. to not be always: select numa first, then select pmd from numa. And some splitting things into functions to help with checking load on all pmds. 'roundrobin' and 'cycles', are dependent on RR of cores per numa, so I agree it makes sense in those cases to still RR the numas. Otherwise a mix of cross- numa enabled and cross-numa disabled interfaces would conflict with each other when selecting a core. So even though the user has selected to ignore numa for this interface, we don't have a choice but to still RR the numa. For 'group' it is about finding the lowest loaded pmd core. In that case we don't need to RR numa. We can just find the lowest loaded pmd core from any numa. This is better because the user is choosing to remove numa based selection for the interface and as the rxq scheduling algorthim is not dependent on it, we can fully remove it too by checking pmds from all numas. I have done this in my RFC. It is also better to do this where possible because there may not be same amount of pmd cores on each numa, or one numa could already be more heavily loaded than the other. Another difference is with above for 'group' I added tiebreaker for a local-to- interface numa pmd to be selected if multiple pmds from different numas were available with same load. This is most likely to be helpful for initial selection when there is no load on any pmds. At first glance I agree with your reasoning. Choosing the least-loaded PMD from all NUMAs using NUMA-locality as a tie-breaker makes sense in 'group' algorithm. How do you select between equally loaded PMDs on the same NUMA node? Just pick any? Yes, that is how the group algorithm currently operates. A further way to tiebreak on local numa could be if one of the remaining pmds was already polling that rxq, or on number of rxqs assigned to the pmd. thanks, Kevin. BR, Jan ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] dpif-netdev: Allow cross-NUMA polling on selected ports
Hi Kevin, This was a bit of a misunderstanding. We didn't check your RFC patch carefully enough to realize that you had meant to encompass our cross-numa-polling function in that RFC patch. Sorry for the confusion. I wouldn't say we are particularly keen on upstreaming exactly our implementation of cross-numa-polling for ALB, as long as we get the functionality with a per-interface configuration option (preferably as in out patch, so that we can maintain backward compatibility with our downstream solution). I suggest we have a closer look at your RFC and come back with comments on that. > > 'roundrobin' and 'cycles', are dependent on RR of cores per numa, so I agree > it > makes sense in those cases to still RR the numas. Otherwise a mix of cross- > numa enabled and cross-numa disabled interfaces would conflict with each > other when selecting a core. So even though the user has selected to ignore > numa for this interface, we don't have a choice but to still RR the numa. > > For 'group' it is about finding the lowest loaded pmd core. In that case we > don't > need to RR numa. We can just find the lowest loaded pmd core from any numa. > > This is better because the user is choosing to remove numa based selection for > the interface and as the rxq scheduling algorthim is not dependent on it, we > can fully remove it too by checking pmds from all numas. I have done this in > my > RFC. > > It is also better to do this where possible because there may not be same > amount of pmd cores on each numa, or one numa could already be more > heavily loaded than the other. > > Another difference is with above for 'group' I added tiebreaker for a > local-to- > interface numa pmd to be selected if multiple pmds from different numas were > available with same load. This is most likely to be helpful for initial > selection > when there is no load on any pmds. At first glance I agree with your reasoning. Choosing the least-loaded PMD from all NUMAs using NUMA-locality as a tie-breaker makes sense in 'group' algorithm. How do you select between equally loaded PMDs on the same NUMA node? Just pick any? BR, Jan ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn branch-22.03 1/2] Set release date for 22.03.0.
On 3/10/22 21:49, Mark Michelson wrote: > Signed-off-by: Mark Michelson > --- > NEWS | 2 +- > debian/changelog | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/NEWS b/NEWS > index 9f3ce3cf3..0f8e068b1 100644 > --- a/NEWS > +++ b/NEWS > @@ -1,4 +1,4 @@ > -OVN v22.03.0 - XX XXX > +OVN v22.03.0 - 11 Mar 2022 > -- >- Refactor CoPP commands introducing a unique name index in CoPP NB table. > Add following new CoPP commands to manage CoPP table: Hey, Mark. It looks like the version of the patch that got applied to the branch-22.03 doesn't have the NEWS update above. We also need this patch ported to the main branch to update all the dates. (And it looks like I myself forgot to do that for OVS 2.17...) Bets regards, Ilya Maximets. > diff --git a/debian/changelog b/debian/changelog > index fe0e7f488..a26420b7a 100644 > --- a/debian/changelog > +++ b/debian/changelog > @@ -2,7 +2,7 @@ ovn (22.03.0-1) unstable; urgency=low > > * New upstream version > > - -- OVN team Thu, 24 Feb 2022 16:55:45 -0500 > + -- OVN team Thu, 10 Mar 2022 15:47:41 -0500 > > ovn (21.12.0-1) unstable; urgency=low > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v4] northd: Add support for NAT with multiple DGP
References: <20220324161239.61857-1-sangana.abhi...@nutanix.com> Bleep bloop. Greetings Abhiram Sangana, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. checkpatch: WARNING: Line is 275 characters long (recommended limit is 79) #1210 FILE: utilities/ovn-nbctl.8.xml:1145: [--may-exist] [--stateless] [--gateway_port=GATEWAY_PORT] lr-nat-add router type external_ip logical_ip [logical_port external_mac] WARNING: Line is 143 characters long (recommended limit is 79) #1255 FILE: utilities/ovn-nbctl.8.xml:1222: [--if-exists] lr-nat-del router [type [ip] [gateway_port]] WARNING: Line lacks whitespace around operator WARNING: Line lacks whitespace around operator WARNING: Line lacks whitespace around operator #1301 FILE: utilities/ovn-nbctl.c:381: [--gateway-port=GATEWAY_PORT]\n\ WARNING: Line lacks whitespace around operator #1306 FILE: utilities/ovn-nbctl.c:385: lr-nat-del ROUTER [TYPE [IP] [GATEWAY_PORT]]\n\ Lines checked: 1592, Warnings: 6, Errors: 0 Please check this out. If you feel there has been an error, please email acon...@redhat.com Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn v4] northd: Add support for NAT with multiple DGP
Currently, if multiple distributed gateway ports (DGP) are configured on a logical router, NAT is disabled as part of commit 15348b7 (northd: Multiple distributed gateway port support.) This patch adds a new column called "gateway_port" in NAT table that references a distributed gateway port in the Logical_Router_Port table. A NAT rule is only applied on matching packets entering or leaving the DGP configured for the rule, when a router has multiple DGPs. If a router has a single DGP, NAT rules are applied at the DGP even if the "gateway_port" column is not set. It is an error to not set this column for a NAT rule when the router has multiple DGPs. This patch also updates the NAT commands in ovn-nbctl to support the new column. Signed-off-by: Abhiram Sangana --- NEWS | 1 + northd/northd.c | 184 +-- northd/ovn-northd.8.xml | 27 +++--- ovn-architecture.7.xml| 6 +- ovn-nb.ovsschema | 10 +- ovn-nb.xml| 37 ++- tests/ovn-nbctl.at| 197 +- tests/ovn-northd.at | 172 +++-- tests/ovn.at | 2 +- utilities/ovn-nbctl.8.xml | 48 ++ utilities/ovn-nbctl.c | 170 +++- 11 files changed, 647 insertions(+), 207 deletions(-) diff --git a/NEWS b/NEWS index 5ba9c3cf4..458db45a0 100644 --- a/NEWS +++ b/NEWS @@ -2,6 +2,7 @@ Post v22.03.0 - - Support IGMP and MLD snooping on transit logical switches that connect different OVN Interconnection availability zones. + - Support NAT for logical routers with multiple distributed gateway ports. OVN v22.03.0 - XX XXX -- diff --git a/northd/northd.c b/northd/northd.c index a2cf8d6fc..df890d97d 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -606,11 +606,11 @@ struct ovn_datapath { /* Applies to only logical router datapath. * True if logical router is a gateway router. i.e options:chassis is set. - * If this is true, then 'l3dgw_port' will be ignored. */ + * If this is true, then 'l3dgw_ports' will be ignored. */ bool is_gw_router; -/* OVN northd only needs to know about the logical router gateway port for - * NAT on a distributed router. The "distributed gateway ports" are +/* OVN northd only needs to know about logical router gateway ports for + * NAT/LB on a distributed router. The "distributed gateway ports" are * populated only when there is a gateway chassis or ha chassis group * specified for some of the ports on the logical router. Otherwise this * will be NULL. */ @@ -2702,8 +2702,9 @@ join_logical_ports(struct northd_input *input_data, * by one or more IP addresses, and if the port is a distributed gateway * port, followed by 'is_chassis_resident("LPORT_NAME")', where the * LPORT_NAME is the name of the L3 redirect port or the name of the - * logical_port specified in a NAT rule. These strings include the - * external IP addresses of all NAT rules defined on that router, and all + * logical_port specified in a NAT rule. These strings include the + * external IP addresses of NAT rules defined on that router which have + * gateway_port not set or have gateway_port as the router port 'op', and all * of the IP addresses used in load balancer VIPs defined on that router. * * The caller must free each of the n returned strings with free(), @@ -2716,8 +2717,7 @@ get_nat_addresses(const struct ovn_port *op, size_t *n, bool routable_only, struct eth_addr mac; if (!op || !op->nbrp || !op->od || !op->od->nbr || (!op->od->nbr->n_nat && !op->od->has_lb_vip) -|| !eth_addr_from_string(op->nbrp->mac, &mac) -|| op->od->n_l3dgw_ports > 1) { +|| !eth_addr_from_string(op->nbrp->mac, &mac)) { *n = n_nats; return NULL; } @@ -2746,6 +2746,12 @@ get_nat_addresses(const struct ovn_port *op, size_t *n, bool routable_only, continue; } +/* Not including external IP of NAT rules whose gateway_port is + * not 'op'. */ +if (nat->gateway_port && nat->gateway_port != op->nbrp) { +continue; +} + /* Determine whether this NAT rule satisfies the conditions for * distributed NAT processing. */ if (op->od->n_l3dgw_ports && !strcmp(nat->type, "dnat_and_snat") @@ -2816,9 +2822,9 @@ get_nat_addresses(const struct ovn_port *op, size_t *n, bool routable_only, if (central_ip_address) { /* Gratuitous ARP for centralized NAT rules on distributed gateway * ports should be restricted to the gateway chassis. */ -if (op->od->n_l3dgw_ports) { +if (is_l3dgw_port(op)) { ds_put_format(&c_addresses, " is_chassis_resident(%s)", - op->od->l3dgw_ports[0]->cr_port->json_key); + op->cr_port->j
[ovs-dev] OVS DPDK DMA-Dev library/Design Discussion
Hi All, This meeting is a follow up to the call earlier this week. This week Sunil presented 3 different approaches to integrating DMA-Dev with OVS along with the performance impacts. https://github.com/Sunil-Pai-G/OVS-DPDK-presentation-share/blob/main/OVS%20vhost%20async%20datapath%20design%202022.pdf The approaches were as follows: * Defer work. * Tx completions from Rx context. * Tx completions from Rx context + lockless ring. The pros and cons of each approach were discussed but there was no clear solution reached. As such a follow up call was suggested to continue discussion and to reach a clear decision on the approach to take. Please see agenda as it stands below: Agenda * Opens * Continue discussion of 3x approaches from last week (Defer work, "V3", V4, links to patches in Sunil's slides above) * Design Feedback (please review solutions of above & slide-deck from last week before call to be informed) * Dynamic Allocation of DMA engine per queue * Code Availability (DPDK GitHub, OVS GitHub branches) Please feel free to respond with any other items to be added to the agenda. Google Meet: https://meet.google.com/hme-pygf-bfb Regards Ian ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] system-traffic.at: Fix flaky DNAT load balancing test.
On 24 Mar 2022, at 12:20, Ilya Maximets wrote: > 'conntrack - DNAT load balancing' test fails from time to time > because not all the group buckets are getting hit. > > In short, the test creates a group with 3 buckets with the same > weight. It creates 12 TCP sessions and expects that each bucket > will be used at least once. However, there is a solid chance that > this will not happen. The probability of having at least one > empty bucket is: > > C(3, 1) x (2/3)^N - C(3, 2) x (1/3)^N > > Where N is the number of distinct TCP sessions. For N=12, the > probability is about 0.023, i.e. there is a 2.3% chance for a > test to fail, which is not great for CI. > > Increasing the number of sessions to 50 to reduce the probability > of failure down to 4.7e-9. In my testing, the configuration with > 50 TCP sessions didn't fail after 6000 runs. Should be good > enough for CI systems. > > Fixes: 2c66ebe47a88 ("ofp-actions: Allow conntrack action in group buckets.") > Signed-off-by: Ilya Maximets > --- Change looks good to me, for test past 6 times ;) Acked-by: Eelco Chaudron ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [External] Re: [PATCH] dpif-netdev: add an option to assign pmd rxq to all numas
Hi Anurag, On 16/03/2022 12:29, Anurag Agarwal wrote: Hello Kevin, Thanks for your inputs. In this scenario we have one VM each on NUMA0 and NUMA1 (VM1 is on NUMA0, VM2 is on NUMA1), dpdk port is on NUMA1. Without cross-numa-polling, VM/VHU queue traffic is evenly distributed based on load on their respective NUMA sockets. However, DPDK traffic is only load balanced on NUMA1 PMDs, thereby exhibiting aggregate load imbalance in the system (i.e.NUMA1 PMDs having more load v/s NUMA0 PMDs) Please refer example below (cross-numa-polling is not enabled) pmd thread numa_id 0 core_id 2: isolated : false port: vhu-vm1p1 queue-id: 2 (enabled) pmd usage: 11 % port: vhu-vm1p1 queue-id: 4 (enabled) pmd usage: 0 % overhead: 0 % pmd thread numa_id 1 core_id 3: isolated : false port: dpdk0 queue-id: 0 (enabled) pmd usage: 13 % port: dpdk0 queue-id: 2 (enabled) pmd usage: 15 % port: vhu-vm2p1 queue-id: 3 (enabled) pmd usage: 9 % port: vhu-vm2p1 queue-id: 4 (enabled) pmd usage: 0 % overhead: 0 % With cross-numa-polling enabled, the rxqs from DPDK port are distributed to both NUMAs, and then the 'group' scheduling algorithm assigns the rxqs to PMDs based on load. Please refer example below, after cross-numa-polling is enabled on dpdk0 port. pmd thread numa_id 0 core_id 2: isolated : false port: dpdk0 queue-id: 5 (enabled) pmd usage: 11 % port: vhu-vm1p1 queue-id: 3 (enabled) pmd usage: 4 % port: vhu-vm1p1 queue-id: 5 (enabled) pmd usage: 4 % overhead: 2 % pmd thread numa_id 1 core_id 3: isolated : false port: dpdk0 queue-id: 2 (enabled) pmd usage: 10 % port: vhu-vm2p1 queue-id: 0 (enabled) pmd usage: 4 % port: vhu-vm2p1 queue-id: 6 (enabled) pmd usage: 4 % overhead: 3 % Yes, this illustrates the operation well. We can imply that if the traffic rate was increased that the cross-numa disabled case would hit a bottleneck first by virtue of having 2 dpdk rxq + 2 vm rxq on one core. Kevin. Regards, Anurag -Original Message- From: Kevin Traynor Sent: Thursday, March 10, 2022 11:02 PM To: Anurag Agarwal ; Jan Scheurich ; Wan Junjie Cc: d...@openvswitch.org Subject: Re: [External] Re: [PATCH] dpif-netdev: add an option to assign pmd rxq to all numas On 04/03/2022 17:57, Anurag Agarwal wrote: Hello Kevin, I have prepared a patch for "per port cross-numa-polling" and attached herewith. The results are captured in 'cross-numa-results.txt'. We see PMD to RxQ assignment evenly balanced across all PMDs with this patch. Please take a look and let us know your inputs. Hi Anurag, I think what this is showing is more related to txqs used for sending to the VM. As you are allowing the rxqs from the phy port to be handled by more pmds, and all those rxqs have traffic then in turn more txqs are used for sending to the VM. The result of using more txqs when sending to the VM in this case is that the traffic is returned on the more rxqs. Allowing cross-numa does not guarantee that the different pmd cores will poll rxqs from an interface. At least with group algorithm, the pmds will be selected purely on load. The right way to ensure that all VM txqs(/rxqs) are used is to enable the Tx-steering feature [0]. So you might be seeing some benefit in this case, but to me it's not the core use case of cross-numa polling. That is more about allowing the pmds on every numa to be used when the traffic load is primarily coming from one numa. Kevin. [0] https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-45444731-623c75dfcb975446&q=1&e=fda391fe-6bfc-4657-ba86-b13008b338fd&u=https%3A%2F%2Fdocs.openvswitch.org%2Fen%2Flatest%2Ftopics%2Fuserspace-tx-steering%2F Please find some of my inputs inline, in response to your comments. Regards, Anurag -Original Message- From: Kevin Traynor Sent: Thursday, February 24, 2022 7:54 PM To: Jan Scheurich ; Wan Junjie Cc: d...@openvswitch.org; Anurag Agarwal Subject: Re: [External] Re: [PATCH] dpif-netdev: add an option to assign pmd rxq to all numas Hi Jan, On 17/02/2022 14:21, Jan Scheurich wrote: Hi Kevin, We have done extensive benchmarking and found that we get better overall PMD load balance and resulting OVS performance when we do not statically pin any rx queues and instead let the auto-load-balancing find the optimal distribution of phy rx queues over both NUMA nodes to balance an asymmetric load of vhu rx queues (polled only on the local NUMA node). Cross-NUMA polling of vhu rx queues comes with a very high latency cost due to cross-NUMA access to volatile virtio ring pointers in every iteration (not only when actually copying packets). Cross-NUMA polling of phy rx queues doesn't have a similar issue. I agree that for vhost rxq polling, it always causes a performance penalty w
Re: [ovs-dev] [PATCH v2] dpif-netdev: Allow cross-NUMA polling on selected ports
Hi Anurag/Jan, On 21/03/2022 09:42, Anurag Agarwal wrote: From: Jan Scheurich Today dpif-netdev considers PMD threads on a non-local NUMA node for automatic assignment of the rxqs of a port only if there are no local, non-isolated PMDs. On typical servers with both physical ports on one NUMA node, this often leaves the PMDs on the other NUMA node under-utilized, wasting CPU resources. The alternative, to manually pin the rxqs to PMDs on remote NUMA nodes, also has drawbacks as it limits OVS' ability to auto load-balance the rxqs. This patch introduces a new interface configuration option to allow ports to be automatically polled by PMDs on any NUMA node: ovs-vsctl set interface other_config:cross-numa-polling=true If this option is not present or set to false, legacy behaviour applies. I mentioned some additional features to improve this and you seemed to agree they were useful so I sent an RFC with them for you to check out. https://mail.openvswitch.org/pipermail/ovs-dev/2022-March/392310.html Then you sent this patch, so I am a bit confused. Maybe you missed it? I will highlight some differences between the patchsets on rxq scheduling below. I think general comments on the trade-offs of this feature in the other thread are still welcome and maybe more important than implementation discussions. Signed-off-by: Jan Scheurich Signed-off-by: Anurag Agarwal --- Documentation/topics/dpdk/pmd.rst | 24 ++-- lib/dpif-netdev.c | 31 +++ tests/pmd.at | 30 ++ vswitchd/vswitch.xml | 20 4 files changed, 95 insertions(+), 10 deletions(-) diff --git a/Documentation/topics/dpdk/pmd.rst b/Documentation/topics/dpdk/pmd.rst index b259cc8b3..f6c9671d7 100644 --- a/Documentation/topics/dpdk/pmd.rst +++ b/Documentation/topics/dpdk/pmd.rst @@ -132,8 +132,28 @@ or can be triggered by using:: Port/Rx Queue assignment to PMD threads by manual pinning ~ -Rx queues may be manually pinned to cores. This will change the default Rx -queue assignment to PMD threads:: + +Normally, Rx queues are assigned to PMD threads automatically. By default +OVS only assigns Rx queues to PMD threads executing on the same NUMA +node in order to avoid unnecessary latency for accessing packet buffers +across the NUMA boundary. Typically this overhead is higher for vhostuser +ports than for physical ports due to the packet copy that is done for all +rx packets. + +On NUMA servers with physical ports only on one NUMA node, the NUMA-local +polling policy can lead to an under-utilization of the PMD threads on the +remote NUMA node. For the overall OVS performance it may in such cases be +beneficial to utilize the spare capacity and allow polling of a physical +port's rxqs across NUMA nodes despite the overhead involved. +The policy can be set per port with the following configuration option:: + +$ ovs-vsctl set Interface \ +other_config:cross-numa-polling=true|false + +The default value is false. + +Rx queues may also be manually pinned to cores. This will change the default +Rx queue assignment to PMD threads:: $ ovs-vsctl set Interface \ other_config:pmd-rxq-affinity= diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 720818e30..4fda8d7a0 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -466,6 +466,7 @@ struct dp_netdev_port { char *type; /* Port type as requested by user. */ char *rxq_affinity_list;/* Requested affinity of rx queues. */ enum txq_req_mode txq_requested_mode; +bool cross_numa_polling;/* If true cross polling will be enabled */ }; static bool dp_netdev_flow_ref(struct dp_netdev_flow *); @@ -5018,6 +5019,7 @@ dpif_netdev_port_set_config(struct dpif *dpif, odp_port_t port_no, bool emc_enabled = smap_get_bool(cfg, "emc-enable", true); const char *tx_steering_mode = smap_get(cfg, "tx-steering"); enum txq_req_mode txq_mode; +bool cross_numa_polling = smap_get_bool(cfg, "cross-numa-polling", false); ovs_rwlock_wrlock(&dp->port_rwlock); error = get_port_by_number(dp, port_no, &port); @@ -5086,6 +5088,14 @@ dpif_netdev_port_set_config(struct dpif *dpif, odp_port_t port_no, dp_netdev_request_reconfigure(dp); } +if (cross_numa_polling != port->cross_numa_polling) { +port->cross_numa_polling = cross_numa_polling; +VLOG_INFO("%s:cross-numa-polling has been %s.", + netdev_get_name(port->netdev), + (cross_numa_polling)? "enabled" : "disabled"); +dp_netdev_request_reconfigure(dp); +} + unlock: ovs_rwlock_unlock(&dp->port_rwlock); return error; @@ -5885,7 +5895,7 @@ sched_numa_list_schedule(struct sched_numa_list *numa_list, { struct d
Re: [ovs-dev] [PATCH ovn] rhel: fix logrotate user config option
On Tue, Mar 22, 2022 at 2:33 PM Mark Michelson wrote: > > Acked-by: Mark Michelson > Thanks. I applied this patch to the main branch and backported upto branch-21.03. Numan > On 3/22/22 13:37, Vladislav Odintsov wrote: > > If rpm is built with libcapng (default), /etc/logrotate.d/ovn > > config file user is SEDed in postinstall section with 'ovn ovn'. > > > > If one run logrotate, next error occurs: > > ``` > > [root@host ~]# logrotate /etc/logrotate.d/ovn > > error: /etc/logrotate.d/ovn:9 unknown user 'ovn' > > error: found error in /var/log/ovn/*.log , skipping > > ``` > > > > Replace 'ovn' user with 'openvswitch' to fix the issue. > > > > Signed-off-by: Vladislav Odintsov > > --- > > rhel/ovn-fedora.spec.in | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/rhel/ovn-fedora.spec.in b/rhel/ovn-fedora.spec.in > > index 3fb854a37..821eb03cc 100644 > > --- a/rhel/ovn-fedora.spec.in > > +++ b/rhel/ovn-fedora.spec.in > > @@ -323,7 +323,7 @@ ln -sf ovn_detrace.py %{_bindir}/ovn-detrace > > %if %{with libcapng} > > if [ $1 -eq 1 ]; then > > sed -i 's:^#OVN_USER_ID=:OVN_USER_ID=:' %{_sysconfdir}/sysconfig/ovn > > -sed -i 's:\(.*su\).*:\1 ovn ovn:' %{_sysconfdir}/logrotate.d/ovn > > +sed -i 's:\(.*su\).*:\1 openvswitch openvswitch:' > > %{_sysconfdir}/logrotate.d/ovn > > fi > > %endif > > > > > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] Possible backport candidate
On Tue, Mar 22, 2022 at 12:31 PM Mark Michelson wrote: > > Hi everyone, > > Last month, commit d7514abe1 (acl-log: Log the direction (logical > pipeline) of the matching ACL.) was added to the main branch of OVN. > This commit was made because of the issue report at > https://bugzilla.redhat.com/show_bug.cgi?id=1992641 . > > At the time the change was added, it was not considered for a backport > to OVN 21.12 because it modifies the contents of ACL logs. The fear was > that if people have written parsers for ACL logs, the addition of a new > field could cause issues. This is not a very friendly thing to do during > a minor update. > > However, the change was made in an attempt to correct a deficiency in > the ACL logs. Without the change, it appears that there are duplicate > ACL logs since there is nothing to distinguish the pipeline on which the > ACL matched. Also, there is no documented contract about ACL log format; > they are intended to be human readable, and that's all. > > My question is if the community would be OK with backporting this fix to > OVN 21.12 in light of the change to the ACL log contents. Please let me > know what you think. Hi Mark, No objections from me. Numan > > Thanks, > Mark Michelson > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v5 3/5] pmd.at: Add tests for multi non-local numa pmds.
On Wed, Mar 23, 2022 at 3:09 PM Kevin Traynor wrote: > > Ensure that if there are no local numa PMD thread > cores available that pmd cores from all other non-local > numas will be used. > > Signed-off-by: Kevin Traynor Acked-by: David Marchand -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] datapath: use after free flow mask on destroy flow table
On Thu, Mar 24, 2022 at 6:32 PM Ilya Maximets wrote: > > On 3/24/22 05:17, Wentao Jia wrote: > > > > > > on destroy flow table instance, referenced flow mask may be released > > too. fuction ovs_flow_tbl_destroy(), release flow mask first and then > > destroy flow table instance. this will trigger kernel panic on detroy > > datapath > > > > > > [ 377.647756] kernel BUG at .../datapath/linux/flow_table.c:272! > > [ 377.653794] invalid opcode: [#1] SMP PTI > > [ 377.666827] RIP: 0010:table_instance_flow_free.isra.7+0x148/0x150 > > [ 377.711465] Call Trace: > > [ 377.715238] > > [ 377.718964] table_instance_destroy+0xbe/0x160 [openvswitch] > > [ 377.722793] destroy_dp_rcu+0x12/0x40 [openvswitch] > > [ 377.726651] rcu_process_callbacks+0x297/0x460 > > [ 377.736795] __do_softirq+0xe3/0x30a > > [ 377.740654] ? ktime_get+0x36/0xa0 > > [ 377.744490] irq_exit+0x100/0x110 > > [ 377.748514] smp_apic_timer_interrupt+0x74/0x140 > > [ 377.752817] apic_timer_interrupt+0xf/0x20 > > [ 377.758802] > > > > > > Fixes: 6d1cf7f3e ("datapath: fix possible memleak on destroy > > flow-table") for linux upstream, fix tag: Fixes: 50b0e61b32ee ("net: openvswitch: fix possible memleak on destroy flow-table") > > > > Signed-off-by: Wentao Jia > > Signed-off-by: Chuanjie Zeng > > --- > > Hi, Wentao Jia. Thanks for the patch! > > Please, send it to the mainline linux kernel ('netdev' mailing list, > keeping the ovs-dev in CC) using the linux kernel process for > submitting patches. > > When it is accepted to the upstream kernel, it can be backported to > the OOT kernel module in OVS repository. > > Best regards, Ilya Maximets. > > > datapath/flow_table.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/datapath/flow_table.c b/datapath/flow_table.c > > index 650338fb0..b2f4b1108 100644 > > --- a/datapath/flow_table.c > > +++ b/datapath/flow_table.c > > @@ -415,8 +415,8 @@ void ovs_flow_tbl_destroy(struct flow_table *table) > > struct table_instance *ufid_ti = > > rcu_dereference_raw(table->ufid_ti); > > > > > > free_percpu(table->mask_cache); > > - kfree(rcu_dereference_raw(table->mask_array)); > > table_instance_destroy(table, ti, ufid_ti, false); > > + kfree(rcu_dereference_raw(table->mask_array)); > > } > -- Best regards, Tonghao ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] system-traffic.at: Fix flaky DNAT load balancing test.
'conntrack - DNAT load balancing' test fails from time to time because not all the group buckets are getting hit. In short, the test creates a group with 3 buckets with the same weight. It creates 12 TCP sessions and expects that each bucket will be used at least once. However, there is a solid chance that this will not happen. The probability of having at least one empty bucket is: C(3, 1) x (2/3)^N - C(3, 2) x (1/3)^N Where N is the number of distinct TCP sessions. For N=12, the probability is about 0.023, i.e. there is a 2.3% chance for a test to fail, which is not great for CI. Increasing the number of sessions to 50 to reduce the probability of failure down to 4.7e-9. In my testing, the configuration with 50 TCP sessions didn't fail after 6000 runs. Should be good enough for CI systems. Fixes: 2c66ebe47a88 ("ofp-actions: Allow conntrack action in group buckets.") Signed-off-by: Ilya Maximets --- tests/system-traffic.at | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/system-traffic.at b/tests/system-traffic.at index 95383275a..4a7fa49fc 100644 --- a/tests/system-traffic.at +++ b/tests/system-traffic.at @@ -6471,7 +6471,7 @@ on_exit 'ovs-appctl revalidator/purge' on_exit 'ovs-appctl dpif/dump-flows br0' dnl Should work with the virtual IP address through NAT -for i in 1 2 3 4 5 6 7 8 9 10 11 12; do +for i in $(seq 1 50); do echo Request $i NS_CHECK_EXEC([at_ns1], [wget 10.1.1.64 -t 5 -T 1 --retry-connrefused -v -o wget$i.log]) done -- 2.34.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] datapath: use after free flow mask on destroy flow table
On 3/24/22 05:17, Wentao Jia wrote: > > > on destroy flow table instance, referenced flow mask may be released > too. fuction ovs_flow_tbl_destroy(), release flow mask first and then > destroy flow table instance. this will trigger kernel panic on detroy > datapath > > > [ 377.647756] kernel BUG at .../datapath/linux/flow_table.c:272! > [ 377.653794] invalid opcode: [#1] SMP PTI > [ 377.666827] RIP: 0010:table_instance_flow_free.isra.7+0x148/0x150 > [ 377.711465] Call Trace: > [ 377.715238] > [ 377.718964] table_instance_destroy+0xbe/0x160 [openvswitch] > [ 377.722793] destroy_dp_rcu+0x12/0x40 [openvswitch] > [ 377.726651] rcu_process_callbacks+0x297/0x460 > [ 377.736795] __do_softirq+0xe3/0x30a > [ 377.740654] ? ktime_get+0x36/0xa0 > [ 377.744490] irq_exit+0x100/0x110 > [ 377.748514] smp_apic_timer_interrupt+0x74/0x140 > [ 377.752817] apic_timer_interrupt+0xf/0x20 > [ 377.758802] > > > Fixes: 6d1cf7f3e ("datapath: fix possible memleak on destroy > flow-table") > > > Signed-off-by: Wentao Jia > Signed-off-by: Chuanjie Zeng > --- Hi, Wentao Jia. Thanks for the patch! Please, send it to the mainline linux kernel ('netdev' mailing list, keeping the ovs-dev in CC) using the linux kernel process for submitting patches. When it is accepted to the upstream kernel, it can be backported to the OOT kernel module in OVS repository. Best regards, Ilya Maximets. > datapath/flow_table.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > > diff --git a/datapath/flow_table.c b/datapath/flow_table.c > index 650338fb0..b2f4b1108 100644 > --- a/datapath/flow_table.c > +++ b/datapath/flow_table.c > @@ -415,8 +415,8 @@ void ovs_flow_tbl_destroy(struct flow_table *table) > struct table_instance *ufid_ti = rcu_dereference_raw(table->ufid_ti); > > > free_percpu(table->mask_cache); > - kfree(rcu_dereference_raw(table->mask_array)); > table_instance_destroy(table, ti, ufid_ti, false); > + kfree(rcu_dereference_raw(table->mask_array)); > } ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v5 00/15] Fix undefined behavior in loop macros
On 3/23/22 17:24, Ilya Maximets wrote: > On 3/23/22 17:11, Adrian Moreno wrote: >> >> >> On 3/23/22 14:25, Eelco Chaudron wrote: >>> One small nit in patch 4, the rest of the changes from v4 to v5 look good. >>> >> >> Sorry about that. Dumitru said he might take a look at the latest version. >> I'll give him some time and resend. > > If there will be no further comments from Dumitru, I had another look at v5 and I acked all remaining patches. > I can fix that one comment on commit. Sounds good to me. Thanks, Dumitru ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v5 12/15] rculist: use multi-variable helpers for loop macros
On 3/23/22 12:56, Adrian Moreno wrote: > Use multi-variable iteration helpers to rewrite rculist loops macros. > > There is an important behavior change compared with the previous > implementation: When the loop ends normally (i.e: not via "break;"), > the object pointer provided by the user is NULL. This is safer > because it's not guaranteed that it would end up pointing a valid > address. > > Acked-by: Eelco Chaudron > Signed-off-by: Adrian Moreno > --- Acked-by: Dumitru Ceara ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v5 10/15] hindex: use multi-variable iterators
On 3/23/22 12:56, Adrian Moreno wrote: > Re-write hindex's loops using multi-variable helpers. > > For safe loops, use the LONG version to maintain backwards > compatibility. > > Acked-by: Eelco Chaudron > Signed-off-by: Adrian Moreno > --- Acked-by: Dumitru Ceara ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v5 09/15] cmap: use multi-variable iterators
On 3/23/22 12:56, Adrian Moreno wrote: > Re-write cmap's loops using multi-variable helpers. > > For iterators based on cmap_cursor, we just need to make sure the NODE > variable is not used after the loop, so we set it to NULL. > > Acked-by: Eelco Chaudron > Signed-off-by: Adrian Moreno > --- Acked-by: Dumitru Ceara ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v5 06/15] hmap: use multi-variable helpers for hmap loops
On 3/23/22 12:56, Adrian Moreno wrote: > Rewrite hmap's loops using multi-variable helpers. > > For SAFE loops, use the LONG version of the multi-variable macros to > keep backwards compatibility. > > Acked-by: Eelco Chaudron > Signed-off-by: Adrian Moreno > --- Acked-by: Dumitru Ceara ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v5 04/15] list: use multi-variable helpers for list loops
On 3/23/22 12:56, Adrian Moreno wrote: > Use multi-variable iteration helpers to rewrite non-safe loops. > > There is an important behavior change compared with the previous > implementation: When the loop ends normally (i.e: not via "break;"), the > object pointer provided by the user is NULL. This is safer because it's > not guaranteed that it would end up pointing a valid address. > > For pop iterator, set the variable to NULL when the loop ends. > > Clang-analyzer has successfully picked the potential null-pointer > dereference on the code that triggered this change (bond.c) and nothing > else has been detected. > > For _SAFE loops, use the LONG version for backwards compatibility. > > Acked-by: Eelco Chaudron > Signed-off-by: Adrian Moreno > --- Acked-by: Dumitru Ceara ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v5 02/15] util: add safe multi-variable iterators
On 3/23/22 12:56, Adrian Moreno wrote: > Safe version of multi-variable iterator helpers declare an internal > variable to store the next value of the iterator temporarily. > > Two versions of the macro are provided, one that still uses the NEXT > variable for backwards compatibility and a shorter version that does not > require the use of an additional variable provided by the user. > > Signed-off-by: Adrian Moreno > Acked-by: Eelco Chaudron > --- Acked-by: Dumitru Ceara ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v5 01/15] util: add multi-variable loop iterator macros
On 3/23/22 12:56, Adrian Moreno wrote: > Multi-variable loop iterators avoid potential undefined behavior by > using an internal iterator variable to perform the iteration and only > referencing the containing object (via OBJECT_CONTAINING) if the > iterator has been validated via the second expression of the for > statement. > > That way, the user can easily implement a loop that never tries to > obtain the object containing NULL or stack-allocated non-contained > nodes. > > When the loop ends normally (not via "break;") the user-provided > variable is set to NULL. > > Signed-off-by: Adrian Moreno > Acked-by: Eelco Chaudron > --- Acked-by: Dumitru Ceara ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ofproto/bond: Add knob "all_slaves_active"
On Wed, Mar 23, 2022 at 10:12 PM Mike Pattrick wrote: > > On Thu, Mar 17, 2022 at 11:54 AM Christophe Fontaine > wrote: > > > > This config param allows the delivery of broadcast and multicast packets > > to the secondary interface of non-lacp bonds, identical to the same option > > for kernel bonds. > > > > Tested with an ovs-dpdk balance-slb bond with 2 virtual functions, > > and a kernel bond with LACP on 2 other virtual functions. > > > > Scapy sending 10 broadcast packets from 10 different mac addresses: > > - with ovs-vsctl set port dpdkbond other_config:all_slaves_active=false > > (default unmodified behavior) 5 packets are received > > - with other_config:all_slaves_active=true, all 10 packets are received. > > > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1720935 > > Signed-off-by: Christophe Fontaine > > --- > > Hello Chris, > > This feature seems like it would be useful, and looks correctly > implemented. However, the nomenclature used throughout the code and > documentation is member in place of slave. > Hi Mike, Sure, I'll fix that in v2. Thanks, Christophe > > Cheers, > M > > > NEWS | 1 + > > ofproto/bond.c | 5 - > > ofproto/bond.h | 2 ++ > > vswitchd/bridge.c| 3 +++ > > vswitchd/vswitch.xml | 20 > > 5 files changed, 30 insertions(+), 1 deletion(-) > > > > diff --git a/NEWS b/NEWS > > index df633e8e2..07f72fd5a 100644 > > --- a/NEWS > > +++ b/NEWS > > @@ -16,6 +16,7 @@ v2.17.0 - xx xxx > > * Removed experimental tag for PMD Auto Load Balance. > > * New configuration knob 'other_config:n-offload-threads' to change > > the > > number of HW offloading threads. > > + * New configuration knob 'all_slaves_active' for non lacp bonds > > - DPDK: > > * EAL argument --socket-mem is no longer configured by default upon > > start-up. If dpdk-socket-mem and dpdk-alloc-mem are not specified, > > diff --git a/ofproto/bond.c b/ofproto/bond.c > > index cdfdf0b9d..8bf166a41 100644 > > --- a/ofproto/bond.c > > +++ b/ofproto/bond.c > > @@ -145,6 +145,7 @@ struct bond { > > struct eth_addr active_member_mac; /* MAC address of the active > > member. */ > > /* Legacy compatibility. */ > > bool lacp_fallback_ab; /* Fallback to active-backup on LACP failure. */ > > +bool all_slaves_active; > > > > struct ovs_refcount ref_cnt; > > }; > > @@ -448,6 +449,7 @@ bond_reconfigure(struct bond *bond, const struct > > bond_settings *s) > > > > bond->updelay = s->up_delay; > > bond->downdelay = s->down_delay; > > +bond->all_slaves_active = s->all_slaves_active; > > > > if (bond->lacp_fallback_ab != s->lacp_fallback_ab_cfg) { > > bond->lacp_fallback_ab = s->lacp_fallback_ab_cfg; > > @@ -893,7 +895,8 @@ bond_check_admissibility(struct bond *bond, const void > > *member_, > > > > /* Drop all multicast packets on inactive members. */ > > if (eth_addr_is_multicast(eth_dst)) { > > -if (bond->active_member != member) { > > +if (bond->active_member != member && > > +bond->all_slaves_active == false) { > > goto out; > > } > > } > > diff --git a/ofproto/bond.h b/ofproto/bond.h > > index 1683ec878..4e8c1872a 100644 > > --- a/ofproto/bond.h > > +++ b/ofproto/bond.h > > @@ -62,6 +62,8 @@ struct bond_settings { > > ovs run. */ > > bool use_lb_output_action; /* Use lb_output action. Only applicable > > for > > bond mode BALANCE TCP. */ > > +bool all_slaves_active; /* For non LACP bond, also accept multicast > > + packets on secondary interface. */ > > }; > > > > /* Program startup. */ > > diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c > > index 5223aa897..53be7ddf6 100644 > > --- a/vswitchd/bridge.c > > +++ b/vswitchd/bridge.c > > @@ -4615,6 +4615,9 @@ port_configure_bond(struct port *port, struct > > bond_settings *s) > > s->use_lb_output_action = (s->balance == BM_TCP) > >&& smap_get_bool(&port->cfg->other_config, > > "lb-output-action", false); > > +/* all_slaves_active is disabled by default */ > > +s->all_slaves_active = smap_get_bool(&port->cfg->other_config, > > + "all_slaves_active", false); > > } > > > > /* Returns true if 'port' is synthetic, that is, if we constructed it > > locally > > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml > > index 0c6632617..d420397a3 100644 > > --- a/vswitchd/vswitch.xml > > +++ b/vswitchd/vswitch.xml > > @@ -2083,6 +2083,26 @@ > > true). > > > > > > + > + type='{"type": "boolean"}'> > > + > > + Enable/Disable delivery of broadcast/multicast packets on > > secondary > > + interface of a bond. Relevant only when is > > +