Re: [ovs-dev] [PATCH RFC v2 0/6] conntrack: Introduce buckets and reduce contention.
Hi, just some thought on the results. the test-conntrack.c now only prepares one batch size of packets for each thread, and repeat running ct on the same batch until the packet number reaches to the parameter of n_pkt, which is, in your case, 16777216 So I guess the optimization results largely comes from the fact that you remove the global lock on the timeout lists, which is taken/released by each packet. the split bucket patch reduces the contention and thus increase the rate of connection setup, I think maybe we need a new benchmark on the setup rate. Paolo Valerio 于2022年3月7日周一 20:39写道: > This series aims to share the work done so far, and to start a > discussion of a slightly different approach in terms of the way we > handle the connections. > It's not considered ready as further work and extensive tests are > needed. > > These are some performance numbers obtained in the following way: > > ./tests/ovstest test-conntrack benchmark $i 16777216 32 1 > > $i baseline ct-scale ct-bucket > 2 16480 ms 3527 ms3180 ms > 4 37375 ms 4997 ms4477 ms > 8 63239 ms 9692 ms8954 ms > 16 131583 ms 19303 ms 18062 ms > > Both ct-scale and ct-bucket introduce a noticeable improvement in > terms of performace. It's worth to mention that the ct-scale, keeping > the exp_lists, is more efficient during the sweeping. > > There are two main logical changes that the series tries to introduce. > The first one is the reintroduction of buckets, the second one is the > removal of the expiration lists. > > The first two patches are a prereq and were picked (untouched) from > this series (by Gaetan): > > > https://patchwork.ozlabs.org/project/openvswitch/list/?series=249027&state=* > > The third patch is a follow up of the following: > > > https://patchwork.ozlabs.org/project/openvswitch/patch/20201129033255.64647-2-hepeng.0...@bytedance.com/ > > Some logic has changed, but the overall idea remained the same. > Additional details in the patch description. > > The fourth patch introduces the buckets and removes the expiration > lists. > The buckets got reintroduced as cmaps instead of hmaps, trying to > narrow down the critical sections as well. > > An alternative approach may involve the replacement of cmaps with > linked lists (without increasing the number of locks), increasing the > number of buckets (it could involve the ability to change the buckets > number when the user changes the connection limit). This could improve > the stale conns eviction during the processing of the packet, but this > is currently out of scope for this series (can be discussed, though). > > The insertion, namely the creation of the connection, can probably be > improved (or at least an improvement can be evaluated), as, in case of > nat, it acquires the bucket locks calling nat_get_unique_tuple_lock(). > This is needed because we may need to know the reverse tuple before > locking in order to acquire the locks in the right way to avoid ABBA > deadlocks. > > Further details about locking may be found in patch #4 description. > > There's a known limitation, and it is related mostly to zone > limit. Keeping the stale entries, leads to a mismatch between the > current number of per zone connections and the actual number of valid > connections. This means that if we have a small zone limit number, we > may wait a whole "next_wakeup" time before the entries get cleaned up > (assuming we have candidates for removal). To avoid this, a zone clean > up from the packet path has been added, but alternative approaches as > triggering a per zone cleanup when a threshold of connections has > been reached (storing the connections per bucket and per zone during > the creation) may be still viable. > > Expiration lists have been removed for the sake of evaluation, but > they could be included and duplicated among buckets. In this way we > could distribute the contention (as a matter of fact reducing the > number of connections per list) that affects the write access to a > single data structure during every connection update. > > Patch #6 reintegrates the command below with multiple buckets: > > ovs-appctl dpctl/ct-bkts > > A couple of minor patches have been kept out of the series, for the > time being, as they strongly depend on #4. > > Gaetan Rivet (3): > conntrack: Use a cmap to store zone limits > conntrack-tp: Use a cmap to store timeout policies > conntrack: Use an atomic conn expiration value > > Paolo Valerio (2): > conntrack: Split single cmap to multiple buckets. > conntrack: Make ovs-appctl dpctl/ct-bkts work with multiple buckets > > Peng He (1): > conntrack: Replaces nat_conn introducing key directionality. > > > lib/conntrack-private.h | 60 ++- > lib/conntrack-tp.c | 102 ++-- > lib/conntrack.c | 1062 +++ > lib/conntrack.h |6 +- > lib/dpif-netdev.c |5 +- > tests/system-traffic.at |5 +- > 6 files
Re: [ovs-dev] [PATCH net-next v2] net: openvswitch: fix uAPI incompatibility with existing user space
Hello: This patch was applied to netdev/net-next.git (master) by Jakub Kicinski : On Wed, 9 Mar 2022 23:20:33 +0100 you wrote: > Few years ago OVS user space made a strange choice in the commit [1] > to define types only valid for the user space inside the copy of a > kernel uAPI header. '#ifndef __KERNEL__' and another attribute was > added later. > > This leads to the inevitable clash between user space and kernel types > when the kernel uAPI is extended. The issue was unveiled with the > addition of a new type for IPv6 extension header in kernel uAPI. > > [...] Here is the summary with links: - [net-next,v2] net: openvswitch: fix uAPI incompatibility with existing user space https://git.kernel.org/netdev/net-next/c/1926407a4ab0 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v20 3/8] dpif-offload-provider: Introduce dpif-offload-provider layer
On 2022-03-04 6:59 PM, Eelco Chaudron wrote: We are waiting on Ilya’s feedback on this, which he told us he will look at after the v2.17 release. https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.ozlabs.org%2Fproject%2Fopenvswitch%2Fpatch%2F2025025312.786953-4-cmi%40nvidia.com%2F&data=04%7C01%7Ccmi%40nvidia.com%7C01938f4df4e54cb7fd1008d9fdce2259%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637819884060258877%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=6iEQpwH%2FxEJyFuhdtqey%2F53I5cT0rSsz4ueKbHAiWJU%3D&reserved=0 Looking at the thread again, I still feel like we need to remove the 1:1 mapping, if not we should remove the entire offload class. OK. Let's wait on Ilya's feedback. If the design is finalized, I'll change it accordingly. Thanks, Chris //Eelco On 15 Feb 2022, at 10:17, Chris Mi wrote: Some offload actions require functionality that is not netdev based, but dpif. For example, sFlow action requires to create a psample netlink socket to receive the sampled packets from TC or kernel driver. Create dpif-offload-provider layer to support such actions. Issue: 2181036 Change-Id: I880b8b2aebbf6886fd8064409108ee59974ff195 Signed-off-by: Chris Mi Reviewed-by: Eli Britstein --- lib/automake.mk | 2 ++ lib/dpif-offload-provider.h | 52 + lib/dpif-offload.c | 43 ++ lib/dpif-provider.h | 4 ++- 4 files changed, 100 insertions(+), 1 deletion(-) create mode 100644 lib/dpif-offload-provider.h create mode 100644 lib/dpif-offload.c diff --git a/lib/automake.mk b/lib/automake.mk index a23cdc4ad..781fba47a 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -129,6 +129,8 @@ lib_libopenvswitch_la_SOURCES = \ lib/dpif-netdev-private.h \ lib/dpif-netdev-perf.c \ lib/dpif-netdev-perf.h \ + lib/dpif-offload.c \ + lib/dpif-offload-provider.h \ lib/dpif-provider.h \ lib/dpif.c \ lib/dpif.h \ diff --git a/lib/dpif-offload-provider.h b/lib/dpif-offload-provider.h new file mode 100644 index 0..3b94740df --- /dev/null +++ b/lib/dpif-offload-provider.h @@ -0,0 +1,52 @@ +/* + * Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.apache.org%2Flicenses%2FLICENSE-2.0&data=04%7C01%7Ccmi%40nvidia.com%7C01938f4df4e54cb7fd1008d9fdce2259%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637819884060258877%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=z5yeYkZ654e15ykYLHQjYeAMGL%2B9N94LTkoFyNg%2Fdgs%3D&reserved=0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef DPIF_OFFLOAD_PROVIDER_H +#define DPIF_OFFLOAD_PROVIDER_H + +struct dpif; +struct dpif_offload_sflow; + +/* Datapath interface offload structure, to be defined by each implementation + * of a datapath interface. + */ +struct dpif_offload_class { +/* Type of dpif offload in this class, e.g. "system", "netdev", etc. + * + * One of the providers should supply a "system" type, since this is + * the type assumed if no type is specified when opening a dpif. */ +const char *type; + +/* Called when the dpif offload provider is registered. */ +int (*init)(void); + +/* Free all dpif offload resources. */ +void (*destroy)(void); + +/* Arranges for the poll loop for an upcall handler to wake up when psample + * has a message queued to be received. */ +void (*sflow_recv_wait)(void); + +/* Polls for an upcall from psample for an upcall handler. + * Return 0 for success. */ +int (*sflow_recv)(struct dpif_offload_sflow *sflow); +}; + +void dpif_offload_sflow_recv_wait(const struct dpif *dpif); +int dpif_offload_sflow_recv(const struct dpif *dpif, +struct dpif_offload_sflow *sflow); + +#endif /* DPIF_OFFLOAD_PROVIDER_H */ diff --git a/lib/dpif-offload.c b/lib/dpif-offload.c new file mode 100644 index 0..f2bf3e634 --- /dev/null +++ b/lib/dpif-offload.c @@ -0,0 +1,43 @@ +/* + * Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * https://nam11.safelinks.protection.outlook.com/?url=http%
Re: [ovs-dev] [PATCH v20 5/8] dpif-offload-netlink: Implement dpif-offload-provider API
On 2022-03-04 6:50 PM, Eelco Chaudron wrote: I’m re-adding some of my v18 comments, which I know are depending on Ilya’s feedback, although I still think we should have provider classes, i.e., not based on the datapath class. Some other questions could still be answered regardless of Ilya’s feedback (will mark them with **). Please answer them inline, before sending another revision so we can have some discussion around them if needed. OK. //Eelco On 15 Feb 2022, at 10:17, Chris Mi wrote: Implement dpif-offload API for netlink datapath. And implement a dummy dpif-offload API for netdev datapath to make tests pass. ** Why do we need a dummy? If there is no hardware offload class we should just pass, shouldn’t we? According to one of your below comment, I moved dp_offload_class_lookup from dpif to dpif-netlink. After that, dpif-offload for dummy is not needed. Issue: 2181036 Change-Id: I950c3cc3c7092c3d87602da8928ad4aa2df4f38a Signed-off-by: Chris Mi Reviewed-by: Eli Britstein --- lib/automake.mk | 2 + lib/dpif-netdev.c | 8 +- lib/dpif-netlink.c | 4 +- lib/dpif-offload-netdev.c | 43 +++ lib/dpif-offload-netlink.c | 221 lib/dpif-offload-provider.h | 25 +++- lib/dpif-offload.c | 157 + lib/dpif-provider.h | 6 +- lib/dpif.c | 17 ++- 9 files changed, 476 insertions(+), 7 deletions(-) create mode 100644 lib/dpif-offload-netdev.c create mode 100644 lib/dpif-offload-netlink.c diff --git a/lib/automake.mk b/lib/automake.mk index 781fba47a..9ed61029a 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -130,6 +130,7 @@ lib_libopenvswitch_la_SOURCES = \ lib/dpif-netdev-perf.c \ lib/dpif-netdev-perf.h \ lib/dpif-offload.c \ + lib/dpif-offload-netdev.c \ lib/dpif-offload-provider.h \ lib/dpif-provider.h \ lib/dpif.c \ @@ -453,6 +454,7 @@ lib_libopenvswitch_la_SOURCES += \ lib/dpif-netlink.h \ lib/dpif-netlink-rtnl.c \ lib/dpif-netlink-rtnl.h \ + lib/dpif-offload-netlink.c \ lib/if-notifier.c \ lib/netdev-linux.c \ lib/netdev-linux.h \ diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index e28e0b554..5ebd127b9 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -1683,7 +1683,8 @@ create_dpif_netdev(struct dp_netdev *dp) ovs_refcount_ref(&dp->ref_cnt); dpif = xmalloc(sizeof *dpif); -dpif_init(&dpif->dpif, dp->class, dp->name, netflow_id >> 8, netflow_id); +dpif_init(&dpif->dpif, dp->class, NULL, dp->name, netflow_id >> 8, + netflow_id); dpif->dp = dp; dpif->last_port_seq = seq_read(dp->port_seq); @@ -9639,6 +9640,10 @@ dpif_dummy_override(const char *type) if (error == 0 || error == EAFNOSUPPORT) { dpif_dummy_register__(type); } +error = dp_offload_unregister_provider(type); +if (error == 0 || error == EAFNOSUPPORT) { +dpif_offload_dummy_register(type); +} } void @@ -9659,6 +9664,7 @@ dpif_dummy_register(enum dummy_level level) } dpif_dummy_register__("dummy"); +dpif_offload_dummy_register("dummy"); unixctl_command_register("dpif-dummy/change-port-number", "dp port new-number", diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c index 71e35ccdd..75f85c254 100644 --- a/lib/dpif-netlink.c +++ b/lib/dpif-netlink.c @@ -461,8 +461,8 @@ open_dpif(const struct dpif_netlink_dp *dp, struct dpif **dpifp) dpif->port_notifier = NULL; fat_rwlock_init(&dpif->upcall_lock); -dpif_init(&dpif->dpif, &dpif_netlink_class, dp->name, - dp->dp_ifindex, dp->dp_ifindex); +dpif_init(&dpif->dpif, &dpif_netlink_class, &dpif_offload_netlink_class, + dp->name, dp->dp_ifindex, dp->dp_ifindex); dpif->dp_ifindex = dp->dp_ifindex; dpif->user_features = dp->user_features; diff --git a/lib/dpif-offload-netdev.c b/lib/dpif-offload-netdev.c new file mode 100644 index 0..35dac2cc3 --- /dev/null +++ b/lib/dpif-offload-netdev.c @@ -0,0 +1,43 @@ +/* + * Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.apache.org%2Flicenses%2FLICENSE-2.0&data=04%7C01%7Ccmi%40nvidia.com%7Cadd31a18c66c46ea056408d9fdccd97d%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637819878551136144%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=A5fNu%2BjqL48flvxwkfz1BHOAiu6LLuBTdp%2F4HkYVEdw%3D&reserved=0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "A
[ovs-dev] [PATCH] ovsdb: raft: Fix inability to read the database with DNS host names.
Clustered OVSDB allows to use DNS names as addresses of raft members. However, if DNS resolution fails during the initial database read, this causes a fatal failure and exit of the ovsdb-server process. Also, if DNS name of a joining server is not resolvable for one of the followers, this follower will reject append requests for a new server to join until the name is successfully resolved. This makes a follower effectively non-functional while DNS is unavailable. To fix the problem relaxing the address verification. Allowing validation to pass if only name resolution failed and the address is valid otherwise. This will allow addresses to be added to the database, so connections could be established later when the DNS is available. Additionally fixing missed initialization of the dns-resolve module. Wihtout it, DNS requests are blocking. This causes unexpected delays in runtime. Fixes: 771680d96fb6 ("DNS: Add basic support for asynchronous DNS resolving") Reported-at: https://bugzilla.redhat.com/2055097 Signed-off-by: Ilya Maximets --- lib/dns-resolve.c| 2 +- lib/socket-util.c| 39 lib/socket-util.h| 3 ++- lib/stream.c | 2 +- ofproto/ofproto-dpif-sflow.c | 3 ++- ovsdb/ovsdb-server.c | 3 +++ ovsdb/raft-private.c | 5 - 7 files changed, 39 insertions(+), 18 deletions(-) diff --git a/lib/dns-resolve.c b/lib/dns-resolve.c index d34451434..8bcecb90c 100644 --- a/lib/dns-resolve.c +++ b/lib/dns-resolve.c @@ -265,7 +265,7 @@ resolve_callback__(void *req_, int err, struct ub_result *result) if (err != 0 || (result->qtype == ns_t_ && !result->havedata)) { ub_resolve_free(result); req->state = RESOLVE_ERROR; -VLOG_ERR_RL(&rl, "%s: failed to resolve", req->name); +VLOG_WARN_RL(&rl, "%s: failed to resolve", req->name); return; } diff --git a/lib/socket-util.c b/lib/socket-util.c index 4f1ffecf5..38705cc51 100644 --- a/lib/socket-util.c +++ b/lib/socket-util.c @@ -62,7 +62,8 @@ static bool parse_sockaddr_components(struct sockaddr_storage *ss, const char *port_s, uint16_t default_port, const char *s, - bool resolve_host); + bool resolve_host, + bool *dns_failure); /* Sets 'fd' to non-blocking mode. Returns 0 if successful, otherwise a * positive errno value. */ @@ -438,7 +439,7 @@ parse_sockaddr_components_dns(struct sockaddr_storage *ss OVS_UNUSED, dns_resolve(host_s, &tmp_host_s); if (tmp_host_s != NULL) { parse_sockaddr_components(ss, tmp_host_s, port_s, - default_port, s, false); + default_port, s, false, NULL); free(tmp_host_s); return true; } @@ -450,11 +451,15 @@ parse_sockaddr_components(struct sockaddr_storage *ss, char *host_s, const char *port_s, uint16_t default_port, const char *s, - bool resolve_host) + bool resolve_host, bool *dns_failure) { struct sockaddr_in *sin = sin_cast(sa_cast(ss)); int port; +if (dns_failure) { +*dns_failure = false; +} + if (port_s && port_s[0]) { if (!str_to_int(port_s, 10, &port) || port < 0 || port > 65535) { VLOG_ERR("%s: bad port number \"%s\"", s, port_s); @@ -501,10 +506,15 @@ parse_sockaddr_components(struct sockaddr_storage *ss, return true; resolve: -if (resolve_host && parse_sockaddr_components_dns(ss, host_s, port_s, - default_port, s)) { -return true; -} else if (!resolve_host) { +if (resolve_host) { +if (parse_sockaddr_components_dns(ss, host_s, port_s, + default_port, s)) { +return true; +} +if (dns_failure) { +*dns_failure = true; +} +} else { VLOG_ERR("%s: bad IP address \"%s\"", s, host_s); } exit: @@ -521,10 +531,12 @@ exit: * It resolves the host if 'resolve_host' is true. * * On success, returns true and stores the parsed remote address into '*ss'. - * On failure, logs an error, stores zeros into '*ss', and returns false. */ + * On failure, logs an error, stores zeros into '*ss', and returns false, + * '*dns_failure' indicates if the host resolution failed. */ bool inet_parse_active(const char *target_, int default_port, - struct sockaddr_storage *ss, bool resolve_host) + struct sockaddr_storage *ss, + bool resolve_host, bool *dns_failure) { char *target = xstrdup(target_); char *port, *ho
[ovs-dev] [PATCH ovn branch-21.12 2/2] Prepare for 21.12.2.
Signed-off-by: Mark Michelson --- NEWS | 3 +++ configure.ac | 2 +- debian/changelog | 6 ++ 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/NEWS b/NEWS index b00644b59..4cb40cd7f 100644 --- a/NEWS +++ b/NEWS @@ -1,3 +1,6 @@ +OVN v21.12.2 - xx xxx +-- + OVN v21.12.1 - 10 Mar 2022 -- - Bug fixes diff --git a/configure.ac b/configure.ac index e44c86c74..c4bf08db7 100644 --- a/configure.ac +++ b/configure.ac @@ -13,7 +13,7 @@ # limitations under the License. AC_PREREQ(2.63) -AC_INIT(ovn, 21.12.1, b...@openvswitch.org) +AC_INIT(ovn, 21.12.2, b...@openvswitch.org) AC_CONFIG_MACRO_DIR([m4]) AC_CONFIG_AUX_DIR([build-aux]) AC_CONFIG_HEADERS([config.h]) diff --git a/debian/changelog b/debian/changelog index eda96af03..9f3d2ebfb 100644 --- a/debian/changelog +++ b/debian/changelog @@ -1,3 +1,9 @@ +OVN (21.12.2-1) unstable; urgency=low + [ OVN team ] + * New upstream version + + -- OVN team Thu, 10 Mar 2022 15:47:33 -0500 + OVN (21.12.1-1) unstable; urgency=low [ OVN team ] * New upstream version -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn branch-21.12 0/2] Release patches for v21.12.1.
Mark Michelson (2): Set release date for 21.12.1. Prepare for 21.12.2. NEWS | 6 +- configure.ac | 2 +- debian/changelog | 8 +++- 3 files changed, 13 insertions(+), 3 deletions(-) -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn branch-22.03 2/2] Prepare for 22.03.1.
Signed-off-by: Mark Michelson --- NEWS | 3 +++ configure.ac | 2 +- debian/changelog | 6 ++ 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/NEWS b/NEWS index 0f8e068b1..15fa545d2 100644 --- a/NEWS +++ b/NEWS @@ -1,3 +1,6 @@ +OVN v22.03.1 - xx xxx +-- + OVN v22.03.0 - 11 Mar 2022 -- - Refactor CoPP commands introducing a unique name index in CoPP NB table. diff --git a/configure.ac b/configure.ac index 283381b4e..70f86e1f5 100644 --- a/configure.ac +++ b/configure.ac @@ -13,7 +13,7 @@ # limitations under the License. AC_PREREQ(2.63) -AC_INIT(ovn, 22.03.0, b...@openvswitch.org) +AC_INIT(ovn, 22.03.1, b...@openvswitch.org) AC_CONFIG_MACRO_DIR([m4]) AC_CONFIG_AUX_DIR([build-aux]) AC_CONFIG_HEADERS([config.h]) diff --git a/debian/changelog b/debian/changelog index a26420b7a..a22ba6e91 100644 --- a/debian/changelog +++ b/debian/changelog @@ -1,3 +1,9 @@ +OVN (22.03.1-1) unstable; urgency=low + [ OVN team ] + * New upstream version + + -- OVN team Thu, 10 Mar 2022 15:47:41 -0500 + ovn (22.03.0-1) unstable; urgency=low * New upstream version -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn branch-22.03 0/2] Release patches for v22.03.0.
Mark Michelson (2): Set release date for 22.03.0. Prepare for 22.03.1. NEWS | 5 - configure.ac | 2 +- debian/changelog | 8 +++- 3 files changed, 12 insertions(+), 3 deletions(-) -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn branch-22.03 1/2] Set release date for 22.03.0.
Signed-off-by: Mark Michelson --- NEWS | 2 +- debian/changelog | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/NEWS b/NEWS index 9f3ce3cf3..0f8e068b1 100644 --- a/NEWS +++ b/NEWS @@ -1,4 +1,4 @@ -OVN v22.03.0 - XX XXX +OVN v22.03.0 - 11 Mar 2022 -- - Refactor CoPP commands introducing a unique name index in CoPP NB table. Add following new CoPP commands to manage CoPP table: diff --git a/debian/changelog b/debian/changelog index fe0e7f488..a26420b7a 100644 --- a/debian/changelog +++ b/debian/changelog @@ -2,7 +2,7 @@ ovn (22.03.0-1) unstable; urgency=low * New upstream version - -- OVN team Thu, 24 Feb 2022 16:55:45 -0500 + -- OVN team Thu, 10 Mar 2022 15:47:41 -0500 ovn (21.12.0-1) unstable; urgency=low -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn branch-21.12 1/2] Set release date for 21.12.1.
Signed-off-by: Mark Michelson --- NEWS | 3 ++- debian/changelog | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/NEWS b/NEWS index 4043ecf20..b00644b59 100644 --- a/NEWS +++ b/NEWS @@ -1,5 +1,6 @@ -OVN v21.12.1 - xx xxx +OVN v21.12.1 - 10 Mar 2022 -- + - Bug fixes OVN v21.12.0 - 22 Dec 2021 -- diff --git a/debian/changelog b/debian/changelog index 0d8593083..eda96af03 100644 --- a/debian/changelog +++ b/debian/changelog @@ -2,7 +2,7 @@ OVN (21.12.1-1) unstable; urgency=low [ OVN team ] * New upstream version - -- OVN team Wed, 22 Dec 2021 09:45:52 -0500 + -- OVN team Thu, 10 Mar 2022 15:47:33 -0500 ovn (21.12.0-1) unstable; urgency=low -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn v2] ovn-northd: Don't commit to conntrack if the conntrack was skipped.
From: Numan Siddique Presently, if the inport or outport is a peer port (of router port), then we skip sending the packet to conntrack by not setting the reg0[0]/reg0[1] bits. But the packet still goes through the stages - ls_in_acl_hint, ls_in_acl and ls_in_stateful in the ingress pipeline and similarly ls_out_acl_hint, ls_out_acl and ls_out_stateful in the egress pipeline of the logical switch. In these mentioned stages, the logical flows match on the ct states (ct.new, ct.est etc) and this can be problematic if the packet was sent to conntrack and committed in the ingress pipeline. When that packet enters the egress pipeline with outport set to the peer port, we skip sending the packet to conntrack (which is expected) but ct state values carry over from the ingress state. If the ct.new flag was set in the ingress pipeline, then the below flows are hit and the packet gets committed in the conntrack zone of the inport logical port. table=4 (ls_out_acl ), priority=1, match=(ip && (!ct.est || (ct.est && ct_label.blocked == 1))), action=(reg0[1] = 1; next;) table=7 (ls_out_stateful), priority=100 , match=(reg0[1] == 1 && reg0[13] == 0), action=(ct_commit { ct_label.blocked = 0; }; next;) With this extra commit to the same conntrack zone, sometimes the packet gets dropped or doesn't reach the destination. It is not very clear how the packet gets misplaced, but avoiding this resolves the issue. And OVN ideally should not do this extra commit. This patch addresses this issue by adding the below logical flow both in ls_in_acl and ls_out_acl stage priority=2, match=(reg0[0] == 0 && reg0[1] == 0), action=(next;) Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2060688 Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2062431 Reported-by: Michael Washer Signed-off-by: Numan Siddique --- v1 -> v2 --- * Changed the approach to solve the issue. northd/northd.c | 17 +++-- northd/ovn-northd.8.xml | 6 ++ tests/ovn-northd.at | 5 + 3 files changed, 26 insertions(+), 2 deletions(-) diff --git a/northd/northd.c b/northd/northd.c index b264fb850b..23ab5ac260 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -6529,7 +6529,7 @@ build_acls(struct ovn_datapath *od, struct hmap *lflows, ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL_AFTER_LB, 0, "1", "next;"); if (has_stateful) { -/* Ingress and Egress ACL Table (Priority 1). +/* Ingress and Egress ACL Table (Priority 1 and 2). * * By default, traffic is allowed. This is partially handled by * the Priority 0 ACL flows added earlier, but we also need to @@ -6549,7 +6549,20 @@ build_acls(struct ovn_datapath *od, struct hmap *lflows, * We need to set ct_label.blocked=0 to let the connection continue, * which will be done by ct_commit() in the "stateful" stage. * Subsequent packets will hit the flow at priority 0 that just - * uses "next;". */ + * uses "next;". + * + * However, we don't want to commit if the packet was not sent to + * conntrack in the first place. + * Eg. if inport or outport is a router peer port. */ +ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, 2, + REGBIT_CONNTRACK_DEFRAG" == 0 && " + REGBIT_CONNTRACK_NAT" == 0", + "next;"); +ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, 2, + REGBIT_CONNTRACK_DEFRAG" == 0 && " + REGBIT_CONNTRACK_NAT" == 0", + "next;"); + ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, 1, "ip && (!ct.est || (ct.est && ct_label.blocked == 1))", REGBIT_CONNTRACK_COMMIT" = 1; next;"); diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml index 4784bff041..d788efe70b 100644 --- a/northd/ovn-northd.8.xml +++ b/northd/ovn-northd.8.xml @@ -784,6 +784,12 @@ + +A priority-2 flow that advances the packet to the next stage if the +packet was not sent to conntrack earlier, by matching on +reg0[0] == 0 && reg0[1] == 1. + + A priority-1 flow that sets the hint to commit IP traffic to the connection tracker (with action reg0[1] = 1; next;). This diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at index 60d91a7717..bfd0f2382b 100644 --- a/tests/ovn-northd.at +++ b/tests/ovn-northd.at @@ -2265,6 +2265,7 @@ AT_CHECK([ovn-sbctl lflow-list ls | grep -e ls_in_acl_hint -e ls_out_acl_hint -e table=4 (ls_out_acl ), priority=1, match=(ip && (!ct.est || (ct.est && ct_label.blocked == 1))), action=(reg0[[1]] = 1; next;) table=4 (ls_out_acl ), priority=1001 , match=(reg0[[7]] == 1 && (ip)), action=(reg0[[1]] = 1; next;) table=4 (ls_out_acl ), priority=1001 , match=(reg0[[8]] == 1 && (ip)), action=(n
Re: [ovs-dev] [PATCH net-next v2] net: openvswitch: fix uAPI incompatibility with existing user space
Ilya Maximets writes: > Few years ago OVS user space made a strange choice in the commit [1] > to define types only valid for the user space inside the copy of a > kernel uAPI header. '#ifndef __KERNEL__' and another attribute was > added later. > > This leads to the inevitable clash between user space and kernel types > when the kernel uAPI is extended. The issue was unveiled with the > addition of a new type for IPv6 extension header in kernel uAPI. > > When kernel provides the OVS_KEY_ATTR_IPV6_EXTHDRS attribute to the > older user space application, application tries to parse it as > OVS_KEY_ATTR_PACKET_TYPE and discards the whole netlink message as > malformed. Since OVS_KEY_ATTR_IPV6_EXTHDRS is supplied along with > every IPv6 packet that goes to the user space, IPv6 support is fully > broken. > > Fixing that by bringing these user space attributes to the kernel > uAPI to avoid the clash. Strictly speaking this is not the problem > of the kernel uAPI, but changing it is the only way to avoid breakage > of the older user space applications at this point. > > These 2 types are explicitly rejected now since they should not be > passed to the kernel. Additionally, OVS_KEY_ATTR_TUNNEL_INFO moved > out from the '#ifdef __KERNEL__' as there is no good reason to hide > it from the userspace. And it's also explicitly rejected now, because > it's for in-kernel use only. > > Comments with warnings were added to avoid the problem coming back. > > (1 << type) converted to (1ULL << type) to avoid integer overflow on > OVS_KEY_ATTR_IPV6_EXTHDRS, since it equals 32 now. > > [1] beb75a40fdc2 ("userspace: Switching of L3 packets in L2 pipeline") > > Fixes: 28a3f0601727 ("net: openvswitch: IPv6: Add IPv6 extension header > support") > Link: > https://lore.kernel.org/netdev/3adf00c7-fe65-3ef4-b6d7-6d8a0cad8...@nvidia.com > Link: > https://github.com/openvswitch/ovs/commit/beb75a40fdc295bfd6521b0068b4cd12f6de507c > Reported-by: Roi Dayan > Signed-off-by: Ilya Maximets > --- Acked-by: Aaron Conole ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [External] Re: [PATCH] dpif-netdev: add an option to assign pmd rxq to all numas
On 04/03/2022 17:57, Anurag Agarwal wrote: Hello Kevin, I have prepared a patch for "per port cross-numa-polling" and attached herewith. The results are captured in 'cross-numa-results.txt'. We see PMD to RxQ assignment evenly balanced across all PMDs with this patch. Please take a look and let us know your inputs. Hi Anurag, I think what this is showing is more related to txqs used for sending to the VM. As you are allowing the rxqs from the phy port to be handled by more pmds, and all those rxqs have traffic then in turn more txqs are used for sending to the VM. The result of using more txqs when sending to the VM in this case is that the traffic is returned on the more rxqs. Allowing cross-numa does not guarantee that the different pmd cores will poll rxqs from an interface. At least with group algorithm, the pmds will be selected purely on load. The right way to ensure that all VM txqs(/rxqs) are used is to enable the Tx-steering feature [0]. So you might be seeing some benefit in this case, but to me it's not the core use case of cross-numa polling. That is more about allowing the pmds on every numa to be used when the traffic load is primarily coming from one numa. Kevin. [0] https://docs.openvswitch.org/en/latest/topics/userspace-tx-steering/ Please find some of my inputs inline, in response to your comments. Regards, Anurag -Original Message- From: Kevin Traynor Sent: Thursday, February 24, 2022 7:54 PM To: Jan Scheurich ; Wan Junjie Cc: d...@openvswitch.org; Anurag Agarwal Subject: Re: [External] Re: [PATCH] dpif-netdev: add an option to assign pmd rxq to all numas Hi Jan, On 17/02/2022 14:21, Jan Scheurich wrote: Hi Kevin, We have done extensive benchmarking and found that we get better overall PMD load balance and resulting OVS performance when we do not statically pin any rx queues and instead let the auto-load-balancing find the optimal distribution of phy rx queues over both NUMA nodes to balance an asymmetric load of vhu rx queues (polled only on the local NUMA node). Cross-NUMA polling of vhu rx queues comes with a very high latency cost due to cross-NUMA access to volatile virtio ring pointers in every iteration (not only when actually copying packets). Cross-NUMA polling of phy rx queues doesn't have a similar issue. I agree that for vhost rxq polling, it always causes a performance penalty when there is cross-numa polling. For polling phy rxq, when phy and vhost are in different numas, I don't see any additional penalty for cross-numa polling the phy rxq. For the case where phy and vhost are both in the same numa, if I change to poll the phy rxq cross-numa, then I see about a >20% tput drop for traffic from phy - vhost. Are you seeing that too? Yes, but the performance drop is mostly due to the extra cost of copying the packets across the UPI bus to the virtio buffers on the other NUMA, not because of polling the phy rxq on the other NUMA. Just to be clear, phy and vhost are on the same numa in my test. I see the drop when polling the phy rxq with a pmd from a different numa. Also, the fact that a different numa can poll the phy rxq after every rebalance means that the ability of the auto-load-balancer to estimate and trigger a rebalance is impacted. Agree, there is some inaccuracy in the estimation of the load a phy rx queue creates when it is moved to another NUMA node. So far we have not seen that as a practical problem. It seems like simple pinning some phy rxqs cross-numa would avoid all the issues above and give most of the benefit of cross-numa polling for phy rxqs. That is what we have done in the past (far a lack of alternatives). But any static pinning reduces the ability of the auto-load balancer to do its job. Consider the following scenarios: 1. The phy ingress traffic is not evenly distributed by RSS due to lack of entropy (Examples for this are IP-IP encapsulated traffic, e.g. Calico, or MPLSoGRE encapsulated traffic). 2. VM traffic is very asymmetric, e.g. due to a large dual-NUMA VM whose vhu ports are all on NUMA 0. In all such scenarios, static pinning of phy rxqs may lead to unnecessarily uneven PMD load and loss of overall capacity. I agree that static pinning may cause a bottleneck if you have more than one rx pinned on a core. On the flip side, pinning removes uncertainty about the ability of OVS to make good assignments and ALB. [Anurag] Echoing what Jan said, static pinning wouldn't allow rebalancing in case the traffic across DPDK and VHU queues is asymmetric. With the introduction of per port cross-numa-polling, the user has one more option in his tool box, to allow full auto load balancing without worrying at all about the rxq to PMD assignments. This also makes the deployment of OVS much simpler. The user only now needs to provide the list of CPUs, enable AUTO LB and cross-numa-polling (in case necessary). All of the rest is handled in software.
Re: [ovs-dev] [PATCH ovn] ovn-northd: Skip conntrack related stages for router ports in switch pipeline.
On Thu, Mar 10, 2022 at 4:22 AM Dumitru Ceara wrote: > > On 3/10/22 04:04, num...@ovn.org wrote: > > From: Numan Siddique > > > > Hi Numan, > > > Presently, if the inport or outport is a peer port (of router port), > > then we skip sending the packet to conntrack by not setting the > > reg0[0]/reg0[1] bits. But the packet still goes through the > > stages - ls_in_acl_hint, ls_in_acl and ls_in_stateful in the ingress > > pipeline and similarly ls_out_acl_hint, ls_out_acl and ls_out_stateful > > in the egress pipeline of the logical switch. > > > > In these mentioned stages, the logical flows match on the ct states > > (ct.new, ct.est etc) and this can be problematic if the inport or > > outport is peer port. It is more problematic if the packet was > > sent to conntrack and committed in the ingress pipeline. When > > that packet enters the egress pipeline with outport set to the peer > > port, we skip sending the packet to conntrack (which is expected) > > but ct state values carry over from the ingress state. If the ct.new > > flag was set in the ingress pipeline, then the below flows are hit and > > the packet gets committed in the conntrack zone of the inport logical port. > > > > table=4 (ls_out_acl ), priority=1, > > match=(ip && (!ct.est || (ct.est && ct_label.blocked == 1))), > > action=(reg0[1] = 1; next;) > > table=7 (ls_out_stateful), priority=100 , > > match=(reg0[1] == 1 && reg0[13] == 0), > > action=(ct_commit { ct_label.blocked = 0; }; next;) > > > > With this extra commit to the same conntrack zone, sometimes the packet gets > > dropped or doesn't reach the destination. It is not very clear how > > the packet gets misplaced, but avoiding this resolves the issue. > > And OVN ideally should not do this extra commit. > > > > To address this issue, this patch adds the logical flows to skip all > > the conntrack related stages if the inport or outport is peer logical > > port. > > > > table=5 (ls_in_pre_acl ), priority=110 , > > match=(ip && inport == "sw0-lr0"), > > action=(next(pipeline=ingress,table=18);) > > > > table=0 (ls_out_pre_lb ), priority=110 , > > match=(ip && outport == "sw0-lr0"), > > action=(next(pipeline=egress,table=8);) > > > > This is a bit worrying. We're skipping a lot of stages, some of which > are not stateful at all, e.g., "ls_in_qos_mark", "ls_in_qos_meter". > > Also, I think we would be breaking the scenario in which LB happens on > traffic received from a logical router. Simplified setup with a client > connecting to VIP1: > > client (IP1) --> LS1 --> LR (VIP1 in the subnet of LS2) --> LS2 (LB: > VIP1->IP1) > > Without your patch the hairpin stage on LS2 would have performed the > DNAT and SNAT (for hairpin) properly. I'm not sure that still works > with your patch. > > Wouldn't it be safer and simpler to change the 110-priority flows in > stages ls_in/out_pre_acl and ls_in/out_pre_lb to apply action="ct_clear; > next;" instead? Hi Dumitru, Thanks for the comments. I agree that skipping many pipelines would not be prudent as there are qos related stages too. I'll submit another patch which skips commiting to the conntrack if the packet was not sent to conntrack in the pre_stateful stage. we could also do ct_clear in table 38/39 when the packet transitions from the ingress stage to egress stage. I'll look into that too. But I'll do that as a follow up. Thanks Numan > > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2060688 > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2062431 > > Reported-by: Michael Washer > > Signed-off-by: Numan Siddique > > --- > > Regards, > Dumitru > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [External] Re: [PATCH] dpif-netdev: add an option to assign pmd rxq to all numas
Hi Jan, On 09/03/2022 15:48, Jan Scheurich wrote: Thanks for sharing your experience with it. My fear with the proposal is that someone turns this on and then tells us performance is worse and/or OVS assignments/ALB are broken, because it has an impact on their case. In terms of limiting possible negative effects, - it can be opt-in and recommended only for phy ports - could print a warning when it is enabled - ALB is currently disabled with cross-numa polling (except a limited case) but it's clear you want to remove that restriction too - for ALB, a user could increase the improvement threshold to account for any reassignments triggered by inaccuracies [Jan] Yes, we want to enable cross-NUMA polling of selected (typically phy) ports in ALB "group" mode as an opt-in config option (default off). Based on our observations we are not too much concerned with the loss of ALB prediction accuracy but increasing the threshold may be a way of taking that into account, if wanted. There is also some improvements that can be made to the proposed method when used with group assignment, - we can prefer local numa where there is no difference between pmd cores. (e.g. two unused cores available, pick the local numa one) - we can flatten the list of pmds, so best pmd can be selected. This will remove issues with RR numa when there are different num of pmd cores or loads per numa. - I wrote an RFC that does these two items, I can post when(/if!) consensus is reached on the broader topic [Jan] In our alternative version of the current upstream "group" ALB [1] we already maintained a flat list of PMDs. So we would support that feature. Using NUMA-locality as a tie-breaker makes sense. [1] https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/384546.html In summary, it's a trade-off, With no cross-numa polling (current): - won't have any impact to OVS assignment or ALB accuracy - there could be a bottleneck on one numa pmds while other numa pmd cores are idle and unused With cross-numa rx pinning (current): - will have access to pmd cores on all numas - may require more cycles for some traffic paths - won't have any impact to OVS assignment or ALB accuracy - >1 pinned rxqs per core may cause a bottleneck depending on traffic With cross-numa interface setting (proposed): - will have access to all pmd cores on all numas (i.e. no unused pmd cores during highest load) - will require more cycles for some traffic paths - will impact on OVS assignment and ALB accuracy Anything missing above, or is it a reasonable summary? I think that is a reasonable summary, albeit I would have characterized the third option a bit more positively: - Gives ALB maximum freedom to balance load of PMDs on all NUMA nodes (in the likely scenario of uneven VM load on the NUMAs) - Accepts an increase of cycles on cross-NUMA paths for a better utilization of a free PMD cycles - Mostly suitable for phy ports due to limited cycle increase for cross-NUMA polling of phy rx queues - Could negatively impact the ALB prediction accuracy in certain scenarios It's the estimation accuracy during the assignments. That might be be seen as part of the ALB dry-run, but it could also be during an actual reconfigure/reassignment itself. e.g. we think pmdx is the lowest loaded pmd and assign another rxq, only to find out a previously assigned rxq now requires 30% more cycles after changing numa and it was a bad selection to add more rxqs. Now there's overload on that pmdx etc. OTOH, in some cases you are now also getting to utilize more pmds from other numas, so there is more net compute power. That means less risk of overload on any pmd, but I'm sure people will still try and push it to the limits. I think we're both on the same page regarding functionality, pros and cons etc. It's just the inaccuracy and likelihood of problems occurring where we are viewing differently. You are saying you think it's low risk as that is your experience, while I am a bit more cautious about it. We will post a new version of our patch [2] for cross-numa polling on selected ports adapted to the current OVS master shortly. [2] https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/384547.html I think we should give a bit more time for more eyes and any discussion at the high level before progressing too much on the code, but as you are talking about it...I mentioned an RFC so I put it up here [0]. I didn't add the user enabling part, that's straightforward, I was just working on how adapt the current rxq scheduling for not always selecting a numa first (as it was based like this), flattening numa and a local numa tiebreaker. [0] https://github.com/kevintraynor/ovs/commits/crossnuma thanks, Kevin. Thanks, Jan ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn 21.12] controller: reconfigure ovs meters for ovn meters
Thanks Lorenzo for this backport. I did a quick sanity check on the 21.12 branch and it looks good to me. So I Acked and pushed the patch to branch-21.12. On 3/10/22 11:01, Lorenzo Bianconi wrote: At the moment ovs meters are reconfigured by ovn just when a meter is allocated or removed while updates for an already allocated meter are ignored. This issue can be easily verified with the following reproducer: $ovn-nbctl meter-add meter0 drop 10 pktps $ovn-nbctl --log --meter=meter0 acl-add sw0 to-lport 1000 'tcp.dst == 80' drop $ovn-nbctl --may-exist meter-add meter0 drop 20 pktps $ovs-ofctl -O OpenFlow15 dump-meters br-int In order to fix the issue introduce SB_meter node in the incremetal processing engine and add it as input for lflow_output one. Sync OVS meters in ofctl_put() to push changes in SB meter/SB meter bands tables. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1939524 Acked-by: Mark Michelson Signed-off-by: Lorenzo Bianconi Signed-off-by: Han Zhou --- controller/ofctrl.c | 212 +++- controller/ovn-controller.c | 25 - lib/extend-table.c | 14 +++ lib/extend-table.h | 4 + tests/ovn-performance.at| 17 +++ tests/ovn.at| 73 + tests/system-ovn.at | 17 +++ 7 files changed, 332 insertions(+), 30 deletions(-) diff --git a/controller/ofctrl.c b/controller/ofctrl.c index 08fcfed8b..22f80620a 100644 --- a/controller/ofctrl.c +++ b/controller/ofctrl.c @@ -335,6 +335,22 @@ static struct ovn_extend_table *groups; /* A reference to the meter_table. */ static struct ovn_extend_table *meters; +/* Installed meter bands. */ +struct meter_band_data { +int64_t burst_size; +int64_t rate; +}; + +struct meter_band_entry { +struct meter_band_data *bands; +size_t n_bands; +}; + +static struct shash meter_bands; + +static void ofctrl_meter_bands_destroy(void); +static void ofctrl_meter_bands_clear(void); + /* MFF_* field ID for our Geneve option. In S_TLV_TABLE_MOD_SENT, this is * the option we requested (we don't know whether we obtained it yet). In * S_CLEAR_FLOWS or S_UPDATE_FLOWS, this is really the option we have. */ @@ -372,6 +388,7 @@ ofctrl_init(struct ovn_extend_table *group_table, ovn_init_symtab(&symtab); groups = group_table; meters = meter_table; +shash_init(&meter_bands); } /* S_NEW, for a new connection. @@ -615,6 +632,7 @@ run_S_CLEAR_FLOWS(void) /* Clear existing meters, to match the state of the switch. */ if (meters) { ovn_extend_table_clear(meters, true); +ofctrl_meter_bands_clear(); } /* All flow updates are irrelevant now. */ @@ -789,6 +807,7 @@ ofctrl_destroy(void) rconn_packet_counter_destroy(tx_counter); expr_symtab_destroy(&symtab); shash_destroy(&symtab); +ofctrl_meter_bands_destroy(); } uint64_t @@ -1802,26 +1821,14 @@ add_meter_string(struct ovn_extend_table_info *m_desired, } static void -add_meter(struct ovn_extend_table_info *m_desired, - const struct sbrec_meter_table *meter_table, - struct ovs_list *msgs) +update_ovs_meter(struct ovn_extend_table_info *entry, + const struct sbrec_meter *sb_meter, int cmd, + struct ovs_list *msgs) { -const struct sbrec_meter *sb_meter; -SBREC_METER_TABLE_FOR_EACH (sb_meter, meter_table) { -if (!strcmp(m_desired->name, sb_meter->name)) { -break; -} -} - -if (!sb_meter) { -static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); -VLOG_ERR_RL(&rl, "could not find meter named \"%s\"", m_desired->name); -return; -} struct ofputil_meter_mod mm; -mm.command = OFPMC13_ADD; -mm.meter.meter_id = m_desired->table_id; +mm.command = cmd; +mm.meter.meter_id = entry->table_id; mm.meter.flags = OFPMF13_STATS; if (!strcmp(sb_meter->unit, "pktps")) { @@ -1854,6 +1861,152 @@ add_meter(struct ovn_extend_table_info *m_desired, free(mm.meter.bands); } +static void +ofctrl_meter_bands_clear(void) +{ +struct shash_node *node, *next; +SHASH_FOR_EACH_SAFE (node, next, &meter_bands) { +struct meter_band_entry *mb = node->data; +shash_delete(&meter_bands, node); +free(mb->bands); +free(mb); +} +} + +static void +ofctrl_meter_bands_destroy(void) +{ +ofctrl_meter_bands_clear(); +shash_destroy(&meter_bands); +} + +static bool +ofctrl_meter_bands_is_equal(const struct sbrec_meter *sb_meter, +struct meter_band_entry *mb) +{ +if (mb->n_bands != sb_meter->n_bands) { +return false; +} + +for (int i = 0; i < sb_meter->n_bands; i++) { +int j; +for (j = 0; j < mb->n_bands; j++) { +if (sb_meter->bands[i]->rate == mb->bands[j].rate && +sb_meter->bands[i]->burst_size == mb->bands[j].bu
Re: [ovs-dev] [PATCH ovn 21.12] controller: reconfigure ovs meters for ovn meters
Bleep bloop. Greetings Lorenzo Bianconi, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. git-am: error: Failed to merge in the changes. hint: Use 'git am --show-current-patch' to see the failed patch Patch failed at 0001 controller: reconfigure ovs meters for ovn meters When you have resolved this problem, run "git am --continue". If you prefer to skip this patch, run "git am --skip" instead. To restore the original branch and stop patching, run "git am --abort". Please check this out. If you feel there has been an error, please email acon...@redhat.com Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn 21.12] controller: reconfigure ovs meters for ovn meters
At the moment ovs meters are reconfigured by ovn just when a meter is allocated or removed while updates for an already allocated meter are ignored. This issue can be easily verified with the following reproducer: $ovn-nbctl meter-add meter0 drop 10 pktps $ovn-nbctl --log --meter=meter0 acl-add sw0 to-lport 1000 'tcp.dst == 80' drop $ovn-nbctl --may-exist meter-add meter0 drop 20 pktps $ovs-ofctl -O OpenFlow15 dump-meters br-int In order to fix the issue introduce SB_meter node in the incremetal processing engine and add it as input for lflow_output one. Sync OVS meters in ofctl_put() to push changes in SB meter/SB meter bands tables. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1939524 Acked-by: Mark Michelson Signed-off-by: Lorenzo Bianconi Signed-off-by: Han Zhou --- controller/ofctrl.c | 212 +++- controller/ovn-controller.c | 25 - lib/extend-table.c | 14 +++ lib/extend-table.h | 4 + tests/ovn-performance.at| 17 +++ tests/ovn.at| 73 + tests/system-ovn.at | 17 +++ 7 files changed, 332 insertions(+), 30 deletions(-) diff --git a/controller/ofctrl.c b/controller/ofctrl.c index 08fcfed8b..22f80620a 100644 --- a/controller/ofctrl.c +++ b/controller/ofctrl.c @@ -335,6 +335,22 @@ static struct ovn_extend_table *groups; /* A reference to the meter_table. */ static struct ovn_extend_table *meters; +/* Installed meter bands. */ +struct meter_band_data { +int64_t burst_size; +int64_t rate; +}; + +struct meter_band_entry { +struct meter_band_data *bands; +size_t n_bands; +}; + +static struct shash meter_bands; + +static void ofctrl_meter_bands_destroy(void); +static void ofctrl_meter_bands_clear(void); + /* MFF_* field ID for our Geneve option. In S_TLV_TABLE_MOD_SENT, this is * the option we requested (we don't know whether we obtained it yet). In * S_CLEAR_FLOWS or S_UPDATE_FLOWS, this is really the option we have. */ @@ -372,6 +388,7 @@ ofctrl_init(struct ovn_extend_table *group_table, ovn_init_symtab(&symtab); groups = group_table; meters = meter_table; +shash_init(&meter_bands); } /* S_NEW, for a new connection. @@ -615,6 +632,7 @@ run_S_CLEAR_FLOWS(void) /* Clear existing meters, to match the state of the switch. */ if (meters) { ovn_extend_table_clear(meters, true); +ofctrl_meter_bands_clear(); } /* All flow updates are irrelevant now. */ @@ -789,6 +807,7 @@ ofctrl_destroy(void) rconn_packet_counter_destroy(tx_counter); expr_symtab_destroy(&symtab); shash_destroy(&symtab); +ofctrl_meter_bands_destroy(); } uint64_t @@ -1802,26 +1821,14 @@ add_meter_string(struct ovn_extend_table_info *m_desired, } static void -add_meter(struct ovn_extend_table_info *m_desired, - const struct sbrec_meter_table *meter_table, - struct ovs_list *msgs) +update_ovs_meter(struct ovn_extend_table_info *entry, + const struct sbrec_meter *sb_meter, int cmd, + struct ovs_list *msgs) { -const struct sbrec_meter *sb_meter; -SBREC_METER_TABLE_FOR_EACH (sb_meter, meter_table) { -if (!strcmp(m_desired->name, sb_meter->name)) { -break; -} -} - -if (!sb_meter) { -static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1); -VLOG_ERR_RL(&rl, "could not find meter named \"%s\"", m_desired->name); -return; -} struct ofputil_meter_mod mm; -mm.command = OFPMC13_ADD; -mm.meter.meter_id = m_desired->table_id; +mm.command = cmd; +mm.meter.meter_id = entry->table_id; mm.meter.flags = OFPMF13_STATS; if (!strcmp(sb_meter->unit, "pktps")) { @@ -1854,6 +1861,152 @@ add_meter(struct ovn_extend_table_info *m_desired, free(mm.meter.bands); } +static void +ofctrl_meter_bands_clear(void) +{ +struct shash_node *node, *next; +SHASH_FOR_EACH_SAFE (node, next, &meter_bands) { +struct meter_band_entry *mb = node->data; +shash_delete(&meter_bands, node); +free(mb->bands); +free(mb); +} +} + +static void +ofctrl_meter_bands_destroy(void) +{ +ofctrl_meter_bands_clear(); +shash_destroy(&meter_bands); +} + +static bool +ofctrl_meter_bands_is_equal(const struct sbrec_meter *sb_meter, +struct meter_band_entry *mb) +{ +if (mb->n_bands != sb_meter->n_bands) { +return false; +} + +for (int i = 0; i < sb_meter->n_bands; i++) { +int j; +for (j = 0; j < mb->n_bands; j++) { +if (sb_meter->bands[i]->rate == mb->bands[j].rate && +sb_meter->bands[i]->burst_size == mb->bands[j].burst_size) { +break; +} +} +if (j == mb->n_bands) { +return false; +} +} +return true; +} + +static void +ofctrl_meter_bands_alloc(const struct sbrec_meter *sb_meter, +
Re: [ovs-dev] [PATCH] Fix Apache license notice.
On Thu, Mar 10, 2022 at 1:51 PM Ilya Maximets wrote: > > On 3/10/22 13:12, David Marchand wrote: > > The hacking flake8 plugin generates a lot of noise in GHA logs about an > > issue on the Apache license contained in OVS python files: > > https://github.com/openvswitch/ovs/runs/5453426047?#step:11:1519 > > > > Those prints seem like a bug in the hacking plugin [1] as the check > > about the license is not enabled in OVS checks. > > > > Though, when comparing with https://www.apache.org/licenses/LICENSE-2.0, > > OVS files have an additional ':'. > > > > Align all files in OVS with the official notice and enable the check to > > avoid reintroducing a wrong notice. > > > > 1: > > https://github.com/openstack/hacking/blob/4.1.0/hacking/checks/comments.py#L168 > > > > Signed-off-by: David Marchand > > --- > > Hi, David. > > OVS doesn't enable H103 check, so I think that the problem is in > the hacking plugin itself. It prints the license comparison > unconditionally even if the check wasn't requested. > > I actually prepared a fix for hacking a couple of weeks ago: > https://review.opendev.org/c/openstack/hacking/+/830537 Thanks, that saves me the time to fix it :-). > Had no responses yet, though... :( I reviewed your change, let's see if it helps... -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] Fix Apache license notice.
On 3/10/22 13:12, David Marchand wrote: > The hacking flake8 plugin generates a lot of noise in GHA logs about an > issue on the Apache license contained in OVS python files: > https://github.com/openvswitch/ovs/runs/5453426047?#step:11:1519 > > Those prints seem like a bug in the hacking plugin [1] as the check > about the license is not enabled in OVS checks. > > Though, when comparing with https://www.apache.org/licenses/LICENSE-2.0, > OVS files have an additional ':'. > > Align all files in OVS with the official notice and enable the check to > avoid reintroducing a wrong notice. > > 1: > https://github.com/openstack/hacking/blob/4.1.0/hacking/checks/comments.py#L168 > > Signed-off-by: David Marchand > --- Hi, David. OVS doesn't enable H103 check, so I think that the problem is in the hacking plugin itself. It prints the license comparison unconditionally even if the check wasn't requested. I actually prepared a fix for hacking a couple of weeks ago: https://review.opendev.org/c/openstack/hacking/+/830537 Had no responses yet, though... :( Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [[PATCH RFC] 13/17] Enable IP checksum offloading by default.
On Mon, Jan 24, 2022 at 02:21:35PM -0500, Mike Pattrick wrote: > On Tue, Dec 7, 2021 at 11:54 AM Flavio Leitner wrote: > > > > The netdev receiving packets is supposed to provide the flags > > indicating if the IP csum was verified and it is OK or BAD, > > otherwise the stack will check when appropriate by software. > > > > If the packet comes with good checksum, then postpone the > > checksum calculation to the egress device if needed. > > > > When encapsulate a packet with that flag, set the checksum > > of the inner IP header since that is not yet supported. > > > > Calculate the IP csum when the packet is going to be sent over > > a device that doesn't support the feature. > > > > Linux devices don't support IP csum offload alone, so the > > support is not enabled. > > > > Signed-off-by: Flavio Leitner > > --- > > lib/conntrack.c | 12 ++--- > > lib/dp-packet.c | 12 + > > lib/dp-packet.h | 63 --- > > lib/dpif.h | 2 +- > > lib/flow.c | 16 -- > > lib/ipf.c | 9 ++-- > > lib/netdev-dpdk.c | 78 ++-- > > lib/netdev-dummy.c | 21 > > lib/netdev-native-tnl.c | 19 +-- > > lib/netdev.c| 22 > > lib/odp-execute.c | 21 ++-- > > lib/packets.c | 34 ++--- > > ofproto/ofproto-dpif-upcall.c | 14 +++-- > > tests/automake.mk | 1 + > > tests/system-userspace-offload.at | 79 + > > tests/system-userspace-testsuite.at | 1 + > > 16 files changed, 322 insertions(+), 82 deletions(-) > > create mode 100644 tests/system-userspace-offload.at > > > > diff --git a/lib/conntrack.c b/lib/conntrack.c > > index 2392a2ea4..5b4ca4dfc 100644 > > --- a/lib/conntrack.c > > +++ b/lib/conntrack.c > > @@ -2089,16 +2089,12 @@ conn_key_extract(struct conntrack *ct, struct > > dp_packet *pkt, ovs_be16 dl_type, > > ctx->key.dl_type = dl_type; > > > > if (ctx->key.dl_type == htons(ETH_TYPE_IP)) { > > -bool hwol_bad_l3_csum = dp_packet_ol_ip_csum_bad(pkt); > > -if (hwol_bad_l3_csum) { > > +if (dp_packet_ol_ip_csum_bad(pkt)) { > > ok = false; > > COVERAGE_INC(conntrack_l3csum_err); > > } else { > > -bool hwol_good_l3_csum = dp_packet_ol_ip_csum_good(pkt) > > - || dp_packet_ol_tx_ipv4(pkt); > > -/* Validate the checksum only when hwol is not supported. */ > > ok = extract_l3_ipv4(&ctx->key, l3, dp_packet_l3_size(pkt), > > NULL, > > - !hwol_good_l3_csum); > > + !dp_packet_ol_ip_csum_good(pkt)); > > } > > } else if (ctx->key.dl_type == htons(ETH_TYPE_IPV6)) { > > ok = extract_l3_ipv6(&ctx->key, l3, dp_packet_l3_size(pkt), NULL); > > @@ -3402,7 +3398,9 @@ handle_ftp_ctl(struct conntrack *ct, const struct > > conn_lookup_ctx *ctx, > > } > > if (seq_skew) { > > ip_len = ntohs(l3_hdr->ip_tot_len) + seq_skew; > > -if (!dp_packet_ol_tx_ipv4(pkt)) { > > +if (dp_packet_ol_tx_ip_csum(pkt)) { > > +dp_packet_ol_reset_ip_csum_good(pkt); > > +} else { > > l3_hdr->ip_csum = recalc_csum16(l3_hdr->ip_csum, > > l3_hdr->ip_tot_len, > > htons(ip_len)); > > This is more of a general comment for the whole patch series, but I > see that a lot of the diffs use the motif: > > if (dp_packet_ol_tx_ip_csum(pkt)) { > dp_packet_ol_reset_ip_csum_good(pkt); > } else { > recalc_csumXX() > } > > Would it make sense instead to simply flag for non-offload tainted > checksum, and then only one call to csum() on packet egress? That's a good point. I see that in most cases it recalculates only what has changed, so it is supposed to be faster because 1) cache is hot and 2) fewer operations. If we leave to the egress port, then we need to calculate full checksum and the headers may not be in the cache. Therefore, to avoid regressions in the non offloaded case, I left the original behavior. Maybe my assumption doesn't make sense. fbl > > -M > > > diff --git a/lib/dp-packet.c b/lib/dp-packet.c > > index a4ca5a052..369f3561e 100644 > > --- a/lib/dp-packet.c > > +++ b/lib/dp-packet.c > > @@ -21,6 +21,7 @@ > > #include "dp-packet.h" > > #include "netdev-afxdp.h" > > #include "netdev-dpdk.h" > > +#include "netdev-provider.h" > > #include "openvswitch/dynamic-string.h" > > #include "util.h" > > > > @@ -506,3 +507,14 @@ dp_packet_resize_l2(struct dp_packet *p, int increment) > >
Re: [ovs-dev] [[PATCH RFC] 06/17] dp-packet: Use p for packet and b for batch.
On Mon, Jan 24, 2022 at 01:39:57PM -0500, Mike Pattrick wrote: > On Tue, Dec 7, 2021 at 11:53 AM Flavio Leitner wrote: > > > > Currently 'p' and 'b' and used for packets, so use > > a convention that struct dp_packet is 'p' and > > struct dp_packet_batch is 'b'. > > > > Some comments needed new formatting to not pass the > > 80 column. > > > > Some variables were using 'p' or 'b' were renamed > > as well. > > > > There should be no functional change with this patch. > > > > Signed-off-by: Flavio Leitner > > --- > > I think this patch goes a long way to improve the readability of Would > it also make sense to change b to p in lib/packets.c and > lib/netdev-linux.c as well? I think 'b' meant buffer before, but IMHO 'b' for batches and 'p' for packets make more sense and then we should keep the consistency in all files, unless someone else has a better idea. -- fbl ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] ovn-northd: Skip conntrack related stages for router ports in switch pipeline.
On 3/10/22 04:04, num...@ovn.org wrote: > From: Numan Siddique > Hi Numan, > Presently, if the inport or outport is a peer port (of router port), > then we skip sending the packet to conntrack by not setting the > reg0[0]/reg0[1] bits. But the packet still goes through the > stages - ls_in_acl_hint, ls_in_acl and ls_in_stateful in the ingress > pipeline and similarly ls_out_acl_hint, ls_out_acl and ls_out_stateful > in the egress pipeline of the logical switch. > > In these mentioned stages, the logical flows match on the ct states > (ct.new, ct.est etc) and this can be problematic if the inport or > outport is peer port. It is more problematic if the packet was > sent to conntrack and committed in the ingress pipeline. When > that packet enters the egress pipeline with outport set to the peer > port, we skip sending the packet to conntrack (which is expected) > but ct state values carry over from the ingress state. If the ct.new > flag was set in the ingress pipeline, then the below flows are hit and > the packet gets committed in the conntrack zone of the inport logical port. > > table=4 (ls_out_acl ), priority=1, > match=(ip && (!ct.est || (ct.est && ct_label.blocked == 1))), > action=(reg0[1] = 1; next;) > table=7 (ls_out_stateful), priority=100 , > match=(reg0[1] == 1 && reg0[13] == 0), > action=(ct_commit { ct_label.blocked = 0; }; next;) > > With this extra commit to the same conntrack zone, sometimes the packet gets > dropped or doesn't reach the destination. It is not very clear how > the packet gets misplaced, but avoiding this resolves the issue. > And OVN ideally should not do this extra commit. > > To address this issue, this patch adds the logical flows to skip all > the conntrack related stages if the inport or outport is peer logical > port. > > table=5 (ls_in_pre_acl ), priority=110 , > match=(ip && inport == "sw0-lr0"), > action=(next(pipeline=ingress,table=18);) > > table=0 (ls_out_pre_lb ), priority=110 , > match=(ip && outport == "sw0-lr0"), > action=(next(pipeline=egress,table=8);) > This is a bit worrying. We're skipping a lot of stages, some of which are not stateful at all, e.g., "ls_in_qos_mark", "ls_in_qos_meter". Also, I think we would be breaking the scenario in which LB happens on traffic received from a logical router. Simplified setup with a client connecting to VIP1: client (IP1) --> LS1 --> LR (VIP1 in the subnet of LS2) --> LS2 (LB: VIP1->IP1) Without your patch the hairpin stage on LS2 would have performed the DNAT and SNAT (for hairpin) properly. I'm not sure that still works with your patch. Wouldn't it be safer and simpler to change the 110-priority flows in stages ls_in/out_pre_acl and ls_in/out_pre_lb to apply action="ct_clear; next;" instead? > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2060688 > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2062431 > Reported-by: Michael Washer > Signed-off-by: Numan Siddique > --- Regards, Dumitru ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v2] net: openvswitch: fix uAPI incompatibility with existing user space
Le 09/03/2022 à 23:20, Ilya Maximets a écrit : > Few years ago OVS user space made a strange choice in the commit [1] > to define types only valid for the user space inside the copy of a > kernel uAPI header. '#ifndef __KERNEL__' and another attribute was > added later. > > This leads to the inevitable clash between user space and kernel types > when the kernel uAPI is extended. The issue was unveiled with the > addition of a new type for IPv6 extension header in kernel uAPI. > > When kernel provides the OVS_KEY_ATTR_IPV6_EXTHDRS attribute to the > older user space application, application tries to parse it as > OVS_KEY_ATTR_PACKET_TYPE and discards the whole netlink message as > malformed. Since OVS_KEY_ATTR_IPV6_EXTHDRS is supplied along with > every IPv6 packet that goes to the user space, IPv6 support is fully > broken. > > Fixing that by bringing these user space attributes to the kernel > uAPI to avoid the clash. Strictly speaking this is not the problem > of the kernel uAPI, but changing it is the only way to avoid breakage > of the older user space applications at this point. > > These 2 types are explicitly rejected now since they should not be > passed to the kernel. Additionally, OVS_KEY_ATTR_TUNNEL_INFO moved > out from the '#ifdef __KERNEL__' as there is no good reason to hide > it from the userspace. And it's also explicitly rejected now, because > it's for in-kernel use only. > > Comments with warnings were added to avoid the problem coming back. > > (1 << type) converted to (1ULL << type) to avoid integer overflow on > OVS_KEY_ATTR_IPV6_EXTHDRS, since it equals 32 now. > > [1] beb75a40fdc2 ("userspace: Switching of L3 packets in L2 pipeline") > > Fixes: 28a3f0601727 ("net: openvswitch: IPv6: Add IPv6 extension header > support") > Link: > https://lore.kernel.org/netdev/3adf00c7-fe65-3ef4-b6d7-6d8a0cad8...@nvidia.com > Link: > https://github.com/openvswitch/ovs/commit/beb75a40fdc295bfd6521b0068b4cd12f6de507c > Reported-by: Roi Dayan > Signed-off-by: Ilya Maximets Thanks for finally doing this. I also suggest it months ago: https://lore.kernel.org/lkml/a4894aef-b82a-8224-611d-07be229f5...@6wind.com/ Acked-by: Nicolas Dichtel ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev