Re: [ovs-dev] [PATCH ovn] lflow: Use learn() action to generate LB hairpin reply flows.
On Mon, Feb 8, 2021 at 4:02 PM Dumitru Ceara wrote: > > On 2/8/21 11:09 AM, Numan Siddique wrote: > > On Sat, Feb 6, 2021 at 4:00 AM Dumitru Ceara wrote: > >> > >> The main trait of load balancer hairpin traffic is that it never leaves > >> the local hypervisor. Essentially this means that only hairpin > >> openflows installed for logical switches that have at least one logical > >> switch port bound locally can ever be hit. > >> > >> Until now, if a load balancer was applied on multiple logical switches > >> that are connected through a distributed router, ovn-controller would > >> install flows to detect hairpin replies for all logical switches. In > >> practice this leads to a very high number of openflows out of which > >> most will never be used. > >> > >> Instead we now use an additional action, learn(), on flows that match on > >> packets that create the hairpin session. The learn() action will then > >> generate the necessary flows to handle hairpin replies, but only for > >> the local datapaths which actually generate hairpin traffic. > >> > >> For example, simulating how ovn-k8s uses load balancer for services, > >> in a "switch per node" scenario, the script below would generate > >> 10K (n_nodes * n_vips * n_backends) openflows on every node in table=69 > >> (hairpin reply). With this patch the maximum number of openflows that > >> can be created for hairpin replies is 200 (n_vips * n_backends). > >> > >> In general, for deployments that leverage switch-per-node topologies, > >> the number of openflows is reduced by a factor of N, where N is the > >> number of nodes. > >> > >>$ cat lbs.sh > >>NODES=50 > >>VIPS=20 > >>BACKENDS=10 > >>ovn-nbctl lr-add rtr > >>for ((i = 1; i <= $NODES; i++)); do > >>ovn-nbctl \ > >>-- ls-add ls$i \ > >>-- lsp-add ls$i vm$i \ > >>-- lsp-add ls$i ls$i-rtr \ > >>-- lsp-set-type ls$i-rtr router \ > >>-- lsp-set-options ls$i-rtr router-port=rtr-ls$i \ > >>-- lrp-add rtr rtr-ls$i 00:00:00:00:01:00 42.42.42.$i/24 > >>done > >> > >>for ((i = 1; i <= $VIPS; i++)); do > >>lb=lb$i > >>vip=10.10.10.$i:1 > >>bip=20.20.20.1:2 > >>for ((j = 2; j <= $BACKENDS; j++)); do > >>bip="$bip,20.20.20.$j:2" > >>done > >>ovn-nbctl lb-add $lb $vip $backends > >>done > >> > >>for ((i = 1; i <= $NODES; i++)); do > >>for ((j = 1; j <= $VIPS; j++)); do > >>ovn-nbctl ls-lb-add ls$i lb$j > >>done > >>done > >> > >>ovs-vsctl add-port br-int vm1 \ > >>-- set interface vm1 type=internal \ > >>-- set interface vm1 external-ids:iface-id=vm1 > >> > >> Suggested-by: Ilya Maximets > >> Signed-off-by: Dumitru Ceara > > > > Hi Dumitru, > > Hi Numan, > > > > > Thanks for this patch. The patch LGTM. I tested it out with huge NB > > DB with lots > > of load balancer and backends. I see a significant reduction in OF flows. > > Thanks for the review! > > > > > Please see the few comments below. > > > > Thanks > > Numan > > > >> --- > >> controller/lflow.c | 204 --- > >> tests/ofproto-macros.at | 5 +- > >> tests/ovn.at| 516 > >> +--- > >> 3 files changed, 455 insertions(+), 270 deletions(-) > >> > >> diff --git a/controller/lflow.c b/controller/lflow.c > >> index 946c1e0..2b7d356 100644 > >> --- a/controller/lflow.c > >> +++ b/controller/lflow.c > >> @@ -1171,6 +1171,178 @@ add_neighbor_flows(struct ovsdb_idl_index > >> *sbrec_port_binding_by_name, > >> } > >> } > >> > >> +/* Builds the "learn()" action to be triggered by packets initiating a > >> + * hairpin session. > >> + * > >> + * This will generate flows in table OFTABLE_CHK_LB_HAIRPIN_REPLY of the > >> form: > >> + * - match: > >> + * metadata=,ip/ipv6,ip.src=,ip.dst= > >> + * nw_proto='lb_proto',tp_src_port= > >> + * - action: > >> + * set MLF_LOOKUP_LB_HAIRPIN_BIT=1 > >> + */ > >> +static void > >> +add_lb_vip_hairpin_reply_action(struct in6_addr *vip6, ovs_be32 vip, > >> +uint8_t lb_proto, bool has_l4_port, > >> +uint64_t cookie, struct ofpbuf *ofpacts) > >> +{ > >> +struct match match = MATCH_CATCHALL_INITIALIZER; > >> +struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts); > >> +struct ofpact_learn_spec *ol_spec; > >> +unsigned int imm_bytes; > >> +uint8_t *src_imm; > >> + > >> +/* Once learned, hairpin reply flows are permanent until the > >> VIP/backend > >> + * is removed. > >> + */ > >> +ol->flags = NX_LEARN_F_DELETE_LEARNED; > >> +ol->idle_timeout = OFP_FLOW_PERMANENT; > >> +ol->hard_timeout = OFP_FLOW_PERMANENT; > >> +ol->priority = OFP_DEFAULT_PRIORITY; > >> +ol->table_id = OFTABLE_CHK_LB_HAIRPIN_REPLY; > >> +ol->cookie = htonll(cookie); > >> + > >> +/* Match on metadata of the
Re: [ovs-dev] [PATCH ovn v2 10/10] lflow-cache: Make max cache memory usage configurable.
On Thu, Feb 4, 2021 at 6:57 PM Dumitru Ceara wrote: > > Add a new OVS external-id, "ovn-memlimit-lflow-cache-kb", through which > users can specify the maximum amount of memory (in KB) ovn-controller > can use for caching logical flows. > > To maintain backwards compatibility the default behavior is to not > enforce any memory limit on the size of the cache. > > Add lflow cache reports to "ovn-appctl -t ovn-controller memory/show". > > The memory usage for cache entries of type LCACHE_T_EXPR is computed by > doing another pass through the expression tree. While this adds a few > extra cycles, entries of type LCACHE_T_EXPR shouldn't be very common > because they are created only for flows with matches that include > "is_chassis_resident()" but do not reference port groups and address sets. > > Signed-off-by: Dumitru Ceara Hi Dumitru, Thanks for working on this series. I have one comment. I think it will be useful to report the used memory by the cache in the reply to the command - lflow-cache/show-stats. Thanks Numan > --- > NEWS|8 +++-- > controller/chassis.c| 21 + > controller/lflow-cache.c| 63 > ++- > controller/lflow-cache.h| 13 ++-- > controller/lflow.c |8 +++-- > controller/ovn-controller.8.xml |7 > controller/ovn-controller.c |8 - > controller/test-lflow-cache.c | 31 +-- > include/ovn/expr.h |3 +- > lib/expr.c | 39 > tests/ovn-lflow-cache.at| 54 +++-- > 11 files changed, 213 insertions(+), 42 deletions(-) > > diff --git a/NEWS b/NEWS > index 6e10557..ddb5977 100644 > --- a/NEWS > +++ b/NEWS > @@ -16,9 +16,11 @@ Post-v20.12.0 >- Add a new option to Load_Balancer.options, "hairpin_snat_ip", to allow > users to explicitly select which source IP should be used for load > balancer hairpin traffic. > - - ovn-controller: Add a configuration knob, through OVS external-id > -"ovn-limit-lflow-cache", to allow enforcing a limit for the size of the > -logical flow cache. > + - ovn-controller: Add configuration knobs, through OVS external-id > +"ovn-limit-lflow-cache" and "ovn-memlimit-lflow-cache-kb", to allow > +enforcing a limit for the size of the logical flow cache based on > +maximum number of entries and/or memory usage. > + - ovn-controller: Add lflow cache related memory reports. > > OVN v20.12.0 - 18 Dec 2020 > -- > diff --git a/controller/chassis.c b/controller/chassis.c > index c66837a..a4a264c 100644 > --- a/controller/chassis.c > +++ b/controller/chassis.c > @@ -50,6 +50,7 @@ struct ovs_chassis_cfg { > const char *chassis_macs; > const char *enable_lflow_cache; > const char *limit_lflow_cache; > +const char *memlimit_lflow_cache; > > /* Set of encap types parsed from the 'ovn-encap-type' external-id. */ > struct sset encap_type_set; > @@ -142,6 +143,12 @@ get_limit_lflow_cache(const struct smap *ext_ids) > } > > static const char * > +get_memlimit_lflow_cache(const struct smap *ext_ids) > +{ > +return smap_get_def(ext_ids, "ovn-memlimit-lflow-cache-kb", ""); > +} > + > +static const char * > get_encap_csum(const struct smap *ext_ids) > { > return smap_get_def(ext_ids, "ovn-encap-csum", "true"); > @@ -264,6 +271,8 @@ chassis_parse_ovs_config(const struct > ovsrec_open_vswitch_table *ovs_table, > ovs_cfg->chassis_macs = get_chassis_mac_mappings(>external_ids); > ovs_cfg->enable_lflow_cache = get_enable_lflow_cache(>external_ids); > ovs_cfg->limit_lflow_cache = get_limit_lflow_cache(>external_ids); > +ovs_cfg->memlimit_lflow_cache = > +get_memlimit_lflow_cache(>external_ids); > > if (!chassis_parse_ovs_encap_type(encap_type, _cfg->encap_type_set)) > { > return false; > @@ -293,6 +302,7 @@ chassis_build_other_config(struct smap *config, const > char *bridge_mappings, > const char *iface_types, > const char *enable_lflow_cache, > const char *limit_lflow_cache, > + const char *memlimit_lflow_cache, > bool is_interconn) > { > smap_replace(config, "ovn-bridge-mappings", bridge_mappings); > @@ -301,6 +311,7 @@ chassis_build_other_config(struct smap *config, const > char *bridge_mappings, > smap_replace(config, "ovn-monitor-all", monitor_all); > smap_replace(config, "ovn-enable-lflow-cache", enable_lflow_cache); > smap_replace(config, "ovn-limit-lflow-cache", limit_lflow_cache); > +smap_replace(config, "ovn-memlimit-lflow-cache-kb", > memlimit_lflow_cache); > smap_replace(config, "iface-types", iface_types); > smap_replace(config, "ovn-chassis-mac-mappings", chassis_macs); >
Re: [ovs-dev] [PATCH ovn v2 09/10] lflow-cache: Make maximum number of cache entries configurable.
On Thu, Feb 4, 2021 at 6:57 PM Dumitru Ceara wrote: > > Add a new OVS external-id, "ovn-limit-lflow-cache", through which users > can specify the maximum size of the ovn-controller logical flow cache. > > To maintain backwards compatibility the default behavior is to not > enforce any limit on the size of the cache. > > When the cache becomes full, the rule is to prefer more "important" > cache entries over less "important" ones. That is, evict entries of > type LCACHE_T_CONJ_ID if there's no room to add an entry of type > LCACHE_T_EXPR. Similarly, evict entries of type LCACHE_T_CONJ_ID or > LCACHE_T_EXPR if there's no room to add an entry of type > LCACHE_T_MATCHES. > > Signed-off-by: Dumitru Ceara Acked-by: Numan Siddique I think the above rules description can also be added as a comment in the function lflow_cache_make_room__(). Thanks Numan > --- > NEWS|3 + > controller/chassis.c| 23 - > controller/lflow-cache.c| 48 +- > controller/lflow-cache.h|2 - > controller/ovn-controller.8.xml | 16 ++ > controller/ovn-controller.c |8 +++ > controller/test-lflow-cache.c | 12 +++-- > tests/ovn-lflow-cache.at| 104 > +++ > 8 files changed, 206 insertions(+), 10 deletions(-) > > diff --git a/NEWS b/NEWS > index 2044671..6e10557 100644 > --- a/NEWS > +++ b/NEWS > @@ -16,6 +16,9 @@ Post-v20.12.0 >- Add a new option to Load_Balancer.options, "hairpin_snat_ip", to allow > users to explicitly select which source IP should be used for load > balancer hairpin traffic. > + - ovn-controller: Add a configuration knob, through OVS external-id > +"ovn-limit-lflow-cache", to allow enforcing a limit for the size of the > +logical flow cache. > > OVN v20.12.0 - 18 Dec 2020 > -- > diff --git a/controller/chassis.c b/controller/chassis.c > index b4d4b0e..c66837a 100644 > --- a/controller/chassis.c > +++ b/controller/chassis.c > @@ -49,6 +49,7 @@ struct ovs_chassis_cfg { > const char *monitor_all; > const char *chassis_macs; > const char *enable_lflow_cache; > +const char *limit_lflow_cache; > > /* Set of encap types parsed from the 'ovn-encap-type' external-id. */ > struct sset encap_type_set; > @@ -135,6 +136,12 @@ get_enable_lflow_cache(const struct smap *ext_ids) > } > > static const char * > +get_limit_lflow_cache(const struct smap *ext_ids) > +{ > +return smap_get_def(ext_ids, "ovn-limit-lflow-cache", ""); > +} > + > +static const char * > get_encap_csum(const struct smap *ext_ids) > { > return smap_get_def(ext_ids, "ovn-encap-csum", "true"); > @@ -256,6 +263,7 @@ chassis_parse_ovs_config(const struct > ovsrec_open_vswitch_table *ovs_table, > ovs_cfg->monitor_all = get_monitor_all(>external_ids); > ovs_cfg->chassis_macs = get_chassis_mac_mappings(>external_ids); > ovs_cfg->enable_lflow_cache = get_enable_lflow_cache(>external_ids); > +ovs_cfg->limit_lflow_cache = get_limit_lflow_cache(>external_ids); > > if (!chassis_parse_ovs_encap_type(encap_type, _cfg->encap_type_set)) > { > return false; > @@ -283,13 +291,16 @@ chassis_build_other_config(struct smap *config, const > char *bridge_mappings, > const char *datapath_type, const char > *cms_options, > const char *monitor_all, const char *chassis_macs, > const char *iface_types, > - const char *enable_lflow_cache, bool is_interconn) > + const char *enable_lflow_cache, > + const char *limit_lflow_cache, > + bool is_interconn) > { > smap_replace(config, "ovn-bridge-mappings", bridge_mappings); > smap_replace(config, "datapath-type", datapath_type); > smap_replace(config, "ovn-cms-options", cms_options); > smap_replace(config, "ovn-monitor-all", monitor_all); > smap_replace(config, "ovn-enable-lflow-cache", enable_lflow_cache); > +smap_replace(config, "ovn-limit-lflow-cache", limit_lflow_cache); > smap_replace(config, "iface-types", iface_types); > smap_replace(config, "ovn-chassis-mac-mappings", chassis_macs); > smap_replace(config, "is-interconn", is_interconn ? "true" : "false"); > @@ -305,6 +316,7 @@ chassis_other_config_changed(const char *bridge_mappings, > const char *monitor_all, > const char *chassis_macs, > const char *enable_lflow_cache, > + const char *limit_lflow_cache, > const struct ds *iface_types, > bool is_interconn, > const struct sbrec_chassis *chassis_rec) > @@ -344,6 +356,13 @@ chassis_other_config_changed(const char
Re: [ovs-dev] [PATCH ovn v2 08/10] lflow-cache: Reclaim heap memory after cache flush.
On Thu, Feb 4, 2021 at 6:57 PM Dumitru Ceara wrote: > > If possible, automatically reclaim heap memory when the lflow cache is > flushed. This can be an expensive operation but cache flushing is not > a very common operation. > > This change is inspired by Ilya Maximets' OVS commit: > f38f98a2c0dd ("ovsdb-server: Reclaim heap memory after compaction.") > > Additionally, when flushing the cache, also shrink the backing hmap. > > Signed-off-by: Dumitru Ceara Acked-by: Numan Siddique Numan > --- > configure.ac |1 + > controller/lflow-cache.c |9 + > 2 files changed, 10 insertions(+) > > diff --git a/configure.ac b/configure.ac > index b2d0843..ebb09a5 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -96,6 +96,7 @@ OVN_CHECK_DOT > OVS_CHECK_IF_DL > OVS_CHECK_STRTOK_R > AC_CHECK_DECLS([sys_siglist], [], [], [[#include ]]) > +AC_CHECK_DECLS([malloc_trim], [], [], [[#include ]]) > AC_CHECK_MEMBERS([struct stat.st_mtim.tv_nsec, struct stat.st_mtimensec], >[], [], [[#include ]]) > AC_CHECK_MEMBERS([struct ifreq.ifr_flagshigh], [], [], [[#include > ]]) > diff --git a/controller/lflow-cache.c b/controller/lflow-cache.c > index 1549034..2453b10 100644 > --- a/controller/lflow-cache.c > +++ b/controller/lflow-cache.c > @@ -17,6 +17,10 @@ > > #include > > +#if HAVE_DECL_MALLOC_TRIM > +#include > +#endif > + > #include "coverage.h" > #include "lib/ovn-sb-idl.h" > #include "lflow-cache.h" > @@ -87,7 +91,12 @@ lflow_cache_flush(struct lflow_cache *lc) > HMAP_FOR_EACH_SAFE (lce, lce_next, node, >entries[i]) { > lflow_cache_delete__(lc, lce); > } > +hmap_shrink(>entries[i]); > } > + > +#if HAVE_DECL_MALLOC_TRIM > +malloc_trim(0); > +#endif > } > > void > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v2 05/10] lflow-cache: Add unit tests.
On Thu, Feb 4, 2021 at 6:56 PM Dumitru Ceara wrote: > > Signed-off-by: Dumitru Ceara Acked-by: Numan Siddique Numan > --- > controller/test-lflow-cache.c | 224 +++ > controller/test-ofctrl-seqno.c | 18 --- > tests/automake.mk |8 + > tests/ovn-lflow-cache.at | 257 > > tests/ovn.at | 68 +++ > tests/test-utils.c | 49 > tests/test-utils.h | 26 > tests/testsuite.at |1 > 8 files changed, 633 insertions(+), 18 deletions(-) > create mode 100644 controller/test-lflow-cache.c > create mode 100644 tests/ovn-lflow-cache.at > create mode 100644 tests/test-utils.c > create mode 100644 tests/test-utils.h > > diff --git a/controller/test-lflow-cache.c b/controller/test-lflow-cache.c > new file mode 100644 > index 000..d6e962e > --- /dev/null > +++ b/controller/test-lflow-cache.c > @@ -0,0 +1,224 @@ > +/* Copyright (c) 2021, Red Hat, Inc. > + * > + * Licensed under the Apache License, Version 2.0 (the "License"); > + * you may not use this file except in compliance with the License. > + * You may obtain a copy of the License at: > + * > + * http://www.apache.org/licenses/LICENSE-2.0 > + * > + * Unless required by applicable law or agreed to in writing, software > + * distributed under the License is distributed on an "AS IS" BASIS, > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > + * See the License for the specific language governing permissions and > + * limitations under the License. > + */ > + > +#include > + > +#include "ovn/expr.h" > +#include "ovn-sb-idl.h" > +#include "tests/ovstest.h" > +#include "tests/test-utils.h" > +#include "util.h" > + > +#include "lflow-cache.h" > + > +static void > +test_lflow_cache_add__(struct lflow_cache *lc, const char *op_type, > + const struct sbrec_logical_flow *lflow, > + unsigned int conj_id_ofs, > + struct expr *e) > +{ > +printf("ADD %s:\n", op_type); > +printf(" conj-id-ofs: %u\n", conj_id_ofs); > + > +if (!strcmp(op_type, "conj-id")) { > +lflow_cache_add_conj_id(lc, lflow, conj_id_ofs); > +} else if (!strcmp(op_type, "expr")) { > +lflow_cache_add_expr(lc, lflow, conj_id_ofs, expr_clone(e)); > +} else if (!strcmp(op_type, "matches")) { > +struct hmap *matches = xmalloc(sizeof *matches); > +ovs_assert(expr_to_matches(e, NULL, NULL, matches) == 0); > +ovs_assert(hmap_count(matches) == 1); > +lflow_cache_add_matches(lc, lflow, matches); > +} else { > +OVS_NOT_REACHED(); > +} > +} > + > +static void > +test_lflow_cache_lookup__(struct lflow_cache *lc, > + const struct sbrec_logical_flow *lflow) > +{ > +struct lflow_cache_value *lcv = lflow_cache_get(lc, lflow); > + > +printf("LOOKUP:\n"); > +if (!lcv) { > +printf(" not found\n"); > +return; > +} > + > +printf(" conj_id_ofs: %"PRIu32"\n", lcv->conj_id_ofs); > +switch (lcv->type) { > +case LCACHE_T_CONJ_ID: > +printf(" type: conj-id\n"); > +break; > +case LCACHE_T_EXPR: > +printf(" type: expr\n"); > +break; > +case LCACHE_T_MATCHES: > +printf(" type: matches\n"); > +break; > +case LCACHE_T_NONE: > +OVS_NOT_REACHED(); > +break; > +} > +} > + > +static void > +test_lflow_cache_delete__(struct lflow_cache *lc, > + const struct sbrec_logical_flow *lflow) > +{ > +printf("DELETE\n"); > +lflow_cache_delete(lc, lflow); > +} > + > +static void > +test_lflow_cache_stats__(struct lflow_cache *lc) > +{ > +struct lflow_cache_stats *lcs = lflow_cache_get_stats(lc); > + > +if (!lcs) { > +return; > +} > +printf("Enabled: %s\n", lflow_cache_is_enabled(lc) ? "true" : "false"); > +for (size_t i = 0; i < LCACHE_T_MAX; i++) { > +printf(" %s: %"PRIuSIZE"\n", lflow_cache_type_names[i], > + lcs->n_entries[i]); > +} > +free(lcs); > +} > + > +static void > +test_lflow_cache_operations(struct ovs_cmdl_context *ctx) > +{ > +struct lflow_cache *lc = lflow_cache_create(); > +struct expr *e = expr_create_boolean(true); > +bool enabled = !strcmp(ctx->argv[1], "true"); > +unsigned int shift = 2; > +unsigned int n_ops; > + > +lflow_cache_enable(lc, enabled); > +test_lflow_cache_stats__(lc); > + > +if (!test_read_uint_value(ctx, shift++, "n_ops", _ops)) { > +goto done; > +} > + > +for (unsigned int i = 0; i < n_ops; i++) { > +const char *op = test_read_value(ctx, shift++, "op"); > + > +if (!op) { > +goto done; > +} > + > +struct sbrec_logical_flow lflow; > +uuid_generate(_.uuid); > + > +if (!strcmp(op, "add")) { > +
Re: [ovs-dev] [PATCH v2] netlink: ignore IFLA_WIRELESS events
On 1/14/2021 1:09 AM, Michal Kazior wrote: From: Michal Kazior Some older wireless drivers - ones relying on the old and long deprecated wireless extension ioctl system - can generate quite a bit of IFLA_WIRELESS events depending on their configuration and runtime conditions. These are delivered as RTNLGRP_LINK via RTM_NEWLINK messages. These tend to be relatively easily identifiable because they report the change mask being 0. This isn't guaranteed but in practice it shouldn't be a problem. None of the wireless events that I ever observed actually carry any unique information about netdev states that ovs-vswitchd is interested in. Hence ignoring these shouldn't cause any problems. These events can be responsible for a significant CPU churn as ovs-vswitchd attempts to do plenty of work for each and every netlink message regardless of what that message carries. On low-end devices such as consumer-grade routers these can lead to a lot of CPU cycles being wasted, adding up to heat output and reducing performance. It could be argued that wireless drivers in question should be fixed, but that isn't exactly a trivial thing to do. Patching ovs seems far more viable while still making sense. This change requires a slight tweak to the netlink abstraction code so that the events can be discarded as soon as possible before getting a chance to cause the churn. Signed-off-by: Michal Kazior Hi Michal, The patch looks fine to me. Applies cleanly to master and did not cause any regressions in 'make check'. I'm curious since I don't ever use OVS with wireless ports if there is somewhere this issue has been reported. If so then we should add it to the commit message. Otherwise, Acked-by: Greg Rose --- Notes: v2: - fix bracing style [0day robot / checkpatch] lib/netdev-linux.c | 4 ++-- lib/netlink-notifier.c | 4 +++- lib/rtnetlink.c| 24 +--- lib/rtnetlink.h| 2 +- 4 files changed, 27 insertions(+), 7 deletions(-) diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index 6be23dbee..301dd9060 100644 --- a/lib/netdev-linux.c +++ b/lib/netdev-linux.c @@ -742,7 +742,7 @@ netdev_linux_run(const struct netdev_class *netdev_class OVS_UNUSED) if (!error) { struct rtnetlink_change change; -if (rtnetlink_parse(, )) { +if (rtnetlink_parse(, ) > 0) { struct netdev *netdev_ = NULL; char dev_name[IFNAMSIZ]; @@ -6343,7 +6343,7 @@ netdev_linux_update_via_netlink(struct netdev_linux *netdev) return error; } -if (rtnetlink_parse(reply, change) +if (rtnetlink_parse(reply, change) > 0 && change->nlmsg_type == RTM_NEWLINK) { bool changed = false; error = 0; diff --git a/lib/netlink-notifier.c b/lib/netlink-notifier.c index dfecb9778..ab5a84abb 100644 --- a/lib/netlink-notifier.c +++ b/lib/netlink-notifier.c @@ -189,8 +189,10 @@ nln_run(struct nln *nln) if (!error) { int group = nln->parse(, nln->change); -if (group != 0) { +if (group > 0) { nln_report(nln, nln->change, group); +} else if (group == 0) { +/* ignore some events */ } else { VLOG_WARN_RL(, "unexpected netlink message contents"); nln_report(nln, NULL, 0); diff --git a/lib/rtnetlink.c b/lib/rtnetlink.c index 125802925..316524c0f 100644 --- a/lib/rtnetlink.c +++ b/lib/rtnetlink.c @@ -82,7 +82,7 @@ rtnetlink_parse_link_info(const struct nlattr *nla, /* Parses a rtnetlink message 'buf' into 'change'. If 'buf' is unparseable, * leaves 'change' untouched and returns false. Otherwise, populates 'change' * and returns true. */ -bool +int rtnetlink_parse(struct ofpbuf *buf, struct rtnetlink_change *change) { const struct nlmsghdr *nlmsg = buf->data; @@ -99,6 +99,7 @@ rtnetlink_parse(struct ofpbuf *buf, struct rtnetlink_change *change) [IFLA_MTU]= { .type = NL_A_U32,.optional = true }, [IFLA_ADDRESS] = { .type = NL_A_UNSPEC, .optional = true }, [IFLA_LINKINFO] = { .type = NL_A_NESTED, .optional = true }, +[IFLA_WIRELESS] = { .type = NL_A_UNSPEC, .optional = true }, }; struct nlattr *attrs[ARRAY_SIZE(policy)]; @@ -111,6 +112,23 @@ rtnetlink_parse(struct ofpbuf *buf, struct rtnetlink_change *change) ifinfo = ofpbuf_at(buf, NLMSG_HDRLEN, sizeof *ifinfo); +/* Wireless events can be spammy and cause a + * lot of unnecessary churn and CPU load in + * ovs-vswitchd. The best way to filter them out + * is to rely on the IFLA_WIRELESS and + * ifi_change. As per rtnetlink_ifinfo_prep() in + * the kernel, the ifi_change = 0. That combined + * with the fact wireless events never really + * change
Re: [ovs-dev] [PATCH] netdev-afxdp: Add start qid support.
On Mon, Feb 8, 2021 at 4:58 AM Ilya Maximets wrote: > > On 2/7/21 5:05 PM, Toshiaki Makita wrote: > > On 2021/02/07 2:00, William Tu wrote: > >> On Fri, Feb 5, 2021 at 1:08 PM Gregory Rose wrote: > >>> On 2/4/2021 7:08 PM, William Tu wrote: > On Thu, Feb 4, 2021 at 3:17 PM Gregory Rose wrote: > > On 2/3/2021 1:21 PM, William Tu wrote: > >> Mellanox card has different XSK design. It requires users to create > >> dedicated queues for XSK. Unlike Intel's NIC which loads XDP program > >> to all queues, Mellanox only loads XDP program to a subset of its > >> queue. > >> > >> When OVS uses AF_XDP with mlx5, it doesn't replace the existing RX and > >> TX > >> queues in the channel with XSK RX and XSK TX queues, but it creates an > >> additional pair of queues for XSK in that channel. To distinguish > >> regular and XSK queues, mlx5 uses a different range of qids. > >> That means, if the card has 24 queues, queues 0..11 correspond to > >> regular queues, and queues 12..23 are XSK queues. > >> In this case, we should attach the netdev-afxdp with 'start-qid=12'. > >> > >> I tested using Mellanox Connect-X 6Dx, by setting 'start-qid=1', and: > >> $ ethtool -L enp2s0f0np0 combined 1 > >> # queue 0 is for non-XDP traffic, queue 1 is for XSK > >> $ ethtool -N enp2s0f0np0 flow-type udp4 action 1 > >> note: we need additionally add flow-redirect rule to queue 1 > > > > Seems awfully hardware dependent. Is this just for Mellanox or does > > it have general usefulness? > > > It is just Mellanox's design which requires pre-configure the > flow-director. > I only have cards from Intel and Mellanox so I don't know about other > vendors. > > Thanks, > William > > >>> > >>> I think we need to abstract the HW layer a little bit. This start-qid > >>> option is specific to a single piece of HW, at least at this point. > >>> We should expect that further HW specific requirements for > >>> different NIC vendors will come up in the future. I suggest > >>> adding a hw_options:mellanox:start-qid type hierarchy so that > >>> as new HW requirements come up we can easily scale. It will > >>> also make adding new vendors easier in the future. > >>> > >>> Even with NIC vendors you can't always count on each new generation > >>> design to always keep old requirements and methods for feature > >>> enablement. > >>> > >>> What do you think? > >>> > >> Thanks for the feedback. > >> So far I don't know whether other vendors will need this option or not. > > > > FWIU, this api "The lower half of the available amount of RX queues are > > regular queues, and the upper half are XSK RX queues." is the result of > > long discussion to support dedicated/isolated XSK rings, which is not meant > > for a mellanox-specific feature. > > > > https://patchwork.ozlabs.org/project/netdev/cover/20190524093431.20887-1-maxi...@mellanox.com/ > > https://patchwork.ozlabs.org/project/netdev/cover/20190612155605.22450-1-maxi...@mellanox.com/ > > > > Toshiaki Makita > > Thanks for the links. Very helpful. > > From what I understand lower half of queues should still work, i.e. > it should still be possible to attach AF_XDP socket to them. But > they will not work in zero-copy mode ("generic" only?). > William, could you check that? Does it work and with which mode > "best-effort" ends up with? And what kind of errors libbpf returns > if we're trying to enable zero-copy? Thanks for your feedback. Yes, only zero-copy mode needs to be aware of this, meaning zero-copy mode has to use the upper half of the queues (the start-qid option here). Native mode and SKB mode works OK on upper and lower queues. When attaching zc XSK to lower half queue, libbpf returns EINVAL at xsk_socket__create(). William ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 ovn 00/10] ovn-controller: Make lflow cache size configurable.
Hi Dumitru, Thanks for the v2. I had one small nit on patch 6. Aside from that (and Numan's comments), looks good. Acked-by: Mark Michelson On 2/4/21 8:24 AM, Dumitru Ceara wrote: Scale tests have identified the lflow cache to be one of the main memory consumers in ovn-controller. This series refactors the lflow cache code and adds configuration knobs to limit the size (in lines and/or memory) of the cache. Patches 1 and 6 fix issues with the already existing lflow cache code. Even though patch 6 is a bug fix, it's easier to add it later in the series because it uses the new lflow cache statistics (from patch 4) to add a unit test that exercises the buggy scenario. Changes in v2: - Added two bug fixes for already existing problems (patches 1 and 6). - Added unit tests as requested by Mark. - Added support for evicting "less important" entries when the cache limit is reached. - Improved cache entries memory accounting. Dumitru Ceara (10): lflow: Fix cache update when I-P engine aborts. lflow: Refactor convert_match_to_expr() to explicitly consume prereqs. lflow-cache: Move the lflow cache to its own module. lflow-cache: Add lflow-cache/show-stats command. lflow-cache: Add unit tests. lflow: Do not cache non-conjunctive flows that use address sets/portgroups. lflow-cache: Add coverage counters. lflow-cache: Reclaim heap memory after cache flush. lflow-cache: Make maximum number of cache entries configurable. lflow-cache: Make max cache memory usage configurable. NEWS|5 configure.ac|1 controller/automake.mk |2 controller/chassis.c| 44 controller/lflow-cache.c| 363 +++ controller/lflow-cache.h| 89 + controller/lflow.c | 376 +--- controller/lflow.h | 10 - controller/ovn-controller.8.xml | 23 ++ controller/ovn-controller.c | 114 --- controller/test-lflow-cache.c | 239 +++ controller/test-ofctrl-seqno.c | 18 -- include/ovn/expr.h |3 lib/expr.c | 43 tests/automake.mk |8 + tests/ovn-lflow-cache.at| 405 +++ tests/ovn.at| 82 tests/test-utils.c | 49 + tests/test-utils.h | 26 +++ tests/testsuite.at |1 20 files changed, 1593 insertions(+), 308 deletions(-) create mode 100644 controller/lflow-cache.c create mode 100644 controller/lflow-cache.h create mode 100644 controller/test-lflow-cache.c create mode 100644 tests/ovn-lflow-cache.at create mode 100644 tests/test-utils.c create mode 100644 tests/test-utils.h ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v2 06/10] lflow: Do not cache non-conjunctive flows that use address sets/portgroups.
Hi Dumitru, just a small nit below On 2/4/21 8:25 AM, Dumitru Ceara wrote: Caching the conjunction id offset for flows that refer to address sets and/or port groups but do not currently generate conjunctive matches, e.g., because the port group has only one locally bound port, is wrong. If ports are later added to the port group and/or addresses are added to the address set, this flow will be reconsidered but at this point will generate conjunctive matches. We cannot use the cached conjunction id offset because it might create collisions with other conjunction ids created for unrelated flows. Fixes: 1213bc827040 ("ovn-controller: Cache logical flow expr matches.") Signed-off-by: Dumitru Ceara --- controller/lflow.c |2 +- tests/ovn.at | 12 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/controller/lflow.c b/controller/lflow.c index 3e3605a..c493652 100644 --- a/controller/lflow.c +++ b/controller/lflow.c @@ -884,7 +884,7 @@ consider_logical_flow__(const struct sbrec_logical_flow *lflow, lflow_cache_add_expr(l_ctx_out->lflow_cache, lflow, conj_id_ofs, cached_expr); cached_expr = NULL; -} else { +} else if (n_conjs) { lflow_cache_add_conj_id(l_ctx_out->lflow_cache, lflow, conj_id_ofs); } diff --git a/tests/ovn.at b/tests/ovn.at index 0e114cf..4bb92d6 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -22300,6 +22300,18 @@ AT_CHECK([test "$(($conj_id_cnt + 2))" = "$(get_cache_count cache-conj-id)"], [0 AT_CHECK([test "$expr_cnt" = "$(get_cache_count cache-expr)"], [0], []) AT_CHECK([test "$matches_cnt" = "$(get_cache_count cache-matches)"], [0], []) +AS_BOX([Check no caching caching for non-conjunctive port group/address set matches]) s/caching caching/caching/ +conj_id_cnt=$(get_cache_count cache-conj-id) +expr_cnt=$(get_cache_count cache-expr) +matches_cnt=$(get_cache_count cache-matches) + +check ovn-nbctl acl-add ls1 from-lport 1 'inport == @pg2 && outport == @pg2 && is_chassis_resident("lsp1")' drop +check ovn-nbctl --wait=hv sync + +AT_CHECK([test "$conj_id_cnt" = "$(get_cache_count cache-conj-id)"], [0], []) +AT_CHECK([test "$expr_cnt" = "$(get_cache_count cache-expr)"], [0], []) +AT_CHECK([test "$matches_cnt" = "$(get_cache_count cache-matches)"], [0], []) + OVN_CLEANUP([hv1]) AT_CLEANUP ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] northd: improve OVN BFD documentation
On Mon, Feb 08, 2021 at 11:49:03AM +0100, Lorenzo Bianconi wrote: > Signed-off-by: Lorenzo Bianconi Thanks! Acked-by: Ben Pfaff I suggest folding in the following as well to make it clearer the roles of the different groups of columns. diff --git a/ovn-nb.xml b/ovn-nb.xml index 58083f101445..cd297a12abcb 100644 --- a/ovn-nb.xml +++ b/ovn-nb.xml @@ -3763,6 +3763,10 @@ + +ovn-northd reads configuration from these columns. + + OVN logical port when BFD engine is running. @@ -3802,6 +3806,10 @@ + +ovn-northd writes BFD status into these columns. + + BFD port logical states. Possible values are: ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] Vhost Server Ports future plans and container networking
Hi! I would like to know what the plans are for going forward with vhost server ports. Judging from the documentation, it seems they will be deprecated and removed at a future point in time. This presents a problem for container networking today. The container networking solution suggested by DPDK leverages VIRTIO PMD Virtual Devices linking up on a socket being hosted by the vhost server. This is the vhost-user port we have today in OVS. The vhost-user-client relies on QEMU or an external process to create the socket and then a VIRTIO PMD to link up with it for communications. This is not ideal for container networking where it is better to have OVS start this server as we've seen so far. This also unifies the socket creation mechanism to OVS rather than having it be distribution specific. The vhost-user-client does not directly interact with DPDK's vhost-server option (beyond registration of its existence). It registers with the socket and the two acknowledge each other, but no initialization is done to bring the link up. This requires a virtio PMD vdev to initialize the vhost socket correctly (only function to actually send messages during initialization-vhost-client does not do this) as we've seen so far. For more info on this, check drivers/net/virtio/virtio_user_ethdev.c for virtio_user_dev_init() Or virtio_user_dev_init definition at drivers/net/virtio/virtio_user/virtio_user_dev.c The above is the actual interaction between a VIRTIO PMD and Vhost server, this is the only location I see this invoked where everything is set and the 'add_device' callback is actually setup on the vhost side. IPC use cases are not addressed with the current plans going forward for vhost (from what we have seen), how does OVS team plan to maintain support for IPC and container networking? We use vhost-server ovs ports with virtio vdevs on the containers. If the current plan is still to deprecate this method, then our next plan will be to use Virtio PMD Vdev ports on OVS with an external listener to reset/add/delete the vdev port when the socket goes down and when it is created again. This method is in fact better for keeping up time on the containers if OVS goes down for the same reason vhost-user-client is better than vhost-user for QEMU environments. In short, vhost-user-client does not work with vhost-user server on DPDK for IPC applications as things stand now. Vhost-user server is still the predominant choice for container networking with Virtio Vdevs. Any feedback or suggestions are welcome! If there are other (better) methods to support IPC/container networking with OVS, please let us know. (Looks like DPDK Rings were removed some time back as well) Thanks for your time and help! Regards, Marc Magnuson ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] netdev-afxdp: Add start qid support.
On 2/6/2021 9:00 AM, William Tu wrote: On Fri, Feb 5, 2021 at 1:08 PM Gregory Rose wrote: On 2/4/2021 7:08 PM, William Tu wrote: On Thu, Feb 4, 2021 at 3:17 PM Gregory Rose wrote: On 2/3/2021 1:21 PM, William Tu wrote: Mellanox card has different XSK design. It requires users to create dedicated queues for XSK. Unlike Intel's NIC which loads XDP program to all queues, Mellanox only loads XDP program to a subset of its queue. When OVS uses AF_XDP with mlx5, it doesn't replace the existing RX and TX queues in the channel with XSK RX and XSK TX queues, but it creates an additional pair of queues for XSK in that channel. To distinguish regular and XSK queues, mlx5 uses a different range of qids. That means, if the card has 24 queues, queues 0..11 correspond to regular queues, and queues 12..23 are XSK queues. In this case, we should attach the netdev-afxdp with 'start-qid=12'. I tested using Mellanox Connect-X 6Dx, by setting 'start-qid=1', and: $ ethtool -L enp2s0f0np0 combined 1 # queue 0 is for non-XDP traffic, queue 1 is for XSK $ ethtool -N enp2s0f0np0 flow-type udp4 action 1 note: we need additionally add flow-redirect rule to queue 1 Seems awfully hardware dependent. Is this just for Mellanox or does it have general usefulness? It is just Mellanox's design which requires pre-configure the flow-director. I only have cards from Intel and Mellanox so I don't know about other vendors. Thanks, William I think we need to abstract the HW layer a little bit. This start-qid option is specific to a single piece of HW, at least at this point. We should expect that further HW specific requirements for different NIC vendors will come up in the future. I suggest adding a hw_options:mellanox:start-qid type hierarchy so that as new HW requirements come up we can easily scale. It will also make adding new vendors easier in the future. Even with NIC vendors you can't always count on each new generation design to always keep old requirements and methods for feature enablement. What do you think? Thanks for the feedback. So far I don't know whether other vendors will need this option or not. I think adding another "hw_options" is a little confusing because this is already an option on the device. Looking at AF_XDP driver at DPDK, it also has similar option: see start_queue https://doc.dpdk.org/guides/nics/af_xdp.html OK, makes sense in this context then. Thanks, - Greg ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] 答复: 答复: [PATCH V3 2/4] Add GSO support for DPDK data path
On Mon, Feb 8, 2021 at 6:36 AM Ilya Maximets wrote: > > On 2/8/21 1:56 AM, Yi Yang (杨燚)-云服务集团 wrote: > > Yes, GSO is ok, but GRO may be have that issue, I didn't see that issue in > > my openstack environment, so maybe it will be great if we can have a test > > case to trigger that issue. > > > > -邮件原件- > > 发件人: William Tu [mailto:u9012...@gmail.com] > > 发送时间: 2021年2月7日 23:46 > > 收件人: Yi Yang (杨燚)-云服务集团 > > 抄送: i.maxim...@ovn.org; yang_y...@163.com; ovs-dev@openvswitch.org; > > f...@sysclose.org > > 主题: Re: [ovs-dev] 答复: [PATCH V3 2/4] Add GSO support for DPDK data path > > > > On Tue, Oct 27, 2020 at 6:02 PM Yi Yang (杨燚)-云服务集团 > > wrote: > >> > >> -邮件原件- > >> 发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 Ilya Maximets > >> 发送时间: 2020年10月27日 21:03 > >> 收件人: yang_y...@163.com; ovs-dev@openvswitch.org > >> 抄送: f...@sysclose.org; i.maxim...@ovn.org > >> 主题: Re: [ovs-dev] [PATCH V3 2/4] Add GSO support for DPDK data path > >> > >> On 8/7/20 12:56 PM, yang_y...@163.com wrote: > >>> From: Yi Yang > >>> > >>> GSO(Generic Segment Offload) can segment large UDP and TCP packet > >>> to small packets per MTU of destination , especially for the case > >>> that physical NIC can't do hardware offload VXLAN TSO and VXLAN UFO, > >>> GSO can make sure userspace TSO can still work but not drop. > >>> > >>> In addition, GSO can help improve UDP performane when UFO is enabled > >>> in VM. > >>> > >>> GSO can support TCP, UDP, VXLAN TCP, VXLAN UDP, it is done in Tx > >>> function of physical NIC. > >>> > >>> Signed-off-by: Yi Yang > >>> --- > >>> lib/dp-packet.h| 21 +++- > >>> lib/netdev-dpdk.c | 358 > >>> + > >>> lib/netdev-linux.c | 17 ++- > >>> lib/netdev.c | 67 +++--- > >>> 4 files changed, 417 insertions(+), 46 deletions(-) > > > > snip > > > >>> > >>> @@ -2339,24 +2428,19 @@ netdev_dpdk_prep_hwol_batch(struct netdev_dpdk > >>> *dev, struct rte_mbuf **pkts, > >>> return cnt; > >>> } > >>> > >>> -/* Tries to transmit 'pkts' to txq 'qid' of device 'dev'. Takes > >>> ownership of > >>> - * 'pkts', even in case of failure. > >>> - * > >>> - * Returns the number of packets that weren't transmitted. */ > >>> static inline int -netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int > >>> qid, > >>> - struct rte_mbuf **pkts, int cnt) > >>> +__netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int qid, > >>> + struct rte_mbuf **pkts, int cnt) > >>> { > >>> uint32_t nb_tx = 0; > >>> -uint16_t nb_tx_prep = cnt; > >>> +uint32_t nb_tx_prep; > >>> > >>> -if (userspace_tso_enabled()) { > >>> -nb_tx_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt); > >>> -if (nb_tx_prep != cnt) { > >>> -VLOG_WARN_RL(, "%s: Output batch contains invalid > >>> packets. " > >>> - "Only %u/%u are valid: %s", dev->up.name, > >>> nb_tx_prep, > >>> - cnt, rte_strerror(rte_errno)); > >>> -} > >>> +nb_tx_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt); > >>> +if (nb_tx_prep != cnt) { > >>> +VLOG_WARN_RL(, "%s: Output batch contains invalid packets. " > >>> + "Only %u/%u are valid: %s", > >>> + dev->up.name, nb_tx_prep, > >>> + cnt, rte_strerror(rte_errno)); > >>> } > >>> > >>> while (nb_tx != nb_tx_prep) { > >>> @@ -2384,6 +2468,200 @@ netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, > >>> int qid, > >>> return cnt - nb_tx; > >>> } > >>> > >>> +static inline void > >>> +set_multiseg_udptcp_cksum(struct rte_mbuf *mbuf) > >> > >> I didn't review the patch, only had a quick glance, but this part bothers > >> me. OVS doesn't support multi-segment mbufs, so it should not be possible > >> for such mbufs being transmitted by OVS. So, I do not understand why this > >> function needs to work with such mbufs. > >> > >> [Yi Yang] Only DPDK driver/Tx function will use it, not OVS, > >> set_multiseg_udptcp_cksum is called in GSO part, it is last step before Tx > >> function, it is a big external mbuf before rte_gso_segment, that isn't a > >> multi-segmented mbuf. > >> > > > > Hi Ilya, > > > > Now I understand Yi Yang's point better and I agree with him. > > Looks like the patch does the GSO at the DPDK TX function. > > It creates multi-seg mbuf after rte_gso_segment(), but will immediately > > send out the multi-seg mbuf to DPDK port, without traversing inside other > > part of OVS code. I guess this case it should work OK? > > Hi. GSO itself should be possible to implement, but we should > not enable support for multi-segment mbufs in the NIC, since this > might end up in multi-segment packets being received by OVS > causing lots of trouble (for example dp_packet_clone() uses simple > memcpy() that will result in invalid memory reads.) I see your point, thanks! I guess you're saying
Re: [ovs-dev] 答复: 答复: [PATCH] netdev-dpdk: fix incorrect shinfo initialization
On Mon, Feb 8, 2021 at 8:57 AM Ilya Maximets wrote: > > On 2/6/21 5:15 PM, William Tu wrote: > > On Mon, Feb 1, 2021 at 5:48 PM Yi Yang (杨燚)-云服务集团 > > wrote: > >> > >> Thanks Ilya, net_tap PMD is handling tap device on host side, so it can > >> leverage vnet header to do TSO/GSO, maybe net_pmd authors don't know how > >> to do this, from source code, tap fd isn't enabled vnet header and TSO. > >> > > thanks, learned a lot from these discussions. > > > > I looked at the DPDK net_tap and indeed it doesn't support virtio net hdr. > > Do you guys think it makes sense to add TSO at dpdk net_tap? > > Or simply using the current OVS's userspace-enable-tso on tap/veth is > > good enough? > > (using type=system, not using dpdk port type on tap/veth.) > > > > Regards, > > William > > > > I didn't benchmark all types of interfaces, but I'd say that, if you > need more or less high performance solution for userspace<->kernel > communication, you should, probably, take a look at virtio-user > ports with vhost kernel backend: > https://doc.dpdk.org/guides/howto/virtio_user_as_exceptional_path.html > This should be the fastest and also feature-rich solution. Thanks! I will give it a try. > > Tap devices are not designed for high performance in general, > so I'd not suggest any of them for highly loaded ports. > If it's only for some small management traffic, it should be fine > to just use netdev-linux implementation. That's what I thought until Flavio enables vnet header. > > netdev-afxdp with pmd or non-pmd modes on a veth devices is another > (potentially high performance) solution. When testing intra-host container to container performance, Tap device becomes much faster than netdev-afxdp, especially with iperf TCP. Mostly due to vnet header's TSO and csum offload feature. It's a big limitation for XDP frame which couldn't carry large buffer or carry the partial csum information. I reach a conclusion that for intra-host container to container TCP performance, from the fastest configuration to slowest (ns: namespace) 0) dpdk vhostuser in ns0 -> vhostuer - OVS userspace (But requires TCP in userspace and application modification) 1) veth0 in ns0 -> veth with TSO - OVS kernel module - veth with TSO -> veth1 in ns1 2) tap0 in ns0 -> virtio_user - OVS userspace - virtio_user -> tap1 in ns1 3) tap0 in ns0 -> recv_tap - OVS with userspace-tso - tap_batch_send -> tap1 in ns1 4) veth0 in ns0 -> af_packet sock - OVS with userspace-tso - af_packet_sock -> veth1 in ns1 5) veth0 in ns0 -> netdev-afxdp - OVS - netdev-afxdp -> veth1 in ns1 I also tested Toshiaki's XDP offload patch, https://www.mail-archive.com/ovs-dev@openvswitch.org/msg45930.html I would guess it's between 2 to 4. Regards, William ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] 答复: 答复: [PATCH] netdev-dpdk: fix incorrect shinfo initialization
On 2/6/21 5:15 PM, William Tu wrote: > On Mon, Feb 1, 2021 at 5:48 PM Yi Yang (杨燚)-云服务集团 wrote: >> >> Thanks Ilya, net_tap PMD is handling tap device on host side, so it can >> leverage vnet header to do TSO/GSO, maybe net_pmd authors don't know how to >> do this, from source code, tap fd isn't enabled vnet header and TSO. >> > thanks, learned a lot from these discussions. > > I looked at the DPDK net_tap and indeed it doesn't support virtio net hdr. > Do you guys think it makes sense to add TSO at dpdk net_tap? > Or simply using the current OVS's userspace-enable-tso on tap/veth is > good enough? > (using type=system, not using dpdk port type on tap/veth.) > > Regards, > William > I didn't benchmark all types of interfaces, but I'd say that, if you need more or less high performance solution for userspace<->kernel communication, you should, probably, take a look at virtio-user ports with vhost kernel backend: https://doc.dpdk.org/guides/howto/virtio_user_as_exceptional_path.html This should be the fastest and also feature-rich solution. Tap devices are not designed for high performance in general, so I'd not suggest any of them for highly loaded ports. If it's only for some small management traffic, it should be fine to just use netdev-linux implementation. netdev-afxdp with pmd or non-pmd modes on a veth devices is another (potentially high performance) solution. Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 00/15] Netdev vxlan-decap offload
On Mon, Feb 8, 2021 at 7:33 PM Eli Britstein wrote: > > > On 2/8/2021 3:11 PM, Sriharsha Basavapatna wrote: > > On Sun, Feb 7, 2021 at 4:58 PM Eli Britstein wrote: > >> > >> On 2/5/2021 8:26 PM, Sriharsha Basavapatna wrote: > >>> On Fri, Feb 5, 2021 at 4:55 PM Sriharsha Basavapatna > >>> wrote: > On Wed, Jan 27, 2021 at 11:40 PM Eli Britstein wrote: > > VXLAN decap in OVS-DPDK configuration consists of two flows: > > F1: in_port(ens1f0),eth(),ipv4(),udp(), actions:tnl_pop(vxlan_sys_4789) > > F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0 > > > > F1 is a classification flow. It has outer headers matches and it > > classifies the packet as a VXLAN packet, and using tnl_pop action the > > packet continues processing in F2. > > F2 is a flow that has matches on tunnel metadata as well as on the inner > > packet headers (as any other flow). > > > > In order to fully offload VXLAN decap path, both F1 and F2 should be > > offloaded. As there are more than one flow in HW, it is possible that > > F1 is done by HW but F2 is not. Packet is received by SW, and should be > > processed starting from F2 as F1 was already done by HW. > > Rte_flows are applicable only on physical port IDs. Vport flows (e.g. > > F2) > > are applied on uplink ports attached to OVS. > > > > This patch-set makes use of [1] introduced in DPDK 20.11, that adds API > > for tunnel offloads. > > > > Travis: > > v1: https://travis-ci.org/github/elibritstein/OVS/builds/756418552 > > > > GitHub Actions: > > v1: https://github.com/elibritstein/OVS/actions/runs/515334647 > > > > [1] https://mails.dpdk.org/archives/dev/2020-October/187314.html > > > > Eli Britstein (13): > > netdev-offload: Add HW miss packet state recover API > > netdev-dpdk: Introduce DPDK tunnel APIs > > netdev-offload-dpdk: Implement flow dump create/destroy APIs > > netdev-dpdk: Add flow_api support for netdev vxlan vports > > netdev-offload-dpdk: Implement HW miss packet recover for vport > > dpif-netdev: Add HW miss packet state recover logic > > netdev-offload-dpdk: Change log rate limits > > netdev-offload-dpdk: Support tunnel pop action > > netdev-offload-dpdk: Refactor offload rule creation > > netdev-dpdk: Introduce an API to query if a dpdk port is an uplink > > port > > netdev-offload-dpdk: Map netdev and ufid to offload objects > > netdev-offload-dpdk: Support vports flows offload > > netdev-dpdk-offload: Add vxlan pattern matching function > > > > Ilya Maximets (2): > > netdev-offload: Allow offloading to netdev without ifindex. > > netdev-offload: Disallow offloading to unrelated tunneling vports. > > > >Documentation/howto/dpdk.rst | 1 + > >NEWS | 2 + > >lib/dpif-netdev.c | 49 +- > >lib/netdev-dpdk.c | 135 ++ > >lib/netdev-dpdk.h | 104 - > >lib/netdev-offload-dpdk.c | 851 > > +- > >lib/netdev-offload-provider.h | 5 + > >lib/netdev-offload-tc.c | 8 + > >lib/netdev-offload.c | 29 +- > >lib/netdev-offload.h | 1 + > >10 files changed, 1033 insertions(+), 152 deletions(-) > > > > -- > > 2.28.0.546.g385c171 > > > Hi Eli, > > Thanks for posting this new patchset to support tunnel decap action > offload. > > I haven't looked at the entire patchset yet. But I focused on the > patches that introduce 1-to-many mapping between an OVS flow (f2) and > HW offloaded flows. > > Here is a representation of the design proposed in this patchset. A > flow f2 (inner flow) between the VxLAN-vPort and VFRep-1, for which > the underlying uplink/physical port is P0, gets offloaded to not only > P0, but also to other physical ports P1, P2... and so on. > > P0 <> VxLAN-vPort <> VFRep-1 > > P1 > P2 > ... > Pn > > IMO, the problems with this design are: > > - Offloading a flow to an unrelated physical device that has nothing > to do with that flow (invalid device for the flow). > - Offloading to not just one, but several such invalid physical devices. > - Consuming HW resources for a flow that is never seen or intended to > be processed by those physical devices. > - Impacts flow scale on other physical devices, since it would consume > their HW resources with a large number of such invalid flows. > - The indirect list used to track these multiple mappings complicates > the offload layer implementation. > - The addition of flow_dump_create() to offload APIs, just to
Re: [ovs-dev] 答复: 答复: [PATCH V3 2/4] Add GSO support for DPDK data path
On 2/8/21 1:56 AM, Yi Yang (杨燚)-云服务集团 wrote: > Yes, GSO is ok, but GRO may be have that issue, I didn't see that issue in > my openstack environment, so maybe it will be great if we can have a test > case to trigger that issue. > > -邮件原件- > 发件人: William Tu [mailto:u9012...@gmail.com] > 发送时间: 2021年2月7日 23:46 > 收件人: Yi Yang (杨燚)-云服务集团 > 抄送: i.maxim...@ovn.org; yang_y...@163.com; ovs-dev@openvswitch.org; > f...@sysclose.org > 主题: Re: [ovs-dev] 答复: [PATCH V3 2/4] Add GSO support for DPDK data path > > On Tue, Oct 27, 2020 at 6:02 PM Yi Yang (杨燚)-云服务集团 > wrote: >> >> -邮件原件- >> 发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 Ilya Maximets >> 发送时间: 2020年10月27日 21:03 >> 收件人: yang_y...@163.com; ovs-dev@openvswitch.org >> 抄送: f...@sysclose.org; i.maxim...@ovn.org >> 主题: Re: [ovs-dev] [PATCH V3 2/4] Add GSO support for DPDK data path >> >> On 8/7/20 12:56 PM, yang_y...@163.com wrote: >>> From: Yi Yang >>> >>> GSO(Generic Segment Offload) can segment large UDP and TCP packet >>> to small packets per MTU of destination , especially for the case >>> that physical NIC can't do hardware offload VXLAN TSO and VXLAN UFO, >>> GSO can make sure userspace TSO can still work but not drop. >>> >>> In addition, GSO can help improve UDP performane when UFO is enabled >>> in VM. >>> >>> GSO can support TCP, UDP, VXLAN TCP, VXLAN UDP, it is done in Tx >>> function of physical NIC. >>> >>> Signed-off-by: Yi Yang >>> --- >>> lib/dp-packet.h| 21 +++- >>> lib/netdev-dpdk.c | 358 >>> + >>> lib/netdev-linux.c | 17 ++- >>> lib/netdev.c | 67 +++--- >>> 4 files changed, 417 insertions(+), 46 deletions(-) > > snip > >>> >>> @@ -2339,24 +2428,19 @@ netdev_dpdk_prep_hwol_batch(struct netdev_dpdk >>> *dev, struct rte_mbuf **pkts, >>> return cnt; >>> } >>> >>> -/* Tries to transmit 'pkts' to txq 'qid' of device 'dev'. Takes >>> ownership of >>> - * 'pkts', even in case of failure. >>> - * >>> - * Returns the number of packets that weren't transmitted. */ >>> static inline int -netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int >>> qid, >>> - struct rte_mbuf **pkts, int cnt) >>> +__netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int qid, >>> + struct rte_mbuf **pkts, int cnt) >>> { >>> uint32_t nb_tx = 0; >>> -uint16_t nb_tx_prep = cnt; >>> +uint32_t nb_tx_prep; >>> >>> -if (userspace_tso_enabled()) { >>> -nb_tx_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt); >>> -if (nb_tx_prep != cnt) { >>> -VLOG_WARN_RL(, "%s: Output batch contains invalid packets. " >>> - "Only %u/%u are valid: %s", dev->up.name, >>> nb_tx_prep, >>> - cnt, rte_strerror(rte_errno)); >>> -} >>> +nb_tx_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt); >>> +if (nb_tx_prep != cnt) { >>> +VLOG_WARN_RL(, "%s: Output batch contains invalid packets. " >>> + "Only %u/%u are valid: %s", >>> + dev->up.name, nb_tx_prep, >>> + cnt, rte_strerror(rte_errno)); >>> } >>> >>> while (nb_tx != nb_tx_prep) { >>> @@ -2384,6 +2468,200 @@ netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, >>> int qid, >>> return cnt - nb_tx; >>> } >>> >>> +static inline void >>> +set_multiseg_udptcp_cksum(struct rte_mbuf *mbuf) >> >> I didn't review the patch, only had a quick glance, but this part bothers >> me. OVS doesn't support multi-segment mbufs, so it should not be possible >> for such mbufs being transmitted by OVS. So, I do not understand why this >> function needs to work with such mbufs. >> >> [Yi Yang] Only DPDK driver/Tx function will use it, not OVS, >> set_multiseg_udptcp_cksum is called in GSO part, it is last step before Tx >> function, it is a big external mbuf before rte_gso_segment, that isn't a >> multi-segmented mbuf. >> > > Hi Ilya, > > Now I understand Yi Yang's point better and I agree with him. > Looks like the patch does the GSO at the DPDK TX function. > It creates multi-seg mbuf after rte_gso_segment(), but will immediately send > out the multi-seg mbuf to DPDK port, without traversing inside other part of > OVS code. I guess this case it should work OK? Hi. GSO itself should be possible to implement, but we should not enable support for multi-segment mbufs in the NIC, since this might end up in multi-segment packets being received by OVS causing lots of trouble (for example dp_packet_clone() uses simple memcpy() that will result in invalid memory reads.) GSO should be implemented in a same way TSO implemented. Right now TSO implementation has no software fallback to segment large packets if netdev doesn't support - it it simply drops them. However, software fallback should be implemented someday and it should be implemented in a generic way for all the netdev types instead of
Re: [ovs-dev] [PATCH] dpif-netdev: display EMC used entries for PMDs
Hi Kevin, thank you for your feedback Kevin Traynor writes: > Hi Paolo, > > On 14/01/2021 13:19, Paolo Valerio wrote: >> adds "emc entries" to "ovs-appctl dpif-netdev/pmd-stats-show" in order >> to show the number of alive entries. >> > > Thanks for working on this. Not a full review, but a few high level > comments. I would like to ask about the motivation - is it so a user can > judge the effectiveness of using the EMC to decide to disable it? or to > tune the EMC insertion rate? > the purpose is more like giving an extra bit of information that of course can be used for tuning. > If I run this and increase the flows, I see: > emc hits: 1961314 > emc entries: 6032 (73.63%) > smc hits: 0 > megaflow hits: 688946 > > If I look at pmd-perf-stats (below), it shows the % split between emc > and megaflow and I think I would use that to judge the effectiveness of > the emc. I'm not too fussed if the emc occupancy is high or low, I just > want to know if it is being effective at getting hits for my pkts, so > I'm not sure that 'emc entries: 6032 (73.63%)' helps me decide to > enable/disable it. > > - EMC hits: 1972766 ( 74.0 %) > - SMC hits: 0 ( 0.0 %) > - Megaflow hits: 693022 ( 26.0 %, 1.00 subtbl lookups/hit) > > Of course it doesn't account for the differing cost between them, but if > emc hits were a low %, I'd want to experiment with disabling it. > When you are approaching a reasonably high number of alive entries, and you expect for any reason that the number of flows for that PMD will grow significantly, it gives you an insight that the hits rate may decrease. > Depending on your motivation, you might also want to take a look at the > really nice fdb stats that Eelco did, it might give some ideas. > Thanks for the heads up. Paolo > Kevin. > >> Signed-off-by: Paolo Valerio >> --- >> NEWS | 2 ++ >> lib/dpif-netdev-perf.h | 2 ++ >> lib/dpif-netdev.c | 76 +++--- >> tests/pmd.at | 6 ++-- >> 4 files changed, 65 insertions(+), 21 deletions(-) >> >> diff --git a/NEWS b/NEWS >> index 617fe8e6a..e5d53a83b 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -20,6 +20,8 @@ Post-v2.14.0 >> * Add generic IP protocol support to conntrack. With this change, all >> none UDP, TCP, and ICMP traffic will be treated as general L3 >> traffic, i.e. using 3 tupples. >> + * EMC alive entries counter has been added to command >> + "ovs-appctl dpif-netdev/pmd-stats-show" >> - The environment variable OVS_UNBOUND_CONF, if set, is now used >> as the DNS resolver's (unbound) configuration file. >> - Linux datapath: >> diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h >> index 72645b6b3..80f50eae9 100644 >> --- a/lib/dpif-netdev-perf.h >> +++ b/lib/dpif-netdev-perf.h >> @@ -157,6 +157,8 @@ struct pmd_perf_stats { >> uint64_t last_tsc; >> /* Used to space certain checks in time. */ >> uint64_t next_check_tsc; >> +/* Exact Match Cache used entries counter. */ >> +atomic_uint32_t emc_n_entries; >> /* If non-NULL, outermost cycle timer currently running in PMD. */ >> struct cycle_timer *cur_timer; >> /* Set of PMD counters with their zero offsets. */ >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >> index 300861ca5..3eb70ccd5 100644 >> --- a/lib/dpif-netdev.c >> +++ b/lib/dpif-netdev.c >> @@ -913,7 +913,7 @@ dp_netdev_pmd_lookup_dpcls(struct dp_netdev_pmd_thread >> *pmd, >> odp_port_t in_port); >> >> static inline bool emc_entry_alive(struct emc_entry *ce); >> -static void emc_clear_entry(struct emc_entry *ce); >> +static void emc_clear_entry(struct emc_entry *ce, struct pmd_perf_stats *s); >> static void smc_clear_entry(struct smc_bucket *b, int idx); >> >> static void dp_netdev_request_reconfigure(struct dp_netdev *dp); >> @@ -955,13 +955,16 @@ dfc_cache_init(struct dfc_cache *flow_cache) >> } >> >> static void >> -emc_cache_uninit(struct emc_cache *flow_cache) >> +emc_cache_uninit(struct emc_cache *flow_cache, struct pmd_perf_stats *s) >> { >> int i; >> >> + >> for (i = 0; i < ARRAY_SIZE(flow_cache->entries); i++) { >> -emc_clear_entry(_cache->entries[i]); >> +emc_clear_entry(_cache->entries[i], s); >> } >> + >> +atomic_store_relaxed(>emc_n_entries, 0); >> } >> >> static void >> @@ -977,21 +980,21 @@ smc_cache_uninit(struct smc_cache *smc) >> } >> >> static void >> -dfc_cache_uninit(struct dfc_cache *flow_cache) >> +dfc_cache_uninit(struct dfc_cache *flow_cache, struct pmd_perf_stats *s) >> { >> smc_cache_uninit(_cache->smc_cache); >> -emc_cache_uninit(_cache->emc_cache); >> +emc_cache_uninit(_cache->emc_cache, s); >> } >> >> /* Check and clear dead flow references slowly (one entry at each >> * invocation). */ >> static void >> -emc_cache_slow_sweep(struct emc_cache *flow_cache) >>
Re: [ovs-dev] [PATCH ovn v2 04/10] lflow-cache: Add lflow-cache/show-stats command.
On Thu, Feb 4, 2021 at 6:55 PM Dumitru Ceara wrote: > > Signed-off-by: Dumitru Ceara Hi Dumitru, Please see below for a few comments. Thanks Numan > --- > controller/lflow-cache.c| 21 + > controller/lflow-cache.h|7 +++ > controller/ovn-controller.c | 29 + > 3 files changed, 57 insertions(+) > > diff --git a/controller/lflow-cache.c b/controller/lflow-cache.c > index ce129ac..8e1c71e 100644 > --- a/controller/lflow-cache.c > +++ b/controller/lflow-cache.c > @@ -21,6 +21,12 @@ > #include "lflow-cache.h" > #include "ovn/expr.h" > > +const char *lflow_cache_type_names[LCACHE_T_MAX] = { > +[LCACHE_T_CONJ_ID] = "cache-conj-id", > +[LCACHE_T_EXPR]= "cache-expr", > +[LCACHE_T_MATCHES] = "cache-matches", > +}; > + > struct lflow_cache { > struct hmap entries[LCACHE_T_MAX]; > bool enabled; > @@ -103,6 +109,21 @@ lflow_cache_is_enabled(struct lflow_cache *lc) > return lc && lc->enabled; > } > > +struct lflow_cache_stats * > +lflow_cache_get_stats(const struct lflow_cache *lc) > +{ > +if (!lc) { > +return NULL; > +} > + > +struct lflow_cache_stats *stats = xmalloc(sizeof *stats); > + > +for (size_t i = 0; i < LCACHE_T_MAX; i++) { > +stats->n_entries[i] = hmap_count(>entries[i]); > +} > +return stats; > +} > + Instead of exposing 'struct lflow_cache_stats', I think it's better if this function takes a 'struct ds *' as out parameter and fills in all the stat related details. This way, there is no need to extern the 'lflow_cache_type_names'. > void > lflow_cache_add_conj_id(struct lflow_cache *lc, > const struct sbrec_logical_flow *lflow, > diff --git a/controller/lflow-cache.h b/controller/lflow-cache.h > index 74a5b81..54873ee 100644 > --- a/controller/lflow-cache.h > +++ b/controller/lflow-cache.h > @@ -42,6 +42,8 @@ enum lflow_cache_type { > LCACHE_T_NONE = LCACHE_T_MAX, /* Not found in cache. */ > }; > > +extern const char *lflow_cache_type_names[LCACHE_T_MAX]; > + > struct lflow_cache_value { > enum lflow_cache_type type; > uint32_t conj_id_ofs; > @@ -52,11 +54,16 @@ struct lflow_cache_value { > }; > }; > > +struct lflow_cache_stats { > +size_t n_entries[LCACHE_T_MAX]; > +}; > + > struct lflow_cache *lflow_cache_create(void); > void lflow_cache_flush(struct lflow_cache *); > void lflow_cache_destroy(struct lflow_cache *); > void lflow_cache_enable(struct lflow_cache *, bool enabled); > bool lflow_cache_is_enabled(struct lflow_cache *); > +struct lflow_cache_stats *lflow_cache_get_stats(const struct lflow_cache *); > > void lflow_cache_add_conj_id(struct lflow_cache *, > const struct sbrec_logical_flow *, > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > index 7d247fd..6ee6119 100644 > --- a/controller/ovn-controller.c > +++ b/controller/ovn-controller.c > @@ -82,6 +82,7 @@ static unixctl_cb_func debug_pause_execution; > static unixctl_cb_func debug_resume_execution; > static unixctl_cb_func debug_status_execution; > static unixctl_cb_func lflow_cache_flush_cmd; > +static unixctl_cb_func lflow_cache_show_stats_cmd; > static unixctl_cb_func debug_delay_nb_cfg_report; > > #define DEFAULT_BRIDGE_NAME "br-int" > @@ -2662,6 +2663,9 @@ main(int argc, char *argv[]) > unixctl_command_register("lflow-cache/flush", "", 0, 0, > lflow_cache_flush_cmd, > _output_data->pd); > +unixctl_command_register("lflow-cache/show-stats", "", 0, 0, > + lflow_cache_show_stats_cmd, > + _output_data->pd); > > bool reset_ovnsb_idl_min_index = false; > unixctl_command_register("sb-cluster-state-reset", "", 0, 0, > @@ -3270,6 +3274,31 @@ lflow_cache_flush_cmd(struct unixctl_conn *conn > OVS_UNUSED, > } > > static void > +lflow_cache_show_stats_cmd(struct unixctl_conn *conn, int argc OVS_UNUSED, > + const char *argv[] OVS_UNUSED, void *arg_) > +{ > +struct flow_output_persistent_data *fo_pd = arg_; > +struct lflow_cache *lc = fo_pd->lflow_cache; > +struct ds ds = DS_EMPTY_INITIALIZER; > + > +ds_put_format(, "Enabled: %s\n", > + lflow_cache_is_enabled(lc) ? "true" : "false"); > + > +struct lflow_cache_stats *stats = lflow_cache_get_stats(lc); > +if (stats) { > +for (size_t i = 0; i < LCACHE_T_MAX; i++) { > +ds_put_format(, "%-16s: %"PRIuSIZE"\n", > + lflow_cache_type_names[i], > + stats->n_entries[i]); > +} > +} > +unixctl_command_reply(conn, ds_cstr()); > + > +ds_destroy(); > +free(stats); > +} > + Please document this command in ovn-controller.8.xml. Thanks Numan > +static void > cluster_state_reset_cmd(struct unixctl_conn *conn, int argc OVS_UNUSED, > const
Re: [ovs-dev] [PATCH ovn v2 03/10] lflow-cache: Move the lflow cache to its own module.
On Thu, Feb 4, 2021 at 6:56 PM Dumitru Ceara wrote: > > This abstracts the implementation details of the ovn-controller logical > flow cache and also has refactors how the cache is being used allowing > further commits to expand its functionality. > > Signed-off-by: Dumitru Ceara Hi Dumitru, Acked-by: Numan Siddique Please see below for a comment. Thanks Numan > --- > controller/automake.mk |2 > controller/lflow-cache.c| 230 + > controller/lflow-cache.h| 77 ++ > controller/lflow.c | 336 > ++- > controller/lflow.h |8 - > controller/ovn-controller.c | 57 +++ > lib/expr.c |4 + > 7 files changed, 447 insertions(+), 267 deletions(-) > create mode 100644 controller/lflow-cache.c > create mode 100644 controller/lflow-cache.h > > diff --git a/controller/automake.mk b/controller/automake.mk > index 480578e..75119a6 100644 > --- a/controller/automake.mk > +++ b/controller/automake.mk > @@ -14,6 +14,8 @@ controller_ovn_controller_SOURCES = \ > controller/ip-mcast.h \ > controller/lflow.c \ > controller/lflow.h \ > + controller/lflow-cache.c \ > + controller/lflow-cache.h \ > controller/lport.c \ > controller/lport.h \ > controller/ofctrl.c \ > diff --git a/controller/lflow-cache.c b/controller/lflow-cache.c > new file mode 100644 > index 000..ce129ac > --- /dev/null > +++ b/controller/lflow-cache.c > @@ -0,0 +1,230 @@ > +/* > + * Copyright (c) 2015, 2016 Nicira, Inc. > + * Copyright (c) 2021, Red Hat, Inc. > + * > + * Licensed under the Apache License, Version 2.0 (the "License"); > + * you may not use this file except in compliance with the License. > + * You may obtain a copy of the License at: > + * > + * http://www.apache.org/licenses/LICENSE-2.0 > + * > + * Unless required by applicable law or agreed to in writing, software > + * distributed under the License is distributed on an "AS IS" BASIS, > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > + * See the License for the specific language governing permissions and > + * limitations under the License. > + */ > + > +#include > + > +#include "lib/ovn-sb-idl.h" > +#include "lflow-cache.h" > +#include "ovn/expr.h" > + > +struct lflow_cache { > +struct hmap entries[LCACHE_T_MAX]; > +bool enabled; > +}; > + > +struct lflow_cache_entry { > +struct hmap_node node; > +struct uuid lflow_uuid; /* key */ > + > +struct lflow_cache_value value; > +}; > + > +static struct lflow_cache_value *lflow_cache_add__( > +struct lflow_cache *lc, > +const struct sbrec_logical_flow *lflow, > +enum lflow_cache_type type); > +static void lflow_cache_delete__(struct lflow_cache *lc, > + struct lflow_cache_entry *lce); > + > +struct lflow_cache * > +lflow_cache_create(void) > +{ > +struct lflow_cache *lc = xmalloc(sizeof *lc); > + > +for (size_t i = 0; i < LCACHE_T_MAX; i++) { > +hmap_init(>entries[i]); > +} > + > +lc->enabled = true; > +return lc; > +} > + > +void > +lflow_cache_flush(struct lflow_cache *lc) > +{ > +if (!lc) { > +return; > +} > + > +for (size_t i = 0; i < LCACHE_T_MAX; i++) { > +struct lflow_cache_entry *lce; > +struct lflow_cache_entry *lce_next; > + > +HMAP_FOR_EACH_SAFE (lce, lce_next, node, >entries[i]) { > +lflow_cache_delete__(lc, lce); > +} > +} > +} > + > +void > +lflow_cache_destroy(struct lflow_cache *lc) > +{ > +if (!lc) { > +return; > +} > + > +lflow_cache_flush(lc); > +for (size_t i = 0; i < LCACHE_T_MAX; i++) { > +hmap_destroy(>entries[i]); > +} > +free(lc); > +} > + > +void > +lflow_cache_enable(struct lflow_cache *lc, bool enabled) > +{ > +if (!lc) { > +return; > +} > + > +if (lc->enabled && !enabled) { > +lflow_cache_flush(lc); > +} > +lc->enabled = enabled; > +} > + > +bool > +lflow_cache_is_enabled(struct lflow_cache *lc) > +{ > +return lc && lc->enabled; > +} > + > +void > +lflow_cache_add_conj_id(struct lflow_cache *lc, > +const struct sbrec_logical_flow *lflow, > +uint32_t conj_id_ofs) 'struct sbrec_logical_flow' is used in the entire file for the lflow uuid. I'd suggest for all these functions which take 'sbrec_logical_flow' as an argument to instead take - 'struct uuid lflow_uuid'. Since we are planning to refactor the code base to separate the IDL access, it would help in that regard. Numan > +{ > +struct lflow_cache_value *lcv = > +lflow_cache_add__(lc, lflow, LCACHE_T_CONJ_ID); > + > +if (!lcv) { > +return; > +} > +lcv->conj_id_ofs = conj_id_ofs; > +} > + > +void > +lflow_cache_add_expr(struct lflow_cache *lc, > + const struct sbrec_logical_flow
Re: [ovs-dev] [PATCH 00/15] Netdev vxlan-decap offload
On 2/8/2021 3:11 PM, Sriharsha Basavapatna wrote: On Sun, Feb 7, 2021 at 4:58 PM Eli Britstein wrote: On 2/5/2021 8:26 PM, Sriharsha Basavapatna wrote: On Fri, Feb 5, 2021 at 4:55 PM Sriharsha Basavapatna wrote: On Wed, Jan 27, 2021 at 11:40 PM Eli Britstein wrote: VXLAN decap in OVS-DPDK configuration consists of two flows: F1: in_port(ens1f0),eth(),ipv4(),udp(), actions:tnl_pop(vxlan_sys_4789) F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0 F1 is a classification flow. It has outer headers matches and it classifies the packet as a VXLAN packet, and using tnl_pop action the packet continues processing in F2. F2 is a flow that has matches on tunnel metadata as well as on the inner packet headers (as any other flow). In order to fully offload VXLAN decap path, both F1 and F2 should be offloaded. As there are more than one flow in HW, it is possible that F1 is done by HW but F2 is not. Packet is received by SW, and should be processed starting from F2 as F1 was already done by HW. Rte_flows are applicable only on physical port IDs. Vport flows (e.g. F2) are applied on uplink ports attached to OVS. This patch-set makes use of [1] introduced in DPDK 20.11, that adds API for tunnel offloads. Travis: v1: https://travis-ci.org/github/elibritstein/OVS/builds/756418552 GitHub Actions: v1: https://github.com/elibritstein/OVS/actions/runs/515334647 [1] https://mails.dpdk.org/archives/dev/2020-October/187314.html Eli Britstein (13): netdev-offload: Add HW miss packet state recover API netdev-dpdk: Introduce DPDK tunnel APIs netdev-offload-dpdk: Implement flow dump create/destroy APIs netdev-dpdk: Add flow_api support for netdev vxlan vports netdev-offload-dpdk: Implement HW miss packet recover for vport dpif-netdev: Add HW miss packet state recover logic netdev-offload-dpdk: Change log rate limits netdev-offload-dpdk: Support tunnel pop action netdev-offload-dpdk: Refactor offload rule creation netdev-dpdk: Introduce an API to query if a dpdk port is an uplink port netdev-offload-dpdk: Map netdev and ufid to offload objects netdev-offload-dpdk: Support vports flows offload netdev-dpdk-offload: Add vxlan pattern matching function Ilya Maximets (2): netdev-offload: Allow offloading to netdev without ifindex. netdev-offload: Disallow offloading to unrelated tunneling vports. Documentation/howto/dpdk.rst | 1 + NEWS | 2 + lib/dpif-netdev.c | 49 +- lib/netdev-dpdk.c | 135 ++ lib/netdev-dpdk.h | 104 - lib/netdev-offload-dpdk.c | 851 +- lib/netdev-offload-provider.h | 5 + lib/netdev-offload-tc.c | 8 + lib/netdev-offload.c | 29 +- lib/netdev-offload.h | 1 + 10 files changed, 1033 insertions(+), 152 deletions(-) -- 2.28.0.546.g385c171 Hi Eli, Thanks for posting this new patchset to support tunnel decap action offload. I haven't looked at the entire patchset yet. But I focused on the patches that introduce 1-to-many mapping between an OVS flow (f2) and HW offloaded flows. Here is a representation of the design proposed in this patchset. A flow f2 (inner flow) between the VxLAN-vPort and VFRep-1, for which the underlying uplink/physical port is P0, gets offloaded to not only P0, but also to other physical ports P1, P2... and so on. P0 <> VxLAN-vPort <> VFRep-1 P1 P2 ... Pn IMO, the problems with this design are: - Offloading a flow to an unrelated physical device that has nothing to do with that flow (invalid device for the flow). - Offloading to not just one, but several such invalid physical devices. - Consuming HW resources for a flow that is never seen or intended to be processed by those physical devices. - Impacts flow scale on other physical devices, since it would consume their HW resources with a large number of such invalid flows. - The indirect list used to track these multiple mappings complicates the offload layer implementation. - The addition of flow_dump_create() to offload APIs, just to parse and get a list of user datapath netdevs is confusing and not needed. I have been exploring an alternate design to address this problem of figuring out the right physical device for a given tunnel inner-flow. I will send a patch, please take a look so we can continue the discussion. I just posted this patch, please see the link below; this is currently based on the decap offload patchset (but it can be rebased if needed, without the decap patchset too). The patch provides changes to pass physical port (orig_in_port) information to the offload layer in the context of flow F2. Note that additional changes would be needed in the decap patchset to utilize this, if we agree on this approach. https://patchwork.ozlabs.org/project/openvswitch/patch/20210205180413.43337-1-sriharsha.basavapa...@broadcom.com/
Re: [ovs-dev] [PATCH 00/15] Netdev vxlan-decap offload
On Sun, Feb 7, 2021 at 4:58 PM Eli Britstein wrote: > > > On 2/5/2021 8:26 PM, Sriharsha Basavapatna wrote: > > On Fri, Feb 5, 2021 at 4:55 PM Sriharsha Basavapatna > > wrote: > >> On Wed, Jan 27, 2021 at 11:40 PM Eli Britstein wrote: > >>> VXLAN decap in OVS-DPDK configuration consists of two flows: > >>> F1: in_port(ens1f0),eth(),ipv4(),udp(), actions:tnl_pop(vxlan_sys_4789) > >>> F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0 > >>> > >>> F1 is a classification flow. It has outer headers matches and it > >>> classifies the packet as a VXLAN packet, and using tnl_pop action the > >>> packet continues processing in F2. > >>> F2 is a flow that has matches on tunnel metadata as well as on the inner > >>> packet headers (as any other flow). > >>> > >>> In order to fully offload VXLAN decap path, both F1 and F2 should be > >>> offloaded. As there are more than one flow in HW, it is possible that > >>> F1 is done by HW but F2 is not. Packet is received by SW, and should be > >>> processed starting from F2 as F1 was already done by HW. > >>> Rte_flows are applicable only on physical port IDs. Vport flows (e.g. F2) > >>> are applied on uplink ports attached to OVS. > >>> > >>> This patch-set makes use of [1] introduced in DPDK 20.11, that adds API > >>> for tunnel offloads. > >>> > >>> Travis: > >>> v1: https://travis-ci.org/github/elibritstein/OVS/builds/756418552 > >>> > >>> GitHub Actions: > >>> v1: https://github.com/elibritstein/OVS/actions/runs/515334647 > >>> > >>> [1] https://mails.dpdk.org/archives/dev/2020-October/187314.html > >>> > >>> Eli Britstein (13): > >>>netdev-offload: Add HW miss packet state recover API > >>>netdev-dpdk: Introduce DPDK tunnel APIs > >>>netdev-offload-dpdk: Implement flow dump create/destroy APIs > >>>netdev-dpdk: Add flow_api support for netdev vxlan vports > >>>netdev-offload-dpdk: Implement HW miss packet recover for vport > >>>dpif-netdev: Add HW miss packet state recover logic > >>>netdev-offload-dpdk: Change log rate limits > >>>netdev-offload-dpdk: Support tunnel pop action > >>>netdev-offload-dpdk: Refactor offload rule creation > >>>netdev-dpdk: Introduce an API to query if a dpdk port is an uplink > >>> port > >>>netdev-offload-dpdk: Map netdev and ufid to offload objects > >>>netdev-offload-dpdk: Support vports flows offload > >>>netdev-dpdk-offload: Add vxlan pattern matching function > >>> > >>> Ilya Maximets (2): > >>>netdev-offload: Allow offloading to netdev without ifindex. > >>>netdev-offload: Disallow offloading to unrelated tunneling vports. > >>> > >>> Documentation/howto/dpdk.rst | 1 + > >>> NEWS | 2 + > >>> lib/dpif-netdev.c | 49 +- > >>> lib/netdev-dpdk.c | 135 ++ > >>> lib/netdev-dpdk.h | 104 - > >>> lib/netdev-offload-dpdk.c | 851 +- > >>> lib/netdev-offload-provider.h | 5 + > >>> lib/netdev-offload-tc.c | 8 + > >>> lib/netdev-offload.c | 29 +- > >>> lib/netdev-offload.h | 1 + > >>> 10 files changed, 1033 insertions(+), 152 deletions(-) > >>> > >>> -- > >>> 2.28.0.546.g385c171 > >>> > >> Hi Eli, > >> > >> Thanks for posting this new patchset to support tunnel decap action > >> offload. > >> > >> I haven't looked at the entire patchset yet. But I focused on the > >> patches that introduce 1-to-many mapping between an OVS flow (f2) and > >> HW offloaded flows. > >> > >> Here is a representation of the design proposed in this patchset. A > >> flow f2 (inner flow) between the VxLAN-vPort and VFRep-1, for which > >> the underlying uplink/physical port is P0, gets offloaded to not only > >> P0, but also to other physical ports P1, P2... and so on. > >> > >> P0 <> VxLAN-vPort <> VFRep-1 > >> > >> P1 > >> P2 > >> ... > >> Pn > >> > >> IMO, the problems with this design are: > >> > >> - Offloading a flow to an unrelated physical device that has nothing > >> to do with that flow (invalid device for the flow). > >> - Offloading to not just one, but several such invalid physical devices. > >> - Consuming HW resources for a flow that is never seen or intended to > >> be processed by those physical devices. > >> - Impacts flow scale on other physical devices, since it would consume > >> their HW resources with a large number of such invalid flows. > >> - The indirect list used to track these multiple mappings complicates > >> the offload layer implementation. > >> - The addition of flow_dump_create() to offload APIs, just to parse > >> and get a list of user datapath netdevs is confusing and not needed. > >> > >> I have been exploring an alternate design to address this problem of > >> figuring out the right physical device for a given tunnel inner-flow. > >> I will send a patch, please take a look so we can continue the discussion. > > I just posted this patch, please see the link
Re: [ovs-dev] [PATCH] netdev-afxdp: Add start qid support.
On 2/7/21 5:05 PM, Toshiaki Makita wrote: > On 2021/02/07 2:00, William Tu wrote: >> On Fri, Feb 5, 2021 at 1:08 PM Gregory Rose wrote: >>> On 2/4/2021 7:08 PM, William Tu wrote: On Thu, Feb 4, 2021 at 3:17 PM Gregory Rose wrote: > On 2/3/2021 1:21 PM, William Tu wrote: >> Mellanox card has different XSK design. It requires users to create >> dedicated queues for XSK. Unlike Intel's NIC which loads XDP program >> to all queues, Mellanox only loads XDP program to a subset of its queue. >> >> When OVS uses AF_XDP with mlx5, it doesn't replace the existing RX and TX >> queues in the channel with XSK RX and XSK TX queues, but it creates an >> additional pair of queues for XSK in that channel. To distinguish >> regular and XSK queues, mlx5 uses a different range of qids. >> That means, if the card has 24 queues, queues 0..11 correspond to >> regular queues, and queues 12..23 are XSK queues. >> In this case, we should attach the netdev-afxdp with 'start-qid=12'. >> >> I tested using Mellanox Connect-X 6Dx, by setting 'start-qid=1', and: >> $ ethtool -L enp2s0f0np0 combined 1 >> # queue 0 is for non-XDP traffic, queue 1 is for XSK >> $ ethtool -N enp2s0f0np0 flow-type udp4 action 1 >> note: we need additionally add flow-redirect rule to queue 1 > > Seems awfully hardware dependent. Is this just for Mellanox or does > it have general usefulness? > It is just Mellanox's design which requires pre-configure the flow-director. I only have cards from Intel and Mellanox so I don't know about other vendors. Thanks, William >>> >>> I think we need to abstract the HW layer a little bit. This start-qid >>> option is specific to a single piece of HW, at least at this point. >>> We should expect that further HW specific requirements for >>> different NIC vendors will come up in the future. I suggest >>> adding a hw_options:mellanox:start-qid type hierarchy so that >>> as new HW requirements come up we can easily scale. It will >>> also make adding new vendors easier in the future. >>> >>> Even with NIC vendors you can't always count on each new generation >>> design to always keep old requirements and methods for feature >>> enablement. >>> >>> What do you think? >>> >> Thanks for the feedback. >> So far I don't know whether other vendors will need this option or not. > > FWIU, this api "The lower half of the available amount of RX queues are > regular queues, and the upper half are XSK RX queues." is the result of long > discussion to support dedicated/isolated XSK rings, which is not meant for a > mellanox-specific feature. > > https://patchwork.ozlabs.org/project/netdev/cover/20190524093431.20887-1-maxi...@mellanox.com/ > https://patchwork.ozlabs.org/project/netdev/cover/20190612155605.22450-1-maxi...@mellanox.com/ > > Toshiaki Makita Thanks for the links. Very helpful. From what I understand lower half of queues should still work, i.e. it should still be possible to attach AF_XDP socket to them. But they will not work in zero-copy mode ("generic" only?). William, could you check that? Does it work and with which mode "best-effort" ends up with? And what kind of errors libbpf returns if we're trying to enable zero-copy? There are still few unanswered questions in those discussions and a clear lack of documentation. It seems that it's a work-in-progress, intermediate step towards some better user API. However it's unclear how this API will look like and when it will be implemented. For BPF list and maintainers: Is it possible to have some of this behavior documented? How can application determine which netdevs are using this upper/bottom half schema for their regular and xsk channels/queues? How users should do that without digging into the kernel code or spending few hours googling for presentations from some conferences? And I actually failed to find any written reference to the fact that I have to manually configure redirection of the traffic in order to receive it on XSK queues/channels and not on regular ones. This is very confusing and hard to understand, especially for a regular OVS user who is not familiar with XDP and kernel internals and just wants to utilize faster userspace networking. > >> I think adding another "hw_options" is a little confusing because this >> is already an option on the device. >> Looking at AF_XDP driver at DPDK, it also has similar option: >> see start_queue >> https://doc.dpdk.org/guides/nics/af_xdp.html This option for DPDK mainly positioned as a way to utilize multi-queue and a way to create different DPDK ports from different ranges of queues of the same device, so it's not exactly the same thing. But I see that it likely had a double purpose that wasn't mentioned in a commit message or documentation. I think, for now we can have this option in OVS in a same way, i.e. as a way to partition the device or
Re: [ovs-dev] [PATCH ovn v2 01/10] lflow: Fix cache update when I-P engine aborts.
On 2/8/21 12:01 PM, Numan Siddique wrote: On Thu, Feb 4, 2021 at 6:55 PM Dumitru Ceara wrote: If the I-P engine aborts in the middle of a run, the 'flow_output' node change handlers or run callback might not be called. As a side effect this may cause that Logical_Flow IDL tracked changes are not processed during the iteration. As a consequence, if a Logical_Flow was removed from the Southound DB, then its associated cache entry (if any) will not be flushed. This has two side effects: 1. Stale entries are kept in the cache until a full recompute or cache flush happens. 2. If a new Logical_Flow is added to the Southbound and it happens to have a UUID that matches one of a stale cache entry then ovn-controller will install incorrect openflows. IDL tracked changes are cleared at every iteration of ovn-controller. Skipping the ovsdb_idl_track_clear() call if the I-P engine aborted is not a valid option for now because it might lead to some of the IDL changes to be processed twice. Instead, lflow_handle_cached_flows() is called now at every iteration of ovn-controller making sure deleted flows are removed from the cache. Also, rename the 'flush-lflow-cache' unixctl command to 'lflow-cache/flush' to better match the style of other command names. Hi Dumitru, Hi Numan, The patch LGTM. 2 comments. 1. I think it's better to keep the command - flush-lflow-cache just so that we don't break any existing users. Lets say If CMS has scripted to run this command once a while, then it will break after this patch. So I would suggest that to have both the commands which do the same. We could add "deprecated" to the description of flush-lflow-cache when the user runs - ovn-appctl -t ovn-controller list-commands. Makes sense, I'll do that in v3. 2. Can you please document the new command - lflow-cache/flush in ovn-controller.8.xml. Sure. My bad that I missed adding documentation to flush-lflow-cache. No problem. Thanks Numan Thanks, Dumitru ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v2 02/10] lflow: Refactor convert_match_to_expr() to explicitly consume prereqs.
On Thu, Feb 4, 2021 at 6:55 PM Dumitru Ceara wrote: > > It was (and still is) the responsibility of the caller of > convert_match_to_expr() to explicitly free any 'prereqs' expression that > was passed as argument if the expression parsing of the 'lflow' match failed. > > However, convert_match_to_expr() now updates the value of '*prereqs' setting > it to NULL in the successful case. This makes it easier for callers of the > function because 'expr_destroy(prereqs)' can now be called unconditionally. > > Signed-off-by: Dumitru Ceara Acked-by: Numan Siddique Numan > --- > Note: This patch doesn't change the callers of convert_match_to_expr() to > take advantage of the new semantic but following patches do. > --- > controller/lflow.c | 14 -- > 1 file changed, 8 insertions(+), 6 deletions(-) > > diff --git a/controller/lflow.c b/controller/lflow.c > index 04ba0d1..495653f 100644 > --- a/controller/lflow.c > +++ b/controller/lflow.c > @@ -758,11 +758,12 @@ add_matches_to_flow_table(const struct > sbrec_logical_flow *lflow, > /* Converts the match and returns the simplified expr tree. > * > * The caller should evaluate the conditions and normalize the expr tree. > + * If parsing is successful, '*prereqs' is also consumed. > */ > static struct expr * > convert_match_to_expr(const struct sbrec_logical_flow *lflow, >const struct sbrec_datapath_binding *dp, > - struct expr *prereqs, > + struct expr **prereqs, >const struct shash *addr_sets, >const struct shash *port_groups, >struct lflow_resource_ref *lfrr, > @@ -795,8 +796,9 @@ convert_match_to_expr(const struct sbrec_logical_flow > *lflow, > sset_destroy(_groups_ref); > > if (!error) { > -if (prereqs) { > -e = expr_combine(EXPR_T_AND, e, prereqs); > +if (*prereqs) { > +e = expr_combine(EXPR_T_AND, e, *prereqs); > +*prereqs = NULL; > } > e = expr_annotate(e, , ); > } > @@ -854,7 +856,7 @@ consider_logical_flow__(const struct sbrec_logical_flow > *lflow, > .n_tables = LOG_PIPELINE_LEN, > .cur_ltable = lflow->table_id, > }; > -struct expr *prereqs; > +struct expr *prereqs = NULL; > char *error; > > error = ovnacts_parse_string(lflow->actions, , , ); > @@ -885,7 +887,7 @@ consider_logical_flow__(const struct sbrec_logical_flow > *lflow, > struct expr *expr = NULL; > if (!l_ctx_out->lflow_cache_map) { > /* Caching is disabled. */ > -expr = convert_match_to_expr(lflow, dp, prereqs, l_ctx_in->addr_sets, > +expr = convert_match_to_expr(lflow, dp, , > l_ctx_in->addr_sets, > l_ctx_in->port_groups, l_ctx_out->lfrr, > NULL); > if (!expr) { > @@ -949,7 +951,7 @@ consider_logical_flow__(const struct sbrec_logical_flow > *lflow, > > bool pg_addr_set_ref = false; > if (!expr) { > -expr = convert_match_to_expr(lflow, dp, prereqs, l_ctx_in->addr_sets, > +expr = convert_match_to_expr(lflow, dp, , > l_ctx_in->addr_sets, > l_ctx_in->port_groups, l_ctx_out->lfrr, > _addr_set_ref); > if (!expr) { > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v2 01/10] lflow: Fix cache update when I-P engine aborts.
On Thu, Feb 4, 2021 at 6:55 PM Dumitru Ceara wrote: > > If the I-P engine aborts in the middle of a run, the 'flow_output' node > change handlers or run callback might not be called. > > As a side effect this may cause that Logical_Flow IDL tracked changes > are not processed during the iteration. As a consequence, if a > Logical_Flow was removed from the Southound DB, then its associated > cache entry (if any) will not be flushed. > > This has two side effects: > 1. Stale entries are kept in the cache until a full recompute or cache >flush happens. > 2. If a new Logical_Flow is added to the Southbound and it happens to >have a UUID that matches one of a stale cache entry then >ovn-controller will install incorrect openflows. > > IDL tracked changes are cleared at every iteration of ovn-controller. > Skipping the ovsdb_idl_track_clear() call if the I-P engine aborted is > not a valid option for now because it might lead to some of the IDL > changes to be processed twice. > > Instead, lflow_handle_cached_flows() is called now at every iteration > of ovn-controller making sure deleted flows are removed from the cache. > > Also, rename the 'flush-lflow-cache' unixctl command to > 'lflow-cache/flush' to better match the style of other command names. Hi Dumitru, The patch LGTM. 2 comments. 1. I think it's better to keep the command - flush-lflow-cache just so that we don't break any existing users. Lets say If CMS has scripted to run this command once a while, then it will break after this patch. So I would suggest that to have both the commands which do the same. We could add "deprecated" to the description of flush-lflow-cache when the user runs - ovn-appctl -t ovn-controller list-commands. 2. Can you please document the new command - lflow-cache/flush in ovn-controller.8.xml. My bad that I missed adding documentation to flush-lflow-cache. Thanks Numan > > Fixes: 2662498bfd13 ("ovn-controller: Persist the conjunction ids allocated > for conjuctive matches.") > Signed-off-by: Dumitru Ceara > --- > controller/lflow.c | 30 +- > controller/lflow.h |4 +++- > controller/ovn-controller.c | 20 > tests/ovn.at|2 +- > 4 files changed, 37 insertions(+), 19 deletions(-) > > diff --git a/controller/lflow.c b/controller/lflow.c > index 946c1e0..04ba0d1 100644 > --- a/controller/lflow.c > +++ b/controller/lflow.c > @@ -1425,19 +1425,6 @@ lflow_run(struct lflow_ctx_in *l_ctx_in, struct > lflow_ctx_out *l_ctx_out) > { > COVERAGE_INC(lflow_run); > > -/* when lflow_run is called, it's possible that some of the logical flows > - * are deleted. We need to delete the lflow cache for these > - * lflows (if present), otherwise, they will not be deleted at all. */ > -if (l_ctx_out->lflow_cache_map) { > -const struct sbrec_logical_flow *lflow; > -SBREC_LOGICAL_FLOW_TABLE_FOR_EACH_TRACKED (lflow, > - > l_ctx_in->logical_flow_table) { > -if (sbrec_logical_flow_is_deleted(lflow)) { > -lflow_cache_delete(l_ctx_out->lflow_cache_map, lflow); > -} > -} > -} > - > add_logical_flows(l_ctx_in, l_ctx_out); > add_neighbor_flows(l_ctx_in->sbrec_port_binding_by_name, > l_ctx_in->mac_binding_table, > l_ctx_in->local_datapaths, > @@ -1446,6 +1433,23 @@ lflow_run(struct lflow_ctx_in *l_ctx_in, struct > lflow_ctx_out *l_ctx_out) > l_ctx_out->flow_table); > } > > +/* Should be called at every ovn-controller iteration before IDL tracked > + * changes are cleared to avoid maintaining cache entries for flows that > + * don't exist anymore. > + */ > +void > +lflow_handle_cached_flows(struct hmap *lflow_cache_map, > + const struct sbrec_logical_flow_table *flow_table) > +{ > +const struct sbrec_logical_flow *lflow; > + > +SBREC_LOGICAL_FLOW_TABLE_FOR_EACH_TRACKED (lflow, flow_table) { > +if (sbrec_logical_flow_is_deleted(lflow)) { > +lflow_cache_delete(lflow_cache_map, lflow); > +} > +} > +} > + > void > lflow_destroy(void) > { > diff --git a/controller/lflow.h b/controller/lflow.h > index ba79cc3..cf4f0e8 100644 > --- a/controller/lflow.h > +++ b/controller/lflow.h > @@ -156,7 +156,9 @@ struct lflow_ctx_out { > }; > > void lflow_init(void); > -void lflow_run(struct lflow_ctx_in *, struct lflow_ctx_out *); > +void lflow_run(struct lflow_ctx_in *, struct lflow_ctx_out *); > +void lflow_handle_cached_flows(struct hmap *lflow_cache, > + const struct sbrec_logical_flow_table *); > bool lflow_handle_changed_flows(struct lflow_ctx_in *, struct lflow_ctx_out > *); > bool lflow_handle_changed_ref(enum ref_type, const char *ref_name, >struct lflow_ctx_in *, struct lflow_ctx_out *, > diff --git
[ovs-dev] [PATCH ovn] northd: improve OVN BFD documentation
Signed-off-by: Lorenzo Bianconi --- ovn-nb.xml | 23 +++ 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/ovn-nb.xml b/ovn-nb.xml index 58083f101..76c3a3d86 100644 --- a/ovn-nb.xml +++ b/ovn-nb.xml @@ -129,10 +129,11 @@ options are described individually below. - + These options apply when ovn-controller configures - BFD on tunnels interfaces. + OVS BFD on tunnels interfaces. Please note these parameters refer + to legacy OVS BFD implementation and not to OVN BFD one. @@ -2281,11 +2282,10 @@ When more than one gateway chassis is specified, OVN only uses -one at a time. OVN uses BFD to monitor gateway connectivity, -preferring the highest-priority gateway that is online. -Priorities are specified in the priority column -of or . +one at a time. OVN can rely on OVS BFD implementation to monitor +gateway connectivity, preferring the highest-priority gateway +that is online. Priorities are specified in the priority +column of or . @@ -3759,7 +3759,14 @@ - Contains BFD parameter for ovn-controller bfd configuration. + Contains BFD parameter for ovn-controller BFD configuration. + OVN BFD implementation is used to provide detection of failures in the + path between adjacent forwarding engines, including the OVN interfaces. + OVN BFD provides link status info to OVN northd in order to update + logical flows according to the status of BFD endpoints. In the current + implementation OVN BFD is used to check next-hop status for ECMP routes. + Please note BFD table refers to OVN BFD implementation and not to OVS + legacy one. -- 2.29.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] tests: Fix Port_Binding up test.
Thanks, Gongming On Feb 8, 2021, at 6:01 PM, Numan Siddique wrote: On Mon, Feb 8, 2021 at 2:40 PM Dumitru Ceara wrote: > > On 2/7/21 3:52 AM, gmingchen(陈供明) wrote: > > From: Gongming Chen > > > > After setting the iface-id, immediately check the up status of the port > > binding, it will occasionally fail, especially when the port binding > > status is reported later. > > > > When it fails, the following will be output: > > Checking values in sb Port_Binding with logical_port=lsp1 against false... found false > > ovs-vsctl add-port br-int lsp1 -- set Interface lsp1 external-ids:iface-id=lsp1 > > ./ovn-macros.at:307: "$@" > > Checking values in sb Port_Binding with logical_port=lsp1 against true... found false > > _uuid : 15ebabb6-3dbb-4806-aa85-d1c03e3b39f6 > > logical_port: lsp1 > > up : true > > ./ovn-macros.at:393: hard failure > > > > Fixes: 4d3cb42b076b ("binding: Set Logical_Switch_Port.up when all OVS flows are installed.") > > Signed-off-by: Gongming Chen > > --- > > Thanks for the fix! > > Acked-by: Dumitru Ceara Thanks for the fix. I applied this patch to master. I also added your name to the AUTHORS list. Numan > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] lflow: Use learn() action to generate LB hairpin reply flows.
On 2/8/21 11:09 AM, Numan Siddique wrote: On Sat, Feb 6, 2021 at 4:00 AM Dumitru Ceara wrote: The main trait of load balancer hairpin traffic is that it never leaves the local hypervisor. Essentially this means that only hairpin openflows installed for logical switches that have at least one logical switch port bound locally can ever be hit. Until now, if a load balancer was applied on multiple logical switches that are connected through a distributed router, ovn-controller would install flows to detect hairpin replies for all logical switches. In practice this leads to a very high number of openflows out of which most will never be used. Instead we now use an additional action, learn(), on flows that match on packets that create the hairpin session. The learn() action will then generate the necessary flows to handle hairpin replies, but only for the local datapaths which actually generate hairpin traffic. For example, simulating how ovn-k8s uses load balancer for services, in a "switch per node" scenario, the script below would generate 10K (n_nodes * n_vips * n_backends) openflows on every node in table=69 (hairpin reply). With this patch the maximum number of openflows that can be created for hairpin replies is 200 (n_vips * n_backends). In general, for deployments that leverage switch-per-node topologies, the number of openflows is reduced by a factor of N, where N is the number of nodes. $ cat lbs.sh NODES=50 VIPS=20 BACKENDS=10 ovn-nbctl lr-add rtr for ((i = 1; i <= $NODES; i++)); do ovn-nbctl \ -- ls-add ls$i \ -- lsp-add ls$i vm$i \ -- lsp-add ls$i ls$i-rtr \ -- lsp-set-type ls$i-rtr router \ -- lsp-set-options ls$i-rtr router-port=rtr-ls$i \ -- lrp-add rtr rtr-ls$i 00:00:00:00:01:00 42.42.42.$i/24 done for ((i = 1; i <= $VIPS; i++)); do lb=lb$i vip=10.10.10.$i:1 bip=20.20.20.1:2 for ((j = 2; j <= $BACKENDS; j++)); do bip="$bip,20.20.20.$j:2" done ovn-nbctl lb-add $lb $vip $backends done for ((i = 1; i <= $NODES; i++)); do for ((j = 1; j <= $VIPS; j++)); do ovn-nbctl ls-lb-add ls$i lb$j done done ovs-vsctl add-port br-int vm1 \ -- set interface vm1 type=internal \ -- set interface vm1 external-ids:iface-id=vm1 Suggested-by: Ilya Maximets Signed-off-by: Dumitru Ceara Hi Dumitru, Hi Numan, Thanks for this patch. The patch LGTM. I tested it out with huge NB DB with lots of load balancer and backends. I see a significant reduction in OF flows. Thanks for the review! Please see the few comments below. Thanks Numan --- controller/lflow.c | 204 --- tests/ofproto-macros.at | 5 +- tests/ovn.at| 516 +--- 3 files changed, 455 insertions(+), 270 deletions(-) diff --git a/controller/lflow.c b/controller/lflow.c index 946c1e0..2b7d356 100644 --- a/controller/lflow.c +++ b/controller/lflow.c @@ -1171,6 +1171,178 @@ add_neighbor_flows(struct ovsdb_idl_index *sbrec_port_binding_by_name, } } +/* Builds the "learn()" action to be triggered by packets initiating a + * hairpin session. + * + * This will generate flows in table OFTABLE_CHK_LB_HAIRPIN_REPLY of the form: + * - match: + * metadata=,ip/ipv6,ip.src=,ip.dst= + * nw_proto='lb_proto',tp_src_port= + * - action: + * set MLF_LOOKUP_LB_HAIRPIN_BIT=1 + */ +static void +add_lb_vip_hairpin_reply_action(struct in6_addr *vip6, ovs_be32 vip, +uint8_t lb_proto, bool has_l4_port, +uint64_t cookie, struct ofpbuf *ofpacts) +{ +struct match match = MATCH_CATCHALL_INITIALIZER; +struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts); +struct ofpact_learn_spec *ol_spec; +unsigned int imm_bytes; +uint8_t *src_imm; + +/* Once learned, hairpin reply flows are permanent until the VIP/backend + * is removed. + */ +ol->flags = NX_LEARN_F_DELETE_LEARNED; +ol->idle_timeout = OFP_FLOW_PERMANENT; +ol->hard_timeout = OFP_FLOW_PERMANENT; +ol->priority = OFP_DEFAULT_PRIORITY; +ol->table_id = OFTABLE_CHK_LB_HAIRPIN_REPLY; +ol->cookie = htonll(cookie); + +/* Match on metadata of the packet that created the hairpin session. */ +ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); + +ol_spec->dst.field = mf_from_id(MFF_METADATA); +ol_spec->dst.ofs = 0; +ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; +ol_spec->n_bits = ol_spec->dst.n_bits; +ol_spec->dst_type = NX_LEARN_DST_MATCH; +ol_spec->src_type = NX_LEARN_SRC_FIELD; +ol_spec->src.field = mf_from_id(MFF_METADATA); + +/* Match on the same ETH type as the packet that created the hairpin + * session. + */ +ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); +ol_spec->dst.field = mf_from_id(MFF_ETH_TYPE); +ol_spec->dst.ofs =
Re: [ovs-dev] [PATCH ovn] expr: Combine multiple ipv4 with wildcard mask.
On 2/7/21 1:25 PM, gmingchen(陈供明) wrote: From: Gongming Chen Hi Gongming, First of all, thanks for the contribution! This is not a full review, just some comments for now. It seems that there's a memory leak with your patch applied. AddressSanitizer reports: Direct leak of 12 byte(s) in 1 object(s) allocated from: #0 0x49644d in malloc (/home/runner/work/ovn/ovn/ovn-20.12.90/_build/sub/tests/ovstest+0x49644d) #1 0x538604 in xmalloc /home/runner/work/ovn/ovn/ovs_src/lib/util.c:138:15 #2 0x62636f in expr_const_sets_add /home/runner/work/ovn/ovn/ovn-20.12.90/_build/sub/../../lib/expr.c:1237:18 #3 0x4cf8f6 in create_addr_sets /home/runner/work/ovn/ovn/ovn-20.12.90/_build/sub/../../tests/test-ovn.c:230:5 #4 0x4cf8f6 in test_parse_expr__ /home/runner/work/ovn/ovn/ovn-20.12.90/_build/sub/../../tests/test-ovn.c:296:5 #5 0x4dfd04 in ovs_cmdl_run_command__ /home/runner/work/ovn/ovn/ovs_src/lib/command-line.c:247:17 #6 0x4c6810 in test_ovn_main /home/runner/work/ovn/ovn/ovn-20.12.90/_build/sub/../../tests/test-ovn.c:1635:5 #7 0x4c6810 in ovstest_wrapper_test_ovn_main__ /home/runner/work/ovn/ovn/ovn-20.12.90/_build/sub/../../tests/test-ovn.c:1638:1 #8 0x4dfd04 in ovs_cmdl_run_command__ /home/runner/work/ovn/ovn/ovs_src/lib/command-line.c:247:17 #9 0x4c5fe3 in main /home/runner/work/ovn/ovn/ovn-20.12.90/_build/sub/../../tests/ovstest.c:150:9 #10 0x7f71aa6ddbf6 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21bf6) Full reports are available in the ovsrobot OVN github CI run artifacts: https://github.com/ovsrobot/ovn/actions/runs/545416492 Just a note, if you push the branch to your own fork it will trigger the github action to run CI and the "linux clang test asan" job will also enable AddressSanitizer. There are also a few style related issues (e.g., sizeof args), please see the guidelines here: https://github.com/ovn-org/ovn/blob/master/Documentation/internals/contributing/coding-style.rst In the ovn security group, each host ip corresponds to at least 4 flow tables (different connection states). As the scale of hosts using the security group increases, the ovs security group flow table will increase sharply, especially when it is applied the remote group feature in OpenStack. This patch merges ipv4 addresses with wildcard masks, and replaces this ipv4 addresses with the merged ip/mask. This will greatly reduce the entries in the ovs security group flow table, especially when the host size is large. After being used in a production environment, the number of ovs flow tables will be reduced by at least 50% in most scenarios, when the remote group in OpenStack is applied. I think it would be great to describe the algorithm here, in the commit log, but also in the comments in the code. Analysis in the simplest scenario, a network 1.1.1.0/24 network, enable the OpenStack security group remote group feature, create 253 virtual machine ports(1.1.1.2-1.1.1.254). Only focus on the number of ip addresses, in the table=44 table: ./configure --disable-combine-ipv4: 1.1.1.2-1.1.1.254(253 flow meters) * 4(connection status) * 1(local net of localport) = 1012 ./configure --enable-combine-ipv4(default): 1.1.1.2/31 1.1.1.4/30 1.1.1.8/29 1.1.1.16/28 1.1.1.32/27 1.1.1.64/26 1.1.1.128/26 1.1.1.192/27 1.1.1.224/28 1.1.1.240/29 1.1.1.248/30 1.1.1.252/31 1.1.1.254 13 flow tables * 4(connection status) * 1(local net of localport) = 52 Reduced from 1012 flow meters to 52, a 19.4 times reduction. Some scenes are similar to the following: 1.1.1.2, 1.1.1.6 After the combine: 1.1.1.2/255.255.255.251 This will slightly increase the difficulty of finding the flow table corresponding to a single address. such as: ovs-ofctl dump-flows br-int | grep 1.1.1.6 The result is empty. 1.1.1.6 will match 1.1.1.2/255.255.255.251 Would it make sense and potentially make the life of the users easier by also adding a utility to automatically do the IP/MASK match? I'm thinking of something like: $ ovs-ofctl dump-flows br-int | ovn-match-ip 1.1.1.6 [..] I didn't look at the implementation in detail but I'm surprised by the following: - change test-ovn.c to define $set1 = {1.1.1.2, .. 1.1.1.254}, basically the example you gave above. - then run: $ echo 'ip4.src == $set1' | ./tests/ovstest test-ovn expr-to-flows ip,nw_src=1.1.1.2/31 ip,nw_src=1.1.1.4 $ Shouldn't this output actually match the 13 entries in your sample output? Signed-off-by: Gongming Chen --- configure.ac | 1 + lib/expr.c | 217 +++ m4/ovn.m4| 21 tests/atlocal.in | 1 + tests/ovn.at | 286 +-- 5 files changed, 443 insertions(+), 83 deletions(-) diff --git a/configure.ac b/configure.ac index b2d084318..6dc51a5cc 100644 --- a/configure.ac +++ b/configure.ac @@ -131,6 +131,7 @@ OVS_LIBTOOL_VERSIONS OVS_CHECK_CXX AX_FUNC_POSIX_MEMALIGN OVN_CHECK_UNBOUND
Re: [ovs-dev] [PATCH ovn] lflow: Use learn() action to generate LB hairpin reply flows.
On Sat, Feb 6, 2021 at 4:00 AM Dumitru Ceara wrote: > > The main trait of load balancer hairpin traffic is that it never leaves > the local hypervisor. Essentially this means that only hairpin > openflows installed for logical switches that have at least one logical > switch port bound locally can ever be hit. > > Until now, if a load balancer was applied on multiple logical switches > that are connected through a distributed router, ovn-controller would > install flows to detect hairpin replies for all logical switches. In > practice this leads to a very high number of openflows out of which > most will never be used. > > Instead we now use an additional action, learn(), on flows that match on > packets that create the hairpin session. The learn() action will then > generate the necessary flows to handle hairpin replies, but only for > the local datapaths which actually generate hairpin traffic. > > For example, simulating how ovn-k8s uses load balancer for services, > in a "switch per node" scenario, the script below would generate > 10K (n_nodes * n_vips * n_backends) openflows on every node in table=69 > (hairpin reply). With this patch the maximum number of openflows that > can be created for hairpin replies is 200 (n_vips * n_backends). > > In general, for deployments that leverage switch-per-node topologies, > the number of openflows is reduced by a factor of N, where N is the > number of nodes. > > $ cat lbs.sh > NODES=50 > VIPS=20 > BACKENDS=10 > ovn-nbctl lr-add rtr > for ((i = 1; i <= $NODES; i++)); do > ovn-nbctl \ > -- ls-add ls$i \ > -- lsp-add ls$i vm$i \ > -- lsp-add ls$i ls$i-rtr \ > -- lsp-set-type ls$i-rtr router \ > -- lsp-set-options ls$i-rtr router-port=rtr-ls$i \ > -- lrp-add rtr rtr-ls$i 00:00:00:00:01:00 42.42.42.$i/24 > done > > for ((i = 1; i <= $VIPS; i++)); do > lb=lb$i > vip=10.10.10.$i:1 > bip=20.20.20.1:2 > for ((j = 2; j <= $BACKENDS; j++)); do > bip="$bip,20.20.20.$j:2" > done > ovn-nbctl lb-add $lb $vip $backends > done > > for ((i = 1; i <= $NODES; i++)); do > for ((j = 1; j <= $VIPS; j++)); do > ovn-nbctl ls-lb-add ls$i lb$j > done > done > > ovs-vsctl add-port br-int vm1 \ > -- set interface vm1 type=internal \ > -- set interface vm1 external-ids:iface-id=vm1 > > Suggested-by: Ilya Maximets > Signed-off-by: Dumitru Ceara Hi Dumitru, Thanks for this patch. The patch LGTM. I tested it out with huge NB DB with lots of load balancer and backends. I see a significant reduction in OF flows. Please see the few comments below. Thanks Numan > --- > controller/lflow.c | 204 --- > tests/ofproto-macros.at | 5 +- > tests/ovn.at| 516 > +--- > 3 files changed, 455 insertions(+), 270 deletions(-) > > diff --git a/controller/lflow.c b/controller/lflow.c > index 946c1e0..2b7d356 100644 > --- a/controller/lflow.c > +++ b/controller/lflow.c > @@ -1171,6 +1171,178 @@ add_neighbor_flows(struct ovsdb_idl_index > *sbrec_port_binding_by_name, > } > } > > +/* Builds the "learn()" action to be triggered by packets initiating a > + * hairpin session. > + * > + * This will generate flows in table OFTABLE_CHK_LB_HAIRPIN_REPLY of the > form: > + * - match: > + * metadata=,ip/ipv6,ip.src=,ip.dst= > + * nw_proto='lb_proto',tp_src_port= > + * - action: > + * set MLF_LOOKUP_LB_HAIRPIN_BIT=1 > + */ > +static void > +add_lb_vip_hairpin_reply_action(struct in6_addr *vip6, ovs_be32 vip, > +uint8_t lb_proto, bool has_l4_port, > +uint64_t cookie, struct ofpbuf *ofpacts) > +{ > +struct match match = MATCH_CATCHALL_INITIALIZER; > +struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts); > +struct ofpact_learn_spec *ol_spec; > +unsigned int imm_bytes; > +uint8_t *src_imm; > + > +/* Once learned, hairpin reply flows are permanent until the VIP/backend > + * is removed. > + */ > +ol->flags = NX_LEARN_F_DELETE_LEARNED; > +ol->idle_timeout = OFP_FLOW_PERMANENT; > +ol->hard_timeout = OFP_FLOW_PERMANENT; > +ol->priority = OFP_DEFAULT_PRIORITY; > +ol->table_id = OFTABLE_CHK_LB_HAIRPIN_REPLY; > +ol->cookie = htonll(cookie); > + > +/* Match on metadata of the packet that created the hairpin session. */ > +ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); > + > +ol_spec->dst.field = mf_from_id(MFF_METADATA); > +ol_spec->dst.ofs = 0; > +ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; > +ol_spec->n_bits = ol_spec->dst.n_bits; > +ol_spec->dst_type = NX_LEARN_DST_MATCH; > +ol_spec->src_type = NX_LEARN_SRC_FIELD; > +ol_spec->src.field = mf_from_id(MFF_METADATA); > + > +/* Match on the same ETH type as the packet that created the hairpin > + * session. > + */ > +ol_spec =
Re: [ovs-dev] [PATCH ovn] tests: Fix Port_Binding up test.
On Mon, Feb 8, 2021 at 2:40 PM Dumitru Ceara wrote: > > On 2/7/21 3:52 AM, gmingchen(陈供明) wrote: > > From: Gongming Chen > > > > After setting the iface-id, immediately check the up status of the port > > binding, it will occasionally fail, especially when the port binding > > status is reported later. > > > > When it fails, the following will be output: > > Checking values in sb Port_Binding with logical_port=lsp1 against false... > > found false > > ovs-vsctl add-port br-int lsp1 -- set Interface lsp1 > > external-ids:iface-id=lsp1 > > ./ovn-macros.at:307: "$@" > > Checking values in sb Port_Binding with logical_port=lsp1 against true... > > found false > > _uuid : 15ebabb6-3dbb-4806-aa85-d1c03e3b39f6 > > logical_port: lsp1 > > up : true > > ./ovn-macros.at:393: hard failure > > > > Fixes: 4d3cb42b076b ("binding: Set Logical_Switch_Port.up when all OVS > > flows are installed.") > > Signed-off-by: Gongming Chen > > --- > > Thanks for the fix! > > Acked-by: Dumitru Ceara Thanks for the fix. I applied this patch to master. I also added your name to the AUTHORS list. Numan > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] tests: Fix Port_Binding up test.
On 2/7/21 3:52 AM, gmingchen(陈供明) wrote: From: Gongming Chen After setting the iface-id, immediately check the up status of the port binding, it will occasionally fail, especially when the port binding status is reported later. When it fails, the following will be output: Checking values in sb Port_Binding with logical_port=lsp1 against false... found false ovs-vsctl add-port br-int lsp1 -- set Interface lsp1 external-ids:iface-id=lsp1 ./ovn-macros.at:307: "$@" Checking values in sb Port_Binding with logical_port=lsp1 against true... found false _uuid : 15ebabb6-3dbb-4806-aa85-d1c03e3b39f6 logical_port: lsp1 up : true ./ovn-macros.at:393: hard failure Fixes: 4d3cb42b076b ("binding: Set Logical_Switch_Port.up when all OVS flows are installed.") Signed-off-by: Gongming Chen --- Thanks for the fix! Acked-by: Dumitru Ceara ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev