Re: [ovs-dev] [PATCH] ofproto: Add upcall/dump-ufid-rules command to map UFIDs to OpenFlow.
On Tue, Jun 4, 2024 at 12:05 AM Dumitru Ceara wrote: > It improves the debugging experience if we can easily get a list of > OpenFlow rules and groups that contribute to the creation of a datapath > flow. > > The suggested workflow is: > a. dump datapath flows (along with UUIDs), this also prints the core IDs > (PMD IDs) when applicable. > > $ ovs-appctl dpctl/dump-flows -m > flow-dump from pmd on cpu core: 7 > ufid:7460db8f..., recirc_id(0), > > b. dump related OpenFlow rules and groups: > $ ovs-appctl upcall/dump-ufid-rules ufid:7460db8f... pmd=7 > cookie=0x12345678, table=0 > priority=100,ip,in_port=2,nw_dst=10.0.0.2,actions=resubmit(,1) > cookie=0x0, table=1 priority=200,actions=group:1 > > group_id=1,bucket=bucket_id:0,actions=ct(commit,table=2,nat(dst=20.0.0.2)) > cookie=0x0, table=2 actions=output:1 > > The new command only shows rules and groups attached to ukeys that are > in states UKEY_VISIBLE or UKEY_OPERATIONAL. That should be fine as all > other ukeys should not be relevant for the use case presented above. > > For ukeys that don't have an xcache populated yet, the command goes > ahead and populates one. In theory this is creates a slight overhead as > those ukeys might not need an xcache until they see traffic (and get > revalidated) but in practice the overhead should be minimal. > > This commit tries to mimic the output format of the ovs-ofctl > dump-flows/dump-groups commands. For groups it actually uses > ofputil_group/_bucket functions for formatting. For rules it's not that > straightforward or clean as 'struct rule' uses its own versions of > fields to describe the match, actions, and other relevant fields and > 'ovs-ofctl dump-flows' operates on the equivalent 'struct > ofputil_flow_stats'. > > Signed-off-by: Dumitru Ceara > --- > NEWS| 3 + > include/openvswitch/ofp-group.h | 7 ++ > lib/ofp-group.c | 110 +++- > ofproto/ofproto-dpif-upcall.c | 87 + > ofproto/ofproto-dpif.c | 33 ++ > ofproto/ofproto-dpif.h | 4 ++ > tests/ofproto-dpif.at | 39 +++ > tests/ofproto-macros.at | 4 ++ > 8 files changed, 243 insertions(+), 44 deletions(-) > > diff --git a/NEWS b/NEWS > index 5ae0108d552b..1bc085f97045 100644 > --- a/NEWS > +++ b/NEWS > @@ -9,6 +9,9 @@ Post-v3.3.0 > https://github.com/openvswitch/ovs.git > - DPDK: > * OVS validated with DPDK 23.11.1. > + - ovs-appctl: > + * Added 'upcall/dump-ufid-rules' to output the set of OpenFlow rules > and > + groups that contributed to the creation of a specific datapath > flow. > > > v3.3.0 - 16 Feb 2024 > diff --git a/include/openvswitch/ofp-group.h > b/include/openvswitch/ofp-group.h > index cd7af0ebff9c..79fcb3a4c0d1 100644 > --- a/include/openvswitch/ofp-group.h > +++ b/include/openvswitch/ofp-group.h > @@ -70,6 +70,11 @@ struct ofputil_bucket *ofputil_bucket_find(const struct > ovs_list *, > bool ofputil_bucket_check_duplicate_id(const struct ovs_list *); > struct ofputil_bucket *ofputil_bucket_list_front(const struct ovs_list *); > struct ofputil_bucket *ofputil_bucket_list_back(const struct ovs_list *); > +void ofputil_bucket_format(const struct ofputil_bucket *, > + enum ofp11_group_type, enum ofp_version, > + const struct ofputil_port_map *, > + const struct ofputil_table_map *, > + struct ds *); > > static inline bool > ofputil_bucket_has_liveness(const struct ofputil_bucket *bucket) > @@ -88,6 +93,8 @@ struct ofputil_group_props { > void ofputil_group_properties_destroy(struct ofputil_group_props *); > void ofputil_group_properties_copy(struct ofputil_group_props *to, > const struct ofputil_group_props > *from); > +void ofputil_group_properties_format(const struct ofputil_group_props *, > + struct ds *); > /* Protocol-independent group_mod. */ > struct ofputil_group_mod { > uint16_t command; /* One of OFPGC15_*. */ > diff --git a/lib/ofp-group.c b/lib/ofp-group.c > index 737f48047b10..8f7614c22f0e 100644 > --- a/lib/ofp-group.c > +++ b/lib/ofp-group.c > @@ -58,14 +58,16 @@ ofputil_group_from_string(const char *s, uint32_t > *group_idp) > return true; > } > > -/* Appends to 's' a string representation of the OpenFlow group ID > 'group_id'. > - * Most groups' string representation is just the number, but for special > - * groups, e.g. OFPG_ALL, it is the name, e.g. "ALL". */ > +/* Appends to 's' a string representation of the OpenFlow group. > 'group_id'. > + * Most groups' string representation is just 'group_id=' followed by the > ID, > + * but for special groups, e.g. OFPG_ALL, the ID is replaced by the name, > + * e.g. "ALL". */ > void > ofputil_format_group(uint32_t group_id, struct ds *s) > { > char name[M
Re: [ovs-dev] [PATCH net-next v2 6/9] net: openvswitch: store sampling probability in cb.
Hi Adrian, kernel test robot noticed the following build errors: [auto build test ERROR on net-next/main] url: https://github.com/intel-lab-lkp/linux/commits/Adrian-Moreno/net-psample-add-user-cookie/20240604-030055 base: net-next/main patch link: https://lore.kernel.org/r/20240603185647.2310748-7-amorenoz%40redhat.com patch subject: [PATCH net-next v2 6/9] net: openvswitch: store sampling probability in cb. config: m68k-allmodconfig (https://download.01.org/0day-ci/archive/20240604/202406041339.ytdrh41v-...@intel.com/config) compiler: m68k-linux-gcc (GCC) 13.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240604/202406041339.ytdrh41v-...@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot | Closes: https://lore.kernel.org/oe-kbuild-all/202406041339.ytdrh41v-...@intel.com/ All errors (new ones prefixed by >>, old ones prefixed by <<): WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/hid/hid-zydacron.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/hid/hid-viewsonic.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/hid/hid-waltop.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/hid/hid-winwing.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/of/of_test.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/fbtft/fbtft.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-bootrom.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-spilib.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-hid.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-light.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-log.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-loopback.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-power-supply.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-raw.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-vibrator.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-audio-manager.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-gbphy.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-gpio.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-i2c.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-pwm.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-sdio.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-spi.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-uart.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-usb.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/platform/goldfish/goldfish_pipe.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/platform/chrome/cros_kunit_proto_test.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/mailbox/mtk-cmdq-mailbox.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/devfreq/governor_simpleondemand.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/devfreq/governor_performance.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/devfreq/governor_powersave.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/devfreq/governor_userspace.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/hwtracing/intel_th/intel_th_msu_sink.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvmem/nvmem-apple-efuses.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvmem/nvmem_brcm_nvram.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvmem/nvmem_u-boot-env.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/interconnect/imx/imx-interconnect.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/interconnect/imx/imx8mm-interconnect.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/interconnect/imx/imx8mq-interconnect.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/interconnect/imx/imx8mn-interconnect.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/interconnect/imx/imx8mp-interconnect.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/hte/hte-tegra194-test.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/vdpa/vdpa.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/parport/parport.o WARNING: modpost: drivers/parport/parport_amiga: section mismatch in reference: amiga_parallel_driver+0x8 (section: .data) -> amiga_parallel_remove (section: .exit.text) WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/mtd/parsers/brcm_u-
Re: [ovs-dev] [PATCH ovn v4] Text respresntations for drop sampling.
On Mon, Jun 3, 2024 at 9:47 PM Jacob Tanenbaum wrote: > Created a new column in the southbound database to hardcode a human > readable > description for flows. This first use is describing why the flow is > dropping packets. > The new column is called flow_desc and will create southbound database > entries like this > > _uuid : 20f1897b-477e-47ae-a32c-c546d83ec097 > actions : > "sample(probability=65535,collector_set=123,obs_domain=1,obs_point=$cookie); > /* drop */" > controller_meter: [] > external_ids: {source="northd.c:8721", stage-name=ls_in_l2_unknown} > flow_desc : "No L2 destination" > logical_datapath: [] > logical_dp_group: ee3c3db5-98a2-4f34-8a84-409deae140a7 > match : "outport == \"none\"" > pipeline: ingress > priority: 50 > table_id: 27 > tags: {} > hash: 0 > > future work includes entering more flow_desc for more flows and adding > flow_desc to the actions as a comment. > > Signed-off-by: Jacob Tanenbaum > Suggested-by: Dumitru Ceara > Reported-at: https://issues.redhat.com/browse/FDP-307 > > --- > > v1: initial version > v2: correct commit message > make the flow_desc a char* > correct a typo in the ovn-sb.xml > correct the test > v3: rebase issue with NEWS file > v4: remove options:debug_drop_domain_id="1" from testing as we > do not depend on it > > merge > > diff --git a/NEWS b/NEWS > index 81c958f9a..04c441ada 100644 > --- a/NEWS > +++ b/NEWS > @@ -21,6 +21,8 @@ Post v24.03.0 > MAC addresses configured on the LSP with "unknown", are learnt via the > OVN native FDB. >- Add support for ovsdb-server `--config-file` option in ovn-ctl. > + - Added a new column in the southbound database "flow_desc" to provide > +human readable context to flows. > > OVN v24.03.0 - 01 Mar 2024 > -- > diff --git a/northd/lflow-mgr.c b/northd/lflow-mgr.c > index b2c60b5de..e27558a32 100644 > --- a/northd/lflow-mgr.c > +++ b/northd/lflow-mgr.c > @@ -36,7 +36,7 @@ static void ovn_lflow_init(struct ovn_lflow *, struct > ovn_datapath *od, > uint16_t priority, char *match, > char *actions, char *io_port, > char *ctrl_meter, char *stage_hint, > - const char *where); > + const char *where, char *flow_desc); > static struct ovn_lflow *ovn_lflow_find(const struct hmap *lflows, > enum ovn_stage stage, > uint16_t priority, const char > *match, > @@ -52,7 +52,7 @@ static struct ovn_lflow *do_ovn_lflow_add( > const char *actions, const char *io_port, > const char *ctrl_meter, > const struct ovsdb_idl_row *stage_hint, > -const char *where); > +const char *where, const char *flow_desc); > > > static struct ovs_mutex *lflow_hash_lock(const struct hmap *lflow_table, > @@ -173,6 +173,7 @@ struct ovn_lflow { >* 'dpg_bitmap'. */ > struct ovn_dp_group *dpg;/* Link to unique Sb datapath group. */ > const char *where; > +char *flow_desc; > > struct uuid sb_uuid; /* SB DB row uuid, specified by northd. > */ > struct ovs_list referenced_by; /* List of struct lflow_ref_node. */ > @@ -659,7 +660,7 @@ lflow_table_add_lflow(struct lflow_table *lflow_table, >const char *match, const char *actions, >const char *io_port, const char *ctrl_meter, >const struct ovsdb_idl_row *stage_hint, > - const char *where, > + const char *where, const char *flow_desc, >struct lflow_ref *lflow_ref) > OVS_EXCLUDED(fake_hash_mutex) > { > @@ -679,7 +680,7 @@ lflow_table_add_lflow(struct lflow_table *lflow_table, > do_ovn_lflow_add(lflow_table, > od ? ods_size(od->datapaths) : dp_bitmap_len, > hash, stage, priority, match, actions, > - io_port, ctrl_meter, stage_hint, where); > + io_port, ctrl_meter, stage_hint, where, > flow_desc); > > if (lflow_ref) { > struct lflow_ref_node *lrn = > @@ -733,7 +734,7 @@ lflow_table_add_lflow_default_drop(struct lflow_table > *lflow_table, > { > lflow_table_add_lflow(lflow_table, od, NULL, 0, stage, 0, "1", >debug_drop_action(), NULL, NULL, NULL, > - where, lflow_ref); > + where, NULL, lflow_ref); > } > > struct ovn_dp_group * > @@ -856,7 +857,7 @@ static void > ovn_lflow_init(struct ovn_lflow *lflow, struct ovn_datapath *od, > size_t dp_bitmap_len, enum ovn_stage stage, uint16_t > priority, > char *match, char *actions, char *io_port, char >
Re: [ovs-dev] [PATCH net-next 1/2] selftests: openvswitch: fix action formatting
On Mon, 03 Jun 2024 15:00:03 -0400 Aaron Conole wrote: > I agree, this is an issue. BUT I think it might be better to just > filter by field type up front. See: > > https://github.com/apconole/linux-next-work/commit/7262107de7170d44b6dbf6c5ea6f7e6c0bb71d36#diff-3e72e7405c6bb4e9842bed5f63883ca930387086bb40d4034e92ed83a5decb4bR441 > > That version I think ends up being much easier to follow. If you want > to take it for your series, feel free. If you disagree, maybe there's > something I'm not considering about it. > > NOTE that version is just a bunch of independent changes that are > squashed together. I have a cleaner version. > > I can also bundle up the series I have so far and submit, but I didn't > want to do that until I got all the pmtu.sh support working. Maybe it > makes sense to send it now though. Simon, Jakub - wdyt? I'd say - hold onto the changes until pmtu.sh works, unless there's *any* reason for a particular patch to go in early, eg: - patch fixes existing bug - someone else needs a patch - ... - a patch which falls under some of the criteria above depends on the patch ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 2/2] python: ovsdb-idl: Use monitor_cond for _Server DB.
On 5/6/24 18:58, Terry Wilson wrote: > Unlike the C IDL code, the Python version still monitors the > _Server DB with "monitor" instead of "monitor_cond". This results > in receiving an entire Database row every time the "index" value > is updated which includes the 'schema' column. Using "monitor_cond" > will result in "update2" notifications which just include the > changed "index" value. > > Unlike the C IDL, the Python IDL requires a SchemaHelper object > to instanitate the IDL, leaving it to the user of the library to > call "get_schema" themselves. Since the Python IDL did not have > support for retrieving the schema automatically and did not have > a state for doing so, instead of transitioning on an error response > from retrieving the _Server schema to requesting the "data" schema, > this moves directly to monitoring the "data" DB. > > Signed-off-by: Terry Wilson > --- > python/ovs/db/idl.py | 28 ++-- > 1 file changed, 14 insertions(+), 14 deletions(-) > Thanks, Terry! Sorry for a long delay. I'm considering this as a bug fix, because the use of a plain monitor causes a significant traffic amplification with every small transactions and can potentially create serious issues in large scale clusters. Applied to main. Backporting to 2.17 is probably too much, so only ported to 3.3 for now to avoid this issue on a future LTS branch. Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] ofproto: Add upcall/dump-ufid-rules command to map UFIDs to OpenFlow.
It improves the debugging experience if we can easily get a list of OpenFlow rules and groups that contribute to the creation of a datapath flow. The suggested workflow is: a. dump datapath flows (along with UUIDs), this also prints the core IDs (PMD IDs) when applicable. $ ovs-appctl dpctl/dump-flows -m flow-dump from pmd on cpu core: 7 ufid:7460db8f..., recirc_id(0), b. dump related OpenFlow rules and groups: $ ovs-appctl upcall/dump-ufid-rules ufid:7460db8f... pmd=7 cookie=0x12345678, table=0 priority=100,ip,in_port=2,nw_dst=10.0.0.2,actions=resubmit(,1) cookie=0x0, table=1 priority=200,actions=group:1 group_id=1,bucket=bucket_id:0,actions=ct(commit,table=2,nat(dst=20.0.0.2)) cookie=0x0, table=2 actions=output:1 The new command only shows rules and groups attached to ukeys that are in states UKEY_VISIBLE or UKEY_OPERATIONAL. That should be fine as all other ukeys should not be relevant for the use case presented above. For ukeys that don't have an xcache populated yet, the command goes ahead and populates one. In theory this is creates a slight overhead as those ukeys might not need an xcache until they see traffic (and get revalidated) but in practice the overhead should be minimal. This commit tries to mimic the output format of the ovs-ofctl dump-flows/dump-groups commands. For groups it actually uses ofputil_group/_bucket functions for formatting. For rules it's not that straightforward or clean as 'struct rule' uses its own versions of fields to describe the match, actions, and other relevant fields and 'ovs-ofctl dump-flows' operates on the equivalent 'struct ofputil_flow_stats'. Signed-off-by: Dumitru Ceara --- NEWS| 3 + include/openvswitch/ofp-group.h | 7 ++ lib/ofp-group.c | 110 +++- ofproto/ofproto-dpif-upcall.c | 87 + ofproto/ofproto-dpif.c | 33 ++ ofproto/ofproto-dpif.h | 4 ++ tests/ofproto-dpif.at | 39 +++ tests/ofproto-macros.at | 4 ++ 8 files changed, 243 insertions(+), 44 deletions(-) diff --git a/NEWS b/NEWS index 5ae0108d552b..1bc085f97045 100644 --- a/NEWS +++ b/NEWS @@ -9,6 +9,9 @@ Post-v3.3.0 https://github.com/openvswitch/ovs.git - DPDK: * OVS validated with DPDK 23.11.1. + - ovs-appctl: + * Added 'upcall/dump-ufid-rules' to output the set of OpenFlow rules and + groups that contributed to the creation of a specific datapath flow. v3.3.0 - 16 Feb 2024 diff --git a/include/openvswitch/ofp-group.h b/include/openvswitch/ofp-group.h index cd7af0ebff9c..79fcb3a4c0d1 100644 --- a/include/openvswitch/ofp-group.h +++ b/include/openvswitch/ofp-group.h @@ -70,6 +70,11 @@ struct ofputil_bucket *ofputil_bucket_find(const struct ovs_list *, bool ofputil_bucket_check_duplicate_id(const struct ovs_list *); struct ofputil_bucket *ofputil_bucket_list_front(const struct ovs_list *); struct ofputil_bucket *ofputil_bucket_list_back(const struct ovs_list *); +void ofputil_bucket_format(const struct ofputil_bucket *, + enum ofp11_group_type, enum ofp_version, + const struct ofputil_port_map *, + const struct ofputil_table_map *, + struct ds *); static inline bool ofputil_bucket_has_liveness(const struct ofputil_bucket *bucket) @@ -88,6 +93,8 @@ struct ofputil_group_props { void ofputil_group_properties_destroy(struct ofputil_group_props *); void ofputil_group_properties_copy(struct ofputil_group_props *to, const struct ofputil_group_props *from); +void ofputil_group_properties_format(const struct ofputil_group_props *, + struct ds *); /* Protocol-independent group_mod. */ struct ofputil_group_mod { uint16_t command; /* One of OFPGC15_*. */ diff --git a/lib/ofp-group.c b/lib/ofp-group.c index 737f48047b10..8f7614c22f0e 100644 --- a/lib/ofp-group.c +++ b/lib/ofp-group.c @@ -58,14 +58,16 @@ ofputil_group_from_string(const char *s, uint32_t *group_idp) return true; } -/* Appends to 's' a string representation of the OpenFlow group ID 'group_id'. - * Most groups' string representation is just the number, but for special - * groups, e.g. OFPG_ALL, it is the name, e.g. "ALL". */ +/* Appends to 's' a string representation of the OpenFlow group. 'group_id'. + * Most groups' string representation is just 'group_id=' followed by the ID, + * but for special groups, e.g. OFPG_ALL, the ID is replaced by the name, + * e.g. "ALL". */ void ofputil_format_group(uint32_t group_id, struct ds *s) { char name[MAX_GROUP_NAME_LEN + 1]; +ds_put_cstr(s, "group_id="); ofputil_group_to_string(group_id, name, sizeof name); ds_put_cstr(s, name); } @@ -297,7 +299,7 @@ ofputil_group_desc_request_format(struct ds *string, const stru
Re: [ovs-dev] [PATCH ovn v2] ovn-controller: Initialize bitmap to zero.
Like Ilya, I'm surprised that ASAN did not catch this. Having said that, Acked-by: Mark Michelson On 5/10/24 06:57, Ales Musil wrote: The bitmap used in the update_ct_zones was uninitialized, and it could contain any value besides 0. Use the bitmap_allocate() function instead, to allocate the bitmap in heap rather than stack, the allocate makes sure that the memory is properly zeroed. This was caught by valgrind: Conditional jump or move depends on uninitialised value(s) at 0x44074B: update_ct_zones (ovn-controller.c:812) by 0x440DC9: en_ct_zones_run (ovn-controller.c:2579) by 0x468BB7: engine_recompute (inc-proc-eng.c:415) by 0x46954C: engine_compute (inc-proc-eng.c:454) by 0x46954C: engine_run_node (inc-proc-eng.c:503) by 0x46954C: engine_run (inc-proc-eng.c:528) by 0x40AE9D: main (ovn-controller.c:5776) Uninitialised value was created by a stack allocation at 0x440313: update_ct_zones (ovn-controller.c:747) Fixes: f9cab11d5fab ("Allow explicit setting of the SNAT zone on a gateway router.") Signed-off-by: Ales Musil --- v2: Use bitmap_allocate() instead of array on stack. --- controller/ovn-controller.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index 453dc62fd..8ee2da2fd 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -732,7 +732,7 @@ update_ct_zones(const struct sset *local_lports, const char *user; struct sset all_users = SSET_INITIALIZER(&all_users); struct simap req_snat_zones = SIMAP_INITIALIZER(&req_snat_zones); -unsigned long unreq_snat_zones_map[BITMAP_N_LONGS(MAX_CT_ZONES)]; +unsigned long *unreq_snat_zones_map = bitmap_allocate(MAX_CT_ZONES); struct simap unreq_snat_zones = SIMAP_INITIALIZER(&unreq_snat_zones); const char *local_lport; @@ -843,6 +843,7 @@ update_ct_zones(const struct sset *local_lports, simap_destroy(&req_snat_zones); simap_destroy(&unreq_snat_zones); sset_destroy(&all_users); +free(unreq_snat_zones_map); } static void ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v5 2/2] northd: Add support for disabling vxlan mode.
Hi Vladislav, I have just one comment below. On 5/3/24 04:13, Vladislav Odintsov wrote: Commit [1] introduced a "VXLAN mode" concept. It brought a limitation for available tunnel IDs because of lack of space in VXLAN VNI. In VXLAN mode OVN is limited by 4095 datapaths (LRs or non-transit LSs) and 2047 logical ports per datapath. Prior to this patch VXLAN mode was enabled automatically if at least one chassis had encap of VXLAN type. In scenarios where one want to use VXLAN only for HW VTEP (RAMP) switch, such limitation makes no sence. This patch adds support for explicit disabling of VXLAN mode via Northbound database. 1: https://github.com/ovn-org/ovn/commit/b07f1bc3d068 Acked-By: Ihar Hrachyshka Fixes: b07f1bc3d068 ("Add VXLAN support for non-VTEP datapath bindings") Signed-off-by: Vladislav Odintsov --- NEWS | 4 northd/en-global-config.c | 7 ++- northd/northd.c | 10 -- northd/northd.h | 3 ++- ovn-architecture.7.xml| 6 ++ ovn-nb.xml| 10 ++ tests/ovn-northd.at | 29 + 7 files changed, 65 insertions(+), 4 deletions(-) diff --git a/NEWS b/NEWS index 3b5e93dc9..007b27f3d 100644 --- a/NEWS +++ b/NEWS @@ -17,6 +17,10 @@ Post v24.03.0 external-ids, the option is no longer needed as it became effectively "true" for all scenarios. - Added DHCPv4 relay support. + - Added new global config option NB_Global:options:disable_vxlan_mode to +extend available tunnel IDs space for datapaths from 4095 to 16711680 +when running in "VXLAN mode". For more details see man ovn-nb(5) for +mentioned option. OVN v24.03.0 - 01 Mar 2024 -- diff --git a/northd/en-global-config.c b/northd/en-global-config.c index 873649a89..f5e2a8154 100644 --- a/northd/en-global-config.c +++ b/northd/en-global-config.c @@ -115,7 +115,7 @@ en_global_config_run(struct engine_node *node , void *data) config_data->svc_monitor_mac); } -init_vxlan_mode(sbrec_chassis_table); +init_vxlan_mode(&nb->options, sbrec_chassis_table); char *max_tunid = xasprintf("%d", get_ovn_max_dp_key_local()); smap_replace(options, "max_tunid", max_tunid); free(max_tunid); @@ -533,6 +533,11 @@ check_nb_options_out_of_sync(const struct nbrec_nb_global *nb, return true; } +if (config_out_of_sync(&nb->options, &config_data->nb_options, + "disable_vxlan_mode", false)) { +return true; +} + return false; } diff --git a/northd/northd.c b/northd/northd.c index 0e0ae24db..7bdffe531 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -886,8 +886,14 @@ join_datapaths(const struct nbrec_logical_switch_table *nbrec_ls_table, } void -init_vxlan_mode(const struct sbrec_chassis_table *sbrec_chassis_table) +init_vxlan_mode(const struct smap *nb_options, +const struct sbrec_chassis_table *sbrec_chassis_table) { +if (smap_get_bool(nb_options, "disable_vxlan_mode", false)) { +vxlan_mode = false; +return; +} + const struct sbrec_chassis *chassis; SBREC_CHASSIS_TABLE_FOR_EACH (chassis, sbrec_chassis_table) { for (int i = 0; i < chassis->n_encaps; i++) { @@ -17596,7 +17602,7 @@ ovnnb_db_run(struct northd_input *input_data, use_common_zone = smap_get_bool(input_data->nb_options, "use_common_zone", false); -init_vxlan_mode(input_data->sbrec_chassis_table); +init_vxlan_mode(input_data->nb_options, input_data->sbrec_chassis_table); build_datapaths(ovnsb_txn, input_data->nbrec_logical_switch_table, diff --git a/northd/northd.h b/northd/northd.h index be480003e..d0322e621 100644 --- a/northd/northd.h +++ b/northd/northd.h @@ -792,7 +792,8 @@ lr_has_multiple_gw_ports(const struct ovn_datapath *od) } void -init_vxlan_mode(const struct sbrec_chassis_table *sbrec_chassis_table); +init_vxlan_mode(const struct smap *nb_options, +const struct sbrec_chassis_table *sbrec_chassis_table); uint32_t get_ovn_max_dp_key_local(void); diff --git a/ovn-architecture.7.xml b/ovn-architecture.7.xml index 3ecb58933..f4eae340c 100644 --- a/ovn-architecture.7.xml +++ b/ovn-architecture.7.xml @@ -2920,4 +2920,10 @@ the future, gateways that do not support encapsulations with large amounts of metadata may continue to have a reduced feature set. + +VXLAN mode is recommended to be disabled if VXLAN encap at +hypervisors is needed only to support HW VTEP L2 Gateway functionality. +See man ovn-nb(5) for table NB_Global column +options key disable_vxlan_mode for more details. + diff --git a/ovn-nb.xml b/ovn-nb.xml index 5cb6ba640..84f1e07b6 100644 --- a/ovn-nb.xml +++ b/ovn-nb.xml @@ -381,6 +381,16 @@ of SB changes would be very noticeable.
Re: [ovs-dev] [PATCH ovn v5 1/2] northd: Make `vxlan_mode` a global variable.
Hi Vladislav, Generally speaking, I agree with this change. However, I think that setting a global variable from an incremental processing engine node runner feels wrong. I think that instead, the "vxlan_mode" variable you have introduced should be a field on struct ed_type_global_config. This way, the engine node is only modifying data local to itself. On 5/3/24 04:13, Vladislav Odintsov wrote: This simplifies code and subsequent commit to explicitely disable VXLAN mode is based on these changes. Also "VXLAN mode" term is introduced in ovn-architecture man page. Signed-off-by: Vladislav Odintsov --- northd/en-global-config.c | 4 +- northd/northd.c | 85 +-- northd/northd.h | 5 ++- ovn-architecture.7.xml| 10 ++--- 4 files changed, 47 insertions(+), 57 deletions(-) diff --git a/northd/en-global-config.c b/northd/en-global-config.c index 28c78a12c..873649a89 100644 --- a/northd/en-global-config.c +++ b/northd/en-global-config.c @@ -115,8 +115,8 @@ en_global_config_run(struct engine_node *node , void *data) config_data->svc_monitor_mac); } -char *max_tunid = xasprintf("%d", -get_ovn_max_dp_key_local(sbrec_chassis_table)); +init_vxlan_mode(sbrec_chassis_table); +char *max_tunid = xasprintf("%d", get_ovn_max_dp_key_local()); smap_replace(options, "max_tunid", max_tunid); free(max_tunid); diff --git a/northd/northd.c b/northd/northd.c index 133cddb69..0e0ae24db 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -90,6 +90,10 @@ static bool use_ct_inv_match = true; */ static bool default_acl_drop; +/* If this option is 'true' northd will use limited 24-bit space for datapath + * and ports tunnel key allocation (12 bits for each instead of default 16). */ +static bool vxlan_mode; + #define MAX_OVN_TAGS 4096 @@ -881,40 +885,40 @@ join_datapaths(const struct nbrec_logical_switch_table *nbrec_ls_table, } } -static bool -is_vxlan_mode(const struct sbrec_chassis_table *sbrec_chassis_table) +void +init_vxlan_mode(const struct sbrec_chassis_table *sbrec_chassis_table) { const struct sbrec_chassis *chassis; SBREC_CHASSIS_TABLE_FOR_EACH (chassis, sbrec_chassis_table) { for (int i = 0; i < chassis->n_encaps; i++) { if (!strcmp(chassis->encaps[i]->type, "vxlan")) { -return true; +vxlan_mode = true; +return; } } } -return false; +vxlan_mode = false; } uint32_t -get_ovn_max_dp_key_local(const struct sbrec_chassis_table *sbrec_chassis_table) +get_ovn_max_dp_key_local(void) { -if (is_vxlan_mode(sbrec_chassis_table)) { -/* OVN_MAX_DP_GLOBAL_NUM doesn't apply for vxlan mode. */ +if (vxlan_mode) { +/* OVN_MAX_DP_GLOBAL_NUM doesn't apply for VXLAN mode. */ return OVN_MAX_DP_VXLAN_KEY; } return OVN_MAX_DP_KEY - OVN_MAX_DP_GLOBAL_NUM; } static void -ovn_datapath_allocate_key(const struct sbrec_chassis_table *sbrec_ch_table, - struct hmap *datapaths, struct hmap *dp_tnlids, +ovn_datapath_allocate_key(struct hmap *datapaths, struct hmap *dp_tnlids, struct ovn_datapath *od, uint32_t *hint) { if (!od->tunnel_key) { od->tunnel_key = ovn_allocate_tnlid(dp_tnlids, "datapath", -OVN_MIN_DP_KEY_LOCAL, -get_ovn_max_dp_key_local(sbrec_ch_table), -hint); +OVN_MIN_DP_KEY_LOCAL, +get_ovn_max_dp_key_local(), +hint); if (!od->tunnel_key) { if (od->sb) { sbrec_datapath_binding_delete(od->sb); @@ -927,7 +931,6 @@ ovn_datapath_allocate_key(const struct sbrec_chassis_table *sbrec_ch_table, static void ovn_datapath_assign_requested_tnl_id( -const struct sbrec_chassis_table *sbrec_chassis_table, struct hmap *dp_tnlids, struct ovn_datapath *od) { const struct smap *other_config = (od->nbs @@ -936,8 +939,7 @@ ovn_datapath_assign_requested_tnl_id( uint32_t tunnel_key = smap_get_int(other_config, "requested-tnl-key", 0); if (tunnel_key) { const char *interconn_ts = smap_get(other_config, "interconn-ts"); -if (!interconn_ts && is_vxlan_mode(sbrec_chassis_table) && -tunnel_key >= 1 << 12) { +if (!interconn_ts && vxlan_mode && tunnel_key >= 1 << 12) { static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1); VLOG_WARN_RL(&rl, "Tunnel key %"PRIu32" for datapath %s is " "incompatible with VXLAN", tunnel_key, @@ -985,7 +987,6 @@ build_datapaths(struct ovsdb_idl_txn *ovnsb_txn, const struct nbrec_logi
Re: [ovs-dev] [PATCH ovn v5] controller: Allow br-int connection via other methods.
Thanks for keeping this fresh, Ales. Acked-by: Mark Michelson On 5/2/24 12:44, Ales Musil wrote: The br-int connection is hardcoded to use unix socket, which requires for the socket to be visible for ovn-controller. This is achievable in container by mounting the socket, but in turn the container requires additional privileges. Add option to vswitchd external-ids that allows to specify remote target for management bridge. This gives the user possibility to connect to management bridge in different manner than unix socket, defaulting to the unix socket when not specified. In addition, there is an option to specify inactivity probe for this connection, disabled by default. Reported-at: https://issues.redhat.com/browse/FDP-243 Signed-off-by: Ales Musil --- v4: Rebase on top of current main. v3: Rebase on top of current main. Fix the copy-paste error in ovn-controller documentation. v2: Rebase on top of current main. Make the probe interval accept milliseconds to be aligned with other probe intervals. Use external-ids instead of options for the ovn-controller. --- NEWS| 6 +++ controller/ofctrl.c | 10 + controller/ofctrl.h | 5 ++- controller/ovn-controller.8.xml | 15 controller/ovn-controller.c | 59 +++-- controller/pinctrl.c| 56 ++-- controller/pinctrl.h| 6 ++- controller/statctrl.c | 66 ++--- controller/statctrl.h | 3 +- include/ovn/features.h | 2 +- lib/features.c | 35 + lib/ovn-util.c | 26 + lib/ovn-util.h | 4 ++ lib/test-ovn-features.c | 6 +-- tests/ovn-controller.at | 45 ++ 15 files changed, 193 insertions(+), 151 deletions(-) diff --git a/NEWS b/NEWS index 3b5e93dc9..4e15f31c8 100644 --- a/NEWS +++ b/NEWS @@ -17,6 +17,12 @@ Post v24.03.0 external-ids, the option is no longer needed as it became effectively "true" for all scenarios. - Added DHCPv4 relay support. + - Add "ovn-bridge-remote" config option to vswitchd external-ids, +that allows to specify connection method to management bridge for +ovn-controller, defaulting to the unix socket. + - Add "ovn-bridge-remote-probe-interval" config option to vswitchd +external-ids, that sets probe interval for integration bridge connection, +disabled by default. OVN v24.03.0 - 01 Mar 2024 -- diff --git a/controller/ofctrl.c b/controller/ofctrl.c index 6a2564604..9d181a782 100644 --- a/controller/ofctrl.c +++ b/controller/ofctrl.c @@ -771,19 +771,13 @@ ofctrl_get_mf_field_id(void) * Returns 'true' if an OpenFlow reconnect happened; 'false' otherwise. */ bool -ofctrl_run(const struct ovsrec_bridge *br_int, +ofctrl_run(const char *conn_target, int probe_interval, const struct ovsrec_open_vswitch_table *ovs_table, struct shash *pending_ct_zones) { -char *target = xasprintf("unix:%s/%s.mgmt", ovs_rundir(), br_int->name); bool reconnected = false; -if (strcmp(target, rconn_get_target(swconn))) { -VLOG_INFO("%s: connecting to switch", target); -rconn_connect(swconn, target, target); -} -free(target); - +ovn_update_swconn_at(swconn, conn_target, probe_interval, "ofctrl"); rconn_run(swconn); if (!rconn_is_connected(swconn) || !pending_ct_zones) { diff --git a/controller/ofctrl.h b/controller/ofctrl.h index 502c73da6..7df0a24ea 100644 --- a/controller/ofctrl.h +++ b/controller/ofctrl.h @@ -50,8 +50,9 @@ struct ovn_desired_flow_table { /* Interface for OVN main loop. */ void ofctrl_init(struct ovn_extend_table *group_table, struct ovn_extend_table *meter_table); -bool ofctrl_run(const struct ovsrec_bridge *br_int, -const struct ovsrec_open_vswitch_table *, + +bool ofctrl_run(const char *conn_target, int probe_interval, +const struct ovsrec_open_vswitch_table *ovs_table, struct shash *pending_ct_zones); enum mf_field_id ofctrl_get_mf_field_id(void); void ofctrl_put(struct ovn_desired_flow_table *lflow_table, diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml index 85e7966d7..b6404a19d 100644 --- a/controller/ovn-controller.8.xml +++ b/controller/ovn-controller.8.xml @@ -378,6 +378,21 @@ cap for the exponential backoff used by ovn-controller to send GARPs packets. + external_ids:ovn-bridge-remote + + + Connection to the OVN management bridge in OvS. It defaults to + unix:br-int.mgmt when not specified. + + + external_ids:ovn-bridge-remote-probe-interval + + + The inactivity probe interval of the connection to the OVN management +
[ovs-dev] [PATCH ovn v4] Text respresntations for drop sampling.
Created a new column in the southbound database to hardcode a human readable description for flows. This first use is describing why the flow is dropping packets. The new column is called flow_desc and will create southbound database entries like this _uuid : 20f1897b-477e-47ae-a32c-c546d83ec097 actions : "sample(probability=65535,collector_set=123,obs_domain=1,obs_point=$cookie); /* drop */" controller_meter: [] external_ids: {source="northd.c:8721", stage-name=ls_in_l2_unknown} flow_desc : "No L2 destination" logical_datapath: [] logical_dp_group: ee3c3db5-98a2-4f34-8a84-409deae140a7 match : "outport == \"none\"" pipeline: ingress priority: 50 table_id: 27 tags: {} hash: 0 future work includes entering more flow_desc for more flows and adding flow_desc to the actions as a comment. Signed-off-by: Jacob Tanenbaum Suggested-by: Dumitru Ceara Reported-at: https://issues.redhat.com/browse/FDP-307 --- v1: initial version v2: correct commit message make the flow_desc a char* correct a typo in the ovn-sb.xml correct the test v3: rebase issue with NEWS file v4: remove options:debug_drop_domain_id="1" from testing as we do not depend on it merge diff --git a/NEWS b/NEWS index 81c958f9a..04c441ada 100644 --- a/NEWS +++ b/NEWS @@ -21,6 +21,8 @@ Post v24.03.0 MAC addresses configured on the LSP with "unknown", are learnt via the OVN native FDB. - Add support for ovsdb-server `--config-file` option in ovn-ctl. + - Added a new column in the southbound database "flow_desc" to provide +human readable context to flows. OVN v24.03.0 - 01 Mar 2024 -- diff --git a/northd/lflow-mgr.c b/northd/lflow-mgr.c index b2c60b5de..e27558a32 100644 --- a/northd/lflow-mgr.c +++ b/northd/lflow-mgr.c @@ -36,7 +36,7 @@ static void ovn_lflow_init(struct ovn_lflow *, struct ovn_datapath *od, uint16_t priority, char *match, char *actions, char *io_port, char *ctrl_meter, char *stage_hint, - const char *where); + const char *where, char *flow_desc); static struct ovn_lflow *ovn_lflow_find(const struct hmap *lflows, enum ovn_stage stage, uint16_t priority, const char *match, @@ -52,7 +52,7 @@ static struct ovn_lflow *do_ovn_lflow_add( const char *actions, const char *io_port, const char *ctrl_meter, const struct ovsdb_idl_row *stage_hint, -const char *where); +const char *where, const char *flow_desc); static struct ovs_mutex *lflow_hash_lock(const struct hmap *lflow_table, @@ -173,6 +173,7 @@ struct ovn_lflow { * 'dpg_bitmap'. */ struct ovn_dp_group *dpg;/* Link to unique Sb datapath group. */ const char *where; +char *flow_desc; struct uuid sb_uuid; /* SB DB row uuid, specified by northd. */ struct ovs_list referenced_by; /* List of struct lflow_ref_node. */ @@ -659,7 +660,7 @@ lflow_table_add_lflow(struct lflow_table *lflow_table, const char *match, const char *actions, const char *io_port, const char *ctrl_meter, const struct ovsdb_idl_row *stage_hint, - const char *where, + const char *where, const char *flow_desc, struct lflow_ref *lflow_ref) OVS_EXCLUDED(fake_hash_mutex) { @@ -679,7 +680,7 @@ lflow_table_add_lflow(struct lflow_table *lflow_table, do_ovn_lflow_add(lflow_table, od ? ods_size(od->datapaths) : dp_bitmap_len, hash, stage, priority, match, actions, - io_port, ctrl_meter, stage_hint, where); + io_port, ctrl_meter, stage_hint, where, flow_desc); if (lflow_ref) { struct lflow_ref_node *lrn = @@ -733,7 +734,7 @@ lflow_table_add_lflow_default_drop(struct lflow_table *lflow_table, { lflow_table_add_lflow(lflow_table, od, NULL, 0, stage, 0, "1", debug_drop_action(), NULL, NULL, NULL, - where, lflow_ref); + where, NULL, lflow_ref); } struct ovn_dp_group * @@ -856,7 +857,7 @@ static void ovn_lflow_init(struct ovn_lflow *lflow, struct ovn_datapath *od, size_t dp_bitmap_len, enum ovn_stage stage, uint16_t priority, char *match, char *actions, char *io_port, char *ctrl_meter, - char *stage_hint, const char *where) + char *stage_hint, const char *where, char *flow_desc) { lflow->dpg_bitmap = bitmap_allocate(dp_bitmap_len); lflow->od = od; @@ -867,6 +868,7 @@ ovn_lflow_init(struct ovn_lflow *lflow, s
Re: [ovs-dev] [PATCH ovn v3] text respresntations for drop sampling.
On Mon, Jun 3, 2024 at 2:52 AM Ales Musil wrote: > > > On Wed, May 29, 2024 at 2:25 AM Jacob Tanenbaum > wrote: > >> created a new column in the southbound database to hardcode a human >> readable >> description for flows. This first use is describing why the flow is >> dropping packets. >> The new column is called flow_desc and will create southbound database >> entries like this >> >> _uuid : 20f1897b-477e-47ae-a32c-c546d83ec097 >> actions : >> "sample(probability=65535,collector_set=123,obs_domain=1,obs_point=$cookie); >> /* drop */" >> controller_meter: [] >> external_ids: {source="northd.c:8721", >> stage-name=ls_in_l2_unknown} >> flow_desc : "No L2 destination" >> logical_datapath: [] >> logical_dp_group: ee3c3db5-98a2-4f34-8a84-409deae140a7 >> match : "outport == \"none\"" >> pipeline: ingress >> priority: 50 >> table_id: 27 >> tags: {} >> hash: 0 >> >> future work includes entering more flow_desc for more flows and adding >> flow_desc to the actions as a comment. >> >> Signed-off-by: Jacob Tanenbaum >> Suggested-by: Dumitru Ceara >> Reported-at: https://issues.redhat.com/browse/FDP-307 >> >> --- >> >> v1: initial version >> v2: correct commit message >> make the flow_desc a char* >> correct a typo in the ovn-sb.xml >> correct the test >> v3: rebase issue with NEWS file >> >> > Hi Jacob, > > thank you for the follow up. I have two minor comments, see below. > > >> diff --git a/NEWS b/NEWS >> index 81c958f9a..04c441ada 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -21,6 +21,8 @@ Post v24.03.0 >> MAC addresses configured on the LSP with "unknown", are learnt via >> the >> OVN native FDB. >>- Add support for ovsdb-server `--config-file` option in ovn-ctl. >> + - Added a new column in the southbound database "flow_desc" to provide >> +human readable context to flows. >> >> OVN v24.03.0 - 01 Mar 2024 >> -- >> diff --git a/northd/lflow-mgr.c b/northd/lflow-mgr.c >> index b2c60b5de..e27558a32 100644 >> --- a/northd/lflow-mgr.c >> +++ b/northd/lflow-mgr.c >> @@ -36,7 +36,7 @@ static void ovn_lflow_init(struct ovn_lflow *, struct >> ovn_datapath *od, >> uint16_t priority, char *match, >> char *actions, char *io_port, >> char *ctrl_meter, char *stage_hint, >> - const char *where); >> + const char *where, char *flow_desc); >> static struct ovn_lflow *ovn_lflow_find(const struct hmap *lflows, >> enum ovn_stage stage, >> uint16_t priority, const char >> *match, >> @@ -52,7 +52,7 @@ static struct ovn_lflow *do_ovn_lflow_add( >> const char *actions, const char *io_port, >> const char *ctrl_meter, >> const struct ovsdb_idl_row *stage_hint, >> -const char *where); >> +const char *where, const char *flow_desc); >> >> >> static struct ovs_mutex *lflow_hash_lock(const struct hmap *lflow_table, >> @@ -173,6 +173,7 @@ struct ovn_lflow { >>* 'dpg_bitmap'. */ >> struct ovn_dp_group *dpg;/* Link to unique Sb datapath group. */ >> const char *where; >> +char *flow_desc; >> >> struct uuid sb_uuid; /* SB DB row uuid, specified by northd. >> */ >> struct ovs_list referenced_by; /* List of struct lflow_ref_node. */ >> @@ -659,7 +660,7 @@ lflow_table_add_lflow(struct lflow_table *lflow_table, >>const char *match, const char *actions, >>const char *io_port, const char *ctrl_meter, >>const struct ovsdb_idl_row *stage_hint, >> - const char *where, >> + const char *where, const char *flow_desc, >>struct lflow_ref *lflow_ref) >> OVS_EXCLUDED(fake_hash_mutex) >> { >> @@ -679,7 +680,7 @@ lflow_table_add_lflow(struct lflow_table *lflow_table, >> do_ovn_lflow_add(lflow_table, >> od ? ods_size(od->datapaths) : dp_bitmap_len, >> hash, stage, priority, match, actions, >> - io_port, ctrl_meter, stage_hint, where); >> + io_port, ctrl_meter, stage_hint, where, >> flow_desc); >> >> if (lflow_ref) { >> struct lflow_ref_node *lrn = >> @@ -733,7 +734,7 @@ lflow_table_add_lflow_default_drop(struct lflow_table >> *lflow_table, >> { >> lflow_table_add_lflow(lflow_table, od, NULL, 0, stage, 0, "1", >>debug_drop_action(), NULL, NULL, NULL, >> - where, lflow_ref); >> + where, NULL, lflow_ref); >> > > The lflow_table_add_lflow_default_drop() is probably the best candidate to > receive the description as an arg
Re: [ovs-dev] [PATCH net-next 2/2] selftests: openvswitch: set value to nla flags
Adrian Moreno writes: > Netlink flags, although they don't have payload at the netlink level, > are represented as having a "True" value in pyroute2. > > Without it, trying to add a flow with a flag-type action (e.g: pop_vlan) > fails with the following traceback: > > Traceback (most recent call last): > File "[...]/ovs-dpctl.py", line 2498, in > sys.exit(main(sys.argv)) > ^^ > File "[...]/ovs-dpctl.py", line 2487, in main > ovsflow.add_flow(rep["dpifindex"], flow) > File "[...]/ovs-dpctl.py", line 2136, in add_flow > reply = self.nlm_request( > ^ > File "[...]/pyroute2/netlink/nlsocket.py", line 822, in nlm_request > return tuple(self._genlm_request(*argv, **kwarg)) > ^^^ > File "[...]/pyroute2/netlink/generic/__init__.py", line 126, in > nlm_request > return tuple(super().nlm_request(*argv, **kwarg)) >^^ > File "[...]/pyroute2/netlink/nlsocket.py", line 1124, in nlm_request > self.put(msg, msg_type, msg_flags, msg_seq=msg_seq) > File "[...]/pyroute2/netlink/nlsocket.py", line 389, in put > self.sendto_gate(msg, addr) > File "[...]/pyroute2/netlink/nlsocket.py", line 1056, in sendto_gate > msg.encode() > File "[...]/pyroute2/netlink/__init__.py", line 1245, in encode > offset = self.encode_nlas(offset) > > File "[...]/pyroute2/netlink/__init__.py", line 1560, in encode_nlas > nla_instance.setvalue(cell[1]) > File "[...]/pyroute2/netlink/__init__.py", line 1265, in setvalue > nlv.setvalue(nla_tuple[1]) > ~^^^ > IndexError: list index out of range > > Signed-off-by: Adrian Moreno > --- Acked-by: Aaron Conole I don't know which pyroute2 version I had used when I tested this previously, but even on my current system I get this error now. Thanks for the fix. > tools/testing/selftests/net/openvswitch/ovs-dpctl.py | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py > b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py > index b76907ac0092..a2395c3f37a1 100644 > --- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py > +++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py > @@ -537,7 +537,7 @@ class ovsactions(nla): > for flat_act in parse_flat_map: > if parse_starts_block(actstr, flat_act[0], False): > actstr = actstr[len(flat_act[0]):] > -self["attrs"].append([flat_act[1]]) > +self["attrs"].append([flat_act[1], True]) > actstr = actstr[strspn(actstr, ", ") :] > parsed = True ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next 1/2] selftests: openvswitch: fix action formatting
Adrian Moreno writes: > In the action formatting function ("dpstr"), the iteration is made over > the nla_map, so if there are more than one attribute from the same type > we only print the first one. > > Fix this by iterating over the actual attributes. > > Signed-off-by: Adrian Moreno > --- > .../selftests/net/openvswitch/ovs-dpctl.py| 48 +++ > 1 file changed, 27 insertions(+), 21 deletions(-) > > diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py > b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py > index 1dd057afd3fb..b76907ac0092 100644 > --- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py > +++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py > @@ -437,40 +437,46 @@ class ovsactions(nla): > def dpstr(self, more=False): > print_str = "" > > -for field in self.nla_map: > -if field[1] == "none" or self.get_attr(field[0]) is None: > +for attr_name, value in self["attrs"]: > +attr_desc = next(filter(lambda x: x[0] == attr_name, > self.nla_map), > + None) > +if not attr_desc: > +raise ValueError("Unknown attribute: %s" % attr) > + > +attr_type = attr_desc[1] > + > +if attr_type == "none": I agree, this is an issue. BUT I think it might be better to just filter by field type up front. See: https://github.com/apconole/linux-next-work/commit/7262107de7170d44b6dbf6c5ea6f7e6c0bb71d36#diff-3e72e7405c6bb4e9842bed5f63883ca930387086bb40d4034e92ed83a5decb4bR441 That version I think ends up being much easier to follow. If you want to take it for your series, feel free. If you disagree, maybe there's something I'm not considering about it. NOTE that version is just a bunch of independent changes that are squashed together. I have a cleaner version. I can also bundle up the series I have so far and submit, but I didn't want to do that until I got all the pmtu.sh support working. Maybe it makes sense to send it now though. Simon, Jakub - wdyt? > continue > if print_str != "": > print_str += "," > > -if field[1] == "uint32": > -if field[0] == "OVS_ACTION_ATTR_OUTPUT": > -print_str += "%d" % int(self.get_attr(field[0])) > -elif field[0] == "OVS_ACTION_ATTR_RECIRC": > -print_str += "recirc(0x%x)" % > int(self.get_attr(field[0])) > -elif field[0] == "OVS_ACTION_ATTR_TRUNC": > -print_str += "trunc(%d)" % int(self.get_attr(field[0])) > -elif field[0] == "OVS_ACTION_ATTR_DROP": > -print_str += "drop(%d)" % int(self.get_attr(field[0])) > -elif field[1] == "flag": > -if field[0] == "OVS_ACTION_ATTR_CT_CLEAR": > +if attr_type == "uint32": > +if attr_name == "OVS_ACTION_ATTR_OUTPUT": > +print_str += "%d" % int(value) > +elif attr_name == "OVS_ACTION_ATTR_RECIRC": > +print_str += "recirc(0x%x)" % int(value) > +elif attr_name == "OVS_ACTION_ATTR_TRUNC": > +print_str += "trunc(%d)" % int(value) > +elif attr_name == "OVS_ACTION_ATTR_DROP": > +print_str += "drop(%d)" % int(value) > +elif attr_type == "flag": > +if attr_name == "OVS_ACTION_ATTR_CT_CLEAR": > print_str += "ct_clear" > -elif field[0] == "OVS_ACTION_ATTR_POP_VLAN": > +elif attr_name == "OVS_ACTION_ATTR_POP_VLAN": > print_str += "pop_vlan" > -elif field[0] == "OVS_ACTION_ATTR_POP_ETH": > +elif attr_name == "OVS_ACTION_ATTR_POP_ETH": > print_str += "pop_eth" > -elif field[0] == "OVS_ACTION_ATTR_POP_NSH": > +elif attr_name == "OVS_ACTION_ATTR_POP_NSH": > print_str += "pop_nsh" > -elif field[0] == "OVS_ACTION_ATTR_POP_MPLS": > +elif attr_name == "OVS_ACTION_ATTR_POP_MPLS": > print_str += "pop_mpls" > else: > -datum = self.get_attr(field[0]) > -if field[0] == "OVS_ACTION_ATTR_CLONE": > +if attr_name == "OVS_ACTION_ATTR_CLONE": > print_str += "clone(" > -print_str += datum.dpstr(more) > +print_str += value.dpstr(more) > print_str += ")" > else: > -print_str += datum.dpstr(more) > +print_str += value.dpstr(more) > > return print_str ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next v2 7/9] net: openvswitch: do not notify drops inside sample
The OVS_ACTION_ATTR_SAMPLE action is, in essence, observability-oriented. Apart from some corner case in which it's used a replacement of clone() for old kernels, it's really only used for sFlow, IPFIX and now, local emit_sample. With this in mind, it doesn't make much sense to report OVS_DROP_LAST_ACTION inside sample actions. For instance, if the flow: actions:sample(..,emit_sample(..)),2 triggers a OVS_DROP_LAST_ACTION skb drop event, it would be extremely confusing for users since the packet did reach its destination. This patch makes internal action execution silently consume the skb instead of notifying a drop for this case. Unfortunately, this patch does not remove all potential sources of confusion since, if the sample action itself is the last action, e.g: actions:sample(..,emit_sample(..)) we actually _should_ generate a OVS_DROP_LAST_ACTION event, but we aren't. Sadly, this case is difficult to solve without breaking the optimization by which the skb is not cloned on last sample actions. But, given explicit drop actions are now supported, OVS can just add one after the last sample() and rewrite the flow as: actions:sample(..,emit_sample(..)),drop Signed-off-by: Adrian Moreno --- net/openvswitch/actions.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index 33f6d93ba5e4..54fc1abcff95 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -82,6 +82,15 @@ static struct action_fifo __percpu *action_fifos; static struct action_flow_keys __percpu *flow_keys; static DEFINE_PER_CPU(int, exec_actions_level); +static inline void ovs_drop_skb_last_action(struct sk_buff *skb) +{ + /* Do not emit packet drops inside sample(). */ + if (OVS_CB(skb)->probability) + consume_skb(skb); + else + ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION); +} + /* Make a clone of the 'key', using the pre-allocated percpu 'flow_keys' * space. Return NULL if out of key spaces. */ @@ -1061,7 +1070,7 @@ static int sample(struct datapath *dp, struct sk_buff *skb, if ((arg->probability != U32_MAX) && (!arg->probability || get_random_u32() > arg->probability)) { if (last) - ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION); + ovs_drop_skb_last_action(skb); return 0; } @@ -1579,7 +1588,7 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb, } } - ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION); + ovs_drop_skb_last_action(skb); return 0; } -- 2.45.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next v2 9/9] selftests: openvswitch: add emit_sample test
Add a test to verify sampling packets via psample works. In order to do that, create a subcommand in ovs-dpctl.py to listen to on the psample multicast group and print samples. In order to also test simultaneous sFlow and psample actions and packet truncation, add missing parsing support for "userspace" and "trunc" actions. Signed-off-by: Adrian Moreno --- .../selftests/net/openvswitch/openvswitch.sh | 99 +++- .../selftests/net/openvswitch/ovs-dpctl.py| 112 +- 2 files changed, 204 insertions(+), 7 deletions(-) diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh b/tools/testing/selftests/net/openvswitch/openvswitch.sh index 5cae53543849..f6e0ae3f6424 100755 --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh @@ -20,7 +20,8 @@ tests=" nat_related_v4 ip4-nat-related: ICMP related matches work with SNAT netlink_checks ovsnl: validate netlink attrs and settings upcall_interfaces ovs: test the upcall interfaces - drop_reason drop: test drop reasons are emitted" + drop_reason drop: test drop reasons are emitted + emit_sample emit_sample: Sampling packets with psample" info() { [ $VERBOSE = 0 ] || echo $* @@ -170,6 +171,19 @@ ovs_drop_reason_count() return `echo "$perf_output" | grep "$pattern" | wc -l` } +ovs_test_flow_fails () { + ERR_MSG="Flow actions may not be safe on all matching packets" + + PRE_TEST=$(dmesg | grep -c "${ERR_MSG}") + ovs_add_flow $@ &> /dev/null $@ && return 1 + POST_TEST=$(dmesg | grep -c "${ERR_MSG}") + + if [ "$PRE_TEST" == "$POST_TEST" ]; then + return 1 + fi + return 0 +} + usage() { echo echo "$0 [OPTIONS] [TEST]..." @@ -184,6 +198,89 @@ usage() { exit 1 } + +# emit_sample test +# - use emit_sample to observe packets +test_emit_sample() { + sbx_add "test_emit_sample" || return $? + + # Add a datapath with per-vport dispatching. + ovs_add_dp "test_emit_sample" emit_sample -V 2:1 || return 1 + + info "create namespaces" + ovs_add_netns_and_veths "test_emit_sample" "emit_sample" \ + client c0 c1 172.31.110.10/24 -u || return 1 + ovs_add_netns_and_veths "test_emit_sample" "emit_sample" \ + server s0 s1 172.31.110.20/24 -u || return 1 + + # Check if emit_sample actions can be configured. + ovs_add_flow "test_emit_sample" emit_sample \ + 'in_port(1),eth(),eth_type(0x0806),arp()' 'emit_sample(group=1)' + if [ $? == 1 ]; then + info "no support for emit_sample - skipping" + ovs_exit_sig + return $ksft_skip + fi + + ovs_del_flows "test_emit_sample" emit_sample + + # Allow ARP + ovs_add_flow "test_emit_sample" emit_sample \ + 'in_port(1),eth(),eth_type(0x0806),arp()' '2' || return 1 + ovs_add_flow "test_emit_sample" emit_sample \ + 'in_port(2),eth(),eth_type(0x0806),arp()' '1' || return 1 + + # Test action verification. + OLDIFS=$IFS + IFS='*' + min_key='in_port(1),eth(),eth_type(0x0800),ipv4()' + for testcase in \ + "cookie to large"*"emit_sample(group=1,cookie=1615141312111009080706050403020100)" \ + "no group with cookie"*"emit_sample(cookie=abcd)" \ + "no group"*"sample()"; + do + set -- $testcase; + ovs_test_flow_fails "test_emit_sample" emit_sample $min_key $2 + if [ $? == 1 ]; then + info "failed - $1" + return 1 + fi + done + IFS=$OLDIFS + + # Sample first 14 bytes of all traffic. + ovs_add_flow "test_emit_sample" emit_sample \ + "in_port(1),eth(),eth_type(0x0800),ipv4(src=172.31.110.10,proto=1),icmp()" "trunc(14),emit_sample(group=1,cookie=c0ffee),2" + + # Sample all traffic. In this case, use a sample() action with both + # emit_sample and an upcall emulating simultaneous local sampling and + # sFlow / IPFIX. + nlpid=$(grep -E "listening on upcall packet handler" $ovs_dir/s0.out | cut -d ":" -f 2 | tr -d ' ') + ovs_add_flow "test_emit_sample" emit_sample \ + "in_port(2),eth(),eth_type(0x0800),ipv4(src=172.31.110.20,proto=1),icmp()" "sample(sample=100%,actions(emit_sample(group=2,cookie=eeff0c),userspace(pid=${nlpid},userdata=eeff0c))),1" + + # Record emit_sample data. + python3 $ovs_base/ovs-dpctl.py psample >$ovs_dir/psample.out 2>$ovs_dir/psample.err & + pid=$! + on_exit "ovs_sbx test_emit_sample kill -TERM $pid 2>/dev/null" + + # Send a single ping. + sleep 1 + ovs_sbx "test_emit_sam
[ovs-dev] [PATCH net-next v2 8/9] selftests: openvswitch: add emit_sample action
Add sample and emit_sample action support to ovs-dpctl.py. Refactor common attribute parsing logic into an external function. Signed-off-by: Adrian Moreno --- .../selftests/net/openvswitch/ovs-dpctl.py| 162 +- 1 file changed, 161 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py index a2395c3f37a1..f8b5362aac8c 100644 --- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py +++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py @@ -8,6 +8,7 @@ import argparse import errno import ipaddress import logging +import math import multiprocessing import re import struct @@ -58,6 +59,7 @@ OVS_FLOW_CMD_DEL = 2 OVS_FLOW_CMD_GET = 3 OVS_FLOW_CMD_SET = 4 +UINT32_MAX = 0x def macstr(mac): outstr = ":".join(["%02X" % i for i in mac]) @@ -267,6 +269,75 @@ def parse_extract_field( return str_skipped, data +def parse_attrs(actstr, attr_desc): +"""Parses the given action string and returns a list of netlink +attributes based on a list of attribute descriptions. + +Each element in the attribute description list is a tuple such as: +(name, attr_name, parse_func) +where: +name: is the string representing the attribute +attr_name: is the name of the attribute as defined in the uAPI. +parse_func: is a callable accepting a string and returning either +a single object (the parsed attribute value) or a tuple of +two values (the parsed attribute value and the remaining string) + +Returns a list of attributes and the remaining string. +""" +def parse_attr(actstr, key, func): +actstr = actstr[len(key) :] + +if not func: +return None, actstr + +delim = actstr[0] +actstr = actstr[1:] + +if delim == "=": +pos = strcspn(actstr, ",)") +ret = func(actstr[:pos]) +else: +ret = func(actstr) + +if isinstance(ret, tuple): +(datum, actstr) = ret +else: +datum = ret +actstr = actstr[strcspn(actstr, ",)"):] + +if delim == "(": +if not actstr or actstr[0] != ")": +raise ValueError("Action contains unbalanced parentheses") + +actstr = actstr[1:] + +actstr = actstr[strspn(actstr, ", ") :] + +return datum, actstr + +attrs = [] +attr_desc = list(attr_desc) +while actstr and actstr[0] != ")" and attr_desc: +found = False +for i, (key, attr, func) in enumerate(attr_desc): +if actstr.startswith(key): +datum, actstr = parse_attr(actstr, key, func) +attrs.append([attr, datum]) +found = True +del attr_desc[i] + +if not found: +raise ValueError("Unknown attribute: '%s'" % actstr) + +actstr = actstr[strspn(actstr, ", ") :] + +if actstr[0] != ")": +raise ValueError("Action string contains extra garbage or has " + "unbalanced parenthesis: '%s'" % actstr) + +return attrs, actstr[1:] + + class ovs_dp_msg(genlmsg): # include the OVS version # We need a custom header rather than just being able to rely on @@ -285,7 +356,7 @@ class ovsactions(nla): ("OVS_ACTION_ATTR_SET", "none"), ("OVS_ACTION_ATTR_PUSH_VLAN", "none"), ("OVS_ACTION_ATTR_POP_VLAN", "flag"), -("OVS_ACTION_ATTR_SAMPLE", "none"), +("OVS_ACTION_ATTR_SAMPLE", "sample"), ("OVS_ACTION_ATTR_RECIRC", "uint32"), ("OVS_ACTION_ATTR_HASH", "none"), ("OVS_ACTION_ATTR_PUSH_MPLS", "none"), @@ -304,8 +375,85 @@ class ovsactions(nla): ("OVS_ACTION_ATTR_ADD_MPLS", "none"), ("OVS_ACTION_ATTR_DEC_TTL", "none"), ("OVS_ACTION_ATTR_DROP", "uint32"), +("OVS_ACTION_ATTR_EMIT_SAMPLE", "emit_sample"), ) +class emit_sample(nla): +nla_flags = NLA_F_NESTED + +nla_map = ( +("OVS_EMIT_SAMPLE_ATTR_UNSPEC", "none"), +("OVS_EMIT_SAMPLE_ATTR_GROUP", "uint32"), +("OVS_EMIT_SAMPLE_ATTR_COOKIE", "array(uint8)"), +) + +def dpstr(self, more=False): +args = "group=%d" % self.get_attr("OVS_EMIT_SAMPLE_ATTR_GROUP") + +cookie = self.get_attr("OVS_EMIT_SAMPLE_ATTR_COOKIE") +if cookie: +args += ",cookie(%s)" % \ +"".join(format(x, "02x") for x in cookie) + +return "emit_sample(%s)" % args + +def parse(self, actstr): +desc = ( +("group", "OVS_EMIT_SAMPLE_ATTR_GROUP", int), +("cookie", "OVS_EMIT_SAMPLE_ATTR_COOKIE", +lambda x: list(bytearray.fromhex(x))) +) + +attrs, actstr = parse_attrs(actstr, desc) + +for attr in attrs: +self[
[ovs-dev] [PATCH net-next v2 6/9] net: openvswitch: store sampling probability in cb.
The behavior of actions might not be the exact same if they are being executed inside a nested sample action. Store the probability of the parent sample action in the skb's cb area. Use the probability in emit_sample to pass it down to psample. Signed-off-by: Adrian Moreno --- include/uapi/linux/openvswitch.h | 3 ++- net/openvswitch/actions.c| 25 ++--- net/openvswitch/datapath.h | 3 +++ net/openvswitch/vport.c | 1 + 4 files changed, 28 insertions(+), 4 deletions(-) diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h index a0e9dde0584a..9d675725fa2b 100644 --- a/include/uapi/linux/openvswitch.h +++ b/include/uapi/linux/openvswitch.h @@ -649,7 +649,8 @@ enum ovs_flow_attr { * Actions are passed as nested attributes. * * Executes the specified actions with the given probability on a per-packet - * basis. + * basis. Nested actions will be able to access the probability value of the + * parent @OVS_ACTION_ATTR_SAMPLE. */ enum ovs_sample_attr { OVS_SAMPLE_ATTR_UNSPEC, diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index 3b4dba0ded59..33f6d93ba5e4 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -1048,12 +1048,15 @@ static int sample(struct datapath *dp, struct sk_buff *skb, struct nlattr *sample_arg; int rem = nla_len(attr); const struct sample_arg *arg; + u32 init_probability; bool clone_flow_key; + int err; /* The first action is always 'OVS_SAMPLE_ATTR_ARG'. */ sample_arg = nla_data(attr); arg = nla_data(sample_arg); actions = nla_next(sample_arg, &rem); + init_probability = OVS_CB(skb)->probability; if ((arg->probability != U32_MAX) && (!arg->probability || get_random_u32() > arg->probability)) { @@ -1062,9 +1065,21 @@ static int sample(struct datapath *dp, struct sk_buff *skb, return 0; } + if (init_probability) { + OVS_CB(skb)->probability = ((u64)OVS_CB(skb)->probability * + arg->probability / U32_MAX); + } else { + OVS_CB(skb)->probability = arg->probability; + } + clone_flow_key = !arg->exec; - return clone_execute(dp, skb, key, 0, actions, rem, last, -clone_flow_key); + err = clone_execute(dp, skb, key, 0, actions, rem, last, + clone_flow_key); + + if (!last) + OVS_CB(skb)->probability = init_probability; + + return err; } /* When 'last' is true, clone() should always consume the 'skb'. @@ -1313,6 +1328,7 @@ static int execute_emit_sample(struct datapath *dp, struct sk_buff *skb, struct psample_metadata md = {}; struct vport *input_vport; const struct nlattr *a; + u32 rate; int rem; for (a = nla_data(attr), rem = nla_len(attr); rem > 0; @@ -1337,8 +1353,11 @@ static int execute_emit_sample(struct datapath *dp, struct sk_buff *skb, md.in_ifindex = input_vport->dev->ifindex; md.trunc_size = skb->len - OVS_CB(skb)->cutlen; + md.rate_as_probability = 1; + + rate = OVS_CB(skb)->probability ? OVS_CB(skb)->probability : U32_MAX; - psample_sample_packet(&psample_group, skb, 0, &md); + psample_sample_packet(&psample_group, skb, rate, &md); #endif return 0; diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h index 0cd29971a907..9ca6231ea647 100644 --- a/net/openvswitch/datapath.h +++ b/net/openvswitch/datapath.h @@ -115,12 +115,15 @@ struct datapath { * fragmented. * @acts_origlen: The netlink size of the flow actions applied to this skb. * @cutlen: The number of bytes from the packet end to be removed. + * @probability: The sampling probability that was applied to this skb; 0 means + * no sampling has occurred; U32_MAX means 100% probability. */ struct ovs_skb_cb { struct vport*input_vport; u16 mru; u16 acts_origlen; u32 cutlen; + u32 probability; }; #define OVS_CB(skb) ((struct ovs_skb_cb *)(skb)->cb) diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c index 972ae01a70f7..8732f6e51ae5 100644 --- a/net/openvswitch/vport.c +++ b/net/openvswitch/vport.c @@ -500,6 +500,7 @@ int ovs_vport_receive(struct vport *vport, struct sk_buff *skb, OVS_CB(skb)->input_vport = vport; OVS_CB(skb)->mru = 0; OVS_CB(skb)->cutlen = 0; + OVS_CB(skb)->probability = 0; if (unlikely(dev_net(skb->dev) != ovs_dp_get_net(vport->dp))) { u32 mark; -- 2.45.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next v2 4/9] net: psample: allow using rate as probability
Although not explicitly documented in the psample module itself, the definition of PSAMPLE_ATTR_SAMPLE_RATE seems inherited from act_sample. Quoting tc-sample(8): "RATE of 100 will lead to an average of one sampled packet out of every 100 observed." With this semantics, the rates that we can express with an unsigned 32-bits number are very unevenly distributed and concentrated towards "sampling few packets". For example, we can express a probability of 2.32E-8% but we cannot express anything between 100% and 50%. For sampling applications that are capable of sampling a decent amount of packets, this sampling rate semantics is not very useful. Add a new flag to the uAPI that indicates that the sampling rate is expressed in scaled probability, this is: - 0 is 0% probability, no packets get sampled. - U32_MAX is 100% probability, all packets get sampled. Signed-off-by: Adrian Moreno --- include/net/psample.h | 3 ++- include/uapi/linux/psample.h | 4 include/uapi/linux/tc_act/tc_sample.h | 1 + net/psample/psample.c | 3 +++ 4 files changed, 10 insertions(+), 1 deletion(-) diff --git a/include/net/psample.h b/include/net/psample.h index 2ac71260a546..c52e9ebd88dd 100644 --- a/include/net/psample.h +++ b/include/net/psample.h @@ -24,7 +24,8 @@ struct psample_metadata { u8 out_tc_valid:1, out_tc_occ_valid:1, latency_valid:1, - unused:5; + rate_as_probability:1, + unused:4; const u8 *user_cookie; u32 user_cookie_len; }; diff --git a/include/uapi/linux/psample.h b/include/uapi/linux/psample.h index e80637e1d97b..8b069e75beab 100644 --- a/include/uapi/linux/psample.h +++ b/include/uapi/linux/psample.h @@ -20,6 +20,10 @@ enum { PSAMPLE_ATTR_TIMESTAMP, /* u64, nanoseconds */ PSAMPLE_ATTR_PROTO, /* u16 */ PSAMPLE_ATTR_USER_COOKIE, /* binary, user provided data */ + PSAMPLE_ATTR_SAMPLE_PROBABILITY,/* no argument, interpret rate in +* PSAMPLE_ATTR_SAMPLE_RATE as a +* probability scaled 0 - U32_MAX. +*/ __PSAMPLE_ATTR_MAX }; diff --git a/include/uapi/linux/tc_act/tc_sample.h b/include/uapi/linux/tc_act/tc_sample.h index fee1bcc20793..7ee0735e7b38 100644 --- a/include/uapi/linux/tc_act/tc_sample.h +++ b/include/uapi/linux/tc_act/tc_sample.h @@ -18,6 +18,7 @@ enum { TCA_SAMPLE_TRUNC_SIZE, TCA_SAMPLE_PSAMPLE_GROUP, TCA_SAMPLE_PAD, + TCA_SAMPLE_PROBABILITY, __TCA_SAMPLE_MAX }; #define TCA_SAMPLE_MAX (__TCA_SAMPLE_MAX - 1) diff --git a/net/psample/psample.c b/net/psample/psample.c index 1c76f3e48dcd..f48b5b9cd409 100644 --- a/net/psample/psample.c +++ b/net/psample/psample.c @@ -497,6 +497,9 @@ void psample_sample_packet(struct psample_group *group, struct sk_buff *skb, md->user_cookie)) goto error; + if (md->rate_as_probability) + nla_put_flag(skb, PSAMPLE_ATTR_SAMPLE_PROBABILITY); + genlmsg_end(nl_skb, data); genlmsg_multicast_netns(&psample_nl_family, group->net, nl_skb, 0, PSAMPLE_NL_MCGRP_SAMPLE, GFP_ATOMIC); -- 2.45.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next v2 5/9] net: openvswitch: add emit_sample action
Add support for a new action: emit_sample. This action accepts a u32 group id and a variable-length cookie and uses the psample multicast group to make the packet available for observability. The maximum length of the user-defined cookie is set to 16, same as tc_cookie, to discourage using cookies that will not be offloadable. Signed-off-by: Adrian Moreno --- Documentation/netlink/specs/ovs_flow.yaml | 17 include/uapi/linux/openvswitch.h | 25 net/openvswitch/actions.c | 50 +++ net/openvswitch/flow_netlink.c| 33 ++- 4 files changed, 124 insertions(+), 1 deletion(-) diff --git a/Documentation/netlink/specs/ovs_flow.yaml b/Documentation/netlink/specs/ovs_flow.yaml index 4fdfc6b5cae9..a7ab5593a24f 100644 --- a/Documentation/netlink/specs/ovs_flow.yaml +++ b/Documentation/netlink/specs/ovs_flow.yaml @@ -727,6 +727,12 @@ attribute-sets: name: dec-ttl type: nest nested-attributes: dec-ttl-attrs + - +name: emit-sample +type: nest +nested-attributes: emit-sample-attrs +doc: | + Sends a packet sample to psample for external observation. - name: tunnel-key-attrs enum-name: ovs-tunnel-key-attr @@ -938,6 +944,17 @@ attribute-sets: - name: gbp type: u32 + - +name: emit-sample-attrs +enum-name: ovs-emit-sample-attr +name-prefix: ovs-emit-sample-attr- +attributes: + - +name: group +type: u32 + - +name: cookie +type: binary operations: name-prefix: ovs-flow-cmd- diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h index efc82c318fa2..a0e9dde0584a 100644 --- a/include/uapi/linux/openvswitch.h +++ b/include/uapi/linux/openvswitch.h @@ -914,6 +914,30 @@ struct check_pkt_len_arg { }; #endif +#define OVS_EMIT_SAMPLE_COOKIE_MAX_SIZE 16 +/** + * enum ovs_emit_sample_attr - Attributes for %OVS_ACTION_ATTR_EMIT_SAMPLE + * action. + * + * @OVS_EMIT_SAMPLE_ATTR_GROUP: 32-bit number to identify the source of the + * sample. + * @OVS_EMIT_SAMPLE_ATTR_COOKIE: A variable-length binary cookie that contains + * user-defined metadata. The maximum length is 16 bytes. + * + * Sends the packet to the psample multicast group with the specified group and + * cookie. It is possible to combine this action with the + * %OVS_ACTION_ATTR_TRUNC action to limit the size of the packet being emitted. + */ +enum ovs_emit_sample_attr { + OVS_EMIT_SAMPLE_ATTR_UNPSEC, + OVS_EMIT_SAMPLE_ATTR_GROUP, /* u32 number. */ + OVS_EMIT_SAMPLE_ATTR_COOKIE,/* Optional, user specified cookie. */ + __OVS_EMIT_SAMPLE_ATTR_MAX +}; + +#define OVS_EMIT_SAMPLE_ATTR_MAX (__OVS_EMIT_SAMPLE_ATTR_MAX - 1) + + /** * enum ovs_action_attr - Action types. * @@ -1004,6 +1028,7 @@ enum ovs_action_attr { OVS_ACTION_ATTR_ADD_MPLS, /* struct ovs_action_add_mpls. */ OVS_ACTION_ATTR_DEC_TTL, /* Nested OVS_DEC_TTL_ATTR_*. */ OVS_ACTION_ATTR_DROP, /* u32 error code. */ + OVS_ACTION_ATTR_EMIT_SAMPLE, /* Nested OVS_EMIT_SAMPLE_ATTR_*. */ __OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted * from userspace. */ diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index 964225580824..3b4dba0ded59 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -24,6 +24,11 @@ #include #include #include + +#if IS_ENABLED(CONFIG_PSAMPLE) +#include +#endif + #include #include "datapath.h" @@ -1299,6 +1304,46 @@ static int execute_dec_ttl(struct sk_buff *skb, struct sw_flow_key *key) return 0; } +static int execute_emit_sample(struct datapath *dp, struct sk_buff *skb, + const struct sw_flow_key *key, + const struct nlattr *attr) +{ +#if IS_ENABLED(CONFIG_PSAMPLE) + struct psample_group psample_group = {}; + struct psample_metadata md = {}; + struct vport *input_vport; + const struct nlattr *a; + int rem; + + for (a = nla_data(attr), rem = nla_len(attr); rem > 0; +a = nla_next(a, &rem)) { + switch (nla_type(a)) { + case OVS_EMIT_SAMPLE_ATTR_GROUP: + psample_group.group_num = nla_get_u32(a); + break; + + case OVS_EMIT_SAMPLE_ATTR_COOKIE: + md.user_cookie = nla_data(a); + md.user_cookie_len = nla_len(a); + break; + } + } + + psample_group.net = ovs_dp_get_net(dp); + + input_vport = ovs_vport_rcu(dp, key->phy.in_port); + if (!input_vport) + input_vport = ovs_vport_rcu(dp, OVSP_LOCAL); + + md.in_ifindex = input_vport->dev->ifindex; + md.trunc_size = skb->len - OVS_CB(skb)->cutlen; + + psa
[ovs-dev] [PATCH net-next v2 3/9] net: psample: skip packet copy if no listeners
If nobody is listening on the multicast group, generating the sample, which involves copying packet data, seems completely unnecessary. Return fast in this case. Signed-off-by: Adrian Moreno --- net/psample/psample.c | 4 1 file changed, 4 insertions(+) diff --git a/net/psample/psample.c b/net/psample/psample.c index b37488f426bc..1c76f3e48dcd 100644 --- a/net/psample/psample.c +++ b/net/psample/psample.c @@ -376,6 +376,10 @@ void psample_sample_packet(struct psample_group *group, struct sk_buff *skb, void *data; int ret; + if (!genl_has_listeners(&psample_nl_family, group->net, + PSAMPLE_NL_MCGRP_SAMPLE)) + return; + meta_len = (in_ifindex ? nla_total_size(sizeof(u16)) : 0) + (out_ifindex ? nla_total_size(sizeof(u16)) : 0) + (md->out_tc_valid ? nla_total_size(sizeof(u16)) : 0) + -- 2.45.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next v2 2/9] net: sched: act_sample: add action cookie to sample
If the action has a user_cookie, pass it along to the sample so it can be easily identified. Signed-off-by: Adrian Moreno --- net/sched/act_sample.c | 12 1 file changed, 12 insertions(+) diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c index a69b53d54039..5c3f86ec964a 100644 --- a/net/sched/act_sample.c +++ b/net/sched/act_sample.c @@ -165,9 +165,11 @@ TC_INDIRECT_SCOPE int tcf_sample_act(struct sk_buff *skb, const struct tc_action *a, struct tcf_result *res) { + u8 cookie_data[TC_COOKIE_MAX_SIZE] = {}; struct tcf_sample *s = to_sample(a); struct psample_group *psample_group; struct psample_metadata md = {}; + struct tc_cookie *user_cookie; int retval; tcf_lastuse_update(&s->tcf_tm); @@ -189,6 +191,16 @@ TC_INDIRECT_SCOPE int tcf_sample_act(struct sk_buff *skb, if (skb_at_tc_ingress(skb) && tcf_sample_dev_ok_push(skb->dev)) skb_push(skb, skb->mac_len); + rcu_read_lock(); + user_cookie = rcu_dereference(a->user_cookie); + if (user_cookie) { + memcpy(cookie_data, user_cookie->data, + user_cookie->len); + md.user_cookie = cookie_data; + md.user_cookie_len = user_cookie->len; + } + rcu_read_unlock(); + md.trunc_size = s->truncate ? s->trunc_size : skb->len; psample_sample_packet(psample_group, skb, s->rate, &md); -- 2.45.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next v2 1/9] net: psample: add user cookie
Add a user cookie to the sample metadata so that sample emitters can provide more contextual information to samples. If present, send the user cookie in a new attribute: PSAMPLE_ATTR_USER_COOKIE. Signed-off-by: Adrian Moreno --- include/net/psample.h| 2 ++ include/uapi/linux/psample.h | 1 + net/psample/psample.c| 9 - 3 files changed, 11 insertions(+), 1 deletion(-) diff --git a/include/net/psample.h b/include/net/psample.h index 0509d2d6be67..2ac71260a546 100644 --- a/include/net/psample.h +++ b/include/net/psample.h @@ -25,6 +25,8 @@ struct psample_metadata { out_tc_occ_valid:1, latency_valid:1, unused:5; + const u8 *user_cookie; + u32 user_cookie_len; }; struct psample_group *psample_group_get(struct net *net, u32 group_num); diff --git a/include/uapi/linux/psample.h b/include/uapi/linux/psample.h index e585db5bf2d2..e80637e1d97b 100644 --- a/include/uapi/linux/psample.h +++ b/include/uapi/linux/psample.h @@ -19,6 +19,7 @@ enum { PSAMPLE_ATTR_LATENCY, /* u64, nanoseconds */ PSAMPLE_ATTR_TIMESTAMP, /* u64, nanoseconds */ PSAMPLE_ATTR_PROTO, /* u16 */ + PSAMPLE_ATTR_USER_COOKIE, /* binary, user provided data */ __PSAMPLE_ATTR_MAX }; diff --git a/net/psample/psample.c b/net/psample/psample.c index a5d9b8446f77..b37488f426bc 100644 --- a/net/psample/psample.c +++ b/net/psample/psample.c @@ -386,7 +386,9 @@ void psample_sample_packet(struct psample_group *group, struct sk_buff *skb, nla_total_size(sizeof(u32)) +/* group_num */ nla_total_size(sizeof(u32)) +/* seq */ nla_total_size_64bit(sizeof(u64)) + /* timestamp */ - nla_total_size(sizeof(u16)); /* protocol */ + nla_total_size(sizeof(u16)) +/* protocol */ + (md->user_cookie_len ? + nla_total_size(md->user_cookie_len) : 0); /* user cookie */ #ifdef CONFIG_INET tun_info = skb_tunnel_info(skb); @@ -486,6 +488,11 @@ void psample_sample_packet(struct psample_group *group, struct sk_buff *skb, } #endif + if (md->user_cookie && md->user_cookie_len && + nla_put(nl_skb, PSAMPLE_ATTR_USER_COOKIE, md->user_cookie_len, + md->user_cookie)) + goto error; + genlmsg_end(nl_skb, data); genlmsg_multicast_netns(&psample_nl_family, group->net, nl_skb, 0, PSAMPLE_NL_MCGRP_SAMPLE, GFP_ATOMIC); -- 2.45.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next v2 0/9] net: openvswitch: Add sample multicasting.
** Background ** Currently, OVS supports several packet sampling mechanisms (sFlow, per-bridge IPFIX, per-flow IPFIX). These end up being translated into a userspace action that needs to be handled by ovs-vswitchd's handler threads only to be forwarded to some third party application that will somehow process the sample and provide observability on the datapath. A particularly interesting use-case is controller-driven per-flow IPFIX sampling where the OpenFlow controller can add metadata to samples (via two 32bit integers) and this metadata is then available to the sample-collecting system for correlation. ** Problem ** The fact that sampled traffic share netlink sockets and handler thread time with upcalls, apart from being a performance bottleneck in the sample extraction itself, can severely compromise the datapath, yielding this solution unfit for highly loaded production systems. Users are left with little options other than guessing what sampling rate will be OK for their traffic pattern and system load and dealing with the lost accuracy. Looking at available infrastructure, an obvious candidated would be to use psample. However, it's current state does not help with the use-case at stake because sampled packets do not contain user-defined metadata. ** Proposal ** This series is an attempt to fix this situation by extending the existing psample infrastructure to carry a variable length user-defined cookie. The main existing user of psample is tc's act_sample. It is also extended to forward the action's cookie to psample. Finally, a new OVS action (OVS_SAMPLE_ATTR_EMIT_SAMPLE) is created. It accepts a group and an optional cookie and uses psample to multicast the packet and the metadata. -- v1 -> v2: - Create a new action ("emit_sample") rather than reuse existing "sample" one. - Add probability semantics to psample's sampling rate. - Store sampling probability in skb's cb area and use it in emit_sample. - Test combining "emit_sample" with "trunc" - Drop group_id filtering and tracepoint in psample. rfc_v2 -> v1: - Accomodate Ilya's comments. - Split OVS's attribute in two attributes and simplify internal handling of psample arguments. - Extend psample and tc with a user-defined cookie. - Add a tracepoint to psample to facilitate troubleshooting. rfc_v1 -> rfc_v2: - Use psample instead of a new OVS-only multicast group. - Extend psample and tc with a user-defined cookie. Adrian Moreno (9): net: psample: add user cookie net: sched: act_sample: add action cookie to sample net: psample: skip packet copy if no listeners net: psample: allow using rate as probability net: openvswitch: add emit_sample action net: openvswitch: store sampling probability in cb. net: openvswitch: do not notify drops inside sample selftests: openvswitch: add emit_sample action selftests: openvswitch: add emit_sample test Documentation/netlink/specs/ovs_flow.yaml | 17 ++ include/net/psample.h | 5 +- include/uapi/linux/openvswitch.h | 28 +- include/uapi/linux/psample.h | 5 + include/uapi/linux/tc_act/tc_sample.h | 1 + net/openvswitch/actions.c | 86 +- net/openvswitch/datapath.h| 3 + net/openvswitch/flow_netlink.c| 33 ++- net/openvswitch/vport.c | 1 + net/psample/psample.c | 16 +- net/sched/act_sample.c| 12 + .../selftests/net/openvswitch/openvswitch.sh | 99 ++- .../selftests/net/openvswitch/ovs-dpctl.py| 274 +- 13 files changed, 564 insertions(+), 16 deletions(-) -- 2.45.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] Do not reply on unicast arps for targets.
I've pushed a new revision, please review. The thread is `[PATCH v2 ovn] Do not reply on unicast arps for IPv4 targets` On Thu, May 23, 2024 at 1:07 PM Han Zhou wrote: > > > On Thu, May 23, 2024 at 1:51 PM Vasyl Saienko > wrote: > > > > Thanks for feedback, > > > > I'm not an ipv6 expert, I did some research. There is no ARP in ipv6, > it's replaced with Neighbor Discovery. The ARP resolution is done via > Neighbor Solicitation and Neighbour Advertisement messages. > > Neighbor Solicitation is an analog of ARP request and is sent to either > > > > solicited-node multicast address 33:33:ff:XX:XX:XX where XX:XX:XX is the > last 3 bytes of the target ipv6 address > > or target node address > > > > So in our case we can update the flows to match only the solicited-node > multicast address 33:33:ff:XX:XX:XX as follows. From > > > > table=??(ls_in_arp_rsp ), priority=100 , match=(nd_ns && ip6.dst > == {fd00::10, ff02::1:ff00:10} && nd.target == fd00::10 && inport == > "S1-vm1"), action=(next;) > > table=??(ls_in_arp_rsp ), priority=50 , match=(nd_ns && ip6.dst > == {fd00::10, ff02::1:ff00:10} && nd.target == fd00::10), action=(nd_na { > eth.src = 50:54:00:00:00:10; ip6.src = fd00::10; nd.target = fd00::10; > nd.tll = 50:54:00:00:00:10; outport = inport; flags.loopback = 1; output; > };) > > > > to > > > > table=??(ls_in_arp_rsp ), priority=100 , match=(nd_ns && eth.dst > == {33:33:ff:00:00:10, 33:33:ff:ff:00:10} && ip6.dst == {fd00::10, > ff02::1:ff00:10} && nd.target == fd00::10 && inport == "S1-vm1"), > action=(next;) > > table=??(ls_in_arp_rsp ), priority=50 , match=(nd_ns && eth.dst > == {33:33:ff:00:00:10, 33:33:ff:ff:00:10} && ip6.dst == {fd00::10, > ff02::1:ff00:10} && nd.target == fd00::10), action=(nd_na { eth.src = > 50:54:00:00:00:10; ip6.src = fd00::10; nd.target = fd00::10; nd.tll = > 50:54:00:00:00:10; outport = inport; flags.loopback = 1; output; };) > > > > Will this be okay? Maybe we can do this in a separate patch? > > > I am not an IPv6 expert either. I even forgot that IPv6 NS uses multicast > instead of broadcast MAC address, and was assuming we could simply append > the eth.dst == ff:ff:ff:ff:ff:ff to the IPv6 NS responder flows, which is > obviously wrong. Please ignore my feedback for the current patch and we can > address IPv6 in a separate patch if it is required. > > Thanks, > Han > > > On Wed, May 22, 2024 at 12:16 AM Han Zhou wrote: > >> > >> > >> > >> On Tue, May 21, 2024 at 9:56 PM Numan Siddique wrote: > >> > > >> > On Mon, May 20, 2024 at 1:47 AM Vasyl Saienko > wrote: > >> > > > >> > > Thanks for feedback Numan. > >> > > > >> > > On Wed, May 15, 2024 at 12:04 AM Numan Siddique > wrote: > >> > > > >> > > > On Mon, May 13, 2024 at 9:00 AM Dumitru Ceara > wrote: > >> > > > > > >> > > > > On 5/4/24 11:38, Vasyl Saienko wrote: > >> > > > > > Hello Numan > >> > > > > > > >> > > > > > Thanks for review and feedback. > >> > > > > > > >> > > > > > On Fri, May 3, 2024 at 7:14 PM Numan Siddique > wrote: > >> > > > > > > >> > > > > >> On Mon, Apr 22, 2024 at 2:54 AM Vasyl Saienko < > vsaie...@mirantis.com> > >> > > > > >> wrote: > >> > > > > >>> > >> > > > > >>> Reply only if target ethernet address is broadcast, if > >> > > > > >>> address is specified explicitly do noting to let target > >> > > > > >>> reply by itself. This technique allows to monitor target > >> > > > > >>> aliveness with arping. > >> > > > > >> > >> > > > > >> Thanks for the patch. > >> > > > > >> > >> > > > > >> This patch does change the behavior of OVN. Having ARP > responder > >> > > > > >> flows avoids tunnelling the request packet if the > destination is on a > >> > > > > >> different compute node. > >> > > > > >> But I don't think its a big deal. > >> > > > > >> > >> > > > > >> You are totally correct that the ARP responder allows > limiting ARP > >> > > > > > broadcast traffic between nodes. Normally ARP requests are > sent to > >> > > > > > broadcast ff:ff:ff:ff:ff:ff ethernet, as the protocol is > designed to > >> > > > > > identify ethernet addresses for known IP addresses. In this > case since > >> > > > > > traffic is broadcast it will be flooded among all nodes if ARP > >> > > > responder is > >> > > > > > not applied. At the same time the client may specify the > target > >> > > > > > ethernet address as unicast (in this case a broadcast storm > will not > >> > > > > > happen, as the destination ethernet address is unicast). > >> > > > > > > >> > > > > > In OpenStack we use unicast ARP requests as a way to monitor > VM > >> > > > aliveness > >> > > > > > from remote locations to make sure there are no issues with > >> > > > > > flows/underlying networking infra between nodes. We use ARPs > due to > >> > > > several > >> > > > > > reasons: > >> > > > > > 1. This protocol is mandatory and can't be disabled on the > target > >> > > > machine > >> > > > > > (which guarantees that the target VM will always reply to > ARPs if it is > >> > > > > > alive) > >> > > > > > 2. T
[ovs-dev] [PATCH net-next 2/2] selftests: openvswitch: set value to nla flags
Netlink flags, although they don't have payload at the netlink level, are represented as having a "True" value in pyroute2. Without it, trying to add a flow with a flag-type action (e.g: pop_vlan) fails with the following traceback: Traceback (most recent call last): File "[...]/ovs-dpctl.py", line 2498, in sys.exit(main(sys.argv)) ^^ File "[...]/ovs-dpctl.py", line 2487, in main ovsflow.add_flow(rep["dpifindex"], flow) File "[...]/ovs-dpctl.py", line 2136, in add_flow reply = self.nlm_request( ^ File "[...]/pyroute2/netlink/nlsocket.py", line 822, in nlm_request return tuple(self._genlm_request(*argv, **kwarg)) ^^^ File "[...]/pyroute2/netlink/generic/__init__.py", line 126, in nlm_request return tuple(super().nlm_request(*argv, **kwarg)) ^^ File "[...]/pyroute2/netlink/nlsocket.py", line 1124, in nlm_request self.put(msg, msg_type, msg_flags, msg_seq=msg_seq) File "[...]/pyroute2/netlink/nlsocket.py", line 389, in put self.sendto_gate(msg, addr) File "[...]/pyroute2/netlink/nlsocket.py", line 1056, in sendto_gate msg.encode() File "[...]/pyroute2/netlink/__init__.py", line 1245, in encode offset = self.encode_nlas(offset) File "[...]/pyroute2/netlink/__init__.py", line 1560, in encode_nlas nla_instance.setvalue(cell[1]) File "[...]/pyroute2/netlink/__init__.py", line 1265, in setvalue nlv.setvalue(nla_tuple[1]) ~^^^ IndexError: list index out of range Signed-off-by: Adrian Moreno --- tools/testing/selftests/net/openvswitch/ovs-dpctl.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py index b76907ac0092..a2395c3f37a1 100644 --- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py +++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py @@ -537,7 +537,7 @@ class ovsactions(nla): for flat_act in parse_flat_map: if parse_starts_block(actstr, flat_act[0], False): actstr = actstr[len(flat_act[0]):] -self["attrs"].append([flat_act[1]]) +self["attrs"].append([flat_act[1], True]) actstr = actstr[strspn(actstr, ", ") :] parsed = True -- 2.45.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next 1/2] selftests: openvswitch: fix action formatting
In the action formatting function ("dpstr"), the iteration is made over the nla_map, so if there are more than one attribute from the same type we only print the first one. Fix this by iterating over the actual attributes. Signed-off-by: Adrian Moreno --- .../selftests/net/openvswitch/ovs-dpctl.py| 48 +++ 1 file changed, 27 insertions(+), 21 deletions(-) diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py index 1dd057afd3fb..b76907ac0092 100644 --- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py +++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py @@ -437,40 +437,46 @@ class ovsactions(nla): def dpstr(self, more=False): print_str = "" -for field in self.nla_map: -if field[1] == "none" or self.get_attr(field[0]) is None: +for attr_name, value in self["attrs"]: +attr_desc = next(filter(lambda x: x[0] == attr_name, self.nla_map), + None) +if not attr_desc: +raise ValueError("Unknown attribute: %s" % attr) + +attr_type = attr_desc[1] + +if attr_type == "none": continue if print_str != "": print_str += "," -if field[1] == "uint32": -if field[0] == "OVS_ACTION_ATTR_OUTPUT": -print_str += "%d" % int(self.get_attr(field[0])) -elif field[0] == "OVS_ACTION_ATTR_RECIRC": -print_str += "recirc(0x%x)" % int(self.get_attr(field[0])) -elif field[0] == "OVS_ACTION_ATTR_TRUNC": -print_str += "trunc(%d)" % int(self.get_attr(field[0])) -elif field[0] == "OVS_ACTION_ATTR_DROP": -print_str += "drop(%d)" % int(self.get_attr(field[0])) -elif field[1] == "flag": -if field[0] == "OVS_ACTION_ATTR_CT_CLEAR": +if attr_type == "uint32": +if attr_name == "OVS_ACTION_ATTR_OUTPUT": +print_str += "%d" % int(value) +elif attr_name == "OVS_ACTION_ATTR_RECIRC": +print_str += "recirc(0x%x)" % int(value) +elif attr_name == "OVS_ACTION_ATTR_TRUNC": +print_str += "trunc(%d)" % int(value) +elif attr_name == "OVS_ACTION_ATTR_DROP": +print_str += "drop(%d)" % int(value) +elif attr_type == "flag": +if attr_name == "OVS_ACTION_ATTR_CT_CLEAR": print_str += "ct_clear" -elif field[0] == "OVS_ACTION_ATTR_POP_VLAN": +elif attr_name == "OVS_ACTION_ATTR_POP_VLAN": print_str += "pop_vlan" -elif field[0] == "OVS_ACTION_ATTR_POP_ETH": +elif attr_name == "OVS_ACTION_ATTR_POP_ETH": print_str += "pop_eth" -elif field[0] == "OVS_ACTION_ATTR_POP_NSH": +elif attr_name == "OVS_ACTION_ATTR_POP_NSH": print_str += "pop_nsh" -elif field[0] == "OVS_ACTION_ATTR_POP_MPLS": +elif attr_name == "OVS_ACTION_ATTR_POP_MPLS": print_str += "pop_mpls" else: -datum = self.get_attr(field[0]) -if field[0] == "OVS_ACTION_ATTR_CLONE": +if attr_name == "OVS_ACTION_ATTR_CLONE": print_str += "clone(" -print_str += datum.dpstr(more) +print_str += value.dpstr(more) print_str += ")" else: -print_str += datum.dpstr(more) +print_str += value.dpstr(more) return print_str -- 2.45.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] ovsdb: Use table indexes if available for ovsdb_query().
On 6/3/24 06:20, Mike Pattrick wrote: > Currently all OVSDB database queries except for UUID lookups all result > in linear lookups over the entire table, even if an index is present. > > This patch modifies ovsdb_query() to attempt an index lookup first, if > possible. If no matching indexes are present then a linear index is > still conducted. > > Reported-at: https://issues.redhat.com/browse/FDP-590 > Signed-off-by: Mike Pattrick > --- > NEWS | 3 ++ > ovsdb/query.c| 102 +++ > ovsdb/row.h | 28 +++ > ovsdb/transaction.c | 27 --- > tests/ovsdb-execution.at | 34 - > tests/ovsdb-server.at| 2 +- > tests/ovsdb-tool.at | 2 +- > 7 files changed, 159 insertions(+), 39 deletions(-) Hi, Mike. Thanks for the patch. Besides what Simon asked, the patch has a few other issues: 1. Lookup is performed only on the committed index and it doesn't include rows that are in-flight in the current transaction. Unlike rows in a hash table, indexes are updated only after the whole transaction is committed. With this change we'll not be able to find newly added rows. Another thing related to this is that it is allowed to have duplicates within a transaction as long as they are removed before the transaction ends. So it is possible that multiple rows will satisfy the condition on indexed columns while the transaction is in-flight. Consider the following commands executed in a sandbox: # ovs-vsctl set-manager "tcp:my-first-target" # ovsdb-client transact unix:$(pwd)/sandbox/db.sock ' ["Open_vSwitch", {"op": "select", "table": "Manager", "columns": ["_uuid", "target"], "where": [["target", "==", "tcp:my-first-target"]]}, {"op": "insert", "table": "Manager", "uuid-name": "duplicate", "row": {"target": "tcp:my-first-target"}}, {"op": "select", "table": "Manager", "columns": ["_uuid", "target"], "where": [["target", "==", "tcp:my-first-target"]]}, {"op": "delete", "table": "Manager", "where":[["_uuid","==",["named-uuid","duplicate"]]]}, {"op": "select", "table": "Manager", "columns": ["_uuid", "target"], "where": [["target", "==", "tcp:my-first-target"]]}]' Transaction must succeed. The first selection should return 1 row, the second should return both duplicates and the third should again return one row. Ideally, implementation should not leak the transaction details to the query module, though I'm not sure if that is 100% achievable. 2. Taking above case into account, this change needs way more unit tests with different order of operations and complex data updates. 3. Since this is a performance-oriented change, please, include some performance numbers in the commit message as well, including impact on non-indexed lookups, if any. 4. There seems to be a lot of logic overlap with existing functions like ovsdb_condition_match_every_clause(), ovsdb_index_search() and ovsdb_row_hash_columns(). Can we re-use those instead? For example, by creating a row from the conditions before the lookup? What a performance impact will look like? Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] ovsdb: Use table indexes if available for ovsdb_query().
On Mon, Jun 3, 2024 at 9:56 AM Simon Horman wrote: > > On Mon, Jun 03, 2024 at 12:20:36AM -0400, Mike Pattrick wrote: > > Currently all OVSDB database queries except for UUID lookups all result > > in linear lookups over the entire table, even if an index is present. > > > > This patch modifies ovsdb_query() to attempt an index lookup first, if > > possible. If no matching indexes are present then a linear index is > > still conducted. > > > > Reported-at: https://issues.redhat.com/browse/FDP-590 > > Signed-off-by: Mike Pattrick > > ... > > > diff --git a/ovsdb/query.c b/ovsdb/query.c > > index eebe56412..e3e50a034 100644 > > --- a/ovsdb/query.c > > +++ b/ovsdb/query.c > > @@ -21,32 +21,116 @@ > > #include "condition.h" > > #include "row.h" > > #include "table.h" > > +#include "transaction.h" > > + > > +static bool > > +ovsdb_query_index(struct ovsdb_table *table, > > + const struct ovsdb_condition *cnd, > > + const struct ovsdb_row **out) > > +{ > > +for (size_t idx = 0; idx < table->schema->n_indexes; idx++) { > > +const struct ovsdb_column_set *index = > > &table->schema->indexes[idx]; > > +struct hmap_node *node; > > +size_t matches = 0; > > +uint32_t hash = 0; > > + > > +if (index->n_columns != cnd->n_clauses) { > > +continue; > > +} > > + > > +/* The conditions may not be in the same order as the index. */ > > +for (size_t c = 0; c < cnd->n_clauses; c++) { > > +const struct ovsdb_clause *cnd_cls = &cnd->clauses[c]; > > + > > +if (cnd_cls->function != OVSDB_F_EQ) { > > +return false; > > +} > > + > > +for (size_t i = 0; i < index->n_columns; i++) { > > +const struct ovsdb_column *idx_col = index->columns[i]; > > + > > +if (cnd_cls->index == idx_col->index) { > > +hash = ovsdb_datum_hash(&cnd_cls->arg, &idx_col->type, > > +hash); > > +matches++; > > +break; > > +} > > +} > > + > > +/* If none of the indexed columns match, continue to the next > > + * index. */ > > +if (matches == c) { > > +break; > > +} > > +} > > + > > +if (matches != cnd->n_clauses) { > > +continue; > > +} > > + > > +for (node = hmap_first_with_hash(&table->indexes[idx], hash); node; > > + node = hmap_next_with_hash(node)) { > > +struct ovsdb_row *irow = ovsdb_row_from_index_node(node, table, > > + idx); > > + > > +for (size_t c = 0; c < cnd->n_clauses; c++) { > > +const struct ovsdb_clause *cnd_cls = &cnd->clauses[c]; > > + > > +if (!ovsdb_datum_equals(&cnd_cls->arg, > > +&irow->fields[cnd_cls->index], > > +&cnd_cls->column->type)) { > > +irow = NULL; > > +break; > > +} > > +} > > + > > +if (irow) { > > +*out = irow; > > +return true; > > +} > > +} > > + > > +/* In the case that there was a matching index but no matching > > row, the > > + * index check is still considered to be a success. */ > > +return true; > > Hi Mike, > > Maybe I misread it, but it seems that the code above implements: > 1. If a row is found, return true > 2. Otherwise, returns true > > If so, then is there a need 1? That's true, I could just break out of the loop. Github-ci also pointed out another issue with the patch that I'm working on. When I resubmit I'll correct this too. -M > > > +} > > +return false; > > +} > > ... > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 3/3] nsh: Add support to compose-packet and use it in system tests.
On Fri, May 31, 2024 at 11:45:12PM +0200, Ilya Maximets wrote: > OVS can parse NSH, but can't compose. Fix that and get rid of plain > hex NSH packets in system tests as they are hard to read or modify. > > Tcpdump calls modified to write actual pcaps instead of text output, > so ovs-pcap can be used while checking the results. > > While at it, replacing sleeps with more robust waiting for tcpdump > to start listening. I might have separated a) adding NSH compose support b) more robust tcpdump waiting, and c) using text rather than hex, into 3 patches. But I don't feel strongly about it. > M4 macros are better than shell variables, because we can see the > substitution result in the test log. So, using m4_define and m4_join > extensively. > > Signed-off-by: Ilya Maximets Acked-by: Simon Horman ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 2/3] tests: Convert ND, MPLS and CT sendpkt tests to compose-packet.
On Fri, May 31, 2024 at 11:45:11PM +0200, Ilya Maximets wrote: > These tests contain plain hex dumps that are hard to read and modify. > Replace with equivalent calls to ovs-ofctl compose-packet --bare and > ovs-pcap. > > Tcpdump calls modified to write actual pcaps instead of text output, > so ovs-pcap can be used while checking the results. > > While at it, replacing sleeps with more robust waiting for tcpdump > to start listening. I might have put that part in a separate patch, but I don't feel strongly about it. > > M4 macros are better than shell variables, because we can see the > substitution result in the test log. So, using m4_define and m4_join > extensively. > > Signed-off-by: Ilya Maximets Acked-by: Simon Horman ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 1/3] tests: sendpkt: Allow different input formats.
On Fri, May 31, 2024 at 11:45:10PM +0200, Ilya Maximets wrote: > We require python 3, so instead of manually parsing bytes on input we > can use built-in bytes.fromhex(). This function ignores whitespaces, > so we can use different input formats - the old style space-separated > bytes as well as pure hex strings provided by ovs-ofctl compose-packet > and ovs-pcap. > > Signed-off-by: Ilya Maximets Acked-by: Simon Horman ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v2 1/3] northd: Introduce ECMP_Nexthop table in SB db.
[...] > > --- a/ovn-sb.xml > > +++ b/ovn-sb.xml > > @@ -5095,4 +5095,30 @@ tcp.flags = RST; > >The set of variable values for a given chassis. > > > > > > + > > + > > + > > + > > +Nexthop IP address for this route. Nexthop IP address should be > the IP > > +address of a connected router port or the IP address of a > logical port > > +or can be set to discard for dropping packets which > match > > +the given route. > > + > > + > > + > > + > > + > > +Nexthop unique indetifier. Nexthop ID is used to track active > > +ecmp-symmetric-reply connections and flush stale ones. > > + > > + > > + > > + > > + Reserved for future use. > > + > > + > > + > > + See External IDs at the beginning of this document. > > + > > + > > > > Hi Lorenzo, I haven't reviewed all the patches of this series yet, but it > is really hard for me to understand this new SB table just by looking at > the documentation above. Each table should have some basic description for > the table itself before describing columns, i.e. the purpose of the table > and what does each row represent. Also in the description of the columns, > it is confusing what "this route" and "the given route" mean. It may be > clear to you since you are implementing the feature, but I think the SB > documentation needs to be clear for someone new to the feature. Hi Han, thx, I will fix it posting v2. I will rebase this patch ontop of FDP-600 https://patchwork.ozlabs.org/project/ovn/list/?series=405268 Regards, Lorenzo > > Thanks, > Han > > > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at > > index 89aed5adc..2160e8de7 100644 > > --- a/tests/ovn-northd.at > > +++ b/tests/ovn-northd.at > > @@ -6542,6 +6542,7 @@ check ovn-nbctl lsp-set-addresses public-lr0 router > > check ovn-nbctl lsp-set-options public-lr0 router-port=lr0-public > > > > check ovn-nbctl --wait=sb --ecmp-symmetric-reply lr-route-add lr0 > 1.0.0.1 192.168.0.10 > > +check_row_count ECMP_Nexthop 1 > > > > ovn-sbctl dump-flows lr0 > lr0flows > > > > @@ -6553,6 +6554,7 @@ AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows > | ovn_strip_lflows], [0], [dn > > ]) > > > > check ovn-nbctl --wait=sb --ecmp-symmetric-reply lr-route-add lr0 > 1.0.0.1 192.168.0.20 > > +check_row_count ECMP_Nexthop 2 > > > > ovn-sbctl dump-flows lr0 > lr0flows > > AT_CHECK([grep -e "lr_in_ip_routing.*select" lr0flows | > ovn_strip_lflows], [0], [dnl > > @@ -6589,6 +6591,7 @@ AT_CHECK([grep -e "lr_in_arp_resolve.*ecmp" > lr0flows | ovn_strip_lflows], [0], [ > > > > # add ecmp route with wrong nexthop > > check ovn-nbctl --wait=sb --ecmp-symmetric-reply lr-route-add lr0 > 1.0.0.1 192.168.1.20 > > +check_row_count ECMP_Nexthop 3 > > > > ovn-sbctl dump-flows lr0 > lr0flows > > AT_CHECK([grep -e "lr_in_ip_routing.*select" lr0flows | > ovn_strip_lflows], [0], [dnl > > @@ -6603,6 +6606,7 @@ AT_CHECK([grep -e "lr_in_ip_routing_ecmp" lr0flows > | sed 's/192\.168\.0\..0/192. > > > > check ovn-nbctl lr-route-del lr0 > > wait_row_count nb:Logical_Router_Static_Route 0 > > +check_row_count ECMP_Nexthop 0 > > > > check ovn-nbctl --wait=sb lr-route-add lr0 1.0.0.0/24 192.168.0.10 > > ovn-sbctl dump-flows lr0 > lr0flows > > -- > > 2.44.0 > > > > ___ > > dev mailing list > > d...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] controller: Send RARP/GARP for VIF post link state is up.
Hi team, Can someone please review these changes? Thanks, Shibir Basak From: Shibir Basak Date: Monday, 27 May 2024 at 11:55 PM To: d...@openvswitch.org Cc: Shibir Basak , Naveen Yerramneni Subject: [PATCH ovn] controller: Send RARP/GARP for VIF post link state is up. Currently, GARP/RARP broadcast is sent for VIFs (part of logical switch with localnet port) after iface-id is set. This fix is to avoid packet loss during migration if iface-id is set even before the VM migration is completed. Signed-off-by: Shibir Basak Acked-by: Naveen Yerramneni --- controller/ovn-controller.c | 1 + controller/pinctrl.c| 4 2 files changed, 5 insertions(+) diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index 6b38f113d..982378a50 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -1128,6 +1128,7 @@ ctrl_register_ovs_idl(struct ovsdb_idl *ovs_idl) ovsdb_idl_add_table(ovs_idl, &ovsrec_table_queue); ovsdb_idl_add_column(ovs_idl, &ovsrec_queue_col_other_config); ovsdb_idl_add_column(ovs_idl, &ovsrec_queue_col_external_ids); +ovsdb_idl_add_column(ovs_idl, &ovsrec_interface_col_link_state); chassis_register_ovs_idl(ovs_idl); encaps_register_ovs_idl(ovs_idl); diff --git a/controller/pinctrl.c b/controller/pinctrl.c index 6a2c3dc68..b5d3162b8 100644 --- a/controller/pinctrl.c +++ b/controller/pinctrl.c @@ -6375,6 +6375,10 @@ get_localnet_vifs_l3gwports( if (!pb || pb->chassis != chassis) { continue; } +if (!iface_rec->link_state || +strcmp(iface_rec->link_state, "up")) { +continue; +} struct local_datapath *ld = get_local_datapath(local_datapaths, pb->datapath->tunnel_key); -- 2.22.3 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 2/2] ipf: Handle common case of ipf defragmentation.
On Thu, May 16, 2024 at 11:38:32AM -0400, Mike Pattrick wrote: > When conntrack is reassembling packet fragments, the same reassembly > context can be shared across multiple threads handling different packets > simultaneously. Once a full packet is assembled, it is added to a packet > batch for processing, in the case where there are multiple different pmd > threads accessing conntrack simultaneously, there is a race condition > where the reassembled packet may be added to an arbitrary batch even if > the current batch is available. > > When this happens, the packet may be handled incorrectly as it is > inserted into a random openflow execution pipeline, instead of the > pipeline for that packets flow. > > This change makes a best effort attempt to try to add the defragmented > packet to the current batch. directly. This should succeed most of the > time. > > Fixes: 4ea96698f667 ("Userspace datapath: Add fragmentation handling.") > Reported-at: https://issues.redhat.com/browse/FDP-560 > Signed-off-by: Mike Pattrick Acked-by: Simon Horman ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 1/2] ipf: Only add fragments to batch of same dl_type.
On Thu, May 16, 2024 at 11:38:31AM -0400, Mike Pattrick wrote: > When conntrack is reassembling packet fragments, the same reassembly > context can be shared across multiple threads handling different packets > simultaneously. Once a full packet is assembled, it is added to a packet > batch for processing, this is most likely the batch that added it in the > first place, but that isn't a guarantee. > > The packets in these batches should be segregated by network protocol > version (ipv4 vs ipv6) for conntrack defragmentation to function > appropriately. However, there are conditions where we would add a > reassembled packet of one type to a batch of another. > > This change introduces checks to make sure that reassembled or expired > fragments are only added to packet batches of the same type. > > Fixes: 4ea96698f667 ("Userspace datapath: Add fragmentation handling.") > Reported-at: https://issues.redhat.com/browse/FDP-560 > Signed-off-by: Mike Pattrick Acked-by: Simon Horman ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] tests: Fix non-portable plus match in python vlog test.
On 6/3/24 13:14, Eelco Chaudron wrote: > > > On 3 Jun 2024, at 13:12, Ilya Maximets wrote: > >> '\+' as a one-or-more match is a GNU extension and it doesn't work >> in BSD sed. This makes the python vlog test to fail on FreeBSD 14 >> that recently got python 3.11 in CirrusCI: >> >> | --- - 2024-06-03 10:42:26.363566000 + >> | +++ /tmp/cirrus-ci-build/tests/testsuite.dir/at-groups/2541/stdout >> | @@ -7,31 +7,37 @@ >> | Traceback (most recent call last): >> | File , line , in main >> | assert fail >> | + >> >> Remove the '\+' match to make the line removal work. It doesn't do >> much for us as we would remove the same lines either way. >> >> This change makes CirruCI green again. >> >> Fixes: 9185793e7543 ("tests: Fix compatibility issue with Python 3.13 in >> vlog.at.") >> Signed-off-by: Ilya Maximets > > The change looks good to me (I did not test it). > > Acked-by: Eelco Chaudron Thanks! I retested this and applied to all branches down to 2.17. Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] ovsdb: Use table indexes if available for ovsdb_query().
On Mon, Jun 03, 2024 at 12:20:36AM -0400, Mike Pattrick wrote: > Currently all OVSDB database queries except for UUID lookups all result > in linear lookups over the entire table, even if an index is present. > > This patch modifies ovsdb_query() to attempt an index lookup first, if > possible. If no matching indexes are present then a linear index is > still conducted. > > Reported-at: https://issues.redhat.com/browse/FDP-590 > Signed-off-by: Mike Pattrick ... > diff --git a/ovsdb/query.c b/ovsdb/query.c > index eebe56412..e3e50a034 100644 > --- a/ovsdb/query.c > +++ b/ovsdb/query.c > @@ -21,32 +21,116 @@ > #include "condition.h" > #include "row.h" > #include "table.h" > +#include "transaction.h" > + > +static bool > +ovsdb_query_index(struct ovsdb_table *table, > + const struct ovsdb_condition *cnd, > + const struct ovsdb_row **out) > +{ > +for (size_t idx = 0; idx < table->schema->n_indexes; idx++) { > +const struct ovsdb_column_set *index = &table->schema->indexes[idx]; > +struct hmap_node *node; > +size_t matches = 0; > +uint32_t hash = 0; > + > +if (index->n_columns != cnd->n_clauses) { > +continue; > +} > + > +/* The conditions may not be in the same order as the index. */ > +for (size_t c = 0; c < cnd->n_clauses; c++) { > +const struct ovsdb_clause *cnd_cls = &cnd->clauses[c]; > + > +if (cnd_cls->function != OVSDB_F_EQ) { > +return false; > +} > + > +for (size_t i = 0; i < index->n_columns; i++) { > +const struct ovsdb_column *idx_col = index->columns[i]; > + > +if (cnd_cls->index == idx_col->index) { > +hash = ovsdb_datum_hash(&cnd_cls->arg, &idx_col->type, > +hash); > +matches++; > +break; > +} > +} > + > +/* If none of the indexed columns match, continue to the next > + * index. */ > +if (matches == c) { > +break; > +} > +} > + > +if (matches != cnd->n_clauses) { > +continue; > +} > + > +for (node = hmap_first_with_hash(&table->indexes[idx], hash); node; > + node = hmap_next_with_hash(node)) { > +struct ovsdb_row *irow = ovsdb_row_from_index_node(node, table, > + idx); > + > +for (size_t c = 0; c < cnd->n_clauses; c++) { > +const struct ovsdb_clause *cnd_cls = &cnd->clauses[c]; > + > +if (!ovsdb_datum_equals(&cnd_cls->arg, > +&irow->fields[cnd_cls->index], > +&cnd_cls->column->type)) { > +irow = NULL; > +break; > +} > +} > + > +if (irow) { > +*out = irow; > +return true; > +} > +} > + > +/* In the case that there was a matching index but no matching row, > the > + * index check is still considered to be a success. */ > +return true; Hi Mike, Maybe I misread it, but it seems that the code above implements: 1. If a row is found, return true 2. Otherwise, returns true If so, then is there a need 1? > +} > +return false; > +} ... ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 1/1] ofproto-dpif-upcall: Avoid stale ukeys leaks.
On 3 Jun 2024, at 10:07, Roi Dayan wrote: > On 03/06/2024 10:18, Roi Dayan wrote: >> >> >> On 30/05/2024 18:48, Eelco Chaudron wrote: >>> >>> >>> On 23 May 2024, at 12:46, Roi Dayan via dev wrote: >>> It is observed in some environments that there are much more ukeys than actual DP flows. For example: $ ovs-appctl upcall/show system@ovs-system: flows : (current 7) (avg 6) (max 117) (limit 2125) offloaded flows : 525 dump duration : 1063ms ufid enabled : true 23: (keys 3612) 24: (keys 3625) 25: (keys 3485) The revalidator threads are busy revalidating the stale ukeys leading to high CPU and long dump duration. There are some possible situations that may result in stale ukeys that have no corresponding DP flows. In revalidator, push_dp_ops() doesn't check error if the op type is not DEL. It is possible that a PUT(MODIFY) fails, especially for tc offload case, where the old flow is deleted first and then the new one is created. If the creation fails, the ukey will be stale (no corresponding DP flow). This patch adds a warning in such case. >>> >>> Not sure I understand, this behavior should be captured by the >>> UKEY_INCONSISTENT state. >> >> Hi Eelco, >> >> thanks for reviewing. >> We started the patch on older branch as we didn't rebase for some time >> and got a little later on sending it. >> I see the addition now for UKEY_INCOSISTENT in the following patch: >> >> cf11766cbcf1 ofproto-dpif-upcall: Fix push_dp_ops to handle all errors. >> >> Looks like it handles the same situation we tried to handle in this patch. >> >>> Another possible scenario is in handle_upcalls() if a PUT operation did not succeed and op->error attribute was not set correctly it can lead to stale ukey in operational state. >>> >>> >>> Guess we need to figure out these cases, as these are the root cause of >>> your problem. >> >> right. maybe. but this could help keep the system alive/clean while logging >> the >> bad situation instead of having increase in those stale/inconsistent ukeys. >> I understand if it's not nice to handle it like this. >> > > Hi Eelco, > > I remember now one of the reproduction scenarios we did is do some traffic > to get rules added using TC and then delete those from tc command line > and it got to stale ukeys. > The revalidator dump thread not seeing the rules so not updating anything > and sweep over the ukeys skipping them as well. > This is why we checked against the timing stats of the ukey. > > I remember I tested on the upstream code and not our development branch > and it reproduces. I didn't notice if the commit adding UKEY_INCONSISTENT > existed but it handles errors from adding flows so I dont think it matters. > > I'll try to check and verify again but I think it's still here. > So while cases getting dop.error handled with UKEY_INCONSISTENT, > this case I think is not. I think you are right this case is not covered by the UKEY_INCONSISTENT test below. See my suggestion on fixing this in revalidate_ukey(). Maybe you could also add a test case for this in the offload test suite. //Eelco This patch adds checks in the sweep phase for such ukeys and move them to DELETE so that they can be cleared eventually. Co-authored-by: Han Zhou Signed-off-by: Han Zhou Signed-off-by: Roi Dayan --- ofproto/ofproto-dpif-upcall.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c index 83609ec62b63..e9520ebdf910 100644 --- a/ofproto/ofproto-dpif-upcall.c +++ b/ofproto/ofproto-dpif-upcall.c @@ -57,6 +57,7 @@ COVERAGE_DEFINE(dumped_inconsistent_flow); COVERAGE_DEFINE(dumped_new_flow); COVERAGE_DEFINE(handler_duplicate_upcall); COVERAGE_DEFINE(revalidate_missed_dp_flow); +COVERAGE_DEFINE(revalidate_missed_dp_flow_del); COVERAGE_DEFINE(ukey_dp_change); COVERAGE_DEFINE(ukey_invalid_stat_reset); COVERAGE_DEFINE(ukey_replace_contention); @@ -278,6 +279,7 @@ enum flow_del_reason { FDR_BAD_ODP_FIT,/* Bad ODP flow fit. */ FDR_FLOW_IDLE, /* Flow idle timeout. */ FDR_FLOW_LIMIT, /* Kill all flows condition reached. */ +FDR_FLOW_STALE, /* Flow stale detected. */ >>> >>> Please also update the associated script, see above comment. >>> FDR_FLOW_WILDCARDED,/* Flow needs a narrower wildcard mask. */ FDR_NO_OFPROTO, /* Bridge not found. */ FDR_PURGE, /* User requested flow deletion. */ @@ -2557,6 +2559,10 @@ push_dp_ops(struct udpif *udpif, struct ukey_op *ops, size_t n_ops) >>> >>> >>> The (op->dop.error) condition below will never be reached, as it’s >>> explicitly checked above. So the error message will never happen. The >>> condition you ex
Re: [ovs-dev] [PATCH v3 0/8] Coverity fixes.
On 29 May 2024, at 12:53, Eelco Chaudron wrote: > This series addresses several high-priority Coverity issues. > > Delta from v2 -> v3: > - Cleaned up error message on patch 8. > > Delta from v1 -> v2: > - Add cover letter. > - Split first patch and update commit subject. > - Added ds_destroy() to context_done(). > - Changed invalid gso_type handling by returning an error + log message. > - Changed sFlowRcvrTimeout to be using uint32_t to avoid TIME_T_MAKE. > > Eelco Chaudron (8): > netdev-linux: Fix possible int overflow in tc_add_matchall_policer(). > cfm: Fix possible integer overflow in tc_add_matchall_policer(). > sflow: Replace libc's random() function with the OVS's random_range(). > sflow: Use uint32_t instead of time_t for tick handling in the poller. > sflow: Fix check for disabled receive time. > ofproto-dpif: Define age as time_t in ofproto_unixctl_fdb_add(). > db-ctl-base: Initialize the output variable in the ctx structure. > netdev-linux: Fix uninitialized gso_type case. > > lib/cfm.c| 2 +- > lib/db-ctl-base.c| 2 ++ > lib/netdev-linux.c | 8 ++-- > lib/sflow_api.h | 10 +- > lib/sflow_poller.c | 3 ++- > lib/sflow_receiver.c | 7 --- > ofproto/ofproto-dpif-sflow.c | 2 +- > ofproto/ofproto-dpif.c | 2 +- > 8 files changed, 22 insertions(+), 14 deletions(-) Series applied to main. Thank Ilya, Paolo, and Mike for the reviews. Cheers, Eelco ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] tests: Fix non-portable plus match in python vlog test.
On 3 Jun 2024, at 13:12, Ilya Maximets wrote: > '\+' as a one-or-more match is a GNU extension and it doesn't work > in BSD sed. This makes the python vlog test to fail on FreeBSD 14 > that recently got python 3.11 in CirrusCI: > > | --- - 2024-06-03 10:42:26.363566000 + > | +++ /tmp/cirrus-ci-build/tests/testsuite.dir/at-groups/2541/stdout > | @@ -7,31 +7,37 @@ > | Traceback (most recent call last): > | File , line , in main > | assert fail > | + > > Remove the '\+' match to make the line removal work. It doesn't do > much for us as we would remove the same lines either way. > > This change makes CirruCI green again. > > Fixes: 9185793e7543 ("tests: Fix compatibility issue with Python 3.13 in > vlog.at.") > Signed-off-by: Ilya Maximets The change looks good to me (I did not test it). Acked-by: Eelco Chaudron > --- > tests/vlog.at | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/tests/vlog.at b/tests/vlog.at > index efe91479a..2768c0740 100644 > --- a/tests/vlog.at > +++ b/tests/vlog.at > @@ -8,7 +8,7 @@ AT_CHECK([$PYTHON3 $srcdir/test-vlog.py --log-file log_file \ > > AT_CHECK([sed -e 's/.*-.*-.*T..:..:..Z |//' \ > -e 's/File ".*", line [[0-9]][[0-9]]*,/File , line ,/' \ > --e '/\^\+/d' \ > +-e '/\^/d' \ > stderr_log], [0], [dnl >0 | module_0 | EMER | emergency >1 | module_0 | ERR | error > -- > 2.45.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3 8/8] netdev-linux: Fix uninitialized gso_type case.
On 31 May 2024, at 13:37, Ilya Maximets wrote: > On 5/29/24 12:53, Eelco Chaudron wrote: >> This patch fixes an uninitialized gso_type case in >> netdev_linux_prepend_vnet_hdr() by returning an error. >> >> Fixes: 3337e6d91c5b ("userspace: Enable L4 checksum offloading by default.") >> Signed-off-by: Eelco Chaudron >> --- >> lib/netdev-linux.c | 4 >> 1 file changed, 4 insertions(+) >> >> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c >> index eb0c5c624..dc67e1268 100644 >> --- a/lib/netdev-linux.c >> +++ b/lib/netdev-linux.c >> @@ -7167,6 +7167,10 @@ netdev_linux_prepend_vnet_hdr(struct dp_packet *b, >> int mtu) >> vnet->gso_type = VIRTIO_NET_HDR_GSO_TCPV4; >> } else if (dp_packet_hwol_tx_ipv6(b)) { >> vnet->gso_type = VIRTIO_NET_HDR_GSO_TCPV6; >> +} else { >> +VLOG_ERR_RL(&rl, "Unknown gso_type for TSO packet. Flags: >> %"PRIx64, > > I'd suggest adding the # qualifier to the format string, i.e. %#"PRIx64. > Can be fixed on commit, I suppose. Thanks for the review! I’ll fix on commit, as I totally missed the # sign :(. > Acked-by: Ilya Maximets ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] tests: Fix non-portable plus match in python vlog test.
'\+' as a one-or-more match is a GNU extension and it doesn't work in BSD sed. This makes the python vlog test to fail on FreeBSD 14 that recently got python 3.11 in CirrusCI: | --- - 2024-06-03 10:42:26.363566000 + | +++ /tmp/cirrus-ci-build/tests/testsuite.dir/at-groups/2541/stdout | @@ -7,31 +7,37 @@ | Traceback (most recent call last): | File , line , in main | assert fail | + Remove the '\+' match to make the line removal work. It doesn't do much for us as we would remove the same lines either way. This change makes CirruCI green again. Fixes: 9185793e7543 ("tests: Fix compatibility issue with Python 3.13 in vlog.at.") Signed-off-by: Ilya Maximets --- tests/vlog.at | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/vlog.at b/tests/vlog.at index efe91479a..2768c0740 100644 --- a/tests/vlog.at +++ b/tests/vlog.at @@ -8,7 +8,7 @@ AT_CHECK([$PYTHON3 $srcdir/test-vlog.py --log-file log_file \ AT_CHECK([sed -e 's/.*-.*-.*T..:..:..Z |//' \ -e 's/File ".*", line [[0-9]][[0-9]]*,/File , line ,/' \ --e '/\^\+/d' \ +-e '/\^/d' \ stderr_log], [0], [dnl 0 | module_0 | EMER | emergency 1 | module_0 | ERR | error -- 2.45.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] utilties: Allow ovn-detrace to run on ovs-ofctl dump-flows output.
Bleep bloop. Greetings Ales Musil, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. checkpatch: WARNING: Line lacks whitespace around operator #189 FILE: utilities/ovn-debug.c:105: uuid-to-cookie UUID\n\ Lines checked: 229, Warnings: 1, Errors: 0 Please check this out. If you feel there has been an error, please email acon...@redhat.com Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn] utilties: Allow ovn-detrace to run on ovs-ofctl dump-flows output.
The ovs-ofctl dump-flows output is slightly different from oproto/trace cookie 0xXXX vs. cookie=0xXXX. Update the regex that it also matches on the equals case. This allows us to run ovn-detrace against the ovs-ofctl dump-flows output. Also provide simple, partially hardcoded test case for ovn-detrace. Signed-off-by: Ales Musil --- tests/automake.mk | 3 +- tests/ovn-util.at | 108 tests/testsuite.at | 1 + utilities/ovn-debug.c | 18 ++ utilities/ovn_detrace.py.in | 5 +- 5 files changed, 132 insertions(+), 3 deletions(-) create mode 100644 tests/ovn-util.at diff --git a/tests/automake.mk b/tests/automake.mk index 1fdc89835..3899c9e80 100644 --- a/tests/automake.mk +++ b/tests/automake.mk @@ -45,7 +45,8 @@ TESTSUITE_AT = \ tests/ovn-lflow-cache.at \ tests/ovn-lflow-conj-ids.at \ tests/ovn-ipsec.at \ - tests/ovn-vif-plug.at + tests/ovn-vif-plug.at \ + tests/ovn-util.at SYSTEM_DPDK_TESTSUITE_AT = \ tests/system-dpdk-testsuite.at \ diff --git a/tests/ovn-util.at b/tests/ovn-util.at new file mode 100644 index 0..fd3282548 --- /dev/null +++ b/tests/ovn-util.at @@ -0,0 +1,108 @@ +AT_SETUP([ovn-detrace - simple scenario]) +AT_SKIP_IF([test $HAVE_SCAPY = no]) +ovn_start +net_add n1 + +sim_add hv1 +as hv1 +ovs-vsctl add-br br-phys +ovn_attach n1 br-phys 192.168.0.1 +ovs-vsctl -- add-port br-int vm0 -- \ +set interface vm0 external-ids:iface-id=vm0 + +ovs-vsctl -- add-port br-int vm1 -- \ +set interface vm1 external-ids:iface-id=vm1 + +ovn-nbctl ls-add ls \ +-- set logical_switch ls other-config:requested-tnl-key=1 + +ovn-nbctl lsp-add ls vm0 \ +-- lsp-set-addresses vm0 "f0:00:00:01:01:00 192.168.1.10" \ +-- set logical_switch_port vm0 options:requested-tnl-key=10 +ovn-nbctl lsp-add ls vm1 \ +-- lsp-set-addresses vm1 "f0:00:00:01:01:01 192.168.1.11" \ +-- set logical_switch_port vm1 options:requested-tnl-key=11 + +# Allow some time for ovn-northd and ovn-controller to catch up. +wait_for_ports_up +check ovn-nbctl --wait=hv sync + +ingress_table=$(ovn-debug lflow-stage-to-ltable ls_in_check_port_sec) +egress_table=$(ovn-debug lflow-stage-to-ltable ls_out_apply_port_sec) +dp_uuid=$(fetch_column datapath _uuid external_ids:name=ls) +pb_vm0=$(ovn-debug uuid-to-cookie $(fetch_column port_binding _uuid \ +logical_port=vm0)) +pb_vm1=$(ovn-debug uuid-to-cookie $(fetch_column port_binding _uuid \ +logical_port=vm1)) +ingress=$(ovn-debug uuid-to-cookie $(fetch_column logical_flow _uuid \ +table_id=$ingress_table pipeline=ingress match="1")) +egress=$(ovn-debug uuid-to-cookie $(fetch_column logical_flow _uuid \ +table_id=$egress_table pipeline=egress match="1")) + +cat << EOF > trace +0. in_port=1, priority 100, cookie $pb_vm0 +set_field:0x4/0x->reg13 +set_field:0x1->reg11 +set_field:0x1->reg12 +set_field:0x1->metadata +set_field:0x1->reg14 +set_field:0/0x->reg13 +resubmit(,??) +8. metadata=0x1, priority 50, cookie $ingress +set_field:0/0x1000->reg10 +resubmit(,??) +51. metadata=0x1, priority 0, cookie $egress +resubmit(,??) +65. reg15=0x2,metadata=0x1, priority 100, cookie $pb_vm1 +output:2 +EOF + +AT_CHECK_UNQUOTED([cat trace | $PYTHON $top_srcdir/utilities/ovn_detrace.py.in], [0], [dnl +0. in_port=1, priority 100, cookie $pb_vm0 +set_field:0x4/0x->reg13 +set_field:0x1->reg11 +set_field:0x1->reg12 +set_field:0x1->metadata +set_field:0x1->reg14 +set_field:0/0x->reg13 +resubmit(,??) + * Logical datapath: "ls" ($dp_uuid) + * Port Binding: logical_port "vm0", tunnel_key 10, chassis-name "hv1", chassis-str "hv1" +8. metadata=0x1, priority 50, cookie $ingress +set_field:0/0x1000->reg10 +resubmit(,??) + * Logical datapaths: + * "ls" ($dp_uuid) [[ingress]] + * Logical flow: table=$ingress_table (ls_in_check_port_sec), priority=50, match=(1), actions=(reg0[[15]] = check_in_port_sec(); next;) +51. metadata=0x1, priority 0, cookie $egress +resubmit(,??) + * Logical datapaths: + * "ls" ($dp_uuid) [[egress]] + * Logical flow: table=$egress_table (ls_out_apply_port_sec), priority=0, match=(1), actions=(output;) +65. reg15=0x2,metadata=0x1, priority 100, cookie $pb_vm1 +output:2 + * Logical datapath: "ls" ($dp_uuid) + * Port Binding: logical_port "vm1", tunnel_key 11, chassis-name "hv1", chassis-str "hv1" + +]) + +ovs-ofctl dump-flows br-int table=$(ovn-debug lflow-stage-to-oftable ls_in_check_port_sec),cookie=$ingress/0x >> flows +ovs-ofctl dump-flows br-int table=$(ovn-debug lflow-stage-to-oftable ls_out_apply_port_sec),cookie=$egress/0x >> flows + +AT_CHECK_UNQUOTED([cat flows | awk '{print $1, $7, $8}' | grep -v "NXST_FLOW" | \ + sed -e "s/resubmit(,[[0-9]]\+)/resubmit(,??)/g" | \ + $PYTHON $top_srcdir/utilities/ovn_detrace.py.in], [0], [dnl +cookie=$ingress, priority=50,metadata=0x1 actions=load:0->NXM_NX_REG10[[12]],resubmit(,??),move:NXM_NX
Re: [ovs-dev] [PATCH 1/1] ofproto-dpif-upcall: Avoid stale ukeys leaks.
On 03/06/2024 10:18, Roi Dayan wrote: > > > On 30/05/2024 18:48, Eelco Chaudron wrote: >> >> >> On 23 May 2024, at 12:46, Roi Dayan via dev wrote: >> >>> It is observed in some environments that there are much more ukeys than >>> actual DP flows. For example: >>> >>> $ ovs-appctl upcall/show >>> system@ovs-system: >>> flows : (current 7) (avg 6) (max 117) (limit 2125) >>> offloaded flows : 525 >>> dump duration : 1063ms >>> ufid enabled : true >>> >>> 23: (keys 3612) >>> 24: (keys 3625) >>> 25: (keys 3485) >>> >>> The revalidator threads are busy revalidating the stale ukeys leading to >>> high CPU and long dump duration. >>> >>> There are some possible situations that may result in stale ukeys that >>> have no corresponding DP flows. >>> >>> In revalidator, push_dp_ops() doesn't check error if the op type is not >>> DEL. It is possible that a PUT(MODIFY) fails, especially for tc offload >>> case, where the old flow is deleted first and then the new one is >>> created. If the creation fails, the ukey will be stale (no corresponding >>> DP flow). This patch adds a warning in such case. >> >> Not sure I understand, this behavior should be captured by the >> UKEY_INCONSISTENT state. > > Hi Eelco, > > thanks for reviewing. > We started the patch on older branch as we didn't rebase for some time > and got a little later on sending it. > I see the addition now for UKEY_INCOSISTENT in the following patch: > > cf11766cbcf1 ofproto-dpif-upcall: Fix push_dp_ops to handle all errors. > > Looks like it handles the same situation we tried to handle in this patch. > >> >>> Another possible scenario is in handle_upcalls() if a PUT operation did >>> not succeed and op->error attribute was not set correctly it can lead to >>> stale ukey in operational state. >> >> >> Guess we need to figure out these cases, as these are the root cause of your >> problem. > > right. maybe. but this could help keep the system alive/clean while logging > the > bad situation instead of having increase in those stale/inconsistent ukeys. > I understand if it's not nice to handle it like this. > Hi Eelco, I remember now one of the reproduction scenarios we did is do some traffic to get rules added using TC and then delete those from tc command line and it got to stale ukeys. The revalidator dump thread not seeing the rules so not updating anything and sweep over the ukeys skipping them as well. This is why we checked against the timing stats of the ukey. I remember I tested on the upstream code and not our development branch and it reproduces. I didn't notice if the commit adding UKEY_INCONSISTENT existed but it handles errors from adding flows so I dont think it matters. I'll try to check and verify again but I think it's still here. So while cases getting dop.error handled with UKEY_INCONSISTENT, this case I think is not. Thanks, Roi >> >>> >>> This patch adds checks in the sweep phase for such ukeys and move them >>> to DELETE so that they can be cleared eventually. >>> >>> Co-authored-by: Han Zhou >>> Signed-off-by: Han Zhou >>> Signed-off-by: Roi Dayan >>> --- >>> ofproto/ofproto-dpif-upcall.c | 15 +++ >>> 1 file changed, 15 insertions(+) >>> >>> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c >>> index 83609ec62b63..e9520ebdf910 100644 >>> --- a/ofproto/ofproto-dpif-upcall.c >>> +++ b/ofproto/ofproto-dpif-upcall.c >>> @@ -57,6 +57,7 @@ COVERAGE_DEFINE(dumped_inconsistent_flow); >>> COVERAGE_DEFINE(dumped_new_flow); >>> COVERAGE_DEFINE(handler_duplicate_upcall); >>> COVERAGE_DEFINE(revalidate_missed_dp_flow); >>> +COVERAGE_DEFINE(revalidate_missed_dp_flow_del); >>> COVERAGE_DEFINE(ukey_dp_change); >>> COVERAGE_DEFINE(ukey_invalid_stat_reset); >>> COVERAGE_DEFINE(ukey_replace_contention); >>> @@ -278,6 +279,7 @@ enum flow_del_reason { >>> FDR_BAD_ODP_FIT,/* Bad ODP flow fit. */ >>> FDR_FLOW_IDLE, /* Flow idle timeout. */ >>> FDR_FLOW_LIMIT, /* Kill all flows condition reached. */ >>> +FDR_FLOW_STALE, /* Flow stale detected. */ >> >> Please also update the associated script, see above comment. >> >>> FDR_FLOW_WILDCARDED,/* Flow needs a narrower wildcard mask. */ >>> FDR_NO_OFPROTO, /* Bridge not found. */ >>> FDR_PURGE, /* User requested flow deletion. */ >>> @@ -2557,6 +2559,10 @@ push_dp_ops(struct udpif *udpif, struct ukey_op >>> *ops, size_t n_ops) >> >> >> The (op->dop.error) condition below will never be reached, as it’s >> explicitly checked above. So the error message will never happen. The >> condition you explain is already covered by UKEY_INCONSISTENT. Not sure if >> we need a log message for this either at warn level, maybe debug, especially >> as you say it can happen often. >> > > right. as replied above. I didn't notice this when I did the cherry-pick as > git didn't > show me a conflict in this area at all. > This commit was done before the change so we did hit
Re: [ovs-dev] [PATCH 1/1] ofproto-dpif-upcall: Avoid stale ukeys leaks.
On 30/05/2024 18:48, Eelco Chaudron wrote: > > > On 23 May 2024, at 12:46, Roi Dayan via dev wrote: > >> It is observed in some environments that there are much more ukeys than >> actual DP flows. For example: >> >> $ ovs-appctl upcall/show >> system@ovs-system: >> flows : (current 7) (avg 6) (max 117) (limit 2125) >> offloaded flows : 525 >> dump duration : 1063ms >> ufid enabled : true >> >> 23: (keys 3612) >> 24: (keys 3625) >> 25: (keys 3485) >> >> The revalidator threads are busy revalidating the stale ukeys leading to >> high CPU and long dump duration. >> >> There are some possible situations that may result in stale ukeys that >> have no corresponding DP flows. >> >> In revalidator, push_dp_ops() doesn't check error if the op type is not >> DEL. It is possible that a PUT(MODIFY) fails, especially for tc offload >> case, where the old flow is deleted first and then the new one is >> created. If the creation fails, the ukey will be stale (no corresponding >> DP flow). This patch adds a warning in such case. > > Not sure I understand, this behavior should be captured by the > UKEY_INCONSISTENT state. Hi Eelco, thanks for reviewing. We started the patch on older branch as we didn't rebase for some time and got a little later on sending it. I see the addition now for UKEY_INCOSISTENT in the following patch: cf11766cbcf1 ofproto-dpif-upcall: Fix push_dp_ops to handle all errors. Looks like it handles the same situation we tried to handle in this patch. > >> Another possible scenario is in handle_upcalls() if a PUT operation did >> not succeed and op->error attribute was not set correctly it can lead to >> stale ukey in operational state. > > > Guess we need to figure out these cases, as these are the root cause of your > problem. right. maybe. but this could help keep the system alive/clean while logging the bad situation instead of having increase in those stale/inconsistent ukeys. I understand if it's not nice to handle it like this. > >> >> This patch adds checks in the sweep phase for such ukeys and move them >> to DELETE so that they can be cleared eventually. >> >> Co-authored-by: Han Zhou >> Signed-off-by: Han Zhou >> Signed-off-by: Roi Dayan >> --- >> ofproto/ofproto-dpif-upcall.c | 15 +++ >> 1 file changed, 15 insertions(+) >> >> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c >> index 83609ec62b63..e9520ebdf910 100644 >> --- a/ofproto/ofproto-dpif-upcall.c >> +++ b/ofproto/ofproto-dpif-upcall.c >> @@ -57,6 +57,7 @@ COVERAGE_DEFINE(dumped_inconsistent_flow); >> COVERAGE_DEFINE(dumped_new_flow); >> COVERAGE_DEFINE(handler_duplicate_upcall); >> COVERAGE_DEFINE(revalidate_missed_dp_flow); >> +COVERAGE_DEFINE(revalidate_missed_dp_flow_del); >> COVERAGE_DEFINE(ukey_dp_change); >> COVERAGE_DEFINE(ukey_invalid_stat_reset); >> COVERAGE_DEFINE(ukey_replace_contention); >> @@ -278,6 +279,7 @@ enum flow_del_reason { >> FDR_BAD_ODP_FIT,/* Bad ODP flow fit. */ >> FDR_FLOW_IDLE, /* Flow idle timeout. */ >> FDR_FLOW_LIMIT, /* Kill all flows condition reached. */ >> +FDR_FLOW_STALE, /* Flow stale detected. */ > > Please also update the associated script, see above comment. > >> FDR_FLOW_WILDCARDED,/* Flow needs a narrower wildcard mask. */ >> FDR_NO_OFPROTO, /* Bridge not found. */ >> FDR_PURGE, /* User requested flow deletion. */ >> @@ -2557,6 +2559,10 @@ push_dp_ops(struct udpif *udpif, struct ukey_op *ops, >> size_t n_ops) > > > The (op->dop.error) condition below will never be reached, as it’s explicitly > checked above. So the error message will never happen. The condition you > explain is already covered by UKEY_INCONSISTENT. Not sure if we need a log > message for this either at warn level, maybe debug, especially as you say it > can happen often. > right. as replied above. I didn't notice this when I did the cherry-pick as git didn't show me a conflict in this area at all. This commit was done before the change so we did hit this spot. we got little off checking the commit and sending it on time for review. sorry about that. cf11766cbcf1 ofproto-dpif-upcall: Fix push_dp_ops to handle all errors. I think we should redo our checks with the commit adding the UKEY_INCONSISTENT state and see our result again. thanks for the review. >> if (op->dop.type != DPIF_OP_FLOW_DEL) { >> /* Only deleted flows need their stats pushed. */ >> +if (op->dop.error) { >> +VLOG_WARN_RL(&rl, "push_dp_ops: error %d in op type %d, >> ukey " >> + "%p", op->dop.error, op->dop.type, op->ukey); >> +} >> continue; >> } >> >> @@ -3027,6 +3033,15 @@ revalidator_sweep__(struct revalidator *revalidator, >> bool purge) >> del_reason = purge ? FDR_PURGE : FDR_UPDATE_FAIL; >> } else if (!seq_mismatch) { >>