Re: [ovs-dev] [PATCH v2 net-next 0/7] selftests: net: Switch pmtu.sh to use the internal ovs script.

2024-06-21 Thread Jakub Kicinski
On Thu, 20 Jun 2024 08:55:54 -0400 Aaron Conole wrote:
> This series enhances the ovs-dpctl utility to provide support for set()
> and tunnel() flow specifiers, better ipv6 handling support, and the
> ability to add tunnel vports, and LWT interfaces.  Finally, it modifies
> the pmtu.sh script to call the ovs-dpctl.py utility rather than the
> typical OVS userspace utilities.

Thanks for the work! 

Looks like the series no longer applies because of other changes
to the kernel config. Before it stopped applying we got some runs,
here's what I see:

https://netdev-3.bots.linux.dev/vmksft-net/results/648440/3-pmtu-sh/stdout

# Cannot find device "ovs_br0"
# TEST: IPv4, OVS vxlan4: PMTU exceptions [FAIL]
# Cannot find device "ovs_br0"
# TEST: IPv4, OVS vxlan4: PMTU exceptions - nexthop objects   [FAIL]
# Cannot find device "ovs_br0"
# TEST: IPv6, OVS vxlan4: PMTU exceptions [FAIL]
# Cannot find device "ovs_br0"
# TEST: IPv6, OVS vxlan4: PMTU exceptions - nexthop objects   [FAIL]
# Cannot find device "ovs_br0"
# TEST: IPv4, OVS vxlan6: PMTU exceptions [FAIL]
# Cannot find device "ovs_br0"
# TEST: IPv4, OVS vxlan6: PMTU exceptions - nexthop objects   [FAIL]
# Cannot find device "ovs_br0"
# TEST: IPv6, OVS vxlan6: PMTU exceptions [FAIL]
# Cannot find device "ovs_br0"
# TEST: IPv6, OVS vxlan6: PMTU exceptions - nexthop objects   [FAIL]
# Cannot find device "ovs_br0"
# TEST: IPv4, OVS geneve4: PMTU exceptions[FAIL]
# Cannot find device "ovs_br0"
# TEST: IPv4, OVS geneve4: PMTU exceptions - nexthop objects  [FAIL]
# Cannot find device "ovs_br0"
# TEST: IPv6, OVS geneve4: PMTU exceptions[FAIL]
# Cannot find device "ovs_br0"
# TEST: IPv6, OVS geneve4: PMTU exceptions - nexthop objects  [FAIL]
# Cannot find device "ovs_br0"
# TEST: IPv4, OVS geneve6: PMTU exceptions[FAIL]
# Cannot find device "ovs_br0"
# TEST: IPv4, OVS geneve6: PMTU exceptions - nexthop objects  [FAIL]
# Cannot find device "ovs_br0"
# TEST: IPv6, OVS geneve6: PMTU exceptions[FAIL]
# Cannot find device "ovs_br0"

Any idea why? Looks like kernel config did include OVS, perhaps we need
explicit modprobe now? I don't see any more details in the logs.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v3 4/4] controller, northd: Add support for CT zone limits.

2024-06-21 Thread Lorenzo Bianconi
> Add support for limiting the CT zone usage per Ls, LR or LSP.
> When the limit is configured on logical switch it will also implicitly
> set limits for all ports in that logical switch. The port configuration
> can be overwritten individually and has priority over the whole logical
> switch configuration.
> 
> The value 0 means unlimited, when the value is not specified it is
> derived from OvS default CT limit specified for given OvS datapath.

Hi Ales,

I think overall the patch is fine, it needs to be rebased since it has trivial
conflicts to address. Just a nit inline.

Regards,
Lorenzo

> 
> Reported-at: https://bugzilla.redhat.com/2189924
> Signed-off-by: Ales Musil 
> ---
> v3: Rebase on top of latest main.
> ---
>  NEWS|   3 +
>  controller/ct-zone.c| 170 
>  controller/ct-zone.h|  12 ++-
>  controller/ovn-controller.c |  25 +-
>  lib/ovn-util.c  |  17 
>  lib/ovn-util.h  |   3 +
>  northd/northd.c |   8 ++
>  ovn-nb.xml  |  29 ++
>  tests/ovn-controller.at |  99 +
>  9 files changed, 345 insertions(+), 21 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index 3bdc55172..22c1797e6 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -31,6 +31,9 @@ Post v24.03.0
>  has been renamed to "options:ic-route-denylist" in order to comply with
>  inclusive language guidelines. The previous name is still recognized to
>  aid with backwards compatibility.
> +  - Add support for CT zone limit that can be specified per LR
> +(options:ct-zone-limit), LS (other_config:ct-zone-limit) or LSP
> +(options:ct-zone-limit).
>  
>  OVN v24.03.0 - 01 Mar 2024
>  --
> diff --git a/controller/ct-zone.c b/controller/ct-zone.c
> index 95faec2f1..820ec061e 100644
> --- a/controller/ct-zone.c
> +++ b/controller/ct-zone.c
> @@ -34,6 +34,17 @@ static bool ct_zone_assign_unused(struct ct_zone_ctx *ctx,
>  static bool ct_zone_remove(struct ct_zone_ctx *ctx, const char *name);
>  static void ct_zone_add(struct ct_zone_ctx *ctx, const char *name,
>  uint16_t zone, bool set_pending);
> +static void ct_zone_limits_sync_per_dp(struct ct_zone_ctx *ctx,
> +   const struct sbrec_datapath_binding 
> *dp,
> +   const char *name,
> +   struct ovsdb_idl_index *pb_by_dp);
> +static void ct_zone_limit_sync(struct ct_zone_ctx *ctx, const char *name,
> +   int64_t limit);
> +static int64_t ct_zone_get_dp_limit(const struct sbrec_datapath_binding *dp);
> +static int64_t ct_zone_get_pb_limit(const struct sbrec_port_binding *pb);
> +static int64_t ct_zone_limit_normalize(int64_t limit);
> +static struct ovsrec_ct_zone *
> +ct_zone_find_ovsrec(const struct ovsrec_datapath *dp, uint16_t zone_id);
>  
>  void
>  ct_zones_restore(struct ct_zone_ctx *ctx,
> @@ -196,11 +207,14 @@ ct_zones_update(const struct sset *local_lports,
>  
>  void
>  ct_zones_commit(const struct ovsrec_bridge *br_int,
> +const struct ovsrec_datapath *ovs_dp,
> +struct ovsdb_idl_txn *ovs_idl_txn,
>  struct shash *pending_ct_zones)
>  {
>  struct shash_node *iter;
>  SHASH_FOR_EACH (iter, pending_ct_zones) {
>  struct ct_zone_pending_entry *ctzpe = iter->data;
> +struct ct_zone *ct_zone = >ct_zone;
>  
>  /* The transaction is open, so any pending entries in the
>   * CT_ZONE_DB_QUEUED must be sent and any in CT_ZONE_DB_QUEUED
> @@ -212,7 +226,7 @@ ct_zones_commit(const struct ovsrec_bridge *br_int,
>  
>  char *user_str = xasprintf("ct-zone-%s", iter->name);
>  if (ctzpe->add) {
> -char *zone_str = xasprintf("%"PRIu16, ctzpe->ct_zone.zone);
> +char *zone_str = xasprintf("%"PRIu16, ct_zone->zone);
>  struct smap_node *node =
>  smap_get_node(_int->external_ids, user_str);
>  if (!node || strcmp(node->value, zone_str)) {
> @@ -227,6 +241,19 @@ ct_zones_commit(const struct ovsrec_bridge *br_int,
>  }
>  free(user_str);
>  
> +struct ovsrec_ct_zone *ovs_zone =
> +ct_zone_find_ovsrec(ovs_dp, ct_zone->zone);
> +if ((!ctzpe->add || ct_zone->limit < 0) && ovs_zone) {
> +ovsrec_datapath_update_ct_zones_delkey(ovs_dp, ct_zone->zone);
> +} else if (ctzpe->add && ct_zone->limit >= 0) {
> +if (!ovs_zone) {
> +ovs_zone = ovsrec_ct_zone_insert(ovs_idl_txn);
> +ovsrec_datapath_update_ct_zones_setkey(ovs_dp, ct_zone->zone,
> +   ovs_zone);
> +}
> +ovsrec_ct_zone_set_limit(ovs_zone, _zone->limit, 1);
> +}
> +
>  ctzpe->state = CT_ZONE_DB_SENT;
>  }
>  }
> @@ -247,8 +274,19 @@ 

Re: [ovs-dev] [PATCH v3] ofp-prop: Fix unaligned 128 bit access.

2024-06-21 Thread Ilya Maximets
On 6/20/24 07:21, Ales Musil wrote:
> On Wed, Jun 19, 2024 at 3:19 PM Mike Pattrick  wrote:
> 
>> When compiling with '-fsanitize=address,undefined', the "ovs-ofctl
>> ct-flush" test will yield the following undefined behavior flagged by
>> UBSan. This problem is caused by the fact that 128bit property put/parse
>> functions weren't adding appropriate padding before writing or reading
>> the value.
>>
>> This patch uses get_32aligned_* functions to copy the bytes as they are
>> aligned.
>>
>> lib/ofp-prop.c:277:14: runtime error: load of misaligned address
>> 0x6060687c for type 'union ovs_be128', which requires 8 byte
>> alignment
>> 0x6060687c: note: pointer points here
>>   00 05 00 14 00 00 00 00  00 00 00 00 00 00 00 00  00 ff ab 00
>>   ^
>> 0: in ofpprop_parse_u128 lib/ofp-prop.c:277
>> 1: in ofp_ct_match_decode lib/ofp-ct.c:525
>> 2: in ofp_print_nxt_ct_flush lib/ofp-print.c:959
>> 3: in ofp_to_string__ lib/ofp-print.c:1206
>> 4: in ofp_to_string lib/ofp-print.c:1264
>> 5: in ofp_print lib/ofp-print.c:1308
>> 6: in ofctl_ofp_print utilities/ovs-ofctl.c:4899
>> 7: in ovs_cmdl_run_command__ lib/command-line.c:247
>> 8: in ovs_cmdl_run_command lib/command-line.c:278
>> 9: in main utilities/ovs-ofctl.c:186
>>
>> Signed-off-by: Mike Pattrick 
>> ---
>> v2: removed memcpy
>> v3: fixed checkpatch
>> ---
>>  lib/ofp-prop.c | 17 ++---
>>  1 file changed, 10 insertions(+), 7 deletions(-)
>>
> Looks good to me, thanks.
> 
> Acked-by: Ales Musil 
> 

Thanks, Mike and Ales!  I added the missing Fixes tag and applied
the change.  Also backported to 3.3.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v3 3/4] controller: Prepare structure around CT zone limiting.

2024-06-21 Thread Lorenzo Bianconi
> In order to be able to store CT limits for specified zone, store the
> zone inside separate struct instead of simap. This allows to add
> the addition of limit without changing the whole infrastructure again.
> 
> This is a preparation step for the CT zone limits.
> 
> Signed-off-by: Ales Musil 
> ---
> v3: Rebase on top of latest main.
> v2: Fix NULL ptr deref.
> ---
>  controller/ct-zone.c| 171 +---
>  controller/ct-zone.h|  13 ++-
>  controller/ofctrl.c |   2 +-
>  controller/ovn-controller.c |  15 ++--
>  controller/physical.c   |  17 ++--
>  controller/physical.h   |   2 +-
>  6 files changed, 128 insertions(+), 92 deletions(-)
> 
> diff --git a/controller/ct-zone.c b/controller/ct-zone.c
> index e4f66a52a..95faec2f1 100644
> --- a/controller/ct-zone.c
> +++ b/controller/ct-zone.c

[...]
> @@ -163,26 +167,12 @@ ct_zones_update(const struct sset *local_lports,
>  bitmap_set0(unreq_snat_zones_map, snat_req_node->data);
>  }
>  
> -struct simap_node *node = simap_find(>current,
> - snat_req_node->name);
> -if (node) {
> -if (node->data != snat_req_node->data) {
> -/* Zone request has changed for this node. delete old entry 
> and
> - * create new one*/
> -ct_zone_add_pending(>pending, CT_ZONE_OF_QUEUED,
> -snat_req_node->data, true,
> -snat_req_node->name);
> -bitmap_set0(ctx->bitmap, node->data);
> -}
> -bitmap_set1(ctx->bitmap, snat_req_node->data);
> -node->data = snat_req_node->data;
> -} else {
> -ct_zone_add_pending(>pending, CT_ZONE_OF_QUEUED,
> -snat_req_node->data, true,
> -snat_req_node->name);
> -bitmap_set1(ctx->bitmap, snat_req_node->data);
> -simap_put(>current, snat_req_node->name, 
> snat_req_node->data);
> +struct ct_zone *ct_zone = shash_find_data(>current,
> +  snat_req_node->name);
> +if (node && ct_zone->zone != snat_req_node->data) {
> +ct_zone_remove(ctx, snat_req_node->name);
>  }
> +ct_zone_add(ctx, snat_req_node->name, snat_req_node->data, true);

IIUC we will perform the same lookup twice here (here and in 
ct_zone_remove/ct_zone_add). Can we rearrange ct_zone_add and ct_zone_remove 
signatures to avoid it?

>  }
>  
>  /* xxx This is wasteful to assign a zone to each port--even if no
> @@ -191,7 +181,7 @@ ct_zones_update(const struct sset *local_lports,
>  /* Assign a unique zone id for each logical port and two zones
>   * to a gateway router. */
>  SSET_FOR_EACH (user, _users) {
> -if (simap_contains(>current, user)) {
> +if (shash_find(>current, user)) {
>  continue;
>  }
>  
> @@ -222,7 +212,7 @@ ct_zones_commit(const struct ovsrec_bridge *br_int,
>  
>  char *user_str = xasprintf("ct-zone-%s", iter->name);
>  if (ctzpe->add) {
> -char *zone_str = xasprintf("%"PRId32, ctzpe->zone);
> +char *zone_str = xasprintf("%"PRIu16, ctzpe->ct_zone.zone);
>  struct smap_node *node =
>  smap_get_node(_int->external_ids, user_str);
>  if (!node || strcmp(node->value, zone_str)) {
> @@ -281,12 +271,17 @@ ct_zone_handle_dp_update(struct ct_zone_ctx *ctx,
>   * or not.  If so, then fall back to full recompute of
>   * ct_zone engine. */
>  char *snat_dp_zone_key = alloc_nat_zone_key(name, "snat");
> -struct simap_node *simap_node =
> -simap_find(>current, snat_dp_zone_key);
> +struct shash_node *node = shash_find(>current, snat_dp_zone_key);
>  free(snat_dp_zone_key);
> -if (!simap_node || simap_node->data != req_snat_zone) {
> -/* There is no entry yet or the requested snat zone has changed.
> - * Trigger full recompute of ct_zones engine. */
> +
> +if (!node) {
> +return false;
> +}
> +
> +struct ct_zone *ct_zone = node->data;
> +if (ct_zone->zone != req_snat_zone) {
> +/* The requested snat zone has changed. Trigger full recompute of
> + * ct_zones engine. */
>  return false;
>  }
>  
> @@ -298,17 +293,24 @@ bool
>  ct_zone_handle_port_update(struct ct_zone_ctx *ctx, const char *name,
> bool updated, int *scan_start)
>  {
> -struct simap_node *ct_zone = simap_find(>current, name);
> -if (updated && !ct_zone) {
> +struct shash_node *node = shash_find(>current, name);
> +if (updated && !node) {
>  ct_zone_assign_unused(ctx, name, scan_start);
>  return true;
> -} else if (!updated && ct_zone_remove(ctx, ct_zone)) {
> +} else if (!updated && node && 

Re: [ovs-dev] [PATCH ovn v3 2/4] controller: Further encapsulate the CT zone handling.

2024-06-21 Thread Lorenzo Bianconi
Hi Ales,

just a nit inline. Other than that:

Acked-by: Lorenzo Bianconi 

> Move more code into the new ct-zone module and encapsulate
> functionality that is strictly related to CT zone handling.
> 
> Signed-off-by: Ales Musil 
> ---
> v3: Rebase on top of latest main.
> ---
>  controller/ct-zone.c| 156 +---
>  controller/ct-zone.h|   8 +-
>  controller/ovn-controller.c |  49 ++-
>  3 files changed, 118 insertions(+), 95 deletions(-)
> 
> diff --git a/controller/ct-zone.c b/controller/ct-zone.c
> index 3e37fedb6..e4f66a52a 100644
> --- a/controller/ct-zone.c
> +++ b/controller/ct-zone.c
> @@ -27,6 +27,11 @@ ct_zone_restore(const struct sbrec_datapath_binding_table 
> *dp_table,
>  static void ct_zone_add_pending(struct shash *pending_ct_zones,
>  enum ct_zone_pending_state state,
>  int zone, bool add, const char *name);
> +static int ct_zone_get_snat(const struct sbrec_datapath_binding *dp);
> +static bool ct_zone_assign_unused(struct ct_zone_ctx *ctx,
> +  const char *zone_name, int *scan_start);
> +static bool ct_zone_remove(struct ct_zone_ctx *ctx,
> +   struct simap_node *ct_zone);
>  
>  void
>  ct_zones_restore(struct ct_zone_ctx *ctx,
> @@ -82,47 +87,6 @@ ct_zones_restore(struct ct_zone_ctx *ctx,
>  }
>  }
>  
> -bool
> -ct_zone_assign_unused(struct ct_zone_ctx *ctx, const char *zone_name,
> -  int *scan_start)
> -{
> -/* We assume that there are 64K zones and that we own them all. */
> -int zone = bitmap_scan(ctx->bitmap, 0, *scan_start, MAX_CT_ZONES + 1);
> -if (zone == MAX_CT_ZONES + 1) {
> -static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
> -VLOG_WARN_RL(, "exhausted all ct zones");
> -return false;
> -}
> -
> -*scan_start = zone + 1;
> -
> -ct_zone_add_pending(>pending, CT_ZONE_OF_QUEUED,
> -zone, true, zone_name);
> -
> -bitmap_set1(ctx->bitmap, zone);
> -simap_put(>current, zone_name, zone);
> -return true;
> -}
> -
> -bool
> -ct_zone_remove(struct ct_zone_ctx *ctx, const char *name)
> -{
> -struct simap_node *ct_zone = simap_find(>current, name);
> -if (!ct_zone) {
> -return false;
> -}
> -
> -VLOG_DBG("removing ct zone %"PRId32" for '%s'", ct_zone->data,
> - ct_zone->name);
> -
> -ct_zone_add_pending(>pending, CT_ZONE_OF_QUEUED,
> -ct_zone->data, false, ct_zone->name);
> -bitmap_set0(ctx->bitmap, ct_zone->data);
> -simap_delete(>current, ct_zone);
> -
> -return true;
> -}
> -
>  void
>  ct_zones_update(const struct sset *local_lports,
>  const struct hmap *local_datapaths, struct ct_zone_ctx *ctx)
> @@ -170,7 +134,7 @@ ct_zones_update(const struct sset *local_lports,
>  /* Delete zones that do not exist in above sset. */
>  SIMAP_FOR_EACH_SAFE (ct_zone, >current) {
>  if (!sset_contains(_users, ct_zone->name)) {
> -ct_zone_remove(ctx, ct_zone->name);
> +ct_zone_remove(ctx, ct_zone);
>  } else if (!simap_find(_snat_zones, ct_zone->name)) {
>  bitmap_set1(unreq_snat_zones_map, ct_zone->data);
>  simap_put(_snat_zones, ct_zone->name, ct_zone->data);
> @@ -277,12 +241,6 @@ ct_zones_commit(const struct ovsrec_bridge *br_int,
>  }
>  }
>  
> -int
> -ct_zone_get_snat(const struct sbrec_datapath_binding *dp)
> -{
> -return smap_get_int(>external_ids, "snat-ct-zone", -1);
> -}
> -
>  void
>  ct_zones_pending_clear_commited(struct shash *pending)
>  {
> @@ -296,6 +254,108 @@ ct_zones_pending_clear_commited(struct shash *pending)
>  }
>  }
>  
> +/* Returns "true" when there is no need for full recompute. */
> +bool
> +ct_zone_handle_dp_update(struct ct_zone_ctx *ctx,
> + const struct sbrec_datapath_binding *dp)
> +{
> +int req_snat_zone = ct_zone_get_snat(dp);
> +if (req_snat_zone == -1) {
> +/* datapath snat ct zone is not set.  This condition will also hit
> + * when CMS clears the snat-ct-zone for the logical router.
> + * In this case there is no harm in using the previosly specified
> + * snat ct zone for this datapath.  Also it is hard to know
> + * if this option was cleared or if this option is never set. */
> +return true;
> +}
> +
> +const char *name = smap_get(>external_ids, "name");
> +if (!name) {
> +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
> +VLOG_ERR_RL(, "Missing name for datapath '"UUID_FMT"' skipping"
> +"zone check.", UUID_ARGS(>header_.uuid));
> +return true;
> +}
> +
> +/* Check if the requested snat zone has changed for the datapath
> + * or not.  If so, then fall back to full recompute of
> + * ct_zone engine. */
> +char 

Re: [ovs-dev] [Patch] ovsdb-client: Document "--timeout" option in help.

2024-06-21 Thread Simon Horman
On Wed, Jun 19, 2024 at 11:54:37AM +0100, Simon Horman wrote:
> On Fri, Jun 07, 2024 at 01:58:42PM +0200, martin.kal...@canonical.com wrote:
> > I made a silly typo in the commit message s/infomration/information.
> > Would it be possible to fix it when the commit is applied, or is it
> > worth posting v2?
> 
> Thanks Martin,
> 
> I think we can handle that when applying.

Thanks again Martin,

Applied with the typo fixed.

- ovsdb-client: Document "--timeout" option in help.
  https://github.com/openvswitch/ovs/commit/24907bd1bc1a

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v3 1/4] controller: Move CT zone handling into separate module.

2024-06-21 Thread Lorenzo Bianconi
> Move the CT zone handling specific bits into its own module. This
> allows for easier changes done within the module and separates the
> logic that is unrelated from ovn-controller.
> 
> Signed-off-by: Ales Musil 
> ---
> v3: Rebase on top of latest main.
> ---

Acked-by: Lorenzo Bianconi 

>  controller/automake.mk  |   4 +-
>  controller/ct-zone.c| 378 ++
>  controller/ct-zone.h|  74 +++
>  controller/ofctrl.c |   3 +-
>  controller/ovn-controller.c | 393 +++-
>  controller/ovn-controller.h |  21 +-
>  controller/pinctrl.c|   2 +-
>  tests/ovn.at|   4 +-
>  8 files changed, 486 insertions(+), 393 deletions(-)
>  create mode 100644 controller/ct-zone.c
>  create mode 100644 controller/ct-zone.h
> 
> diff --git a/controller/automake.mk b/controller/automake.mk
> index 1b1b3aeb1..ed93cfb3c 100644
> --- a/controller/automake.mk
> +++ b/controller/automake.mk
> @@ -47,7 +47,9 @@ controller_ovn_controller_SOURCES = \
>   controller/mac-cache.h \
>   controller/mac-cache.c \
>   controller/statctrl.h \
> - controller/statctrl.c
> + controller/statctrl.c \
> + controller/ct-zone.h \
> + controller/ct-zone.c
>  
>  controller_ovn_controller_LDADD = lib/libovn.la 
> $(OVS_LIBDIR)/libopenvswitch.la
>  man_MANS += controller/ovn-controller.8
> diff --git a/controller/ct-zone.c b/controller/ct-zone.c
> new file mode 100644
> index 0..3e37fedb6
> --- /dev/null
> +++ b/controller/ct-zone.c
> @@ -0,0 +1,378 @@
> +/* Copyright (c) 2024, Red Hat, Inc.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#include 
> +
> +#include "ct-zone.h"
> +#include "local_data.h"
> +#include "openvswitch/vlog.h"
> +
> +VLOG_DEFINE_THIS_MODULE(ct_zone);
> +
> +static void
> +ct_zone_restore(const struct sbrec_datapath_binding_table *dp_table,
> +struct ct_zone_ctx *ctx, const char *name, int zone);
> +static void ct_zone_add_pending(struct shash *pending_ct_zones,
> +enum ct_zone_pending_state state,
> +int zone, bool add, const char *name);
> +
> +void
> +ct_zones_restore(struct ct_zone_ctx *ctx,
> + const struct ovsrec_open_vswitch_table *ovs_table,
> + const struct sbrec_datapath_binding_table *dp_table,
> + const struct ovsrec_bridge *br_int)
> +{
> +memset(ctx->bitmap, 0, sizeof ctx->bitmap);
> +bitmap_set1(ctx->bitmap, 0); /* Zone 0 is reserved. */
> +
> +struct shash_node *pending_node;
> +SHASH_FOR_EACH (pending_node, >pending) {
> +struct ct_zone_pending_entry *ctpe = pending_node->data;
> +
> +if (ctpe->add) {
> +ct_zone_restore(dp_table, ctx, pending_node->name, ctpe->zone);
> +}
> +}
> +
> +const struct ovsrec_open_vswitch *cfg;
> +cfg = ovsrec_open_vswitch_table_first(ovs_table);
> +if (!cfg) {
> +return;
> +}
> +
> +if (!br_int) {
> +/* If the integration bridge hasn't been defined, assume that
> + * any existing ct-zone definitions aren't valid. */
> +return;
> +}
> +
> +struct smap_node *node;
> +SMAP_FOR_EACH (node, _int->external_ids) {
> +if (strncmp(node->key, "ct-zone-", 8)) {
> +continue;
> +}
> +
> +const char *user = node->key + 8;
> +if (!user[0]) {
> +continue;
> +}
> +
> +if (shash_find(>pending, user)) {
> +continue;
> +}
> +
> +unsigned int zone;
> +if (!str_to_uint(node->value, 10, )) {
> +continue;
> +}
> +
> +ct_zone_restore(dp_table, ctx, user, zone);
> +}
> +}
> +
> +bool
> +ct_zone_assign_unused(struct ct_zone_ctx *ctx, const char *zone_name,
> +  int *scan_start)
> +{
> +/* We assume that there are 64K zones and that we own them all. */
> +int zone = bitmap_scan(ctx->bitmap, 0, *scan_start, MAX_CT_ZONES + 1);
> +if (zone == MAX_CT_ZONES + 1) {
> +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
> +VLOG_WARN_RL(, "exhausted all ct zones");
> +return false;
> +}
> +
> +*scan_start = zone + 1;
> +
> +ct_zone_add_pending(>pending, CT_ZONE_OF_QUEUED,
> +zone, true, zone_name);
> +
> +bitmap_set1(ctx->bitmap, zone);

Re: [ovs-dev] [Patch v2] ovsdb-client: Add "COLUMN" arg to help for 'dump'.

2024-06-21 Thread Simon Horman
On Fri, Jun 07, 2024 at 11:30:55AM +0200, Martin Kalcok wrote:
> Help text for 'ovsdb-client dump' does not mention that it's capable
> of dumping a specific column's contents if the user supplies the
> column's name as a fourth positional argument.
> 
> Signed-off-by: Martin Kalcok 

Thanks Martin,

Applied.

- ovsdb-client: Add "COLUMN" arg to help for 'dump'.
  https://github.com/openvswitch/ovs/commit/8b405f45d5f1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] Userspace: Software fallback for UDP encapsulated TCP segmentation.

2024-06-21 Thread David Marchand
Hi Mike,

On Thu, Jun 13, 2024 at 4:54 AM Mike Pattrick  wrote:
>
> When sending packets that are flagged as requiring segmentation to an
> interface that doens't support this feature, send the packet to the TSO
> software fallback instead of dropping it.
>
> Signed-off-by: Mike Pattrick 

I started testing this patch (still not finished), I have some
questions and comments.

>
> ---
> v2:
>  - Fixed udp tunnel length
>  - Added test that UDP headers are correct
>  - Split inner and outer ip_id into different counters
>  - Set tunnel flags in reset_tcp_seg
>
> Signed-off-by: Mike Pattrick 
> ---
>  lib/dp-packet-gso.c | 89 +
>  lib/dp-packet.h | 34 
>  lib/netdev-native-tnl.c |  8 
>  lib/netdev.c| 37 +++--
>  tests/system-traffic.at | 87 
>  5 files changed, 216 insertions(+), 39 deletions(-)
>
> diff --git a/lib/dp-packet-gso.c b/lib/dp-packet-gso.c
> index 847685ad9..dc43ad662 100644
> --- a/lib/dp-packet-gso.c
> +++ b/lib/dp-packet-gso.c
> @@ -47,6 +47,8 @@ dp_packet_gso_seg_new(const struct dp_packet *p, size_t 
> hdr_len,
>  seg->l2_5_ofs = p->l2_5_ofs;
>  seg->l3_ofs = p->l3_ofs;
>  seg->l4_ofs = p->l4_ofs;
> +seg->inner_l3_ofs = p->inner_l3_ofs;
> +seg->inner_l4_ofs = p->inner_l4_ofs;
>
>  /* The protocol headers remain the same, so preserve hash and mark. */
>  *dp_packet_rss_ptr(seg) = *dp_packet_rss_ptr(p);
> @@ -71,7 +73,12 @@ dp_packet_gso_nr_segs(struct dp_packet *p)
>  const char *data_tail;
>  const char *data_pos;
>
> -data_pos = dp_packet_get_tcp_payload(p);
> +if (dp_packet_hwol_is_tunnel_vxlan(p) ||
> +dp_packet_hwol_is_tunnel_geneve(p)) {
> +data_pos = dp_packet_get_inner_tcp_payload(p);
> +} else {
> +data_pos = dp_packet_get_tcp_payload(p);
> +}
>  data_tail = (char *) dp_packet_tail(p) - dp_packet_l2_pad_size(p);
>
>  return DIV_ROUND_UP(data_tail - data_pos, segsz);
> @@ -89,14 +96,19 @@ dp_packet_gso(struct dp_packet *p, struct dp_packet_batch 
> **batches)
>  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
>  struct dp_packet_batch *curr_batch = *batches;
>  struct tcp_header *tcp_hdr;
> +struct udp_header *tnl_hdr;
>  struct ip_header *ip_hdr;
> +uint16_t inner_ip_id = 0;
> +uint16_t outer_ip_id = 0;
>  struct dp_packet *seg;
> +const char *data_pos;
>  uint16_t tcp_offset;
>  uint16_t tso_segsz;
>  uint32_t tcp_seq;
> -uint16_t ip_id;
> +bool outer_ipv4;
>  int hdr_len;
>  int seg_len;
> +bool tnl;
>
>  tso_segsz = dp_packet_get_tso_segsz(p);
>  if (!tso_segsz) {
> @@ -105,20 +117,38 @@ dp_packet_gso(struct dp_packet *p, struct 
> dp_packet_batch **batches)
>  return false;
>  }
>
> -tcp_hdr = dp_packet_l4(p);
> -tcp_offset = TCP_OFFSET(tcp_hdr->tcp_ctl);
> -tcp_seq = ntohl(get_16aligned_be32(_hdr->tcp_seq));
> -hdr_len = ((char *) dp_packet_l4(p) - (char *) dp_packet_eth(p))
> -  + tcp_offset * 4;
> -ip_id = 0;
> -if (dp_packet_hwol_is_ipv4(p)) {
> +if (dp_packet_hwol_is_tunnel_vxlan(p) ||
> +dp_packet_hwol_is_tunnel_geneve(p)) {
> +data_pos =  dp_packet_get_inner_tcp_payload(p);
> +outer_ipv4 = dp_packet_hwol_is_outer_ipv4(p);
> +tcp_hdr = dp_packet_inner_l4(p);
> +ip_hdr = dp_packet_inner_l3(p);
> +tnl = true;
> +
> +if (outer_ipv4) {
> +outer_ip_id = ntohs(((struct ip_header *) 
> dp_packet_l3(p))->ip_id);
> +}
> +if (dp_packet_hwol_is_ipv4(p)) {
> +inner_ip_id = ntohs(ip_hdr->ip_id);
> +}
> +} else {
> +data_pos = dp_packet_get_tcp_payload(p);
> +outer_ipv4 = dp_packet_hwol_is_ipv4(p);
> +tcp_hdr = dp_packet_l4(p);
>  ip_hdr = dp_packet_l3(p);
> -ip_id = ntohs(ip_hdr->ip_id);
> +tnl = false;
> +
> +if (outer_ipv4) {
> +outer_ip_id = ntohs(ip_hdr->ip_id);
> +}
>  }
>
> +tcp_offset = TCP_OFFSET(tcp_hdr->tcp_ctl);
> +tcp_seq = ntohl(get_16aligned_be32(_hdr->tcp_seq));
> +hdr_len = ((char *) tcp_hdr - (char *) dp_packet_eth(p))
> +  + tcp_offset * 4;
>  const char *data_tail = (char *) dp_packet_tail(p)
>  - dp_packet_l2_pad_size(p);
> -const char *data_pos = dp_packet_get_tcp_payload(p);
>  int n_segs = dp_packet_gso_nr_segs(p);
>
>  for (int i = 0; i < n_segs; i++) {
> @@ -130,14 +160,35 @@ dp_packet_gso(struct dp_packet *p, struct 
> dp_packet_batch **batches)
>  seg = dp_packet_gso_seg_new(p, hdr_len, data_pos, seg_len);
>  data_pos += seg_len;
>
> +if (tnl) {
> +/* Update tunnel inner L3 header. */
> +if (dp_packet_hwol_is_ipv4(seg)) {
> +ip_hdr = dp_packet_inner_l3(seg);
> +

Re: [ovs-dev] [PATCH v3] netdev-offload-dpdk: Support offload of set dscp action.

2024-06-21 Thread Simon Horman
On Thu, Jun 20, 2024 at 03:35:20PM +0800, Sunyang Wu via dev wrote:
> Add the "set dscp action" parsing function,
> so that the "set dscp action" can be offloaded.
> 
> Signed-off-by: Sunyang Wu 

Acked-by: Simon Horman 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] utilties: Allow ovn-detrace to run on ovs-ofctl dump-flows output.

2024-06-21 Thread Dumitru Ceara
On 6/3/24 10:23, Ales Musil wrote:
> The ovs-ofctl dump-flows output is slightly different from oproto/trace
> cookie 0xXXX vs. cookie=0xXXX. Update the regex that it also matches
> on the equals case. This allows us to run ovn-detrace against the
> ovs-ofctl dump-flows output.
> 
> Also provide simple, partially hardcoded test case for ovn-detrace.
> 
> Signed-off-by: Ales Musil 
> ---

Looks good to me, thanks!

Acked-by: Dumitru Ceara 

Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5] tunnel: Allow UDP zero checksum with IPv6 tunnels.

2024-06-21 Thread Ilya Maximets
On 3/27/24 16:55, Mike Pattrick wrote:
> This patch adopts the proposed RFC 6935 by allowing null UDP checksums
> even if the tunnel protocol is IPv6. This is already supported by Linux
> through the udp6zerocsumtx tunnel option. It is disabled by default and
> IPv6 tunnels are flagged as requiring a checksum, but this patch enables
> the user to set csum=false on IPv6 tunnels.
> 
> Signed-off-by: Mike Pattrick 
> ---
> v2: Changed documentation, and added a NEWS item
> v3: NEWS file merge conflict
> v4: Better comments, new test
> v5: Addressed identified nit's
> ---
>  NEWS  |  4 
>  lib/netdev-native-tnl.c   |  2 +-
>  lib/netdev-vport.c| 17 +++--
>  lib/netdev.h  | 18 +-
>  ofproto/tunnel.c  | 10 --
>  tests/tunnel-push-pop-ipv6.at |  9 +
>  tests/tunnel-push-pop.at  |  7 +++
>  tests/tunnel.at   |  2 +-
>  vswitchd/vswitch.xml  | 12 +---
>  9 files changed, 71 insertions(+), 10 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index c9e4064e6..6c8c4a2dc 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -4,6 +4,10 @@ Post-v3.3.0
>   * Conntrack now supports 'random' flag for selecting ports in a range
> while natting and 'persistent' flag for selection of the IP address
> from a range.
> + * IPv6 UDP tunnel encapsulation including Geneve and VXLAN will now
> +   honour the csum option.  Configuring the interface with
> +   "options:csum=false" now has the same effect as the udp6zerocsumtx
> +   option has with Linux kernel UDP tunnels.
>  
>  
>  v3.3.0 - 16 Feb 2024
> diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
> index dee9ab344..e8258bc4e 100644
> --- a/lib/netdev-native-tnl.c
> +++ b/lib/netdev-native-tnl.c
> @@ -424,7 +424,7 @@ udp_build_header(const struct netdev_tunnel_config 
> *tnl_cfg,
>  udp = netdev_tnl_ip_build_header(data, params, IPPROTO_UDP, 0);
>  udp->udp_dst = tnl_cfg->dst_port;
>  
> -if (params->is_ipv6 || params->flow->tunnel.flags & FLOW_TNL_F_CSUM) {
> +if (params->flow->tunnel.flags & FLOW_TNL_F_CSUM) {
>  /* Write a value in now to mark that we should compute the checksum
>   * later. 0x is handy because it is transparent to the
>   * calculation. */
> diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
> index 60caa02fb..234a4ebe1 100644
> --- a/lib/netdev-vport.c
> +++ b/lib/netdev-vport.c
> @@ -702,7 +702,9 @@ set_tunnel_config(struct netdev *dev_, const struct smap 
> *args, char **errp)
>  tnl_cfg.dst_port = htons(atoi(node->value));
>  } else if (!strcmp(node->key, "csum") && has_csum) {
>  if (!strcmp(node->value, "true")) {
> -tnl_cfg.csum = true;
> +tnl_cfg.csum = NETDEV_TNL_CSUM_ENABLED;
> +} else if (!strcmp(node->value, "false")) {
> +tnl_cfg.csum = NETDEV_TNL_CSUM_DISABLED;
>  }
>  } else if (!strcmp(node->key, "seq") && has_seq) {
>  if (!strcmp(node->value, "true")) {
> @@ -850,6 +852,15 @@ set_tunnel_config(struct netdev *dev_, const struct smap 
> *args, char **errp)
>  }
>  }
>  
> +/* The default csum state for GRE is special as it does have an optional
> + * checksum but the default configuration isn't correlated with IP 
> version
> + * like UDP tunnels are.  Likewise, tunnels with no checksum at all must 
> be
> + * in this state. */
> +if (tnl_cfg.csum == NETDEV_TNL_CSUM_DEFAULT &&
> +(!has_csum || strstr(type, "gre"))) {
> +tnl_cfg.csum = NETDEV_TNL_DEFAULT_NO_CSUM;
> +}
> +
>  enum tunnel_layers layers = tunnel_supported_layers(type, _cfg);
>  const char *full_type = (strcmp(type, "vxlan") ? type
>   : (tnl_cfg.exts & (1 << OVS_VXLAN_EXT_GPE)
> @@ -1026,8 +1037,10 @@ get_tunnel_config(const struct netdev *dev, struct 
> smap *args)
>  }
>  }
>  
> -if (tnl_cfg->csum) {
> +if (tnl_cfg->csum == NETDEV_TNL_CSUM_ENABLED) {
>  smap_add(args, "csum", "true");
> +} else if (tnl_cfg->csum == NETDEV_TNL_CSUM_DISABLED) {
> +smap_add(args, "csum", "false");
>  }
>  
>  if (tnl_cfg->set_seq) {
> diff --git a/lib/netdev.h b/lib/netdev.h
> index 67a8486bd..5d253157c 100644
> --- a/lib/netdev.h
> +++ b/lib/netdev.h
> @@ -111,6 +111,22 @@ enum netdev_srv6_flowlabel {
>  SRV6_FLOWLABEL_COMPUTE,
>  };
>  
> +enum netdev_tnl_csum {
> +/* Default value for UDP tunnels if no configurations is present.  
> Enforce
> + * checksum calculation in IPv6 tunnels, disable in IPv4 tunnels. */
> +NETDEV_TNL_CSUM_DEFAULT = 0,
> +
> +/* Checksum explicitly to be calculated. */
> +NETDEV_TNL_CSUM_ENABLED,
> +
> +/* Checksum calculation explicitly disabled. */
> +NETDEV_TNL_CSUM_DISABLED,
> +
> +/* A value for when there is no checksum or the default value is no
> + 

Re: [ovs-dev] [RFC v2 0/9] Introduce local sampling with NXAST_SAMPLE action

2024-06-21 Thread Eelco Chaudron


On 21 Jun 2024, at 10:00, Adrián Moreno wrote:

> On Thu, Jun 20, 2024 at 02:20:53PM GMT, Eelco Chaudron wrote:
>>
>>
>> On 5 Jun 2024, at 22:23, Adrian Moreno wrote:
>>
>>> (Was: Add psample support to NXAST_SAMPLE action)
>>>
>>> This is the userspace counterpart of the work being done in the kernel
>>> [1] which is still not merged (hence the RFC state). There, a new
>>> datapath action is added, called "emit_sample". Being a specific
>>> action, it promises a more efficient way of making packet samples
>>> available to observability applications.
>>>
>>> From the PoV of ovs-vswitchd, this new action is used to implement
>>> "local sampling". Local sampling (or lsample for short) is configured
>>> in a similar way as current per-flow IPFIX sampling, i.e: using the
>>> Flow_Sample_Collector_Set table and the NXAST_SAMPLE action.
>>>
>>> However, instead of sending the sample to an external IPFIX collector
>>> though the network, the sample is emitted using the new action and
>>> made available to locally running sample collector.
>>>
>>> The specific way emit_sample sends the sample (and the way the local
>>> collector shall collect it) is datapath-specific.
>>> Currently, currently only the Linux kernel datapath implements it using
>>> the psample netlink multicast group.
>>>
>>> ~~ Configuration ~~
>>> Local sampling is configured via a new column in the
>>> Flow_Sample_Collector_Set (FSCS) table called "local_sample_group".
>>> Configuring this value is orthogonal to also associating the FSCS
>>> entry to an entry in the IPFIX table.
>>>
>>> Once that entry in the OVSDB is configured, NXAST_SAMPLE actions coming
>>> from the controller will be translated into the following odp action:
>>>
>>>sample(sample={P}%, actions(emit_sample(group={G},cookie={C})))
>>>
>>> Where:
>>> P: Is the sampling probability from NXAST_SAMPLE
>>> G: Is the group id in the FSCS entry whose "id" matches the one in
>>> the NXAST_SAMPLE.
>>> C: Is a 64bit cookie result of concatenating the obs_domain and
>>> obs_point from the NXAST_SAMPLE, i.e:
>>> "obs_domain << 32 | obs_point"
>>> Notes:
>>> - The parent sample action might be omitted if the probability is
>>>   100% and there is no IPFIX sampling that requires the use of a
>>>   meter.
>>>
>>> ~~ Internal implementation: dpif-lsample ~~
>>> Internally, a new object called "dpif-lsample" is introduced to track
>>> the configured local sampling exporters and track statistics based on
>>> odp flow stats (using xcache).
>>> It exposes the list of configured exporters and their statistics on a
>>> new unixctl command called "lsample/show".
>>>
>>> ~~ Testing ~~
>>> The series includes an test utility program than can be executed by
>>> running "tests/ovstest test-psample". This utility listens
>>> to packets multicasted by the psample module and prints them (also
>>> printing the obs_domain and obs_point ids).
>>>
>>> ~~ HW Offload ~~
>>> tc offload is not being introduced in this series as existing sample
>>> or userspace actions are not currently offloadable. Also some
>>> improvements need to be implemented in tc for it to be feasible.
>>>
>>> [1]
>>> https://patchwork.kernel.org/project/netdevbpf/cover/20240603185647.2310748-1-amore...@redhat.com/
>>
>> Hi Adrian,
>>
>> I did not finish reviewing, but I was doing some testing, and I noticed I’m 
>> not getting any warning/error when configuring all of this without Datapath 
>> support.
>>
>> Are you planning to add some user feedback when trying to enable this 
>> without the kernel other than the “Datapath does not support emit_sample” 
>> which the user might not understand relates to this feature?
>>
>
> I added the following code in bridge.c when it configures local
> sampling:
>
>
> +ret = ofproto_set_local_sample(br->ofproto, opts_array, n_opts);
> +
> +if (ret == ENOTSUP) {
> +if (n_opts) {
> +VLOG_WARN_RL(,
> + "bridge %s: ignoring local sampling configuration: "
> + "not supported by this datapath",
> + br->name);
> +}
> +} else if (ret) {
> +VLOG_ERR_RL(, "bridge %s: error configuring local sampling: %s",
> +br->name, ovs_strerror(ret));
> +}
>
> I believe it should warn the user the local sampling configuration will
> not be applied.
>
> Is that not working?

You are right, I did not notice these log messages :( Doing too much stuff in 
parallel...

//Eelco

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next v4 10/10] selftests: openvswitch: add emit_sample test

2024-06-21 Thread Adrian Moreno
Add a test to verify sampling packets via psample works.

In order to do that, create a subcommand in ovs-dpctl.py to listen to
on the psample multicast group and print samples.

Signed-off-by: Adrian Moreno 
---
 .../selftests/net/openvswitch/openvswitch.sh  | 110 +-
 .../selftests/net/openvswitch/ovs-dpctl.py|  73 +++-
 2 files changed, 177 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh 
b/tools/testing/selftests/net/openvswitch/openvswitch.sh
index 15bca0708717..a07530ff13ca 100755
--- a/tools/testing/selftests/net/openvswitch/openvswitch.sh
+++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh
@@ -20,7 +20,8 @@ tests="
nat_related_v4  ip4-nat-related: ICMP related 
matches work with SNAT
netlink_checks  ovsnl: validate netlink attrs 
and settings
upcall_interfaces   ovs: test the upcall interfaces
-   drop_reason drop: test drop reasons are 
emitted"
+   drop_reason drop: test drop reasons are 
emitted
+   emit_sample emit_sample: Sampling packets 
with psample"
 
 info() {
 [ $VERBOSE = 0 ] || echo $*
@@ -102,12 +103,21 @@ ovs_netns_spawn_daemon() {
shift
netns=$1
shift
-   info "spawning cmd: $*"
-   ip netns exec $netns $*  >> $ovs_dir/stdout  2>> $ovs_dir/stderr &
+   if [ "$netns" == "_default" ]; then
+   $*  >> $ovs_dir/stdout  2>> $ovs_dir/stderr &
+   else
+   ip netns exec $netns $*  >> $ovs_dir/stdout  2>> 
$ovs_dir/stderr &
+   fi
pid=$!
ovs_sbx "$sbx" on_exit "kill -TERM $pid 2>/dev/null"
 }
 
+ovs_spawn_daemon() {
+   sbx=$1
+   shift
+   ovs_netns_spawn_daemon $sbx "_default" $*
+}
+
 ovs_add_netns_and_veths () {
info "Adding netns attached: sbx:$1 dp:$2 {$3, $4, $5}"
ovs_sbx "$1" ip netns add "$3" || return 1
@@ -170,6 +180,19 @@ ovs_drop_reason_count()
return `echo "$perf_output" | grep "$pattern" | wc -l`
 }
 
+ovs_test_flow_fails () {
+   ERR_MSG="Flow actions may not be safe on all matching packets"
+
+   PRE_TEST=$(dmesg | grep -c "${ERR_MSG}")
+   ovs_add_flow $@ &> /dev/null $@ && return 1
+   POST_TEST=$(dmesg | grep -c "${ERR_MSG}")
+
+   if [ "$PRE_TEST" == "$POST_TEST" ]; then
+   return 1
+   fi
+   return 0
+}
+
 usage() {
echo
echo "$0 [OPTIONS] [TEST]..."
@@ -184,6 +207,87 @@ usage() {
exit 1
 }
 
+
+# emit_sample test
+# - use emit_sample to observe packets
+test_emit_sample() {
+   sbx_add "test_emit_sample" || return $?
+
+   # Add a datapath with per-vport dispatching.
+   ovs_add_dp "test_emit_sample" emit_sample -V 2:1 || return 1
+
+   info "create namespaces"
+   ovs_add_netns_and_veths "test_emit_sample" "emit_sample" \
+   client c0 c1 172.31.110.10/24 -u || return 1
+   ovs_add_netns_and_veths "test_emit_sample" "emit_sample" \
+   server s0 s1 172.31.110.20/24 -u || return 1
+
+   # Check if emit_sample actions can be configured.
+   ovs_add_flow "test_emit_sample" emit_sample \
+   'in_port(1),eth(),eth_type(0x0806),arp()' 'emit_sample(group=1)'
+   if [ $? == 1 ]; then
+   info "no support for emit_sample - skipping"
+   ovs_exit_sig
+   return $ksft_skip
+   fi
+
+   ovs_del_flows "test_emit_sample" emit_sample
+
+   # Allow ARP
+   ovs_add_flow "test_emit_sample" emit_sample \
+   'in_port(1),eth(),eth_type(0x0806),arp()' '2' || return 1
+   ovs_add_flow "test_emit_sample" emit_sample \
+   'in_port(2),eth(),eth_type(0x0806),arp()' '1' || return 1
+
+   # Test action verification.
+   OLDIFS=$IFS
+   IFS='*'
+   min_key='in_port(1),eth(),eth_type(0x0800),ipv4()'
+   for testcase in \
+   "cookie to 
large"*"emit_sample(group=1,cookie=1615141312111009080706050403020100)" \
+   "no group with cookie"*"emit_sample(cookie=abcd)" \
+   "no group"*"sample()";
+   do
+   set -- $testcase;
+   ovs_test_flow_fails "test_emit_sample" emit_sample $min_key $2
+   if [ $? == 1 ]; then
+   info "failed - $1"
+   return 1
+   fi
+   done
+   IFS=$OLDIFS
+
+   # Sample first 14 bytes of all traffic.
+   ovs_add_flow "test_emit_sample" emit_sample \
+   
"in_port(1),eth(),eth_type(0x0800),ipv4(src=172.31.110.10,proto=1),icmp()" 
"trunc(14),emit_sample(group=1,cookie=c0ffee),2"
+
+   # Sample all traffic. In this case, use a sample() action with both
+   # emit_sample and an upcall emulating simultaneous local sampling and
+   # sFlow / IPFIX.
+   nlpid=$(grep -E "listening on upcall packet 

[ovs-dev] [PATCH net-next v4 09/10] selftests: openvswitch: parse trunc action

2024-06-21 Thread Adrian Moreno
The trunc action was supported decode-able but not parse-able. Add
support for parsing the action string.

Signed-off-by: Adrian Moreno 
---
 .../testing/selftests/net/openvswitch/ovs-dpctl.py  | 13 +
 1 file changed, 13 insertions(+)

diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py 
b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
index 071309289cff..a3c26ddac42f 100644
--- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
+++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
@@ -817,6 +817,19 @@ class ovsactions(nla):
 self["attrs"].append(["OVS_ACTION_ATTR_USERSPACE", uact])
 parsed = True
 
+elif parse_starts_block(actstr, "trunc(", False):
+parencount += 1
+actstr, val = parse_extract_field(
+actstr,
+"trunc(",
+r"([0-9]+)",
+int,
+False,
+None,
+)
+self["attrs"].append(["OVS_ACTION_ATTR_TRUNC", val])
+parsed = True
+
 actstr = actstr[strspn(actstr, ", ") :]
 while parencount > 0:
 parencount -= 1
-- 
2.45.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next v4 06/10] net: openvswitch: store sampling probability in cb.

2024-06-21 Thread Adrian Moreno
When a packet sample is observed, the sampling rate that was used is
important to estimate the real frequency of such event.

Store the probability of the parent sample action in the skb's cb area
and use it in emit_sample to pass it down to psample.

Signed-off-by: Adrian Moreno 
---
 include/uapi/linux/openvswitch.h |  3 ++-
 net/openvswitch/actions.c| 20 +---
 net/openvswitch/datapath.h   |  3 +++
 net/openvswitch/vport.c  |  1 +
 4 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 8cfa1b3f6b06..823f22be1fcc 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -649,7 +649,8 @@ enum ovs_flow_attr {
  * Actions are passed as nested attributes.
  *
  * Executes the specified actions with the given probability on a per-packet
- * basis.
+ * basis. Nested actions will be able to access the probability value of the
+ * parent @OVS_ACTION_ATTR_SAMPLE.
  */
 enum ovs_sample_attr {
OVS_SAMPLE_ATTR_UNSPEC,
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 1f555cbba312..85a2ff91379b 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -1048,12 +1048,15 @@ static int sample(struct datapath *dp, struct sk_buff 
*skb,
struct nlattr *sample_arg;
int rem = nla_len(attr);
const struct sample_arg *arg;
+   u32 init_probability;
bool clone_flow_key;
+   int err;
 
/* The first action is always 'OVS_SAMPLE_ATTR_ARG'. */
sample_arg = nla_data(attr);
arg = nla_data(sample_arg);
actions = nla_next(sample_arg, );
+   init_probability = OVS_CB(skb)->probability;
 
if ((arg->probability != U32_MAX) &&
(!arg->probability || get_random_u32() > arg->probability)) {
@@ -1062,9 +1065,16 @@ static int sample(struct datapath *dp, struct sk_buff 
*skb,
return 0;
}
 
+   OVS_CB(skb)->probability = arg->probability;
+
clone_flow_key = !arg->exec;
-   return clone_execute(dp, skb, key, 0, actions, rem, last,
-clone_flow_key);
+   err = clone_execute(dp, skb, key, 0, actions, rem, last,
+   clone_flow_key);
+
+   if (!last)
+   OVS_CB(skb)->probability = init_probability;
+
+   return err;
 }
 
 /* When 'last' is true, clone() should always consume the 'skb'.
@@ -1312,6 +1322,7 @@ static void execute_emit_sample(struct datapath *dp, 
struct sk_buff *skb,
struct psample_group psample_group = {};
struct psample_metadata md = {};
const struct nlattr *a;
+   u32 rate;
int rem;
 
nla_for_each_attr(a, nla_data(attr), nla_len(attr), rem) {
@@ -1330,8 +1341,11 @@ static void execute_emit_sample(struct datapath *dp, 
struct sk_buff *skb,
psample_group.net = ovs_dp_get_net(dp);
md.in_ifindex = OVS_CB(skb)->input_vport->dev->ifindex;
md.trunc_size = skb->len - OVS_CB(skb)->cutlen;
+   md.rate_as_probability = 1;
+
+   rate = OVS_CB(skb)->probability ? OVS_CB(skb)->probability : U32_MAX;
 
-   psample_sample_packet(_group, skb, 0, );
+   psample_sample_packet(_group, skb, rate, );
 #endif
 }
 
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 0cd29971a907..9ca6231ea647 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -115,12 +115,15 @@ struct datapath {
  * fragmented.
  * @acts_origlen: The netlink size of the flow actions applied to this skb.
  * @cutlen: The number of bytes from the packet end to be removed.
+ * @probability: The sampling probability that was applied to this skb; 0 means
+ * no sampling has occurred; U32_MAX means 100% probability.
  */
 struct ovs_skb_cb {
struct vport*input_vport;
u16 mru;
u16 acts_origlen;
u32 cutlen;
+   u32 probability;
 };
 #define OVS_CB(skb) ((struct ovs_skb_cb *)(skb)->cb)
 
diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index 972ae01a70f7..8732f6e51ae5 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -500,6 +500,7 @@ int ovs_vport_receive(struct vport *vport, struct sk_buff 
*skb,
OVS_CB(skb)->input_vport = vport;
OVS_CB(skb)->mru = 0;
OVS_CB(skb)->cutlen = 0;
+   OVS_CB(skb)->probability = 0;
if (unlikely(dev_net(skb->dev) != ovs_dp_get_net(vport->dp))) {
u32 mark;
 
-- 
2.45.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next v4 08/10] selftests: openvswitch: add userspace parsing

2024-06-21 Thread Adrian Moreno
The userspace action lacks parsing support plus it contains a bug in the
name of one of its attributes.

This patch makes userspace action work.

Signed-off-by: Adrian Moreno 
---
 .../selftests/net/openvswitch/ovs-dpctl.py| 24 +--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py 
b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
index ddf4d999317c..071309289cff 100644
--- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
+++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
@@ -575,13 +575,27 @@ class ovsactions(nla):
 print_str += "userdata="
 for f in self.get_attr("OVS_USERSPACE_ATTR_USERDATA"):
 print_str += "%x." % f
-if self.get_attr("OVS_USERSPACE_ATTR_TUN_PORT") is not None:
+if self.get_attr("OVS_USERSPACE_ATTR_EGRESS_TUN_PORT") is not None:
 print_str += "egress_tun_port=%d" % self.get_attr(
-"OVS_USERSPACE_ATTR_TUN_PORT"
+"OVS_USERSPACE_ATTR_EGRESS_TUN_PORT"
 )
 print_str += ")"
 return print_str
 
+def parse(self, actstr):
+attrs_desc = (
+("pid", "OVS_USERSPACE_ATTR_PID", int),
+("userdata", "OVS_USERSPACE_ATTR_USERDATA",
+lambda x: list(bytearray.fromhex(x))),
+("egress_tun_port", "OVS_USERSPACE_ATTR_EGRESS_TUN_PORT", int)
+)
+
+attrs, actstr = parse_attrs(actstr, attrs_desc)
+for attr in attrs:
+self["attrs"].append(attr)
+
+return actstr
+
 def dpstr(self, more=False):
 print_str = ""
 
@@ -797,6 +811,12 @@ class ovsactions(nla):
 self["attrs"].append(["OVS_ACTION_ATTR_EMIT_SAMPLE", emitact])
 parsed = True
 
+elif parse_starts_block(actstr, "userspace(", False):
+uact = self.userspace()
+actstr = uact.parse(actstr[len("userspace(") : ])
+self["attrs"].append(["OVS_ACTION_ATTR_USERSPACE", uact])
+parsed = True
+
 actstr = actstr[strspn(actstr, ", ") :]
 while parencount > 0:
 parencount -= 1
-- 
2.45.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next v4 07/10] selftests: openvswitch: add emit_sample action

2024-06-21 Thread Adrian Moreno
Add sample and emit_sample action support to ovs-dpctl.py.

Refactor common attribute parsing logic into an external function.

Signed-off-by: Adrian Moreno 
---
 .../selftests/net/openvswitch/ovs-dpctl.py| 162 +-
 1 file changed, 161 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py 
b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
index 9f8dec2f6539..ddf4d999317c 100644
--- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
+++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
@@ -8,6 +8,7 @@ import argparse
 import errno
 import ipaddress
 import logging
+import math
 import multiprocessing
 import re
 import struct
@@ -58,6 +59,7 @@ OVS_FLOW_CMD_DEL = 2
 OVS_FLOW_CMD_GET = 3
 OVS_FLOW_CMD_SET = 4
 
+UINT32_MAX = 0x
 
 def macstr(mac):
 outstr = ":".join(["%02X" % i for i in mac])
@@ -267,6 +269,75 @@ def parse_extract_field(
 return str_skipped, data
 
 
+def parse_attrs(actstr, attr_desc):
+"""Parses the given action string and returns a list of netlink
+attributes based on a list of attribute descriptions.
+
+Each element in the attribute description list is a tuple such as:
+(name, attr_name, parse_func)
+where:
+name: is the string representing the attribute
+attr_name: is the name of the attribute as defined in the uAPI.
+parse_func: is a callable accepting a string and returning either
+a single object (the parsed attribute value) or a tuple of
+two values (the parsed attribute value and the remaining string)
+
+Returns a list of attributes and the remaining string.
+"""
+def parse_attr(actstr, key, func):
+actstr = actstr[len(key) :]
+
+if not func:
+return None, actstr
+
+delim = actstr[0]
+actstr = actstr[1:]
+
+if delim == "=":
+pos = strcspn(actstr, ",)")
+ret = func(actstr[:pos])
+else:
+ret = func(actstr)
+
+if isinstance(ret, tuple):
+(datum, actstr) = ret
+else:
+datum = ret
+actstr = actstr[strcspn(actstr, ",)"):]
+
+if delim == "(":
+if not actstr or actstr[0] != ")":
+raise ValueError("Action contains unbalanced parentheses")
+
+actstr = actstr[1:]
+
+actstr = actstr[strspn(actstr, ", ") :]
+
+return datum, actstr
+
+attrs = []
+attr_desc = list(attr_desc)
+while actstr and actstr[0] != ")" and attr_desc:
+found = False
+for i, (key, attr, func) in enumerate(attr_desc):
+if actstr.startswith(key):
+datum, actstr = parse_attr(actstr, key, func)
+attrs.append([attr, datum])
+found = True
+del attr_desc[i]
+
+if not found:
+raise ValueError("Unknown attribute: '%s'" % actstr)
+
+actstr = actstr[strspn(actstr, ", ") :]
+
+if actstr[0] != ")":
+raise ValueError("Action string contains extra garbage or has "
+ "unbalanced parenthesis: '%s'" % actstr)
+
+return attrs, actstr[1:]
+
+
 class ovs_dp_msg(genlmsg):
 # include the OVS version
 # We need a custom header rather than just being able to rely on
@@ -285,7 +356,7 @@ class ovsactions(nla):
 ("OVS_ACTION_ATTR_SET", "none"),
 ("OVS_ACTION_ATTR_PUSH_VLAN", "none"),
 ("OVS_ACTION_ATTR_POP_VLAN", "flag"),
-("OVS_ACTION_ATTR_SAMPLE", "none"),
+("OVS_ACTION_ATTR_SAMPLE", "sample"),
 ("OVS_ACTION_ATTR_RECIRC", "uint32"),
 ("OVS_ACTION_ATTR_HASH", "none"),
 ("OVS_ACTION_ATTR_PUSH_MPLS", "none"),
@@ -304,8 +375,85 @@ class ovsactions(nla):
 ("OVS_ACTION_ATTR_ADD_MPLS", "none"),
 ("OVS_ACTION_ATTR_DEC_TTL", "none"),
 ("OVS_ACTION_ATTR_DROP", "uint32"),
+("OVS_ACTION_ATTR_EMIT_SAMPLE", "emit_sample"),
 )
 
+class emit_sample(nla):
+nla_flags = NLA_F_NESTED
+
+nla_map = (
+("OVS_EMIT_SAMPLE_ATTR_UNSPEC", "none"),
+("OVS_EMIT_SAMPLE_ATTR_GROUP", "uint32"),
+("OVS_EMIT_SAMPLE_ATTR_COOKIE", "array(uint8)"),
+)
+
+def dpstr(self, more=False):
+args = "group=%d" % self.get_attr("OVS_EMIT_SAMPLE_ATTR_GROUP")
+
+cookie = self.get_attr("OVS_EMIT_SAMPLE_ATTR_COOKIE")
+if cookie:
+args += ",cookie(%s)" % \
+"".join(format(x, "02x") for x in cookie)
+
+return "emit_sample(%s)" % args
+
+def parse(self, actstr):
+desc = (
+("group", "OVS_EMIT_SAMPLE_ATTR_GROUP", int),
+("cookie", "OVS_EMIT_SAMPLE_ATTR_COOKIE",
+lambda x: list(bytearray.fromhex(x)))
+)
+
+attrs, actstr = parse_attrs(actstr, desc)
+
+for attr in attrs:
+

[ovs-dev] [PATCH net-next v4 05/10] net: openvswitch: add emit_sample action

2024-06-21 Thread Adrian Moreno
Add support for a new action: emit_sample.

This action accepts a u32 group id and a variable-length cookie and uses
the psample multicast group to make the packet available for
observability.

The maximum length of the user-defined cookie is set to 16, same as
tc_cookie, to discourage using cookies that will not be offloadable.

Signed-off-by: Adrian Moreno 
---
 Documentation/netlink/specs/ovs_flow.yaml | 17 +
 include/uapi/linux/openvswitch.h  | 28 ++
 net/openvswitch/Kconfig   |  1 +
 net/openvswitch/actions.c | 45 +++
 net/openvswitch/flow_netlink.c| 33 -
 5 files changed, 123 insertions(+), 1 deletion(-)

diff --git a/Documentation/netlink/specs/ovs_flow.yaml 
b/Documentation/netlink/specs/ovs_flow.yaml
index 4fdfc6b5cae9..a7ab5593a24f 100644
--- a/Documentation/netlink/specs/ovs_flow.yaml
+++ b/Documentation/netlink/specs/ovs_flow.yaml
@@ -727,6 +727,12 @@ attribute-sets:
 name: dec-ttl
 type: nest
 nested-attributes: dec-ttl-attrs
+  -
+name: emit-sample
+type: nest
+nested-attributes: emit-sample-attrs
+doc: |
+  Sends a packet sample to psample for external observation.
   -
 name: tunnel-key-attrs
 enum-name: ovs-tunnel-key-attr
@@ -938,6 +944,17 @@ attribute-sets:
   -
 name: gbp
 type: u32
+  -
+name: emit-sample-attrs
+enum-name: ovs-emit-sample-attr
+name-prefix: ovs-emit-sample-attr-
+attributes:
+  -
+name: group
+type: u32
+  -
+name: cookie
+type: binary
 
 operations:
   name-prefix: ovs-flow-cmd-
diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index efc82c318fa2..8cfa1b3f6b06 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -914,6 +914,31 @@ struct check_pkt_len_arg {
 };
 #endif
 
+#define OVS_EMIT_SAMPLE_COOKIE_MAX_SIZE 16
+/**
+ * enum ovs_emit_sample_attr - Attributes for %OVS_ACTION_ATTR_EMIT_SAMPLE
+ * action.
+ *
+ * @OVS_EMIT_SAMPLE_ATTR_GROUP: 32-bit number to identify the source of the
+ * sample.
+ * @OVS_EMIT_SAMPLE_ATTR_COOKIE: A variable-length binary cookie that contains
+ * user-defined metadata. The maximum length is OVS_EMIT_SAMPLE_COOKIE_MAX_SIZE
+ * bytes.
+ *
+ * Sends the packet to the psample multicast group with the specified group and
+ * cookie. It is possible to combine this action with the
+ * %OVS_ACTION_ATTR_TRUNC action to limit the size of the packet being emitted.
+ */
+enum ovs_emit_sample_attr {
+   OVS_EMIT_SAMPLE_ATTR_GROUP = 1, /* u32 number. */
+   OVS_EMIT_SAMPLE_ATTR_COOKIE,/* Optional, user specified cookie. */
+
+   /* private: */
+   __OVS_EMIT_SAMPLE_ATTR_MAX
+};
+
+#define OVS_EMIT_SAMPLE_ATTR_MAX (__OVS_EMIT_SAMPLE_ATTR_MAX - 1)
+
 /**
  * enum ovs_action_attr - Action types.
  *
@@ -966,6 +991,8 @@ struct check_pkt_len_arg {
  * of l3 tunnel flag in the tun_flags field of OVS_ACTION_ATTR_ADD_MPLS
  * argument.
  * @OVS_ACTION_ATTR_DROP: Explicit drop action.
+ * @OVS_ACTION_ATTR_EMIT_SAMPLE: Send a sample of the packet to external
+ * observers via psample.
  *
  * Only a single header can be set with a single %OVS_ACTION_ATTR_SET.  Not all
  * fields within a header are modifiable, e.g. the IPv4 protocol and fragment
@@ -1004,6 +1031,7 @@ enum ovs_action_attr {
OVS_ACTION_ATTR_ADD_MPLS, /* struct ovs_action_add_mpls. */
OVS_ACTION_ATTR_DEC_TTL,  /* Nested OVS_DEC_TTL_ATTR_*. */
OVS_ACTION_ATTR_DROP, /* u32 error code. */
+   OVS_ACTION_ATTR_EMIT_SAMPLE,  /* Nested OVS_EMIT_SAMPLE_ATTR_*. */
 
__OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
   * from userspace. */
diff --git a/net/openvswitch/Kconfig b/net/openvswitch/Kconfig
index 29a7081858cd..2535f3f9f462 100644
--- a/net/openvswitch/Kconfig
+++ b/net/openvswitch/Kconfig
@@ -10,6 +10,7 @@ config OPENVSWITCH
   (NF_CONNTRACK && ((!NF_DEFRAG_IPV6 || NF_DEFRAG_IPV6) && \
 (!NF_NAT || NF_NAT) && \
 (!NETFILTER_CONNCOUNT || 
NETFILTER_CONNCOUNT)))
+   depends on PSAMPLE || !PSAMPLE
select LIBCRC32C
select MPLS
select NET_MPLS_GSO
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 964225580824..1f555cbba312 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -24,6 +24,11 @@
 #include 
 #include 
 #include 
+
+#if IS_ENABLED(CONFIG_PSAMPLE)
+#include 
+#endif
+
 #include 
 
 #include "datapath.h"
@@ -1299,6 +1304,37 @@ static int execute_dec_ttl(struct sk_buff *skb, struct 
sw_flow_key *key)
return 0;
 }
 
+static void execute_emit_sample(struct datapath *dp, struct sk_buff *skb,
+   const struct sw_flow_key *key,
+   

[ovs-dev] [PATCH net-next v4 04/10] net: psample: allow using rate as probability

2024-06-21 Thread Adrian Moreno
Although not explicitly documented in the psample module itself, the
definition of PSAMPLE_ATTR_SAMPLE_RATE seems inherited from act_sample.

Quoting tc-sample(8):
"RATE of 100 will lead to an average of one sampled packet out of every
100 observed."

With this semantics, the rates that we can express with an unsigned
32-bits number are very unevenly distributed and concentrated towards
"sampling few packets".
For example, we can express a probability of 2.32E-8% but we
cannot express anything between 100% and 50%.

For sampling applications that are capable of sampling a decent
amount of packets, this sampling rate semantics is not very useful.

Add a new flag to the uAPI that indicates that the sampling rate is
expressed in scaled probability, this is:
- 0 is 0% probability, no packets get sampled.
- U32_MAX is 100% probability, all packets get sampled.

Signed-off-by: Adrian Moreno 
---
 include/net/psample.h |  3 ++-
 include/uapi/linux/psample.h  | 10 +-
 include/uapi/linux/tc_act/tc_sample.h |  1 +
 net/psample/psample.c |  3 +++
 4 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/net/psample.h b/include/net/psample.h
index 2ac71260a546..c52e9ebd88dd 100644
--- a/include/net/psample.h
+++ b/include/net/psample.h
@@ -24,7 +24,8 @@ struct psample_metadata {
u8 out_tc_valid:1,
   out_tc_occ_valid:1,
   latency_valid:1,
-  unused:5;
+  rate_as_probability:1,
+  unused:4;
const u8 *user_cookie;
u32 user_cookie_len;
 };
diff --git a/include/uapi/linux/psample.h b/include/uapi/linux/psample.h
index e80637e1d97b..b765f0e81f20 100644
--- a/include/uapi/linux/psample.h
+++ b/include/uapi/linux/psample.h
@@ -8,7 +8,11 @@ enum {
PSAMPLE_ATTR_ORIGSIZE,
PSAMPLE_ATTR_SAMPLE_GROUP,
PSAMPLE_ATTR_GROUP_SEQ,
-   PSAMPLE_ATTR_SAMPLE_RATE,
+   PSAMPLE_ATTR_SAMPLE_RATE,   /* u32, ratio between observed and
+* sampled packets or scaled probability
+* if PSAMPLE_ATTR_SAMPLE_PROBABILITY
+* is set.
+*/
PSAMPLE_ATTR_DATA,
PSAMPLE_ATTR_GROUP_REFCOUNT,
PSAMPLE_ATTR_TUNNEL,
@@ -20,6 +24,10 @@ enum {
PSAMPLE_ATTR_TIMESTAMP, /* u64, nanoseconds */
PSAMPLE_ATTR_PROTO, /* u16 */
PSAMPLE_ATTR_USER_COOKIE,   /* binary, user provided data */
+   PSAMPLE_ATTR_SAMPLE_PROBABILITY,/* no argument, interpret rate in
+* PSAMPLE_ATTR_SAMPLE_RATE as a
+* probability scaled 0 - U32_MAX.
+*/
 
__PSAMPLE_ATTR_MAX
 };
diff --git a/include/uapi/linux/tc_act/tc_sample.h 
b/include/uapi/linux/tc_act/tc_sample.h
index fee1bcc20793..7ee0735e7b38 100644
--- a/include/uapi/linux/tc_act/tc_sample.h
+++ b/include/uapi/linux/tc_act/tc_sample.h
@@ -18,6 +18,7 @@ enum {
TCA_SAMPLE_TRUNC_SIZE,
TCA_SAMPLE_PSAMPLE_GROUP,
TCA_SAMPLE_PAD,
+   TCA_SAMPLE_PROBABILITY,
__TCA_SAMPLE_MAX
 };
 #define TCA_SAMPLE_MAX (__TCA_SAMPLE_MAX - 1)
diff --git a/net/psample/psample.c b/net/psample/psample.c
index 1c76f3e48dcd..f48b5b9cd409 100644
--- a/net/psample/psample.c
+++ b/net/psample/psample.c
@@ -497,6 +497,9 @@ void psample_sample_packet(struct psample_group *group, 
struct sk_buff *skb,
md->user_cookie))
goto error;
 
+   if (md->rate_as_probability)
+   nla_put_flag(skb, PSAMPLE_ATTR_SAMPLE_PROBABILITY);
+
genlmsg_end(nl_skb, data);
genlmsg_multicast_netns(_nl_family, group->net, nl_skb, 0,
PSAMPLE_NL_MCGRP_SAMPLE, GFP_ATOMIC);
-- 
2.45.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next v4 03/10] net: psample: skip packet copy if no listeners

2024-06-21 Thread Adrian Moreno
If nobody is listening on the multicast group, generating the sample,
which involves copying packet data, seems completely unnecessary.

Return fast in this case.

Reviewed-by: Simon Horman 
Signed-off-by: Adrian Moreno 
---
 net/psample/psample.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/psample/psample.c b/net/psample/psample.c
index b37488f426bc..1c76f3e48dcd 100644
--- a/net/psample/psample.c
+++ b/net/psample/psample.c
@@ -376,6 +376,10 @@ void psample_sample_packet(struct psample_group *group, 
struct sk_buff *skb,
void *data;
int ret;
 
+   if (!genl_has_listeners(_nl_family, group->net,
+   PSAMPLE_NL_MCGRP_SAMPLE))
+   return;
+
meta_len = (in_ifindex ? nla_total_size(sizeof(u16)) : 0) +
   (out_ifindex ? nla_total_size(sizeof(u16)) : 0) +
   (md->out_tc_valid ? nla_total_size(sizeof(u16)) : 0) +
-- 
2.45.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next v4 02/10] net: sched: act_sample: add action cookie to sample

2024-06-21 Thread Adrian Moreno
If the action has a user_cookie, pass it along to the sample so it can
be easily identified.

Signed-off-by: Adrian Moreno 
---
 net/sched/act_sample.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c
index a69b53d54039..2ceb4d141b71 100644
--- a/net/sched/act_sample.c
+++ b/net/sched/act_sample.c
@@ -167,7 +167,9 @@ TC_INDIRECT_SCOPE int tcf_sample_act(struct sk_buff *skb,
 {
struct tcf_sample *s = to_sample(a);
struct psample_group *psample_group;
+   u8 cookie_data[TC_COOKIE_MAX_SIZE];
struct psample_metadata md = {};
+   struct tc_cookie *user_cookie;
int retval;
 
tcf_lastuse_update(>tcf_tm);
@@ -189,6 +191,16 @@ TC_INDIRECT_SCOPE int tcf_sample_act(struct sk_buff *skb,
if (skb_at_tc_ingress(skb) && tcf_sample_dev_ok_push(skb->dev))
skb_push(skb, skb->mac_len);
 
+   rcu_read_lock();
+   user_cookie = rcu_dereference(a->user_cookie);
+   if (user_cookie) {
+   memcpy(cookie_data, user_cookie->data,
+  user_cookie->len);
+   md.user_cookie = cookie_data;
+   md.user_cookie_len = user_cookie->len;
+   }
+   rcu_read_unlock();
+
md.trunc_size = s->truncate ? s->trunc_size : skb->len;
psample_sample_packet(psample_group, skb, s->rate, );
 
-- 
2.45.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next v4 01/10] net: psample: add user cookie

2024-06-21 Thread Adrian Moreno
Add a user cookie to the sample metadata so that sample emitters can
provide more contextual information to samples.

If present, send the user cookie in a new attribute:
PSAMPLE_ATTR_USER_COOKIE.

Reviewed-by: Simon Horman 
Signed-off-by: Adrian Moreno 
---
 include/net/psample.h| 2 ++
 include/uapi/linux/psample.h | 1 +
 net/psample/psample.c| 9 -
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/include/net/psample.h b/include/net/psample.h
index 0509d2d6be67..2ac71260a546 100644
--- a/include/net/psample.h
+++ b/include/net/psample.h
@@ -25,6 +25,8 @@ struct psample_metadata {
   out_tc_occ_valid:1,
   latency_valid:1,
   unused:5;
+   const u8 *user_cookie;
+   u32 user_cookie_len;
 };
 
 struct psample_group *psample_group_get(struct net *net, u32 group_num);
diff --git a/include/uapi/linux/psample.h b/include/uapi/linux/psample.h
index e585db5bf2d2..e80637e1d97b 100644
--- a/include/uapi/linux/psample.h
+++ b/include/uapi/linux/psample.h
@@ -19,6 +19,7 @@ enum {
PSAMPLE_ATTR_LATENCY,   /* u64, nanoseconds */
PSAMPLE_ATTR_TIMESTAMP, /* u64, nanoseconds */
PSAMPLE_ATTR_PROTO, /* u16 */
+   PSAMPLE_ATTR_USER_COOKIE,   /* binary, user provided data */
 
__PSAMPLE_ATTR_MAX
 };
diff --git a/net/psample/psample.c b/net/psample/psample.c
index a5d9b8446f77..b37488f426bc 100644
--- a/net/psample/psample.c
+++ b/net/psample/psample.c
@@ -386,7 +386,9 @@ void psample_sample_packet(struct psample_group *group, 
struct sk_buff *skb,
   nla_total_size(sizeof(u32)) +/* group_num */
   nla_total_size(sizeof(u32)) +/* seq */
   nla_total_size_64bit(sizeof(u64)) +  /* timestamp */
-  nla_total_size(sizeof(u16)); /* protocol */
+  nla_total_size(sizeof(u16)) +/* protocol */
+  (md->user_cookie_len ?
+   nla_total_size(md->user_cookie_len) : 0); /* user cookie */
 
 #ifdef CONFIG_INET
tun_info = skb_tunnel_info(skb);
@@ -486,6 +488,11 @@ void psample_sample_packet(struct psample_group *group, 
struct sk_buff *skb,
}
 #endif
 
+   if (md->user_cookie && md->user_cookie_len &&
+   nla_put(nl_skb, PSAMPLE_ATTR_USER_COOKIE, md->user_cookie_len,
+   md->user_cookie))
+   goto error;
+
genlmsg_end(nl_skb, data);
genlmsg_multicast_netns(_nl_family, group->net, nl_skb, 0,
PSAMPLE_NL_MCGRP_SAMPLE, GFP_ATOMIC);
-- 
2.45.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next v4 00/10] net: openvswitch: Add sample multicasting.

2024-06-21 Thread Adrian Moreno
** Background **
Currently, OVS supports several packet sampling mechanisms (sFlow,
per-bridge IPFIX, per-flow IPFIX). These end up being translated into a
userspace action that needs to be handled by ovs-vswitchd's handler
threads only to be forwarded to some third party application that
will somehow process the sample and provide observability on the
datapath.

A particularly interesting use-case is controller-driven
per-flow IPFIX sampling where the OpenFlow controller can add metadata
to samples (via two 32bit integers) and this metadata is then available
to the sample-collecting system for correlation.

** Problem **
The fact that sampled traffic share netlink sockets and handler thread
time with upcalls, apart from being a performance bottleneck in the
sample extraction itself, can severely compromise the datapath,
yielding this solution unfit for highly loaded production systems.

Users are left with little options other than guessing what sampling
rate will be OK for their traffic pattern and system load and dealing
with the lost accuracy.

Looking at available infrastructure, an obvious candidated would be
to use psample. However, it's current state does not help with the
use-case at stake because sampled packets do not contain user-defined
metadata.

** Proposal **
This series is an attempt to fix this situation by extending the
existing psample infrastructure to carry a variable length
user-defined cookie.

The main existing user of psample is tc's act_sample. It is also
extended to forward the action's cookie to psample.

Finally, a new OVS action (OVS_SAMPLE_ATTR_EMIT_SAMPLE) is created.
It accepts a group and an optional cookie and uses psample to
multicast the packet and the metadata.

--
v3 -> v4:
- Rebased.
- Addressed Jakub's comment on private and unused nla attributes.

v2 -> v3:
- Addressed comments from Simon, Aaron and Ilya.
- Dropped probability propagation in nested sample actions.
- Dropped patch v2's 7/9 in favor of a userspace implementation and
consume skb if emit_sample is the last action, same as we do with
userspace.
- Split ovs-dpctl.py features in independent patches.

v1 -> v2:
- Create a new action ("emit_sample") rather than reuse existing
  "sample" one.
- Add probability semantics to psample's sampling rate.
- Store sampling probability in skb's cb area and use it in emit_sample.
- Test combining "emit_sample" with "trunc"
- Drop group_id filtering and tracepoint in psample.

rfc_v2 -> v1:
- Accomodate Ilya's comments.
- Split OVS's attribute in two attributes and simplify internal
handling of psample arguments.
- Extend psample and tc with a user-defined cookie.
- Add a tracepoint to psample to facilitate troubleshooting.

rfc_v1 -> rfc_v2:
- Use psample instead of a new OVS-only multicast group.
- Extend psample and tc with a user-defined cookie.


Adrian Moreno (10):
  net: psample: add user cookie
  net: sched: act_sample: add action cookie to sample
  net: psample: skip packet copy if no listeners
  net: psample: allow using rate as probability
  net: openvswitch: add emit_sample action
  net: openvswitch: store sampling probability in cb.
  selftests: openvswitch: add emit_sample action
  selftests: openvswitch: add userspace parsing
  selftests: openvswitch: parse trunc action
  selftests: openvswitch: add emit_sample test

 Documentation/netlink/specs/ovs_flow.yaml |  17 ++
 include/net/psample.h |   5 +-
 include/uapi/linux/openvswitch.h  |  31 +-
 include/uapi/linux/psample.h  |  11 +-
 include/uapi/linux/tc_act/tc_sample.h |   1 +
 net/openvswitch/Kconfig   |   1 +
 net/openvswitch/actions.c |  63 +++-
 net/openvswitch/datapath.h|   3 +
 net/openvswitch/flow_netlink.c|  33 ++-
 net/openvswitch/vport.c   |   1 +
 net/psample/psample.c |  16 +-
 net/sched/act_sample.c|  12 +
 .../selftests/net/openvswitch/openvswitch.sh  | 110 ++-
 .../selftests/net/openvswitch/ovs-dpctl.py| 272 +-
 14 files changed, 560 insertions(+), 16 deletions(-)

-- 
2.45.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net] openvswitch: get related ct labels from its master if it is not confirmed

2024-06-21 Thread patchwork-bot+netdevbpf
Hello:

This patch was applied to netdev/net.git (main)
by David S. Miller :

On Wed, 19 Jun 2024 18:08:56 -0400 you wrote:
> Ilya found a failure in running check-kernel tests with at_groups=144
> (144: conntrack - FTP SNAT orig tuple) in OVS repo. After his further
> investigation, the root cause is that the labels sent to userspace
> for related ct are incorrect.
> 
> The labels for unconfirmed related ct should use its master's labels.
> However, the changes made in commit 8c8b73320805 ("openvswitch: set
> IPS_CONFIRMED in tmpl status only when commit is set in conntrack")
> led to getting labels from this related ct.
> 
> [...]

Here is the summary with links:
  - [net] openvswitch: get related ct labels from its master if it is not 
confirmed
https://git.kernel.org/netdev/net/c/a23ac973f67f

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] OVN technical community meeting - June 24th

2024-06-21 Thread Dumitru Ceara
Hi everyone,

Just a small reminder for anyone interested in joining the next OVN
technical meeting.  It's scheduled to happen next week on Monday June
24th at 3PM UTC.

Meeting details:
Date/Time: Monday, June 24th, 3PM UTC
Link: meet.google.com/zns-gqsd-jdn
Agenda: 
https://docs.google.com/document/d/1dG4GwcYOSs4uArPGtOoaP5tH4KCto-GH_C3tIXSnZZ8/edit?usp=meetingnotes

Please feel free to add any items you'd like to discuss to the agenda.
I added some of the things I'm aware of there already.

Best regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC v2 0/9] Introduce local sampling with NXAST_SAMPLE action

2024-06-21 Thread Adrián Moreno
On Thu, Jun 20, 2024 at 02:20:53PM GMT, Eelco Chaudron wrote:
>
>
> On 5 Jun 2024, at 22:23, Adrian Moreno wrote:
>
> > (Was: Add psample support to NXAST_SAMPLE action)
> >
> > This is the userspace counterpart of the work being done in the kernel
> > [1] which is still not merged (hence the RFC state). There, a new
> > datapath action is added, called "emit_sample". Being a specific
> > action, it promises a more efficient way of making packet samples
> > available to observability applications.
> >
> > From the PoV of ovs-vswitchd, this new action is used to implement
> > "local sampling". Local sampling (or lsample for short) is configured
> > in a similar way as current per-flow IPFIX sampling, i.e: using the
> > Flow_Sample_Collector_Set table and the NXAST_SAMPLE action.
> >
> > However, instead of sending the sample to an external IPFIX collector
> > though the network, the sample is emitted using the new action and
> > made available to locally running sample collector.
> >
> > The specific way emit_sample sends the sample (and the way the local
> > collector shall collect it) is datapath-specific.
> > Currently, currently only the Linux kernel datapath implements it using
> > the psample netlink multicast group.
> >
> > ~~ Configuration ~~
> > Local sampling is configured via a new column in the
> > Flow_Sample_Collector_Set (FSCS) table called "local_sample_group".
> > Configuring this value is orthogonal to also associating the FSCS
> > entry to an entry in the IPFIX table.
> >
> > Once that entry in the OVSDB is configured, NXAST_SAMPLE actions coming
> > from the controller will be translated into the following odp action:
> >
> >sample(sample={P}%, actions(emit_sample(group={G},cookie={C})))
> >
> > Where:
> > P: Is the sampling probability from NXAST_SAMPLE
> > G: Is the group id in the FSCS entry whose "id" matches the one in
> > the NXAST_SAMPLE.
> > C: Is a 64bit cookie result of concatenating the obs_domain and
> > obs_point from the NXAST_SAMPLE, i.e:
> > "obs_domain << 32 | obs_point"
> > Notes:
> > - The parent sample action might be omitted if the probability is
> >   100% and there is no IPFIX sampling that requires the use of a
> >   meter.
> >
> > ~~ Internal implementation: dpif-lsample ~~
> > Internally, a new object called "dpif-lsample" is introduced to track
> > the configured local sampling exporters and track statistics based on
> > odp flow stats (using xcache).
> > It exposes the list of configured exporters and their statistics on a
> > new unixctl command called "lsample/show".
> >
> > ~~ Testing ~~
> > The series includes an test utility program than can be executed by
> > running "tests/ovstest test-psample". This utility listens
> > to packets multicasted by the psample module and prints them (also
> > printing the obs_domain and obs_point ids).
> >
> > ~~ HW Offload ~~
> > tc offload is not being introduced in this series as existing sample
> > or userspace actions are not currently offloadable. Also some
> > improvements need to be implemented in tc for it to be feasible.
> >
> > [1]
> > https://patchwork.kernel.org/project/netdevbpf/cover/20240603185647.2310748-1-amore...@redhat.com/
>
> Hi Adrian,
>
> I did not finish reviewing, but I was doing some testing, and I noticed I’m 
> not getting any warning/error when configuring all of this without Datapath 
> support.
>
> Are you planning to add some user feedback when trying to enable this without 
> the kernel other than the “Datapath does not support emit_sample” which the 
> user might not understand relates to this feature?
>

I added the following code in bridge.c when it configures local
sampling:


+ret = ofproto_set_local_sample(br->ofproto, opts_array, n_opts);
+
+if (ret == ENOTSUP) {
+if (n_opts) {
+VLOG_WARN_RL(,
+ "bridge %s: ignoring local sampling configuration: "
+ "not supported by this datapath",
+ br->name);
+}
+} else if (ret) {
+VLOG_ERR_RL(, "bridge %s: error configuring local sampling: %s",
+br->name, ovs_strerror(ret));
+}

I believe it should warn the user the local sampling configuration will
not be applied.

Is that not working?

> Cheers,
>
> Eelco
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v2] odp-execute: Check IPv4 checksum offload flag in AVX.

2024-06-21 Thread Eelco Chaudron


On 20 Jun 2024, at 20:48, Mike Pattrick wrote:

> On Mon, Jun 17, 2024 at 10:08 AM Emma Finn  wrote:
>>
>> The AVX implementation for IPv4 action did not check whether
>> the IPv4 checksum offload flag has been set and was incorrectly
>> calculating checksums in software. Adding a check to skip AVX
>> checksum calculation when offload flags are set.
>>
>> Signed-off-by: Emma Finn 
>> Reported-by: Eelco Chaudron 
>
> This brings things inline with the scalar odp-execute.
>
> Fixes: 5d11c47d3ebe ("userspace: Enable IP checksum offloading by default.")
> Acked-by: Mike Pattrick 

Thanks Mike for the review! I’ll wait till next week, and if I get no more 
comments/feedback I’ll apply both AVX512 patches with the suggested 
modifications.

Cheers,

Eelco

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev