Re: [ovs-dev] [PATCH v4] ofproto: Add upcall/dump-ufid-rules command to map UFIDs to OpenFlow.

2024-06-19 Thread Ales Musil
On Tue, Jun 18, 2024 at 9:41 PM Dumitru Ceara  wrote:

> It improves the debugging experience if we can easily get a list of
> OpenFlow rules and groups that contribute to the creation of a datapath
> flow.
>
> The suggested workflow is:
> a. dump datapath flows (along with UUIDs), this also prints the core IDs
> (PMD IDs) when applicable.
>
>   $ ovs-appctl dpctl/dump-flows -m
>   flow-dump from pmd on cpu core: 7
>   ufid:7460db8f..., recirc_id(0), 
>
> b. dump related OpenFlow rules and groups:
>   $ ovs-appctl upcall/dump-ufid-rules ufid:7460db8f... pmd=7
>   cookie=0x12345678, table=0
> priority=100,ip,in_port=2,nw_dst=10.0.0.2,actions=resubmit(,1)
>   cookie=0x0, table=1 priority=200,actions=group:1
>
> group_id=1,bucket=bucket_id:0,actions=ct(commit,table=2,nat(dst=20.0.0.2))
>   cookie=0x0, table=2 actions=output:1
>
> The new command only shows rules and groups attached to ukeys that are
> in states UKEY_VISIBLE or UKEY_OPERATIONAL.  That should be fine as all
> other ukeys should not be relevant for the use case presented above.
>
> For ukeys that don't have an xcache populated yet, the command goes
> ahead and populates one.  In theory this is creates a slight overhead as
> those ukeys might not need an xcache until they see traffic (and get
> revalidated) but in practice the overhead should be minimal.
>
> This commit tries to mimic the output format of the ovs-ofctl
> dump-flows/dump-groups commands.  For groups it actually uses
> ofputil_group/_bucket functions for formatting.  For rules it uses
> flow_stats_ds() (the original function was exported and renamed to
> ofproto_rule_stats_ds()).
>
> Signed-off-by: Dumitru Ceara 
> ---
> V4:
> - Addressed Mike's comment:
>   - use predictable port IDs.
> V3:
> - Addressed Mike's comments:
>   - use ds_put_char()
>   - make the test less flaky
> V2:
> - Addressed Adrian's comments:
>   - check return value of populate_xcache()
>   - use flow_stats_ds() (renamed to ofproto_rule_stats_ds()) instead of
> custom printing
>   - move ukey->state check to caller
>   - handle case when group bucket is not available
>   - update test case to cover the point above
> ---
>  NEWS|   3 +
>  include/openvswitch/ofp-group.h |   7 ++
>  lib/ofp-group.c | 110 +++-
>  ofproto/ofproto-dpif-upcall.c   |  85 
>  ofproto/ofproto-dpif.c  |  24 +++
>  ofproto/ofproto-dpif.h  |   2 +
>  ofproto/ofproto-provider.h  |   1 +
>  ofproto/ofproto.c   |  85 
>  tests/ofproto-dpif.at   |  47 ++
>  tests/ofproto-macros.at |   9 ++-
>  10 files changed, 285 insertions(+), 88 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index 5ae0108d552b..1bc085f97045 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -9,6 +9,9 @@ Post-v3.3.0
>   https://github.com/openvswitch/ovs.git
> - DPDK:
>   * OVS validated with DPDK 23.11.1.
> +   - ovs-appctl:
> + * Added 'upcall/dump-ufid-rules' to output the set of OpenFlow rules
> and
> +   groups that contributed to the creation of a specific datapath
> flow.
>
>
>  v3.3.0 - 16 Feb 2024
> diff --git a/include/openvswitch/ofp-group.h
> b/include/openvswitch/ofp-group.h
> index cd7af0ebff9c..79fcb3a4c0d1 100644
> --- a/include/openvswitch/ofp-group.h
> +++ b/include/openvswitch/ofp-group.h
> @@ -70,6 +70,11 @@ struct ofputil_bucket *ofputil_bucket_find(const struct
> ovs_list *,
>  bool ofputil_bucket_check_duplicate_id(const struct ovs_list *);
>  struct ofputil_bucket *ofputil_bucket_list_front(const struct ovs_list *);
>  struct ofputil_bucket *ofputil_bucket_list_back(const struct ovs_list *);
> +void ofputil_bucket_format(const struct ofputil_bucket *,
> +   enum ofp11_group_type, enum ofp_version,
> +   const struct ofputil_port_map *,
> +   const struct ofputil_table_map *,
> +   struct ds *);
>
>  static inline bool
>  ofputil_bucket_has_liveness(const struct ofputil_bucket *bucket)
> @@ -88,6 +93,8 @@ struct ofputil_group_props {
>  void ofputil_group_properties_destroy(struct ofputil_group_props *);
>  void ofputil_group_properties_copy(struct ofputil_group_props *to,
> const struct ofputil_group_props
> *from);
> +void ofputil_group_properties_format(const struct ofputil_group_props *,
> + struct ds *);
>  /* Protocol-independent group_mod. */
>  struct ofputil_group_mod {
>  uint16_t command; /* One of OFPGC15_*. */
> diff --git a/lib/ofp-group.c b/lib/ofp-group.c
> index 737f48047b10..0be453d15203 100644
> --- a/lib/ofp-group.c
> +++ b/lib/ofp-group.c
> @@ -58,14 +58,16 @@ ofputil_group_from_string(const char *s, uint32_t
> *group_idp)
>  return true;
>  }
>
> -/* Appends to 's' a string representation of the OpenFlow group ID
> 'group_id'.
> - * Most groups' strin

Re: [ovs-dev] [PATCH v3] ofp-prop: Fix unaligned 128 bit access.

2024-06-19 Thread Ales Musil
On Wed, Jun 19, 2024 at 3:19 PM Mike Pattrick  wrote:

> When compiling with '-fsanitize=address,undefined', the "ovs-ofctl
> ct-flush" test will yield the following undefined behavior flagged by
> UBSan. This problem is caused by the fact that 128bit property put/parse
> functions weren't adding appropriate padding before writing or reading
> the value.
>
> This patch uses get_32aligned_* functions to copy the bytes as they are
> aligned.
>
> lib/ofp-prop.c:277:14: runtime error: load of misaligned address
> 0x6060687c for type 'union ovs_be128', which requires 8 byte
> alignment
> 0x6060687c: note: pointer points here
>   00 05 00 14 00 00 00 00  00 00 00 00 00 00 00 00  00 ff ab 00
>   ^
> 0: in ofpprop_parse_u128 lib/ofp-prop.c:277
> 1: in ofp_ct_match_decode lib/ofp-ct.c:525
> 2: in ofp_print_nxt_ct_flush lib/ofp-print.c:959
> 3: in ofp_to_string__ lib/ofp-print.c:1206
> 4: in ofp_to_string lib/ofp-print.c:1264
> 5: in ofp_print lib/ofp-print.c:1308
> 6: in ofctl_ofp_print utilities/ovs-ofctl.c:4899
> 7: in ovs_cmdl_run_command__ lib/command-line.c:247
> 8: in ovs_cmdl_run_command lib/command-line.c:278
> 9: in main utilities/ovs-ofctl.c:186
>
> Signed-off-by: Mike Pattrick 
> ---
> v2: removed memcpy
> v3: fixed checkpatch
> ---
>  lib/ofp-prop.c | 17 ++---
>  1 file changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/lib/ofp-prop.c b/lib/ofp-prop.c
> index 0a685750c..0e54543bd 100644
> --- a/lib/ofp-prop.c
> +++ b/lib/ofp-prop.c
> @@ -21,6 +21,7 @@
>  #include "openvswitch/ofp-errors.h"
>  #include "openvswitch/ofp-prop.h"
>  #include "openvswitch/vlog.h"
> +#include "unaligned.h"
>  #include "util.h"
>  #include "uuid.h"
>
> @@ -190,11 +191,12 @@ ofpprop_parse_be64(const struct ofpbuf *property,
> ovs_be64 *value)
>  enum ofperr
>  ofpprop_parse_be128(const struct ofpbuf *property, ovs_be128 *value)
>  {
> -ovs_be128 *p = property->msg;
> +ovs_32aligned_be128 *p = property->msg;
> +
>  if (ofpbuf_msgsize(property) != sizeof *p) {
>  return OFPERR_OFPBPC_BAD_LEN;
>  }
> -*value = *p;
> +*value = get_32aligned_be128(p);
>  return 0;
>  }
>
> @@ -270,12 +272,13 @@ ofpprop_parse_u64(const struct ofpbuf *property,
> uint64_t *value)
>  enum ofperr
>  ofpprop_parse_u128(const struct ofpbuf *property, ovs_u128 *value)
>  {
> -ovs_be128 *p = property->msg;
> -if (ofpbuf_msgsize(property) != sizeof *p) {
> -return OFPERR_OFPBPC_BAD_LEN;
> +enum ofperr error = ofpprop_parse_be128(property, (ovs_be128 *)
> value);
> +
> +if (!error) {
> +*value = ntoh128(*(ovs_be128 *) value);
>  }
> -*value = ntoh128(*p);
> -return 0;
> +
> +return error;
>  }
>
>  /* Attempts to parse 'property' as a property containing a UUID.  If
> --
> 2.39.3
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
Looks good to me, thanks.

Acked-by: Ales Musil 

-- 

Ales Musil

Senior Software Engineer - OVN Core

Red Hat EMEA 

amu...@redhat.com

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v3 05/10] net: openvswitch: add emit_sample action

2024-06-19 Thread Jakub Kicinski
On Wed, 19 Jun 2024 23:00:06 +0200 Adrian Moreno wrote:
> + OVS_EMIT_SAMPLE_ATTR_UNPSEC,

Are you using this one? Looking closely I presume not, since it's
misspelled ;) You can assign = 1 to GROUP, no need to name the value
for 0.

> + OVS_EMIT_SAMPLE_ATTR_GROUP, /* u32 number. */
> + OVS_EMIT_SAMPLE_ATTR_COOKIE,/* Optional, user specified cookie. */
> + __OVS_EMIT_SAMPLE_ATTR_MAX

kdoc is complaining that __OVS_EMIT_SAMPLE_ATTR_MAX is not documented.
You can add:

/* private: */

before, take a look at include/uapi/linux/netdev.h for example.
-- 
pw-bot: cr
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack

2024-06-19 Thread Xin Long
On Wed, Jun 19, 2024 at 5:20 PM Florian Westphal  wrote:
>
> Xin Long  wrote:
> > Got it, I will fix this in ovs.
>
> Thanks!
>
> I don't want to throw more work at you, but since you are
> already working on ovs+conntrack:
>
> ovs_ct_init() is bad, as this runs unconditionally.
>
> modprobe openvswitch -> conntrack becomes active in all
> existing and future namespaces.
>
> Conntrack is slow, we should avoid this behaviour.
>
> ovs_ct_limit_init() should be called only when the feature is
> configured (the problematic call is nf_conncount_init, as that
> turns on connection tracking, the kmalloc etc is fine).
understand.

>
> Likewise, nf_connlabels_get() should only be called when
> labels are added/configured, not at the start.
>
> I resolved this for tc in 70f06c115bcc but it seems i never
> got around to fix it for ovs.
I will take a look.

Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net] openvswitch: get related ct labels from its master if it is not confirmed

2024-06-19 Thread Xin Long
Ilya found a failure in running check-kernel tests with at_groups=144
(144: conntrack - FTP SNAT orig tuple) in OVS repo. After his further
investigation, the root cause is that the labels sent to userspace
for related ct are incorrect.

The labels for unconfirmed related ct should use its master's labels.
However, the changes made in commit 8c8b73320805 ("openvswitch: set
IPS_CONFIRMED in tmpl status only when commit is set in conntrack")
led to getting labels from this related ct.

So fix it in ovs_ct_get_labels() by changing to copy labels from its
master ct if it is a unconfirmed related ct. Note that there is no
fix needed for ct->mark, as it was already copied from its master
ct for related ct in init_conntrack().

Fixes: 8c8b73320805 ("openvswitch: set IPS_CONFIRMED in tmpl status only when 
commit is set in conntrack")
Reported-by: Ilya Maximets 
Signed-off-by: Xin Long 
---
 net/openvswitch/conntrack.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 331730fd3580..920e802ff01e 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -167,8 +167,13 @@ static u32 ovs_ct_get_mark(const struct nf_conn *ct)
 static void ovs_ct_get_labels(const struct nf_conn *ct,
  struct ovs_key_ct_labels *labels)
 {
-   struct nf_conn_labels *cl = ct ? nf_ct_labels_find(ct) : NULL;
+   struct nf_conn_labels *cl = NULL;
 
+   if (ct) {
+   if (ct->master && !nf_ct_is_confirmed(ct))
+   ct = ct->master;
+   cl = nf_ct_labels_find(ct);
+   }
if (cl)
memcpy(labels, cl->bits, OVS_CT_LABELS_LEN);
else
-- 
2.43.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack

2024-06-19 Thread Florian Westphal
Xin Long  wrote:
> Got it, I will fix this in ovs.

Thanks!

I don't want to throw more work at you, but since you are
already working on ovs+conntrack:

ovs_ct_init() is bad, as this runs unconditionally.

modprobe openvswitch -> conntrack becomes active in all
existing and future namespaces.

Conntrack is slow, we should avoid this behaviour.

ovs_ct_limit_init() should be called only when the feature is
configured (the problematic call is nf_conncount_init, as that
turns on connection tracking, the kmalloc etc is fine).

Likewise, nf_connlabels_get() should only be called when
labels are added/configured, not at the start.

I resolved this for tc in 70f06c115bcc but it seems i never
got around to fix it for ovs.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next v3 10/10] selftests: openvswitch: add emit_sample test

2024-06-19 Thread Adrian Moreno
Add a test to verify sampling packets via psample works.

In order to do that, create a subcommand in ovs-dpctl.py to listen to
on the psample multicast group and print samples.

Signed-off-by: Adrian Moreno 
---
 .../selftests/net/openvswitch/openvswitch.sh  | 110 +-
 .../selftests/net/openvswitch/ovs-dpctl.py|  73 +++-
 2 files changed, 177 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh 
b/tools/testing/selftests/net/openvswitch/openvswitch.sh
index 5cae53543849..4e5a1f252065 100755
--- a/tools/testing/selftests/net/openvswitch/openvswitch.sh
+++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh
@@ -20,7 +20,8 @@ tests="
nat_related_v4  ip4-nat-related: ICMP related 
matches work with SNAT
netlink_checks  ovsnl: validate netlink attrs 
and settings
upcall_interfaces   ovs: test the upcall interfaces
-   drop_reason drop: test drop reasons are 
emitted"
+   drop_reason drop: test drop reasons are 
emitted
+   emit_sample emit_sample: Sampling packets 
with psample"
 
 info() {
 [ $VERBOSE = 0 ] || echo $*
@@ -102,12 +103,21 @@ ovs_netns_spawn_daemon() {
shift
netns=$1
shift
-   info "spawning cmd: $*"
-   ip netns exec $netns $*  >> $ovs_dir/stdout  2>> $ovs_dir/stderr &
+   if [ "$netns" == "_default" ]; then
+   $*  >> $ovs_dir/stdout  2>> $ovs_dir/stderr &
+   else
+   ip netns exec $netns $*  >> $ovs_dir/stdout  2>> 
$ovs_dir/stderr &
+   fi
pid=$!
ovs_sbx "$sbx" on_exit "kill -TERM $pid 2>/dev/null"
 }
 
+ovs_spawn_daemon() {
+   sbx=$1
+   shift
+   ovs_netns_spawn_daemon $sbx "_default" $*
+}
+
 ovs_add_netns_and_veths () {
info "Adding netns attached: sbx:$1 dp:$2 {$3, $4, $5}"
ovs_sbx "$1" ip netns add "$3" || return 1
@@ -170,6 +180,19 @@ ovs_drop_reason_count()
return `echo "$perf_output" | grep "$pattern" | wc -l`
 }
 
+ovs_test_flow_fails () {
+   ERR_MSG="Flow actions may not be safe on all matching packets"
+
+   PRE_TEST=$(dmesg | grep -c "${ERR_MSG}")
+   ovs_add_flow $@ &> /dev/null $@ && return 1
+   POST_TEST=$(dmesg | grep -c "${ERR_MSG}")
+
+   if [ "$PRE_TEST" == "$POST_TEST" ]; then
+   return 1
+   fi
+   return 0
+}
+
 usage() {
echo
echo "$0 [OPTIONS] [TEST]..."
@@ -184,6 +207,87 @@ usage() {
exit 1
 }
 
+
+# emit_sample test
+# - use emit_sample to observe packets
+test_emit_sample() {
+   sbx_add "test_emit_sample" || return $?
+
+   # Add a datapath with per-vport dispatching.
+   ovs_add_dp "test_emit_sample" emit_sample -V 2:1 || return 1
+
+   info "create namespaces"
+   ovs_add_netns_and_veths "test_emit_sample" "emit_sample" \
+   client c0 c1 172.31.110.10/24 -u || return 1
+   ovs_add_netns_and_veths "test_emit_sample" "emit_sample" \
+   server s0 s1 172.31.110.20/24 -u || return 1
+
+   # Check if emit_sample actions can be configured.
+   ovs_add_flow "test_emit_sample" emit_sample \
+   'in_port(1),eth(),eth_type(0x0806),arp()' 'emit_sample(group=1)'
+   if [ $? == 1 ]; then
+   info "no support for emit_sample - skipping"
+   ovs_exit_sig
+   return $ksft_skip
+   fi
+
+   ovs_del_flows "test_emit_sample" emit_sample
+
+   # Allow ARP
+   ovs_add_flow "test_emit_sample" emit_sample \
+   'in_port(1),eth(),eth_type(0x0806),arp()' '2' || return 1
+   ovs_add_flow "test_emit_sample" emit_sample \
+   'in_port(2),eth(),eth_type(0x0806),arp()' '1' || return 1
+
+   # Test action verification.
+   OLDIFS=$IFS
+   IFS='*'
+   min_key='in_port(1),eth(),eth_type(0x0800),ipv4()'
+   for testcase in \
+   "cookie to 
large"*"emit_sample(group=1,cookie=1615141312111009080706050403020100)" \
+   "no group with cookie"*"emit_sample(cookie=abcd)" \
+   "no group"*"sample()";
+   do
+   set -- $testcase;
+   ovs_test_flow_fails "test_emit_sample" emit_sample $min_key $2
+   if [ $? == 1 ]; then
+   info "failed - $1"
+   return 1
+   fi
+   done
+   IFS=$OLDIFS
+
+   # Sample first 14 bytes of all traffic.
+   ovs_add_flow "test_emit_sample" emit_sample \
+   
"in_port(1),eth(),eth_type(0x0800),ipv4(src=172.31.110.10,proto=1),icmp()" 
"trunc(14),emit_sample(group=1,cookie=c0ffee),2"
+
+   # Sample all traffic. In this case, use a sample() action with both
+   # emit_sample and an upcall emulating simultaneous local sampling and
+   # sFlow / IPFIX.
+   nlpid=$(grep -E "listening on upcall packet handler

[ovs-dev] [PATCH net-next v3 09/10] selftests: openvswitch: parse trunc action

2024-06-19 Thread Adrian Moreno
The trunc action was supported decode-able but not parse-able. Add
support for parsing the action string.

Signed-off-by: Adrian Moreno 
---
 .../testing/selftests/net/openvswitch/ovs-dpctl.py  | 13 +
 1 file changed, 13 insertions(+)

diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py 
b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
index ecb9a857af66..6636833abe53 100644
--- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
+++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
@@ -817,6 +817,19 @@ class ovsactions(nla):
 self["attrs"].append(["OVS_ACTION_ATTR_USERSPACE", uact])
 parsed = True
 
+elif parse_starts_block(actstr, "trunc(", False):
+parencount += 1
+actstr, val = parse_extract_field(
+actstr,
+"trunc(",
+r"([0-9]+)",
+int,
+False,
+None,
+)
+self["attrs"].append(["OVS_ACTION_ATTR_TRUNC", val])
+parsed = True
+
 actstr = actstr[strspn(actstr, ", ") :]
 while parencount > 0:
 parencount -= 1
-- 
2.45.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next v3 08/10] selftests: openvswitch: add userspace parsing

2024-06-19 Thread Adrian Moreno
The userspace action lacks parsing support plus it contains a bug in the
name of one of its attributes.

This patch makes userspace action work.

Signed-off-by: Adrian Moreno 
---
 .../selftests/net/openvswitch/ovs-dpctl.py| 24 +--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py 
b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
index beb2499b7126..ecb9a857af66 100644
--- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
+++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
@@ -575,13 +575,27 @@ class ovsactions(nla):
 print_str += "userdata="
 for f in self.get_attr("OVS_USERSPACE_ATTR_USERDATA"):
 print_str += "%x." % f
-if self.get_attr("OVS_USERSPACE_ATTR_TUN_PORT") is not None:
+if self.get_attr("OVS_USERSPACE_ATTR_EGRESS_TUN_PORT") is not None:
 print_str += "egress_tun_port=%d" % self.get_attr(
-"OVS_USERSPACE_ATTR_TUN_PORT"
+"OVS_USERSPACE_ATTR_EGRESS_TUN_PORT"
 )
 print_str += ")"
 return print_str
 
+def parse(self, actstr):
+attrs_desc = (
+("pid", "OVS_USERSPACE_ATTR_PID", int),
+("userdata", "OVS_USERSPACE_ATTR_USERDATA",
+lambda x: list(bytearray.fromhex(x))),
+("egress_tun_port", "OVS_USERSPACE_ATTR_EGRESS_TUN_PORT", int)
+)
+
+attrs, actstr = parse_attrs(actstr, attrs_desc)
+for attr in attrs:
+self["attrs"].append(attr)
+
+return actstr
+
 def dpstr(self, more=False):
 print_str = ""
 
@@ -797,6 +811,12 @@ class ovsactions(nla):
 self["attrs"].append(["OVS_ACTION_ATTR_EMIT_SAMPLE", emitact])
 parsed = True
 
+elif parse_starts_block(actstr, "userspace(", False):
+uact = self.userspace()
+actstr = uact.parse(actstr[len("userspace(") : ])
+self["attrs"].append(["OVS_ACTION_ATTR_USERSPACE", uact])
+parsed = True
+
 actstr = actstr[strspn(actstr, ", ") :]
 while parencount > 0:
 parencount -= 1
-- 
2.45.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next v3 07/10] selftests: openvswitch: add emit_sample action

2024-06-19 Thread Adrian Moreno
Add sample and emit_sample action support to ovs-dpctl.py.

Refactor common attribute parsing logic into an external function.

Signed-off-by: Adrian Moreno 
---
 .../selftests/net/openvswitch/ovs-dpctl.py| 162 +-
 1 file changed, 161 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py 
b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
index 1dd057afd3fb..beb2499b7126 100644
--- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
+++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
@@ -8,6 +8,7 @@ import argparse
 import errno
 import ipaddress
 import logging
+import math
 import multiprocessing
 import re
 import struct
@@ -58,6 +59,7 @@ OVS_FLOW_CMD_DEL = 2
 OVS_FLOW_CMD_GET = 3
 OVS_FLOW_CMD_SET = 4
 
+UINT32_MAX = 0x
 
 def macstr(mac):
 outstr = ":".join(["%02X" % i for i in mac])
@@ -267,6 +269,75 @@ def parse_extract_field(
 return str_skipped, data
 
 
+def parse_attrs(actstr, attr_desc):
+"""Parses the given action string and returns a list of netlink
+attributes based on a list of attribute descriptions.
+
+Each element in the attribute description list is a tuple such as:
+(name, attr_name, parse_func)
+where:
+name: is the string representing the attribute
+attr_name: is the name of the attribute as defined in the uAPI.
+parse_func: is a callable accepting a string and returning either
+a single object (the parsed attribute value) or a tuple of
+two values (the parsed attribute value and the remaining string)
+
+Returns a list of attributes and the remaining string.
+"""
+def parse_attr(actstr, key, func):
+actstr = actstr[len(key) :]
+
+if not func:
+return None, actstr
+
+delim = actstr[0]
+actstr = actstr[1:]
+
+if delim == "=":
+pos = strcspn(actstr, ",)")
+ret = func(actstr[:pos])
+else:
+ret = func(actstr)
+
+if isinstance(ret, tuple):
+(datum, actstr) = ret
+else:
+datum = ret
+actstr = actstr[strcspn(actstr, ",)"):]
+
+if delim == "(":
+if not actstr or actstr[0] != ")":
+raise ValueError("Action contains unbalanced parentheses")
+
+actstr = actstr[1:]
+
+actstr = actstr[strspn(actstr, ", ") :]
+
+return datum, actstr
+
+attrs = []
+attr_desc = list(attr_desc)
+while actstr and actstr[0] != ")" and attr_desc:
+found = False
+for i, (key, attr, func) in enumerate(attr_desc):
+if actstr.startswith(key):
+datum, actstr = parse_attr(actstr, key, func)
+attrs.append([attr, datum])
+found = True
+del attr_desc[i]
+
+if not found:
+raise ValueError("Unknown attribute: '%s'" % actstr)
+
+actstr = actstr[strspn(actstr, ", ") :]
+
+if actstr[0] != ")":
+raise ValueError("Action string contains extra garbage or has "
+ "unbalanced parenthesis: '%s'" % actstr)
+
+return attrs, actstr[1:]
+
+
 class ovs_dp_msg(genlmsg):
 # include the OVS version
 # We need a custom header rather than just being able to rely on
@@ -285,7 +356,7 @@ class ovsactions(nla):
 ("OVS_ACTION_ATTR_SET", "none"),
 ("OVS_ACTION_ATTR_PUSH_VLAN", "none"),
 ("OVS_ACTION_ATTR_POP_VLAN", "flag"),
-("OVS_ACTION_ATTR_SAMPLE", "none"),
+("OVS_ACTION_ATTR_SAMPLE", "sample"),
 ("OVS_ACTION_ATTR_RECIRC", "uint32"),
 ("OVS_ACTION_ATTR_HASH", "none"),
 ("OVS_ACTION_ATTR_PUSH_MPLS", "none"),
@@ -304,8 +375,85 @@ class ovsactions(nla):
 ("OVS_ACTION_ATTR_ADD_MPLS", "none"),
 ("OVS_ACTION_ATTR_DEC_TTL", "none"),
 ("OVS_ACTION_ATTR_DROP", "uint32"),
+("OVS_ACTION_ATTR_EMIT_SAMPLE", "emit_sample"),
 )
 
+class emit_sample(nla):
+nla_flags = NLA_F_NESTED
+
+nla_map = (
+("OVS_EMIT_SAMPLE_ATTR_UNSPEC", "none"),
+("OVS_EMIT_SAMPLE_ATTR_GROUP", "uint32"),
+("OVS_EMIT_SAMPLE_ATTR_COOKIE", "array(uint8)"),
+)
+
+def dpstr(self, more=False):
+args = "group=%d" % self.get_attr("OVS_EMIT_SAMPLE_ATTR_GROUP")
+
+cookie = self.get_attr("OVS_EMIT_SAMPLE_ATTR_COOKIE")
+if cookie:
+args += ",cookie(%s)" % \
+"".join(format(x, "02x") for x in cookie)
+
+return "emit_sample(%s)" % args
+
+def parse(self, actstr):
+desc = (
+("group", "OVS_EMIT_SAMPLE_ATTR_GROUP", int),
+("cookie", "OVS_EMIT_SAMPLE_ATTR_COOKIE",
+lambda x: list(bytearray.fromhex(x)))
+)
+
+attrs, actstr = parse_attrs(actstr, desc)
+
+for attr in attrs:
+self[

[ovs-dev] [PATCH net-next v3 06/10] net: openvswitch: store sampling probability in cb.

2024-06-19 Thread Adrian Moreno
When a packet sample is observed, the sampling rate that was used is
important to estimate the real frequency of such event.

Store the probability of the parent sample action in the skb's cb area
and use it in emit_sample to pass it down to psample.

Signed-off-by: Adrian Moreno 
---
 include/uapi/linux/openvswitch.h |  3 ++-
 net/openvswitch/actions.c| 20 +---
 net/openvswitch/datapath.h   |  3 +++
 net/openvswitch/vport.c  |  1 +
 4 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index b91b4c3cc230..3e7257a714d2 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -649,7 +649,8 @@ enum ovs_flow_attr {
  * Actions are passed as nested attributes.
  *
  * Executes the specified actions with the given probability on a per-packet
- * basis.
+ * basis. Nested actions will be able to access the probability value of the
+ * parent @OVS_ACTION_ATTR_SAMPLE.
  */
 enum ovs_sample_attr {
OVS_SAMPLE_ATTR_UNSPEC,
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 1f555cbba312..85a2ff91379b 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -1048,12 +1048,15 @@ static int sample(struct datapath *dp, struct sk_buff 
*skb,
struct nlattr *sample_arg;
int rem = nla_len(attr);
const struct sample_arg *arg;
+   u32 init_probability;
bool clone_flow_key;
+   int err;
 
/* The first action is always 'OVS_SAMPLE_ATTR_ARG'. */
sample_arg = nla_data(attr);
arg = nla_data(sample_arg);
actions = nla_next(sample_arg, &rem);
+   init_probability = OVS_CB(skb)->probability;
 
if ((arg->probability != U32_MAX) &&
(!arg->probability || get_random_u32() > arg->probability)) {
@@ -1062,9 +1065,16 @@ static int sample(struct datapath *dp, struct sk_buff 
*skb,
return 0;
}
 
+   OVS_CB(skb)->probability = arg->probability;
+
clone_flow_key = !arg->exec;
-   return clone_execute(dp, skb, key, 0, actions, rem, last,
-clone_flow_key);
+   err = clone_execute(dp, skb, key, 0, actions, rem, last,
+   clone_flow_key);
+
+   if (!last)
+   OVS_CB(skb)->probability = init_probability;
+
+   return err;
 }
 
 /* When 'last' is true, clone() should always consume the 'skb'.
@@ -1312,6 +1322,7 @@ static void execute_emit_sample(struct datapath *dp, 
struct sk_buff *skb,
struct psample_group psample_group = {};
struct psample_metadata md = {};
const struct nlattr *a;
+   u32 rate;
int rem;
 
nla_for_each_attr(a, nla_data(attr), nla_len(attr), rem) {
@@ -1330,8 +1341,11 @@ static void execute_emit_sample(struct datapath *dp, 
struct sk_buff *skb,
psample_group.net = ovs_dp_get_net(dp);
md.in_ifindex = OVS_CB(skb)->input_vport->dev->ifindex;
md.trunc_size = skb->len - OVS_CB(skb)->cutlen;
+   md.rate_as_probability = 1;
+
+   rate = OVS_CB(skb)->probability ? OVS_CB(skb)->probability : U32_MAX;
 
-   psample_sample_packet(&psample_group, skb, 0, &md);
+   psample_sample_packet(&psample_group, skb, rate, &md);
 #endif
 }
 
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 0cd29971a907..9ca6231ea647 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -115,12 +115,15 @@ struct datapath {
  * fragmented.
  * @acts_origlen: The netlink size of the flow actions applied to this skb.
  * @cutlen: The number of bytes from the packet end to be removed.
+ * @probability: The sampling probability that was applied to this skb; 0 means
+ * no sampling has occurred; U32_MAX means 100% probability.
  */
 struct ovs_skb_cb {
struct vport*input_vport;
u16 mru;
u16 acts_origlen;
u32 cutlen;
+   u32 probability;
 };
 #define OVS_CB(skb) ((struct ovs_skb_cb *)(skb)->cb)
 
diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index 972ae01a70f7..8732f6e51ae5 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -500,6 +500,7 @@ int ovs_vport_receive(struct vport *vport, struct sk_buff 
*skb,
OVS_CB(skb)->input_vport = vport;
OVS_CB(skb)->mru = 0;
OVS_CB(skb)->cutlen = 0;
+   OVS_CB(skb)->probability = 0;
if (unlikely(dev_net(skb->dev) != ovs_dp_get_net(vport->dp))) {
u32 mark;
 
-- 
2.45.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next v3 05/10] net: openvswitch: add emit_sample action

2024-06-19 Thread Adrian Moreno
Add support for a new action: emit_sample.

This action accepts a u32 group id and a variable-length cookie and uses
the psample multicast group to make the packet available for
observability.

The maximum length of the user-defined cookie is set to 16, same as
tc_cookie, to discourage using cookies that will not be offloadable.

Signed-off-by: Adrian Moreno 
---
 Documentation/netlink/specs/ovs_flow.yaml | 17 +
 include/uapi/linux/openvswitch.h  | 27 ++
 net/openvswitch/Kconfig   |  1 +
 net/openvswitch/actions.c | 45 +++
 net/openvswitch/flow_netlink.c| 33 -
 5 files changed, 122 insertions(+), 1 deletion(-)

diff --git a/Documentation/netlink/specs/ovs_flow.yaml 
b/Documentation/netlink/specs/ovs_flow.yaml
index 4fdfc6b5cae9..a7ab5593a24f 100644
--- a/Documentation/netlink/specs/ovs_flow.yaml
+++ b/Documentation/netlink/specs/ovs_flow.yaml
@@ -727,6 +727,12 @@ attribute-sets:
 name: dec-ttl
 type: nest
 nested-attributes: dec-ttl-attrs
+  -
+name: emit-sample
+type: nest
+nested-attributes: emit-sample-attrs
+doc: |
+  Sends a packet sample to psample for external observation.
   -
 name: tunnel-key-attrs
 enum-name: ovs-tunnel-key-attr
@@ -938,6 +944,17 @@ attribute-sets:
   -
 name: gbp
 type: u32
+  -
+name: emit-sample-attrs
+enum-name: ovs-emit-sample-attr
+name-prefix: ovs-emit-sample-attr-
+attributes:
+  -
+name: group
+type: u32
+  -
+name: cookie
+type: binary
 
 operations:
   name-prefix: ovs-flow-cmd-
diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index efc82c318fa2..b91b4c3cc230 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -914,6 +914,30 @@ struct check_pkt_len_arg {
 };
 #endif
 
+#define OVS_EMIT_SAMPLE_COOKIE_MAX_SIZE 16
+/**
+ * enum ovs_emit_sample_attr - Attributes for %OVS_ACTION_ATTR_EMIT_SAMPLE
+ * action.
+ *
+ * @OVS_EMIT_SAMPLE_ATTR_GROUP: 32-bit number to identify the source of the
+ * sample.
+ * @OVS_EMIT_SAMPLE_ATTR_COOKIE: A variable-length binary cookie that contains
+ * user-defined metadata. The maximum length is OVS_EMIT_SAMPLE_COOKIE_MAX_SIZE
+ * bytes.
+ *
+ * Sends the packet to the psample multicast group with the specified group and
+ * cookie. It is possible to combine this action with the
+ * %OVS_ACTION_ATTR_TRUNC action to limit the size of the packet being emitted.
+ */
+enum ovs_emit_sample_attr {
+   OVS_EMIT_SAMPLE_ATTR_UNPSEC,
+   OVS_EMIT_SAMPLE_ATTR_GROUP, /* u32 number. */
+   OVS_EMIT_SAMPLE_ATTR_COOKIE,/* Optional, user specified cookie. */
+   __OVS_EMIT_SAMPLE_ATTR_MAX
+};
+
+#define OVS_EMIT_SAMPLE_ATTR_MAX (__OVS_EMIT_SAMPLE_ATTR_MAX - 1)
+
 /**
  * enum ovs_action_attr - Action types.
  *
@@ -966,6 +990,8 @@ struct check_pkt_len_arg {
  * of l3 tunnel flag in the tun_flags field of OVS_ACTION_ATTR_ADD_MPLS
  * argument.
  * @OVS_ACTION_ATTR_DROP: Explicit drop action.
+ * @OVS_ACTION_ATTR_EMIT_SAMPLE: Send a sample of the packet to external
+ * observers via psample.
  *
  * Only a single header can be set with a single %OVS_ACTION_ATTR_SET.  Not all
  * fields within a header are modifiable, e.g. the IPv4 protocol and fragment
@@ -1004,6 +1030,7 @@ enum ovs_action_attr {
OVS_ACTION_ATTR_ADD_MPLS, /* struct ovs_action_add_mpls. */
OVS_ACTION_ATTR_DEC_TTL,  /* Nested OVS_DEC_TTL_ATTR_*. */
OVS_ACTION_ATTR_DROP, /* u32 error code. */
+   OVS_ACTION_ATTR_EMIT_SAMPLE,  /* Nested OVS_EMIT_SAMPLE_ATTR_*. */
 
__OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
   * from userspace. */
diff --git a/net/openvswitch/Kconfig b/net/openvswitch/Kconfig
index 29a7081858cd..2535f3f9f462 100644
--- a/net/openvswitch/Kconfig
+++ b/net/openvswitch/Kconfig
@@ -10,6 +10,7 @@ config OPENVSWITCH
   (NF_CONNTRACK && ((!NF_DEFRAG_IPV6 || NF_DEFRAG_IPV6) && \
 (!NF_NAT || NF_NAT) && \
 (!NETFILTER_CONNCOUNT || 
NETFILTER_CONNCOUNT)))
+   depends on PSAMPLE || !PSAMPLE
select LIBCRC32C
select MPLS
select NET_MPLS_GSO
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 964225580824..1f555cbba312 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -24,6 +24,11 @@
 #include 
 #include 
 #include 
+
+#if IS_ENABLED(CONFIG_PSAMPLE)
+#include 
+#endif
+
 #include 
 
 #include "datapath.h"
@@ -1299,6 +1304,37 @@ static int execute_dec_ttl(struct sk_buff *skb, struct 
sw_flow_key *key)
return 0;
 }
 
+static void execute_emit_sample(struct datapath *dp, struct sk_buff *skb,
+   const struct sw_flow_key *key,
+  

[ovs-dev] [PATCH net-next v3 04/10] net: psample: allow using rate as probability

2024-06-19 Thread Adrian Moreno
Although not explicitly documented in the psample module itself, the
definition of PSAMPLE_ATTR_SAMPLE_RATE seems inherited from act_sample.

Quoting tc-sample(8):
"RATE of 100 will lead to an average of one sampled packet out of every
100 observed."

With this semantics, the rates that we can express with an unsigned
32-bits number are very unevenly distributed and concentrated towards
"sampling few packets".
For example, we can express a probability of 2.32E-8% but we
cannot express anything between 100% and 50%.

For sampling applications that are capable of sampling a decent
amount of packets, this sampling rate semantics is not very useful.

Add a new flag to the uAPI that indicates that the sampling rate is
expressed in scaled probability, this is:
- 0 is 0% probability, no packets get sampled.
- U32_MAX is 100% probability, all packets get sampled.

Signed-off-by: Adrian Moreno 
---
 include/net/psample.h |  3 ++-
 include/uapi/linux/psample.h  | 10 +-
 include/uapi/linux/tc_act/tc_sample.h |  1 +
 net/psample/psample.c |  3 +++
 4 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/net/psample.h b/include/net/psample.h
index 2ac71260a546..c52e9ebd88dd 100644
--- a/include/net/psample.h
+++ b/include/net/psample.h
@@ -24,7 +24,8 @@ struct psample_metadata {
u8 out_tc_valid:1,
   out_tc_occ_valid:1,
   latency_valid:1,
-  unused:5;
+  rate_as_probability:1,
+  unused:4;
const u8 *user_cookie;
u32 user_cookie_len;
 };
diff --git a/include/uapi/linux/psample.h b/include/uapi/linux/psample.h
index e80637e1d97b..b765f0e81f20 100644
--- a/include/uapi/linux/psample.h
+++ b/include/uapi/linux/psample.h
@@ -8,7 +8,11 @@ enum {
PSAMPLE_ATTR_ORIGSIZE,
PSAMPLE_ATTR_SAMPLE_GROUP,
PSAMPLE_ATTR_GROUP_SEQ,
-   PSAMPLE_ATTR_SAMPLE_RATE,
+   PSAMPLE_ATTR_SAMPLE_RATE,   /* u32, ratio between observed and
+* sampled packets or scaled probability
+* if PSAMPLE_ATTR_SAMPLE_PROBABILITY
+* is set.
+*/
PSAMPLE_ATTR_DATA,
PSAMPLE_ATTR_GROUP_REFCOUNT,
PSAMPLE_ATTR_TUNNEL,
@@ -20,6 +24,10 @@ enum {
PSAMPLE_ATTR_TIMESTAMP, /* u64, nanoseconds */
PSAMPLE_ATTR_PROTO, /* u16 */
PSAMPLE_ATTR_USER_COOKIE,   /* binary, user provided data */
+   PSAMPLE_ATTR_SAMPLE_PROBABILITY,/* no argument, interpret rate in
+* PSAMPLE_ATTR_SAMPLE_RATE as a
+* probability scaled 0 - U32_MAX.
+*/
 
__PSAMPLE_ATTR_MAX
 };
diff --git a/include/uapi/linux/tc_act/tc_sample.h 
b/include/uapi/linux/tc_act/tc_sample.h
index fee1bcc20793..7ee0735e7b38 100644
--- a/include/uapi/linux/tc_act/tc_sample.h
+++ b/include/uapi/linux/tc_act/tc_sample.h
@@ -18,6 +18,7 @@ enum {
TCA_SAMPLE_TRUNC_SIZE,
TCA_SAMPLE_PSAMPLE_GROUP,
TCA_SAMPLE_PAD,
+   TCA_SAMPLE_PROBABILITY,
__TCA_SAMPLE_MAX
 };
 #define TCA_SAMPLE_MAX (__TCA_SAMPLE_MAX - 1)
diff --git a/net/psample/psample.c b/net/psample/psample.c
index 1c76f3e48dcd..f48b5b9cd409 100644
--- a/net/psample/psample.c
+++ b/net/psample/psample.c
@@ -497,6 +497,9 @@ void psample_sample_packet(struct psample_group *group, 
struct sk_buff *skb,
md->user_cookie))
goto error;
 
+   if (md->rate_as_probability)
+   nla_put_flag(skb, PSAMPLE_ATTR_SAMPLE_PROBABILITY);
+
genlmsg_end(nl_skb, data);
genlmsg_multicast_netns(&psample_nl_family, group->net, nl_skb, 0,
PSAMPLE_NL_MCGRP_SAMPLE, GFP_ATOMIC);
-- 
2.45.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next v3 03/10] net: psample: skip packet copy if no listeners

2024-06-19 Thread Adrian Moreno
If nobody is listening on the multicast group, generating the sample,
which involves copying packet data, seems completely unnecessary.

Return fast in this case.

Reviewed-by: Simon Horman 
Signed-off-by: Adrian Moreno 
---
 net/psample/psample.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/psample/psample.c b/net/psample/psample.c
index b37488f426bc..1c76f3e48dcd 100644
--- a/net/psample/psample.c
+++ b/net/psample/psample.c
@@ -376,6 +376,10 @@ void psample_sample_packet(struct psample_group *group, 
struct sk_buff *skb,
void *data;
int ret;
 
+   if (!genl_has_listeners(&psample_nl_family, group->net,
+   PSAMPLE_NL_MCGRP_SAMPLE))
+   return;
+
meta_len = (in_ifindex ? nla_total_size(sizeof(u16)) : 0) +
   (out_ifindex ? nla_total_size(sizeof(u16)) : 0) +
   (md->out_tc_valid ? nla_total_size(sizeof(u16)) : 0) +
-- 
2.45.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next v3 02/10] net: sched: act_sample: add action cookie to sample

2024-06-19 Thread Adrian Moreno
If the action has a user_cookie, pass it along to the sample so it can
be easily identified.

Signed-off-by: Adrian Moreno 
---
 net/sched/act_sample.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c
index a69b53d54039..2ceb4d141b71 100644
--- a/net/sched/act_sample.c
+++ b/net/sched/act_sample.c
@@ -167,7 +167,9 @@ TC_INDIRECT_SCOPE int tcf_sample_act(struct sk_buff *skb,
 {
struct tcf_sample *s = to_sample(a);
struct psample_group *psample_group;
+   u8 cookie_data[TC_COOKIE_MAX_SIZE];
struct psample_metadata md = {};
+   struct tc_cookie *user_cookie;
int retval;
 
tcf_lastuse_update(&s->tcf_tm);
@@ -189,6 +191,16 @@ TC_INDIRECT_SCOPE int tcf_sample_act(struct sk_buff *skb,
if (skb_at_tc_ingress(skb) && tcf_sample_dev_ok_push(skb->dev))
skb_push(skb, skb->mac_len);
 
+   rcu_read_lock();
+   user_cookie = rcu_dereference(a->user_cookie);
+   if (user_cookie) {
+   memcpy(cookie_data, user_cookie->data,
+  user_cookie->len);
+   md.user_cookie = cookie_data;
+   md.user_cookie_len = user_cookie->len;
+   }
+   rcu_read_unlock();
+
md.trunc_size = s->truncate ? s->trunc_size : skb->len;
psample_sample_packet(psample_group, skb, s->rate, &md);
 
-- 
2.45.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next v3 01/10] net: psample: add user cookie

2024-06-19 Thread Adrian Moreno
Add a user cookie to the sample metadata so that sample emitters can
provide more contextual information to samples.

If present, send the user cookie in a new attribute:
PSAMPLE_ATTR_USER_COOKIE.

Reviewed-by: Simon Horman 
Signed-off-by: Adrian Moreno 
---
 include/net/psample.h| 2 ++
 include/uapi/linux/psample.h | 1 +
 net/psample/psample.c| 9 -
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/include/net/psample.h b/include/net/psample.h
index 0509d2d6be67..2ac71260a546 100644
--- a/include/net/psample.h
+++ b/include/net/psample.h
@@ -25,6 +25,8 @@ struct psample_metadata {
   out_tc_occ_valid:1,
   latency_valid:1,
   unused:5;
+   const u8 *user_cookie;
+   u32 user_cookie_len;
 };
 
 struct psample_group *psample_group_get(struct net *net, u32 group_num);
diff --git a/include/uapi/linux/psample.h b/include/uapi/linux/psample.h
index e585db5bf2d2..e80637e1d97b 100644
--- a/include/uapi/linux/psample.h
+++ b/include/uapi/linux/psample.h
@@ -19,6 +19,7 @@ enum {
PSAMPLE_ATTR_LATENCY,   /* u64, nanoseconds */
PSAMPLE_ATTR_TIMESTAMP, /* u64, nanoseconds */
PSAMPLE_ATTR_PROTO, /* u16 */
+   PSAMPLE_ATTR_USER_COOKIE,   /* binary, user provided data */
 
__PSAMPLE_ATTR_MAX
 };
diff --git a/net/psample/psample.c b/net/psample/psample.c
index a5d9b8446f77..b37488f426bc 100644
--- a/net/psample/psample.c
+++ b/net/psample/psample.c
@@ -386,7 +386,9 @@ void psample_sample_packet(struct psample_group *group, 
struct sk_buff *skb,
   nla_total_size(sizeof(u32)) +/* group_num */
   nla_total_size(sizeof(u32)) +/* seq */
   nla_total_size_64bit(sizeof(u64)) +  /* timestamp */
-  nla_total_size(sizeof(u16)); /* protocol */
+  nla_total_size(sizeof(u16)) +/* protocol */
+  (md->user_cookie_len ?
+   nla_total_size(md->user_cookie_len) : 0); /* user cookie */
 
 #ifdef CONFIG_INET
tun_info = skb_tunnel_info(skb);
@@ -486,6 +488,11 @@ void psample_sample_packet(struct psample_group *group, 
struct sk_buff *skb,
}
 #endif
 
+   if (md->user_cookie && md->user_cookie_len &&
+   nla_put(nl_skb, PSAMPLE_ATTR_USER_COOKIE, md->user_cookie_len,
+   md->user_cookie))
+   goto error;
+
genlmsg_end(nl_skb, data);
genlmsg_multicast_netns(&psample_nl_family, group->net, nl_skb, 0,
PSAMPLE_NL_MCGRP_SAMPLE, GFP_ATOMIC);
-- 
2.45.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next v3 00/10] net: openvswitch: Add sample multicasting.

2024-06-19 Thread Adrian Moreno
** Background **
Currently, OVS supports several packet sampling mechanisms (sFlow,
per-bridge IPFIX, per-flow IPFIX). These end up being translated into a
userspace action that needs to be handled by ovs-vswitchd's handler
threads only to be forwarded to some third party application that
will somehow process the sample and provide observability on the
datapath.

A particularly interesting use-case is controller-driven
per-flow IPFIX sampling where the OpenFlow controller can add metadata
to samples (via two 32bit integers) and this metadata is then available
to the sample-collecting system for correlation.

** Problem **
The fact that sampled traffic share netlink sockets and handler thread
time with upcalls, apart from being a performance bottleneck in the
sample extraction itself, can severely compromise the datapath,
yielding this solution unfit for highly loaded production systems.

Users are left with little options other than guessing what sampling
rate will be OK for their traffic pattern and system load and dealing
with the lost accuracy.

Looking at available infrastructure, an obvious candidated would be
to use psample. However, it's current state does not help with the
use-case at stake because sampled packets do not contain user-defined
metadata.

** Proposal **
This series is an attempt to fix this situation by extending the
existing psample infrastructure to carry a variable length
user-defined cookie.

The main existing user of psample is tc's act_sample. It is also
extended to forward the action's cookie to psample.

Finally, a new OVS action (OVS_SAMPLE_ATTR_EMIT_SAMPLE) is created.
It accepts a group and an optional cookie and uses psample to
multicast the packet and the metadata.

--
v2 -> v3:
- Addressed comments from Simon, Aaron and Ilya.
- Dropped probability propagation in nested sample actions.
- Dropped patch v2's 7/9 in favor of a userspace implementation and
consume skb if emit_sample is the last action, same as we do with
userspace.
- Split ovs-dpctl.py features in independent patches.

v1 -> v2:
- Create a new action ("emit_sample") rather than reuse existing
  "sample" one.
- Add probability semantics to psample's sampling rate.
- Store sampling probability in skb's cb area and use it in emit_sample.
- Test combining "emit_sample" with "trunc"
- Drop group_id filtering and tracepoint in psample.

rfc_v2 -> v1:
- Accomodate Ilya's comments.
- Split OVS's attribute in two attributes and simplify internal
handling of psample arguments.
- Extend psample and tc with a user-defined cookie.
- Add a tracepoint to psample to facilitate troubleshooting.

rfc_v1 -> rfc_v2:
- Use psample instead of a new OVS-only multicast group.
- Extend psample and tc with a user-defined cookie.

Adrian Moreno (10):
  net: psample: add user cookie
  net: sched: act_sample: add action cookie to sample
  net: psample: skip packet copy if no listeners
  net: psample: allow using rate as probability
  net: openvswitch: add emit_sample action
  net: openvswitch: store sampling probability in cb.
  selftests: openvswitch: add emit_sample action
  selftests: openvswitch: add userspace parsing
  selftests: openvswitch: parse trunc action
  selftests: openvswitch: add emit_sample test

 Documentation/netlink/specs/ovs_flow.yaml |  17 ++
 include/net/psample.h |   5 +-
 include/uapi/linux/openvswitch.h  |  30 +-
 include/uapi/linux/psample.h  |  11 +-
 include/uapi/linux/tc_act/tc_sample.h |   1 +
 net/openvswitch/Kconfig   |   1 +
 net/openvswitch/actions.c |  63 +++-
 net/openvswitch/datapath.h|   3 +
 net/openvswitch/flow_netlink.c|  33 ++-
 net/openvswitch/vport.c   |   1 +
 net/psample/psample.c |  16 +-
 net/sched/act_sample.c|  12 +
 .../selftests/net/openvswitch/openvswitch.sh  | 110 ++-
 .../selftests/net/openvswitch/ovs-dpctl.py| 272 +-
 14 files changed, 559 insertions(+), 16 deletions(-)

-- 
2.45.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v2 7/9] net: openvswitch: do not notify drops inside sample

2024-06-19 Thread Ilya Maximets
On 6/19/24 22:40, Adrián Moreno wrote:
> On Wed, Jun 19, 2024 at 08:21:02PM GMT, Ilya Maximets wrote:
>> On 6/19/24 08:35, Adrián Moreno wrote:
>>> On Tue, Jun 18, 2024 at 05:44:05PM GMT, Ilya Maximets wrote:
 On 6/18/24 12:50, Adrián Moreno wrote:
> On Tue, Jun 18, 2024 at 12:22:23PM GMT, Ilya Maximets wrote:
>> On 6/18/24 09:00, Adrián Moreno wrote:
>>> On Mon, Jun 17, 2024 at 02:10:37PM GMT, Ilya Maximets wrote:
 On 6/17/24 13:55, Ilya Maximets wrote:
> On 6/3/24 20:56, Adrian Moreno wrote:
>> The OVS_ACTION_ATTR_SAMPLE action is, in essence,
>> observability-oriented.
>>
>> Apart from some corner case in which it's used a replacement of 
>> clone()
>> for old kernels, it's really only used for sFlow, IPFIX and now,
>> local emit_sample.
>>
>> With this in mind, it doesn't make much sense to report
>> OVS_DROP_LAST_ACTION inside sample actions.
>>
>> For instance, if the flow:
>>
>>   actions:sample(..,emit_sample(..)),2
>>
>> triggers a OVS_DROP_LAST_ACTION skb drop event, it would be extremely
>> confusing for users since the packet did reach its destination.
>>
>> This patch makes internal action execution silently consume the skb
>> instead of notifying a drop for this case.
>>
>> Unfortunately, this patch does not remove all potential sources of
>> confusion since, if the sample action itself is the last action, e.g:
>>
>> actions:sample(..,emit_sample(..))
>>
>> we actually _should_ generate a OVS_DROP_LAST_ACTION event, but we 
>> aren't.
>>
>> Sadly, this case is difficult to solve without breaking the
>> optimization by which the skb is not cloned on last sample actions.
>> But, given explicit drop actions are now supported, OVS can just add 
>> one
>> after the last sample() and rewrite the flow as:
>>
>> actions:sample(..,emit_sample(..)),drop
>>
>> Signed-off-by: Adrian Moreno 
>> ---
>>  net/openvswitch/actions.c | 13 +++--
>>  1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
>> index 33f6d93ba5e4..54fc1abcff95 100644
>> --- a/net/openvswitch/actions.c
>> +++ b/net/openvswitch/actions.c
>> @@ -82,6 +82,15 @@ static struct action_fifo __percpu *action_fifos;
>>  static struct action_flow_keys __percpu *flow_keys;
>>  static DEFINE_PER_CPU(int, exec_actions_level);
>>
>> +static inline void ovs_drop_skb_last_action(struct sk_buff *skb)
>> +{
>> +/* Do not emit packet drops inside sample(). */
>> +if (OVS_CB(skb)->probability)
>> +consume_skb(skb);
>> +else
>> +ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
>> +}
>> +
>>  /* Make a clone of the 'key', using the pre-allocated percpu 
>> 'flow_keys'
>>   * space. Return NULL if out of key spaces.
>>   */
>> @@ -1061,7 +1070,7 @@ static int sample(struct datapath *dp, struct 
>> sk_buff *skb,
>>  if ((arg->probability != U32_MAX) &&
>>  (!arg->probability || get_random_u32() > arg->probability)) 
>> {
>>  if (last)
>> -ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
>> +ovs_drop_skb_last_action(skb);

 Always consuming the skb at this point makes sense, since having 
 smaple()
 as a last action is a reasonable thing to have.  But this looks more 
 like
 a fix for the original drop reason patch set.

>>>
>>> I don't think consuming the skb at this point makes sense. It was very
>>> intentionally changed to a drop since a very common use-case for
>>> sampling is drop-sampling, i.e: replacing an empty action list (that
>>> triggers OVS_DROP_LAST_ACTION) with a sample(emit_sample()). Ideally,
>>> that replacement should not have any effect on the number of
>>> OVS_DROP_LAST_ACTION being reported as the packets are being treated in
>>> the same way (only observed in one case).
>>>
>>>
>>  return 0;
>>  }
>>
>> @@ -1579,7 +1588,7 @@ static int do_execute_actions(struct datapath 
>> *dp, struct sk_buff *skb,
>>  }
>>  }
>>
>> -ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
>> +ovs_drop_skb_last_action(skb);
>
> I don't think I agree with this one.  If we have a sample() action 
> with
> a lot of different actions inside and we reac

Re: [ovs-dev] [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack

2024-06-19 Thread Florian Westphal
Xin Long  wrote:
> > master connection only if it is not yet confirmed.  Users may commit 
> > different
> > labels for the related connection.  This should be more in line with the
> > previous behavior.
> >
> > What do you think?
> >
> You're right.
> Also, I noticed the related ct->mark is set to master ct->mark in
> init_conntrack() as well as secmark when creating the related ct.
> 
> Hi, Florian,
> 
> Any reason why the labels are not set to master ct's in there?

The intent was to have lables be set only via ctnetlink (userspace)
or ruleset.

The original use case was for tagging connections based on
observed behaviour/properties at a later time, not at start of flow.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack

2024-06-19 Thread Xin Long
On Wed, Jun 19, 2024 at 4:20 PM Florian Westphal  wrote:
>
> Xin Long  wrote:
> > > master connection only if it is not yet confirmed.  Users may commit 
> > > different
> > > labels for the related connection.  This should be more in line with the
> > > previous behavior.
> > >
> > > What do you think?
> > >
> > You're right.
> > Also, I noticed the related ct->mark is set to master ct->mark in
> > init_conntrack() as well as secmark when creating the related ct.
> >
> > Hi, Florian,
> >
> > Any reason why the labels are not set to master ct's in there?
>
> The intent was to have lables be set only via ctnetlink (userspace)
> or ruleset.
>
> The original use case was for tagging connections based on
> observed behaviour/properties at a later time, not at start of flow.
Got it, I will fix this in ovs.

Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v2 7/9] net: openvswitch: do not notify drops inside sample

2024-06-19 Thread Adrián Moreno
On Wed, Jun 19, 2024 at 08:21:02PM GMT, Ilya Maximets wrote:
> On 6/19/24 08:35, Adrián Moreno wrote:
> > On Tue, Jun 18, 2024 at 05:44:05PM GMT, Ilya Maximets wrote:
> >> On 6/18/24 12:50, Adrián Moreno wrote:
> >>> On Tue, Jun 18, 2024 at 12:22:23PM GMT, Ilya Maximets wrote:
>  On 6/18/24 09:00, Adrián Moreno wrote:
> > On Mon, Jun 17, 2024 at 02:10:37PM GMT, Ilya Maximets wrote:
> >> On 6/17/24 13:55, Ilya Maximets wrote:
> >>> On 6/3/24 20:56, Adrian Moreno wrote:
>  The OVS_ACTION_ATTR_SAMPLE action is, in essence,
>  observability-oriented.
> 
>  Apart from some corner case in which it's used a replacement of 
>  clone()
>  for old kernels, it's really only used for sFlow, IPFIX and now,
>  local emit_sample.
> 
>  With this in mind, it doesn't make much sense to report
>  OVS_DROP_LAST_ACTION inside sample actions.
> 
>  For instance, if the flow:
> 
>    actions:sample(..,emit_sample(..)),2
> 
>  triggers a OVS_DROP_LAST_ACTION skb drop event, it would be extremely
>  confusing for users since the packet did reach its destination.
> 
>  This patch makes internal action execution silently consume the skb
>  instead of notifying a drop for this case.
> 
>  Unfortunately, this patch does not remove all potential sources of
>  confusion since, if the sample action itself is the last action, e.g:
> 
>  actions:sample(..,emit_sample(..))
> 
>  we actually _should_ generate a OVS_DROP_LAST_ACTION event, but we 
>  aren't.
> 
>  Sadly, this case is difficult to solve without breaking the
>  optimization by which the skb is not cloned on last sample actions.
>  But, given explicit drop actions are now supported, OVS can just add 
>  one
>  after the last sample() and rewrite the flow as:
> 
>  actions:sample(..,emit_sample(..)),drop
> 
>  Signed-off-by: Adrian Moreno 
>  ---
>   net/openvswitch/actions.c | 13 +++--
>   1 file changed, 11 insertions(+), 2 deletions(-)
> 
>  diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
>  index 33f6d93ba5e4..54fc1abcff95 100644
>  --- a/net/openvswitch/actions.c
>  +++ b/net/openvswitch/actions.c
>  @@ -82,6 +82,15 @@ static struct action_fifo __percpu *action_fifos;
>   static struct action_flow_keys __percpu *flow_keys;
>   static DEFINE_PER_CPU(int, exec_actions_level);
> 
>  +static inline void ovs_drop_skb_last_action(struct sk_buff *skb)
>  +{
>  +/* Do not emit packet drops inside sample(). */
>  +if (OVS_CB(skb)->probability)
>  +consume_skb(skb);
>  +else
>  +ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
>  +}
>  +
>   /* Make a clone of the 'key', using the pre-allocated percpu 
>  'flow_keys'
>    * space. Return NULL if out of key spaces.
>    */
>  @@ -1061,7 +1070,7 @@ static int sample(struct datapath *dp, struct 
>  sk_buff *skb,
>   if ((arg->probability != U32_MAX) &&
>   (!arg->probability || get_random_u32() > arg->probability)) 
>  {
>   if (last)
>  -ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
>  +ovs_drop_skb_last_action(skb);
> >>
> >> Always consuming the skb at this point makes sense, since having 
> >> smaple()
> >> as a last action is a reasonable thing to have.  But this looks more 
> >> like
> >> a fix for the original drop reason patch set.
> >>
> >
> > I don't think consuming the skb at this point makes sense. It was very
> > intentionally changed to a drop since a very common use-case for
> > sampling is drop-sampling, i.e: replacing an empty action list (that
> > triggers OVS_DROP_LAST_ACTION) with a sample(emit_sample()). Ideally,
> > that replacement should not have any effect on the number of
> > OVS_DROP_LAST_ACTION being reported as the packets are being treated in
> > the same way (only observed in one case).
> >
> >
>   return 0;
>   }
> 
>  @@ -1579,7 +1588,7 @@ static int do_execute_actions(struct datapath 
>  *dp, struct sk_buff *skb,
>   }
>   }
> 
>  -ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
>  +ovs_drop_skb_last_action(skb);
> >>>
> >>> I don't think I agree with this one.  If we have a sample() action 
> >>> with
> >>> a lot of different actions inside and we reached the end while the 
> >>> last
> >>

Re: [ovs-dev] [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack

2024-06-19 Thread Xin Long
On Wed, Jun 19, 2024 at 1:30 PM Ilya Maximets  wrote:
>
> On 6/19/24 16:07, Xin Long wrote:
> > On Wed, Jun 19, 2024 at 8:58 AM Ilya Maximets  wrote:
> >>
> >> On 6/18/24 17:50, Ilya Maximets wrote:
> >>> On 6/18/24 16:58, Xin Long wrote:
>  On Tue, Jun 18, 2024 at 7:34 AM Ilya Maximets  wrote:
> >
> > On 6/17/24 22:10, Ilya Maximets wrote:
> >> On 7/16/23 23:09, Xin Long wrote:
> >>> By not setting IPS_CONFIRMED in tmpl that allows the exp not to be 
> >>> removed
> >>> from the hashtable when lookup, we can simplify the exp processing 
> >>> code a
> >>> lot in openvswitch conntrack.
> >>>
> >>> Signed-off-by: Xin Long 
> >>> ---
> >>>  net/openvswitch/conntrack.c | 78 
> >>> +
> >>>  1 file changed, 10 insertions(+), 68 deletions(-)
> >>>
> >>> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> >>> index 331730fd3580..fa955e892210 100644
> >>> --- a/net/openvswitch/conntrack.c
> >>> +++ b/net/openvswitch/conntrack.c
> >>> @@ -455,45 +455,6 @@ static int ovs_ct_handle_fragments(struct net 
> >>> *net, struct sw_flow_key *key,
> >>>  return 0;
> >>>  }
> >>>
> >>> -static struct nf_conntrack_expect *
> >>> -ovs_ct_expect_find(struct net *net, const struct nf_conntrack_zone 
> >>> *zone,
> >>> -   u16 proto, const struct sk_buff *skb)
> >>> -{
> >>> -struct nf_conntrack_tuple tuple;
> >>> -struct nf_conntrack_expect *exp;
> >>> -
> >>> -if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), proto, net, 
> >>> &tuple))
> >>> -return NULL;
> >>> -
> >>> -exp = __nf_ct_expect_find(net, zone, &tuple);
> >>> -if (exp) {
> >>> -struct nf_conntrack_tuple_hash *h;
> >>> -
> >>> -/* Delete existing conntrack entry, if it clashes with 
> >>> the
> >>> - * expectation.  This can happen since conntrack ALGs do 
> >>> not
> >>> - * check for clashes between (new) expectations and 
> >>> existing
> >>> - * conntrack entries.  nf_conntrack_in() will check the
> >>> - * expectations only if a conntrack entry can not be 
> >>> found,
> >>> - * which can lead to OVS finding the expectation (here) 
> >>> in the
> >>> - * init direction, but which will not be removed by the
> >>> - * nf_conntrack_in() call, if a matching conntrack entry 
> >>> is
> >>> - * found instead.  In this case all init direction 
> >>> packets
> >>> - * would be reported as new related packets, while reply
> >>> - * direction packets would be reported as un-related
> >>> - * established packets.
> >>> - */
> >>> -h = nf_conntrack_find_get(net, zone, &tuple);
> >>> -if (h) {
> >>> -struct nf_conn *ct = 
> >>> nf_ct_tuplehash_to_ctrack(h);
> >>> -
> >>> -nf_ct_delete(ct, 0, 0);
> >>> -nf_ct_put(ct);
> >>> -}
> >>> -}
> >>> -
> >>> -return exp;
> >>> -}
> >>> -
> >>>  /* This replicates logic from nf_conntrack_core.c that is not 
> >>> exported. */
> >>>  static enum ip_conntrack_info
> >>>  ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h)
> >>> @@ -852,36 +813,16 @@ static int ovs_ct_lookup(struct net *net, 
> >>> struct sw_flow_key *key,
> >>>   const struct ovs_conntrack_info *info,
> >>>   struct sk_buff *skb)
> >>>  {
> >>> -struct nf_conntrack_expect *exp;
> >>> -
> >>> -/* If we pass an expected packet through nf_conntrack_in() the
> >>> - * expectation is typically removed, but the packet could still 
> >>> be
> >>> - * lost in upcall processing.  To prevent this from happening we
> >>> - * perform an explicit expectation lookup.  Expected connections 
> >>> are
> >>> - * always new, and will be passed through conntrack only when 
> >>> they are
> >>> - * committed, as it is OK to remove the expectation at that time.
> >>> - */
> >>> -exp = ovs_ct_expect_find(net, &info->zone, info->family, skb);
> >>> -if (exp) {
> >>> -u8 state;
> >>> -
> >>> -/* NOTE: New connections are NATted and Helped only when
> >>> - * committed, so we are not calling into NAT here.
> >>> - */
> >>> -state = OVS_CS_F_TRACKED | OVS_CS_F_NEW | 
> >>> OVS_CS_F_RELATED;
> >>> -__ovs_ct_update_key(key, state, &info->zone, 
> >>> exp->master);
> >>
> >> Hi, Xin, others.
> >>
> >> Unfortunately, it seems like removal of 

Re: [ovs-dev] [PATCH net-next v2 7/9] net: openvswitch: do not notify drops inside sample

2024-06-19 Thread Ilya Maximets
On 6/19/24 08:35, Adrián Moreno wrote:
> On Tue, Jun 18, 2024 at 05:44:05PM GMT, Ilya Maximets wrote:
>> On 6/18/24 12:50, Adrián Moreno wrote:
>>> On Tue, Jun 18, 2024 at 12:22:23PM GMT, Ilya Maximets wrote:
 On 6/18/24 09:00, Adrián Moreno wrote:
> On Mon, Jun 17, 2024 at 02:10:37PM GMT, Ilya Maximets wrote:
>> On 6/17/24 13:55, Ilya Maximets wrote:
>>> On 6/3/24 20:56, Adrian Moreno wrote:
 The OVS_ACTION_ATTR_SAMPLE action is, in essence,
 observability-oriented.

 Apart from some corner case in which it's used a replacement of clone()
 for old kernels, it's really only used for sFlow, IPFIX and now,
 local emit_sample.

 With this in mind, it doesn't make much sense to report
 OVS_DROP_LAST_ACTION inside sample actions.

 For instance, if the flow:

   actions:sample(..,emit_sample(..)),2

 triggers a OVS_DROP_LAST_ACTION skb drop event, it would be extremely
 confusing for users since the packet did reach its destination.

 This patch makes internal action execution silently consume the skb
 instead of notifying a drop for this case.

 Unfortunately, this patch does not remove all potential sources of
 confusion since, if the sample action itself is the last action, e.g:

 actions:sample(..,emit_sample(..))

 we actually _should_ generate a OVS_DROP_LAST_ACTION event, but we 
 aren't.

 Sadly, this case is difficult to solve without breaking the
 optimization by which the skb is not cloned on last sample actions.
 But, given explicit drop actions are now supported, OVS can just add 
 one
 after the last sample() and rewrite the flow as:

 actions:sample(..,emit_sample(..)),drop

 Signed-off-by: Adrian Moreno 
 ---
  net/openvswitch/actions.c | 13 +++--
  1 file changed, 11 insertions(+), 2 deletions(-)

 diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
 index 33f6d93ba5e4..54fc1abcff95 100644
 --- a/net/openvswitch/actions.c
 +++ b/net/openvswitch/actions.c
 @@ -82,6 +82,15 @@ static struct action_fifo __percpu *action_fifos;
  static struct action_flow_keys __percpu *flow_keys;
  static DEFINE_PER_CPU(int, exec_actions_level);

 +static inline void ovs_drop_skb_last_action(struct sk_buff *skb)
 +{
 +  /* Do not emit packet drops inside sample(). */
 +  if (OVS_CB(skb)->probability)
 +  consume_skb(skb);
 +  else
 +  ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
 +}
 +
  /* Make a clone of the 'key', using the pre-allocated percpu 
 'flow_keys'
   * space. Return NULL if out of key spaces.
   */
 @@ -1061,7 +1070,7 @@ static int sample(struct datapath *dp, struct 
 sk_buff *skb,
if ((arg->probability != U32_MAX) &&
(!arg->probability || get_random_u32() > arg->probability)) 
 {
if (last)
 -  ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
 +  ovs_drop_skb_last_action(skb);
>>
>> Always consuming the skb at this point makes sense, since having smaple()
>> as a last action is a reasonable thing to have.  But this looks more like
>> a fix for the original drop reason patch set.
>>
>
> I don't think consuming the skb at this point makes sense. It was very
> intentionally changed to a drop since a very common use-case for
> sampling is drop-sampling, i.e: replacing an empty action list (that
> triggers OVS_DROP_LAST_ACTION) with a sample(emit_sample()). Ideally,
> that replacement should not have any effect on the number of
> OVS_DROP_LAST_ACTION being reported as the packets are being treated in
> the same way (only observed in one case).
>
>
return 0;
}

 @@ -1579,7 +1588,7 @@ static int do_execute_actions(struct datapath 
 *dp, struct sk_buff *skb,
}
}

 -  ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
 +  ovs_drop_skb_last_action(skb);
>>>
>>> I don't think I agree with this one.  If we have a sample() action with
>>> a lot of different actions inside and we reached the end while the last
>>> action didn't consume the skb, then we should report that.  E.g.
>>> "sample(emit_sample(),push_vlan(),set(eth())),2"  should report that the
>>> cloned skb was dropped.  "sample(push_vlan(),emit_sample())" should not.
>>>
>
> What is the use case for such action list? H

Re: [ovs-dev] [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack

2024-06-19 Thread Ilya Maximets
On 6/19/24 16:07, Xin Long wrote:
> On Wed, Jun 19, 2024 at 8:58 AM Ilya Maximets  wrote:
>>
>> On 6/18/24 17:50, Ilya Maximets wrote:
>>> On 6/18/24 16:58, Xin Long wrote:
 On Tue, Jun 18, 2024 at 7:34 AM Ilya Maximets  wrote:
>
> On 6/17/24 22:10, Ilya Maximets wrote:
>> On 7/16/23 23:09, Xin Long wrote:
>>> By not setting IPS_CONFIRMED in tmpl that allows the exp not to be 
>>> removed
>>> from the hashtable when lookup, we can simplify the exp processing code 
>>> a
>>> lot in openvswitch conntrack.
>>>
>>> Signed-off-by: Xin Long 
>>> ---
>>>  net/openvswitch/conntrack.c | 78 +
>>>  1 file changed, 10 insertions(+), 68 deletions(-)
>>>
>>> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
>>> index 331730fd3580..fa955e892210 100644
>>> --- a/net/openvswitch/conntrack.c
>>> +++ b/net/openvswitch/conntrack.c
>>> @@ -455,45 +455,6 @@ static int ovs_ct_handle_fragments(struct net 
>>> *net, struct sw_flow_key *key,
>>>  return 0;
>>>  }
>>>
>>> -static struct nf_conntrack_expect *
>>> -ovs_ct_expect_find(struct net *net, const struct nf_conntrack_zone 
>>> *zone,
>>> -   u16 proto, const struct sk_buff *skb)
>>> -{
>>> -struct nf_conntrack_tuple tuple;
>>> -struct nf_conntrack_expect *exp;
>>> -
>>> -if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), proto, net, 
>>> &tuple))
>>> -return NULL;
>>> -
>>> -exp = __nf_ct_expect_find(net, zone, &tuple);
>>> -if (exp) {
>>> -struct nf_conntrack_tuple_hash *h;
>>> -
>>> -/* Delete existing conntrack entry, if it clashes with the
>>> - * expectation.  This can happen since conntrack ALGs do 
>>> not
>>> - * check for clashes between (new) expectations and 
>>> existing
>>> - * conntrack entries.  nf_conntrack_in() will check the
>>> - * expectations only if a conntrack entry can not be found,
>>> - * which can lead to OVS finding the expectation (here) in 
>>> the
>>> - * init direction, but which will not be removed by the
>>> - * nf_conntrack_in() call, if a matching conntrack entry is
>>> - * found instead.  In this case all init direction packets
>>> - * would be reported as new related packets, while reply
>>> - * direction packets would be reported as un-related
>>> - * established packets.
>>> - */
>>> -h = nf_conntrack_find_get(net, zone, &tuple);
>>> -if (h) {
>>> -struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
>>> -
>>> -nf_ct_delete(ct, 0, 0);
>>> -nf_ct_put(ct);
>>> -}
>>> -}
>>> -
>>> -return exp;
>>> -}
>>> -
>>>  /* This replicates logic from nf_conntrack_core.c that is not 
>>> exported. */
>>>  static enum ip_conntrack_info
>>>  ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h)
>>> @@ -852,36 +813,16 @@ static int ovs_ct_lookup(struct net *net, struct 
>>> sw_flow_key *key,
>>>   const struct ovs_conntrack_info *info,
>>>   struct sk_buff *skb)
>>>  {
>>> -struct nf_conntrack_expect *exp;
>>> -
>>> -/* If we pass an expected packet through nf_conntrack_in() the
>>> - * expectation is typically removed, but the packet could still be
>>> - * lost in upcall processing.  To prevent this from happening we
>>> - * perform an explicit expectation lookup.  Expected connections 
>>> are
>>> - * always new, and will be passed through conntrack only when they 
>>> are
>>> - * committed, as it is OK to remove the expectation at that time.
>>> - */
>>> -exp = ovs_ct_expect_find(net, &info->zone, info->family, skb);
>>> -if (exp) {
>>> -u8 state;
>>> -
>>> -/* NOTE: New connections are NATted and Helped only when
>>> - * committed, so we are not calling into NAT here.
>>> - */
>>> -state = OVS_CS_F_TRACKED | OVS_CS_F_NEW | OVS_CS_F_RELATED;
>>> -__ovs_ct_update_key(key, state, &info->zone, exp->master);
>>
>> Hi, Xin, others.
>>
>> Unfortunately, it seems like removal of this code broke the expected 
>> behavior.
>> OVS in userspace expects that SYN packet of a new related FTP connection 
>> will
>> get +new+rel+trk flags, but after this patch we're only getting +rel+trk 
>> and not
>> new.  This is a problem because we need to commit this connection with 
>> the label
>> and we do that for +new pack

Re: [ovs-dev] [PATCH v4] netdev-dpdk: Use LSC interrupt mode.

2024-06-19 Thread Kevin Traynor
On 19/06/2024 17:00, David Marchand wrote:
> Querying link status may get delayed for an undeterministic (long) time
> with mlx5 ports. This is a consequence of the mlx5 driver calling ethtool
> kernel API and getting stuck on the kernel RTNL lock while some other
> operation is in progress under this lock.
> 
> One impact for long link status query is that it is called under the bond
> lock taken in write mode periodically in bond_run().
> In parallel, datapath threads may block requesting to read bonding related
> info (like for example in bond_check_admissibility()).
> 
> The LSC interrupt mode is available with many DPDK drivers and is used by
> default with testpmd.
> 
> It seems safe enough to switch on this feature by default in OVS.
> We keep the per interface option to disable this feature in case of an
> unforeseen bug.
> 
> Signed-off-by: David Marchand 
> Reviewed-by: Robin Jarry 
> Acked-by: Mike Pattrick 
> ---

LGTM

Acked-by: Kevin Traynor 

Re backporting. I'm not so keen on changing a default when someone
upgrades from 3.3.1 -> 3.3.2 etc. The feature is already available so
anyone who needs it can enable it.

Perhaps a separate small 'known issue' type of doc patch for phy.rst
could be backported to guide a user if they want to enable it ?

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4] netdev-dpdk: Use LSC interrupt mode.

2024-06-19 Thread Maxime Coquelin




On 6/19/24 18:00, David Marchand wrote:

Querying link status may get delayed for an undeterministic (long) time
with mlx5 ports. This is a consequence of the mlx5 driver calling ethtool
kernel API and getting stuck on the kernel RTNL lock while some other
operation is in progress under this lock.

One impact for long link status query is that it is called under the bond
lock taken in write mode periodically in bond_run().
In parallel, datapath threads may block requesting to read bonding related
info (like for example in bond_check_admissibility()).

The LSC interrupt mode is available with many DPDK drivers and is used by
default with testpmd.

It seems safe enough to switch on this feature by default in OVS.
We keep the per interface option to disable this feature in case of an
unforeseen bug.

Signed-off-by: David Marchand 
Reviewed-by: Robin Jarry 
Acked-by: Mike Pattrick 
---
Changes since v3:
- updated logging in case of error,

Changes since v2:
- fixed typo in NEWS,

Changes since v1:
- (early) fail when interrupt lsc is requested by user but not supported
   by the driver,
- otherwise, log a debug message if user did not request interrupt mode,

---
  Documentation/topics/dpdk/phy.rst |  4 ++--
  NEWS  |  3 +++
  lib/netdev-dpdk.c | 13 -
  vswitchd/vswitch.xml  |  8 
  4 files changed, 21 insertions(+), 7 deletions(-)



Thanks for doing the patch, I agree lsc interrupt should be used by
default.

Acked-by: Maxime Coquelin 

Thanks,
Maxime

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v4] netdev-dpdk: Use LSC interrupt mode.

2024-06-19 Thread David Marchand
Querying link status may get delayed for an undeterministic (long) time
with mlx5 ports. This is a consequence of the mlx5 driver calling ethtool
kernel API and getting stuck on the kernel RTNL lock while some other
operation is in progress under this lock.

One impact for long link status query is that it is called under the bond
lock taken in write mode periodically in bond_run().
In parallel, datapath threads may block requesting to read bonding related
info (like for example in bond_check_admissibility()).

The LSC interrupt mode is available with many DPDK drivers and is used by
default with testpmd.

It seems safe enough to switch on this feature by default in OVS.
We keep the per interface option to disable this feature in case of an
unforeseen bug.

Signed-off-by: David Marchand 
Reviewed-by: Robin Jarry 
Acked-by: Mike Pattrick 
---
Changes since v3:
- updated logging in case of error,

Changes since v2:
- fixed typo in NEWS,

Changes since v1:
- (early) fail when interrupt lsc is requested by user but not supported
  by the driver,
- otherwise, log a debug message if user did not request interrupt mode,

---
 Documentation/topics/dpdk/phy.rst |  4 ++--
 NEWS  |  3 +++
 lib/netdev-dpdk.c | 13 -
 vswitchd/vswitch.xml  |  8 
 4 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/Documentation/topics/dpdk/phy.rst 
b/Documentation/topics/dpdk/phy.rst
index efd168cba8..eefc25613d 100644
--- a/Documentation/topics/dpdk/phy.rst
+++ b/Documentation/topics/dpdk/phy.rst
@@ -546,8 +546,8 @@ the firmware every time to fulfil this request.
 
 Note that not all PMD drivers support LSC interrupts.
 
-The default configuration is polling mode. To set interrupt mode, option
-``dpdk-lsc-interrupt`` has to be set to ``true``.
+The default configuration is interrupt mode. To set polling mode, option
+``dpdk-lsc-interrupt`` has to be set to ``false``.
 
 Command to set interrupt mode for a specific interface::
 $ ovs-vsctl set interface  options:dpdk-lsc-interrupt=true
diff --git a/NEWS b/NEWS
index 5ae0108d55..d05f2d0f89 100644
--- a/NEWS
+++ b/NEWS
@@ -9,6 +9,9 @@ Post-v3.3.0
  https://github.com/openvswitch/ovs.git
- DPDK:
  * OVS validated with DPDK 23.11.1.
+ * Link status changes are now handled via interrupt mode if the DPDK
+   driver supports it.  It is possible to revert to polling mode by setting
+   per interface 'options:dpdk-lsc-interrupt' to 'false'.
 
 
 v3.3.0 - 16 Feb 2024
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 0fa37d5145..c42144b091 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2397,7 +2397,18 @@ netdev_dpdk_set_config(struct netdev *netdev, const 
struct smap *args,
 }
 }
 
-lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", false);
+lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", true);
+if (lsc_interrupt_mode && !(*info.dev_flags & RTE_ETH_DEV_INTR_LSC)) {
+if (smap_get(args, "dpdk-lsc-interrupt")) {
+VLOG_WARN_BUF(errp, "'%s': link status interrupt is not "
+  "supported.", netdev_get_name(netdev));
+err = EINVAL;
+goto out;
+}
+VLOG_DBG("interface '%s': not enabling link status interrupt.",
+ netdev_get_name(netdev));
+lsc_interrupt_mode = false;
+}
 if (dev->requested_lsc_interrupt_mode != lsc_interrupt_mode) {
 dev->requested_lsc_interrupt_mode = lsc_interrupt_mode;
 netdev_request_reconfigure(netdev);
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 8a1b607d71..e3afb78a4e 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -4647,12 +4647,12 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 
type=patch options:peer=p1 \
   
 
-  Set this value to true to configure interrupt mode for
-  Link State Change (LSC) detection instead of poll mode for the DPDK
-  interface.
+  Set this value to false to configure poll mode for
+  Link State Change (LSC) detection instead of interrupt mode for the
+  DPDK interface.
 
 
-  If this value is not set, poll mode is configured.
+  If this value is not set, interrupt mode is configured.
 
 
   This parameter has an effect only on netdev dpdk interfaces.
-- 
2.45.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack

2024-06-19 Thread Xin Long
On Wed, Jun 19, 2024 at 8:58 AM Ilya Maximets  wrote:
>
> On 6/18/24 17:50, Ilya Maximets wrote:
> > On 6/18/24 16:58, Xin Long wrote:
> >> On Tue, Jun 18, 2024 at 7:34 AM Ilya Maximets  wrote:
> >>>
> >>> On 6/17/24 22:10, Ilya Maximets wrote:
>  On 7/16/23 23:09, Xin Long wrote:
> > By not setting IPS_CONFIRMED in tmpl that allows the exp not to be 
> > removed
> > from the hashtable when lookup, we can simplify the exp processing code 
> > a
> > lot in openvswitch conntrack.
> >
> > Signed-off-by: Xin Long 
> > ---
> >  net/openvswitch/conntrack.c | 78 +
> >  1 file changed, 10 insertions(+), 68 deletions(-)
> >
> > diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> > index 331730fd3580..fa955e892210 100644
> > --- a/net/openvswitch/conntrack.c
> > +++ b/net/openvswitch/conntrack.c
> > @@ -455,45 +455,6 @@ static int ovs_ct_handle_fragments(struct net 
> > *net, struct sw_flow_key *key,
> >  return 0;
> >  }
> >
> > -static struct nf_conntrack_expect *
> > -ovs_ct_expect_find(struct net *net, const struct nf_conntrack_zone 
> > *zone,
> > -   u16 proto, const struct sk_buff *skb)
> > -{
> > -struct nf_conntrack_tuple tuple;
> > -struct nf_conntrack_expect *exp;
> > -
> > -if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), proto, net, 
> > &tuple))
> > -return NULL;
> > -
> > -exp = __nf_ct_expect_find(net, zone, &tuple);
> > -if (exp) {
> > -struct nf_conntrack_tuple_hash *h;
> > -
> > -/* Delete existing conntrack entry, if it clashes with the
> > - * expectation.  This can happen since conntrack ALGs do 
> > not
> > - * check for clashes between (new) expectations and 
> > existing
> > - * conntrack entries.  nf_conntrack_in() will check the
> > - * expectations only if a conntrack entry can not be found,
> > - * which can lead to OVS finding the expectation (here) in 
> > the
> > - * init direction, but which will not be removed by the
> > - * nf_conntrack_in() call, if a matching conntrack entry is
> > - * found instead.  In this case all init direction packets
> > - * would be reported as new related packets, while reply
> > - * direction packets would be reported as un-related
> > - * established packets.
> > - */
> > -h = nf_conntrack_find_get(net, zone, &tuple);
> > -if (h) {
> > -struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
> > -
> > -nf_ct_delete(ct, 0, 0);
> > -nf_ct_put(ct);
> > -}
> > -}
> > -
> > -return exp;
> > -}
> > -
> >  /* This replicates logic from nf_conntrack_core.c that is not 
> > exported. */
> >  static enum ip_conntrack_info
> >  ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h)
> > @@ -852,36 +813,16 @@ static int ovs_ct_lookup(struct net *net, struct 
> > sw_flow_key *key,
> >   const struct ovs_conntrack_info *info,
> >   struct sk_buff *skb)
> >  {
> > -struct nf_conntrack_expect *exp;
> > -
> > -/* If we pass an expected packet through nf_conntrack_in() the
> > - * expectation is typically removed, but the packet could still be
> > - * lost in upcall processing.  To prevent this from happening we
> > - * perform an explicit expectation lookup.  Expected connections 
> > are
> > - * always new, and will be passed through conntrack only when they 
> > are
> > - * committed, as it is OK to remove the expectation at that time.
> > - */
> > -exp = ovs_ct_expect_find(net, &info->zone, info->family, skb);
> > -if (exp) {
> > -u8 state;
> > -
> > -/* NOTE: New connections are NATted and Helped only when
> > - * committed, so we are not calling into NAT here.
> > - */
> > -state = OVS_CS_F_TRACKED | OVS_CS_F_NEW | OVS_CS_F_RELATED;
> > -__ovs_ct_update_key(key, state, &info->zone, exp->master);
> 
>  Hi, Xin, others.
> 
>  Unfortunately, it seems like removal of this code broke the expected 
>  behavior.
>  OVS in userspace expects that SYN packet of a new related FTP connection 
>  will
>  get +new+rel+trk flags, but after this patch we're only getting +rel+trk 
>  and not
>  new.  This is a problem because we need to commit this connection with 
>  the label
>  and we do that for +new packets.  If we can't get +new packet we'l

Re: [ovs-dev] [PATCH v4] ofproto: Add upcall/dump-ufid-rules command to map UFIDs to OpenFlow.

2024-06-19 Thread Mike Pattrick
On Tue, Jun 18, 2024 at 3:41 PM Dumitru Ceara  wrote:
>
> It improves the debugging experience if we can easily get a list of
> OpenFlow rules and groups that contribute to the creation of a datapath
> flow.
>
> The suggested workflow is:
> a. dump datapath flows (along with UUIDs), this also prints the core IDs
> (PMD IDs) when applicable.
>
>   $ ovs-appctl dpctl/dump-flows -m
>   flow-dump from pmd on cpu core: 7
>   ufid:7460db8f..., recirc_id(0), 
>
> b. dump related OpenFlow rules and groups:
>   $ ovs-appctl upcall/dump-ufid-rules ufid:7460db8f... pmd=7
>   cookie=0x12345678, table=0 
> priority=100,ip,in_port=2,nw_dst=10.0.0.2,actions=resubmit(,1)
>   cookie=0x0, table=1 priority=200,actions=group:1
>   group_id=1,bucket=bucket_id:0,actions=ct(commit,table=2,nat(dst=20.0.0.2))
>   cookie=0x0, table=2 actions=output:1
>
> The new command only shows rules and groups attached to ukeys that are
> in states UKEY_VISIBLE or UKEY_OPERATIONAL.  That should be fine as all
> other ukeys should not be relevant for the use case presented above.
>
> For ukeys that don't have an xcache populated yet, the command goes
> ahead and populates one.  In theory this is creates a slight overhead as
> those ukeys might not need an xcache until they see traffic (and get
> revalidated) but in practice the overhead should be minimal.
>
> This commit tries to mimic the output format of the ovs-ofctl
> dump-flows/dump-groups commands.  For groups it actually uses
> ofputil_group/_bucket functions for formatting.  For rules it uses
> flow_stats_ds() (the original function was exported and renamed to
> ofproto_rule_stats_ds()).
>
> Signed-off-by: Dumitru Ceara 
> ---

Acked-by: Mike Pattrick 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3] ofp-prop: Fix unaligned 128 bit access.

2024-06-19 Thread Mike Pattrick
When compiling with '-fsanitize=address,undefined', the "ovs-ofctl
ct-flush" test will yield the following undefined behavior flagged by
UBSan. This problem is caused by the fact that 128bit property put/parse
functions weren't adding appropriate padding before writing or reading
the value.

This patch uses get_32aligned_* functions to copy the bytes as they are
aligned.

lib/ofp-prop.c:277:14: runtime error: load of misaligned address
0x6060687c for type 'union ovs_be128', which requires 8 byte
alignment
0x6060687c: note: pointer points here
  00 05 00 14 00 00 00 00  00 00 00 00 00 00 00 00  00 ff ab 00
  ^
0: in ofpprop_parse_u128 lib/ofp-prop.c:277
1: in ofp_ct_match_decode lib/ofp-ct.c:525
2: in ofp_print_nxt_ct_flush lib/ofp-print.c:959
3: in ofp_to_string__ lib/ofp-print.c:1206
4: in ofp_to_string lib/ofp-print.c:1264
5: in ofp_print lib/ofp-print.c:1308
6: in ofctl_ofp_print utilities/ovs-ofctl.c:4899
7: in ovs_cmdl_run_command__ lib/command-line.c:247
8: in ovs_cmdl_run_command lib/command-line.c:278
9: in main utilities/ovs-ofctl.c:186

Signed-off-by: Mike Pattrick 
---
v2: removed memcpy
v3: fixed checkpatch
---
 lib/ofp-prop.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/lib/ofp-prop.c b/lib/ofp-prop.c
index 0a685750c..0e54543bd 100644
--- a/lib/ofp-prop.c
+++ b/lib/ofp-prop.c
@@ -21,6 +21,7 @@
 #include "openvswitch/ofp-errors.h"
 #include "openvswitch/ofp-prop.h"
 #include "openvswitch/vlog.h"
+#include "unaligned.h"
 #include "util.h"
 #include "uuid.h"
 
@@ -190,11 +191,12 @@ ofpprop_parse_be64(const struct ofpbuf *property, 
ovs_be64 *value)
 enum ofperr
 ofpprop_parse_be128(const struct ofpbuf *property, ovs_be128 *value)
 {
-ovs_be128 *p = property->msg;
+ovs_32aligned_be128 *p = property->msg;
+
 if (ofpbuf_msgsize(property) != sizeof *p) {
 return OFPERR_OFPBPC_BAD_LEN;
 }
-*value = *p;
+*value = get_32aligned_be128(p);
 return 0;
 }
 
@@ -270,12 +272,13 @@ ofpprop_parse_u64(const struct ofpbuf *property, uint64_t 
*value)
 enum ofperr
 ofpprop_parse_u128(const struct ofpbuf *property, ovs_u128 *value)
 {
-ovs_be128 *p = property->msg;
-if (ofpbuf_msgsize(property) != sizeof *p) {
-return OFPERR_OFPBPC_BAD_LEN;
+enum ofperr error = ofpprop_parse_be128(property, (ovs_be128 *) value);
+
+if (!error) {
+*value = ntoh128(*(ovs_be128 *) value);
 }
-*value = ntoh128(*p);
-return 0;
+
+return error;
 }
 
 /* Attempts to parse 'property' as a property containing a UUID.  If
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] ofp-prop: Fix unaligned 128 bit access.

2024-06-19 Thread 0-day Robot
Bleep bloop.  Greetings Mike Pattrick, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
ERROR: Inappropriate spacing around cast
#71 FILE: lib/ofp-prop.c:275:
enum ofperr error = ofpprop_parse_be128(property, (ovs_be128 *)value);

ERROR: Inappropriate spacing around cast
#74 FILE: lib/ofp-prop.c:278:
*value = ntoh128(*(ovs_be128 *)value);

Lines checked: 86, Warnings: 0, Errors: 2


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack

2024-06-19 Thread Ilya Maximets
On 6/18/24 17:50, Ilya Maximets wrote:
> On 6/18/24 16:58, Xin Long wrote:
>> On Tue, Jun 18, 2024 at 7:34 AM Ilya Maximets  wrote:
>>>
>>> On 6/17/24 22:10, Ilya Maximets wrote:
 On 7/16/23 23:09, Xin Long wrote:
> By not setting IPS_CONFIRMED in tmpl that allows the exp not to be removed
> from the hashtable when lookup, we can simplify the exp processing code a
> lot in openvswitch conntrack.
>
> Signed-off-by: Xin Long 
> ---
>  net/openvswitch/conntrack.c | 78 +
>  1 file changed, 10 insertions(+), 68 deletions(-)
>
> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> index 331730fd3580..fa955e892210 100644
> --- a/net/openvswitch/conntrack.c
> +++ b/net/openvswitch/conntrack.c
> @@ -455,45 +455,6 @@ static int ovs_ct_handle_fragments(struct net *net, 
> struct sw_flow_key *key,
>  return 0;
>  }
>
> -static struct nf_conntrack_expect *
> -ovs_ct_expect_find(struct net *net, const struct nf_conntrack_zone *zone,
> -   u16 proto, const struct sk_buff *skb)
> -{
> -struct nf_conntrack_tuple tuple;
> -struct nf_conntrack_expect *exp;
> -
> -if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), proto, net, 
> &tuple))
> -return NULL;
> -
> -exp = __nf_ct_expect_find(net, zone, &tuple);
> -if (exp) {
> -struct nf_conntrack_tuple_hash *h;
> -
> -/* Delete existing conntrack entry, if it clashes with the
> - * expectation.  This can happen since conntrack ALGs do not
> - * check for clashes between (new) expectations and existing
> - * conntrack entries.  nf_conntrack_in() will check the
> - * expectations only if a conntrack entry can not be found,
> - * which can lead to OVS finding the expectation (here) in 
> the
> - * init direction, but which will not be removed by the
> - * nf_conntrack_in() call, if a matching conntrack entry is
> - * found instead.  In this case all init direction packets
> - * would be reported as new related packets, while reply
> - * direction packets would be reported as un-related
> - * established packets.
> - */
> -h = nf_conntrack_find_get(net, zone, &tuple);
> -if (h) {
> -struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
> -
> -nf_ct_delete(ct, 0, 0);
> -nf_ct_put(ct);
> -}
> -}
> -
> -return exp;
> -}
> -
>  /* This replicates logic from nf_conntrack_core.c that is not exported. 
> */
>  static enum ip_conntrack_info
>  ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h)
> @@ -852,36 +813,16 @@ static int ovs_ct_lookup(struct net *net, struct 
> sw_flow_key *key,
>   const struct ovs_conntrack_info *info,
>   struct sk_buff *skb)
>  {
> -struct nf_conntrack_expect *exp;
> -
> -/* If we pass an expected packet through nf_conntrack_in() the
> - * expectation is typically removed, but the packet could still be
> - * lost in upcall processing.  To prevent this from happening we
> - * perform an explicit expectation lookup.  Expected connections are
> - * always new, and will be passed through conntrack only when they 
> are
> - * committed, as it is OK to remove the expectation at that time.
> - */
> -exp = ovs_ct_expect_find(net, &info->zone, info->family, skb);
> -if (exp) {
> -u8 state;
> -
> -/* NOTE: New connections are NATted and Helped only when
> - * committed, so we are not calling into NAT here.
> - */
> -state = OVS_CS_F_TRACKED | OVS_CS_F_NEW | OVS_CS_F_RELATED;
> -__ovs_ct_update_key(key, state, &info->zone, exp->master);

 Hi, Xin, others.

 Unfortunately, it seems like removal of this code broke the expected 
 behavior.
 OVS in userspace expects that SYN packet of a new related FTP connection 
 will
 get +new+rel+trk flags, but after this patch we're only getting +rel+trk 
 and not
 new.  This is a problem because we need to commit this connection with the 
 label
 and we do that for +new packets.  If we can't get +new packet we'll have 
 to commit
 every single +rel+trk packet, which doesn't make a lot of sense.  And it's 
 a
 significant behavior change regardless.
>>>
>>> Interestingly enough I see +new+rel+trk packets in cases without SNAT,
>>> but we can only get +rel+trk in cases with SNAT.  So, this may be just
>>> a gene

[ovs-dev] [PATCH v3] ovsdb: Use table indexes if available for ovsdb_query().

2024-06-19 Thread Mike Pattrick
Currently all OVSDB database queries except for UUID lookups all result
in linear lookups over the entire table, even if an index is present.

This patch modifies ovsdb_query() to attempt an index lookup first, if
possible. If no matching indexes are present then a linear index is
still conducted.

To test this, I set up an ovsdb database with a variable number of rows
and timed the average of how long ovsdb-client took to query a single
row. The first two tests involved a linear scan that didn't match any
rows, so there was no overhead associated with sending or encoding
output. The post-patch linear scan was a worst case scenario where the
table did have an appropriate index but the conditions made its usage
impossible. The indexed lookup test was for a matching row, which did
also include overhead associated with a match. The results are included
in the table below.

Rows   | 100k | 200k | 300k | 400k | 500k
---+--+--+--+--+-
Pre-patch linear scan  |  9ms | 24ms | 37ms | 49ms | 61ms
Post-patch linear scan |  9ms | 24ms | 38ms | 49ms | 61ms
Indexed lookup |  3ms |  3ms |  3ms |  3ms |  3ms

I also tested the performance of ovsdb_query() by wrapping it in a loop
and measuring the time it took to perform 1000 linear scans on 1, 10,
100k, and 200k rows. This test showed that the new index checking code
did not slow down worst case lookups to a statistically detectable
degree.

Reported-at: https://issues.redhat.com/browse/FDP-590
Signed-off-by: Mike Pattrick 

---

v3:
 - Included txn in index code
 - Added benchmarks
 - Refactored code
 - Added more tests
 - Now a mock row is created to perform the search with standard
 functions
Signed-off-by: Mike Pattrick 
---
 ovsdb/execution.c|  20 +++--
 ovsdb/query.c| 174 +++
 ovsdb/query.h|   6 +-
 ovsdb/rbac.c |  15 ++--
 ovsdb/rbac.h |  10 ++-
 ovsdb/row.h  |  28 +++
 ovsdb/transaction.c  |  29 +--
 ovsdb/transaction.h  |   5 ++
 tests/ovsdb-execution.at | 108 +++-
 tests/ovsdb-macros.at|  10 +++
 tests/ovsdb-query.at |  18 ++--
 tests/ovsdb-server.at|   2 +-
 tests/ovsdb-tool.at  |   2 +-
 tests/test-ovsdb.c   |  15 +++-
 14 files changed, 363 insertions(+), 79 deletions(-)

diff --git a/ovsdb/execution.c b/ovsdb/execution.c
index f4cc9e802..212839bca 100644
--- a/ovsdb/execution.c
+++ b/ovsdb/execution.c
@@ -459,7 +459,7 @@ ovsdb_execute_select(struct ovsdb_execution *x, struct 
ovsdb_parser *parser,
 if (!error) {
 struct ovsdb_row_set rows = OVSDB_ROW_SET_INITIALIZER;
 
-ovsdb_query_distinct(table, &condition, &columns, &rows);
+ovsdb_query_distinct(table, &condition, &columns, &rows, x->txn);
 ovsdb_row_set_sort(&rows, &sort);
 json_object_put(result, "rows",
 ovsdb_row_set_to_json(&rows, &columns));
@@ -545,8 +545,8 @@ ovsdb_execute_update(struct ovsdb_execution *x, struct 
ovsdb_parser *parser,
 ur.row = row;
 ur.columns = &columns;
 if (ovsdb_rbac_update(x->db, table, &columns, &condition, x->role,
-  x->id)) {
-ovsdb_query(table, &condition, update_row_cb, &ur);
+  x->id, x->txn)) {
+ovsdb_query(table, &condition, update_row_cb, &ur, x->txn);
 } else {
 error = ovsdb_perm_error("RBAC rules for client \"%s\" role "
  "\"%s\" prohibit modification of "
@@ -626,7 +626,7 @@ ovsdb_execute_mutate(struct ovsdb_execution *x, struct 
ovsdb_parser *parser,
 json_integer_create(hmap_count(&table->rows)));
 } else {
 size_t row_count = 0;
-ovsdb_query(table, &condition, count_row_cb, &row_count);
+ovsdb_query(table, &condition, count_row_cb, &row_count, x->txn);
 json_object_put(result, "count",
 json_integer_create(row_count));
 }
@@ -636,8 +636,8 @@ ovsdb_execute_mutate(struct ovsdb_execution *x, struct 
ovsdb_parser *parser,
 mr.mutations = &mutations;
 mr.error = &error;
 if (ovsdb_rbac_mutate(x->db, table, &mutations, &condition, x->role,
-  x->id)) {
-ovsdb_query(table, &condition, mutate_row_cb, &mr);
+  x->id, x->txn)) {
+ovsdb_query(table, &condition, mutate_row_cb, &mr, x->txn);
 } else {
 error = ovsdb_perm_error("RBAC rules for client \"%s\" role "
  "\"%s\" prohibit mutate operation on "
@@ -693,8 +693,9 @@ ovsdb_execute_delete(struct ovsdb_execution *x, struct 
ovsdb_parser *parser,
 dr.table = table;
 dr.txn = x->txn;
 
-if (ovsdb_rbac_delete(x->db, table, &condition, x->role, x->id)) {
-ovsdb_quer

[ovs-dev] [PATCH v2] ofp-prop: Fix unaligned 128 bit access.

2024-06-19 Thread Mike Pattrick
When compiling with '-fsanitize=address,undefined', the "ovs-ofctl
ct-flush" test will yield the following undefined behavior flagged by
UBSan. This problem is caused by the fact that 128bit property put/parse
functions weren't adding appropriate padding before writing or reading
the value.

This patch uses get_32aligned_* functions to copy the bytes as they are
aligned.

lib/ofp-prop.c:277:14: runtime error: load of misaligned address
0x6060687c for type 'union ovs_be128', which requires 8 byte
alignment
0x6060687c: note: pointer points here
  00 05 00 14 00 00 00 00  00 00 00 00 00 00 00 00  00 ff ab 00
  ^
0: in ofpprop_parse_u128 lib/ofp-prop.c:277
1: in ofp_ct_match_decode lib/ofp-ct.c:525
2: in ofp_print_nxt_ct_flush lib/ofp-print.c:959
3: in ofp_to_string__ lib/ofp-print.c:1206
4: in ofp_to_string lib/ofp-print.c:1264
5: in ofp_print lib/ofp-print.c:1308
6: in ofctl_ofp_print utilities/ovs-ofctl.c:4899
7: in ovs_cmdl_run_command__ lib/command-line.c:247
8: in ovs_cmdl_run_command lib/command-line.c:278
9: in main utilities/ovs-ofctl.c:186

Signed-off-by: Mike Pattrick 
---
v2: removed memcpy
---
 lib/ofp-prop.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/lib/ofp-prop.c b/lib/ofp-prop.c
index 0a685750c..842ecebea 100644
--- a/lib/ofp-prop.c
+++ b/lib/ofp-prop.c
@@ -21,6 +21,7 @@
 #include "openvswitch/ofp-errors.h"
 #include "openvswitch/ofp-prop.h"
 #include "openvswitch/vlog.h"
+#include "unaligned.h"
 #include "util.h"
 #include "uuid.h"
 
@@ -190,11 +191,12 @@ ofpprop_parse_be64(const struct ofpbuf *property, 
ovs_be64 *value)
 enum ofperr
 ofpprop_parse_be128(const struct ofpbuf *property, ovs_be128 *value)
 {
-ovs_be128 *p = property->msg;
+ovs_32aligned_be128 *p = property->msg;
+
 if (ofpbuf_msgsize(property) != sizeof *p) {
 return OFPERR_OFPBPC_BAD_LEN;
 }
-*value = *p;
+*value = get_32aligned_be128(p);
 return 0;
 }
 
@@ -270,12 +272,13 @@ ofpprop_parse_u64(const struct ofpbuf *property, uint64_t 
*value)
 enum ofperr
 ofpprop_parse_u128(const struct ofpbuf *property, ovs_u128 *value)
 {
-ovs_be128 *p = property->msg;
-if (ofpbuf_msgsize(property) != sizeof *p) {
-return OFPERR_OFPBPC_BAD_LEN;
+enum ofperr error = ofpprop_parse_be128(property, (ovs_be128 *)value);
+
+if (!error) {
+*value = ntoh128(*(ovs_be128 *)value);
 }
-*value = ntoh128(*p);
-return 0;
+
+return error;
 }
 
 /* Attempts to parse 'property' as a property containing a UUID.  If
-- 
2.39.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] ovsdb: Use table indexes if available for ovsdb_query().

2024-06-19 Thread Simon Horman
On Wed, Jun 19, 2024 at 08:35:20AM -0400, Mike Pattrick wrote:
> On Wed, Jun 19, 2024 at 8:32 AM Simon Horman  wrote:
> >
> > On Mon, Jun 17, 2024 at 01:11:28PM -0400, Mike Pattrick wrote:
> > > On Mon, Jun 3, 2024 at 2:01 PM Ilya Maximets  wrote:
> > > >
> > > > On 6/3/24 06:20, Mike Pattrick wrote:
> > > > > Currently all OVSDB database queries except for UUID lookups all 
> > > > > result
> > > > > in linear lookups over the entire table, even if an index is present.
> > > > >
> > > > > This patch modifies ovsdb_query() to attempt an index lookup first, if
> > > > > possible. If no matching indexes are present then a linear index is
> > > > > still conducted.
> > > > >
> > > > > Reported-at: https://issues.redhat.com/browse/FDP-590
> > > > > Signed-off-by: Mike Pattrick 
> > > > > ---
> > > > >  NEWS |   3 ++
> > > > >  ovsdb/query.c| 102 
> > > > > +++
> > > > >  ovsdb/row.h  |  28 +++
> > > > >  ovsdb/transaction.c  |  27 ---
> > > > >  tests/ovsdb-execution.at |  34 -
> > > > >  tests/ovsdb-server.at|   2 +-
> > > > >  tests/ovsdb-tool.at  |   2 +-
> > > > >  7 files changed, 159 insertions(+), 39 deletions(-)
> > > >
> > > > Hi, Mike.  Thanks for the patch.
> > > >
> > > > Besides what Simon asked, the patch has a few other issues:
> > > >
> > > > 1. Lookup is performed only on the committed index and it doesn't 
> > > > include
> > > >rows that are in-flight in the current transaction.
> > > >
> > > >Unlike rows in a hash table, indexes are updated only after the whole
> > > >transaction is committed.  With this change we'll not be able to find
> > > >newly added rows.
> > > >
> > > >Another thing related to this is that it is allowed to have 
> > > > duplicates
> > > >within a transaction as long as they are removed before the 
> > > > transaction
> > > >ends.  So it is possible that multiple rows will satisfy the 
> > > > condition
> > > >on indexed columns while the transaction is in-flight.
> > > >
> > > >Consider the following commands executed in a sandbox:
> > > >
> > > ># ovs-vsctl set-manager "tcp:my-first-target"
> > > ># ovsdb-client transact unix:$(pwd)/sandbox/db.sock '
> > > >["Open_vSwitch",
> > > > {"op": "select",
> > > >  "table": "Manager",
> > > >  "columns": ["_uuid", "target"],
> > > >  "where": [["target", "==", "tcp:my-first-target"]]},
> > > > {"op": "insert",
> > > >  "table": "Manager",
> > > >  "uuid-name": "duplicate",
> > > >  "row": {"target": "tcp:my-first-target"}},
> > > > {"op": "select",
> > > >  "table": "Manager",
> > > >  "columns": ["_uuid", "target"],
> > > >  "where": [["target", "==", "tcp:my-first-target"]]},
> > > > {"op": "delete",
> > > >  "table": "Manager",
> > > >  "where":[["_uuid","==",["named-uuid","duplicate"]]]},
> > > > {"op": "select",
> > > >  "table": "Manager",
> > > >  "columns": ["_uuid", "target"],
> > > >  "where": [["target", "==", "tcp:my-first-target"]]}]'
> > > >
> > > >Transaction must succeed.  The first selection should return 1 row,
> > > >the second should return both duplicates and the third should again
> > > >return one row.
> > >
> > > This is a good point, I hadn't anticipated this use-case but it does
> > > have a large impact on this change. After working through a few
> > > implementations, I wasn't able to find a solution that wasn't overly
> > > complex. For the next version, I've instead opted to exclude indexed
> > > lookups from transactions that modify the associated row.
> > >
> > > The next version should address this and the other feedback.
> >
> > Hi Mike,
> >
> > My inbox is confusing me somewhat,
> > are you planing a v3 of this patch?
> 
> I seem to have sent the v3 as v2 by mistake. I will resubmit.

Thanks, I think that would help :)
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] ovsdb: Use table indexes if available for ovsdb_query().

2024-06-19 Thread Mike Pattrick
On Wed, Jun 19, 2024 at 8:32 AM Simon Horman  wrote:
>
> On Mon, Jun 17, 2024 at 01:11:28PM -0400, Mike Pattrick wrote:
> > On Mon, Jun 3, 2024 at 2:01 PM Ilya Maximets  wrote:
> > >
> > > On 6/3/24 06:20, Mike Pattrick wrote:
> > > > Currently all OVSDB database queries except for UUID lookups all result
> > > > in linear lookups over the entire table, even if an index is present.
> > > >
> > > > This patch modifies ovsdb_query() to attempt an index lookup first, if
> > > > possible. If no matching indexes are present then a linear index is
> > > > still conducted.
> > > >
> > > > Reported-at: https://issues.redhat.com/browse/FDP-590
> > > > Signed-off-by: Mike Pattrick 
> > > > ---
> > > >  NEWS |   3 ++
> > > >  ovsdb/query.c| 102 +++
> > > >  ovsdb/row.h  |  28 +++
> > > >  ovsdb/transaction.c  |  27 ---
> > > >  tests/ovsdb-execution.at |  34 -
> > > >  tests/ovsdb-server.at|   2 +-
> > > >  tests/ovsdb-tool.at  |   2 +-
> > > >  7 files changed, 159 insertions(+), 39 deletions(-)
> > >
> > > Hi, Mike.  Thanks for the patch.
> > >
> > > Besides what Simon asked, the patch has a few other issues:
> > >
> > > 1. Lookup is performed only on the committed index and it doesn't include
> > >rows that are in-flight in the current transaction.
> > >
> > >Unlike rows in a hash table, indexes are updated only after the whole
> > >transaction is committed.  With this change we'll not be able to find
> > >newly added rows.
> > >
> > >Another thing related to this is that it is allowed to have duplicates
> > >within a transaction as long as they are removed before the transaction
> > >ends.  So it is possible that multiple rows will satisfy the condition
> > >on indexed columns while the transaction is in-flight.
> > >
> > >Consider the following commands executed in a sandbox:
> > >
> > ># ovs-vsctl set-manager "tcp:my-first-target"
> > ># ovsdb-client transact unix:$(pwd)/sandbox/db.sock '
> > >["Open_vSwitch",
> > > {"op": "select",
> > >  "table": "Manager",
> > >  "columns": ["_uuid", "target"],
> > >  "where": [["target", "==", "tcp:my-first-target"]]},
> > > {"op": "insert",
> > >  "table": "Manager",
> > >  "uuid-name": "duplicate",
> > >  "row": {"target": "tcp:my-first-target"}},
> > > {"op": "select",
> > >  "table": "Manager",
> > >  "columns": ["_uuid", "target"],
> > >  "where": [["target", "==", "tcp:my-first-target"]]},
> > > {"op": "delete",
> > >  "table": "Manager",
> > >  "where":[["_uuid","==",["named-uuid","duplicate"]]]},
> > > {"op": "select",
> > >  "table": "Manager",
> > >  "columns": ["_uuid", "target"],
> > >  "where": [["target", "==", "tcp:my-first-target"]]}]'
> > >
> > >Transaction must succeed.  The first selection should return 1 row,
> > >the second should return both duplicates and the third should again
> > >return one row.
> >
> > This is a good point, I hadn't anticipated this use-case but it does
> > have a large impact on this change. After working through a few
> > implementations, I wasn't able to find a solution that wasn't overly
> > complex. For the next version, I've instead opted to exclude indexed
> > lookups from transactions that modify the associated row.
> >
> > The next version should address this and the other feedback.
>
> Hi Mike,
>
> My inbox is confusing me somewhat,
> are you planing a v3 of this patch?

I seem to have sent the v3 as v2 by mistake. I will resubmit.

-M

>
> ...
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] ovsdb: Use table indexes if available for ovsdb_query().

2024-06-19 Thread Simon Horman
On Mon, Jun 17, 2024 at 01:11:28PM -0400, Mike Pattrick wrote:
> On Mon, Jun 3, 2024 at 2:01 PM Ilya Maximets  wrote:
> >
> > On 6/3/24 06:20, Mike Pattrick wrote:
> > > Currently all OVSDB database queries except for UUID lookups all result
> > > in linear lookups over the entire table, even if an index is present.
> > >
> > > This patch modifies ovsdb_query() to attempt an index lookup first, if
> > > possible. If no matching indexes are present then a linear index is
> > > still conducted.
> > >
> > > Reported-at: https://issues.redhat.com/browse/FDP-590
> > > Signed-off-by: Mike Pattrick 
> > > ---
> > >  NEWS |   3 ++
> > >  ovsdb/query.c| 102 +++
> > >  ovsdb/row.h  |  28 +++
> > >  ovsdb/transaction.c  |  27 ---
> > >  tests/ovsdb-execution.at |  34 -
> > >  tests/ovsdb-server.at|   2 +-
> > >  tests/ovsdb-tool.at  |   2 +-
> > >  7 files changed, 159 insertions(+), 39 deletions(-)
> >
> > Hi, Mike.  Thanks for the patch.
> >
> > Besides what Simon asked, the patch has a few other issues:
> >
> > 1. Lookup is performed only on the committed index and it doesn't include
> >rows that are in-flight in the current transaction.
> >
> >Unlike rows in a hash table, indexes are updated only after the whole
> >transaction is committed.  With this change we'll not be able to find
> >newly added rows.
> >
> >Another thing related to this is that it is allowed to have duplicates
> >within a transaction as long as they are removed before the transaction
> >ends.  So it is possible that multiple rows will satisfy the condition
> >on indexed columns while the transaction is in-flight.
> >
> >Consider the following commands executed in a sandbox:
> >
> ># ovs-vsctl set-manager "tcp:my-first-target"
> ># ovsdb-client transact unix:$(pwd)/sandbox/db.sock '
> >["Open_vSwitch",
> > {"op": "select",
> >  "table": "Manager",
> >  "columns": ["_uuid", "target"],
> >  "where": [["target", "==", "tcp:my-first-target"]]},
> > {"op": "insert",
> >  "table": "Manager",
> >  "uuid-name": "duplicate",
> >  "row": {"target": "tcp:my-first-target"}},
> > {"op": "select",
> >  "table": "Manager",
> >  "columns": ["_uuid", "target"],
> >  "where": [["target", "==", "tcp:my-first-target"]]},
> > {"op": "delete",
> >  "table": "Manager",
> >  "where":[["_uuid","==",["named-uuid","duplicate"]]]},
> > {"op": "select",
> >  "table": "Manager",
> >  "columns": ["_uuid", "target"],
> >  "where": [["target", "==", "tcp:my-first-target"]]}]'
> >
> >Transaction must succeed.  The first selection should return 1 row,
> >the second should return both duplicates and the third should again
> >return one row.
> 
> This is a good point, I hadn't anticipated this use-case but it does
> have a large impact on this change. After working through a few
> implementations, I wasn't able to find a solution that wasn't overly
> complex. For the next version, I've instead opted to exclude indexed
> lookups from transactions that modify the associated row.
> 
> The next version should address this and the other feedback.

Hi Mike,

My inbox is confusing me somewhat,
are you planing a v3 of this patch?

...
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] selftests: openvswitch: Set value to nla flags.

2024-06-19 Thread patchwork-bot+netdevbpf
Hello:

This patch was applied to netdev/net.git (main)
by David S. Miller :

On Tue, 18 Jun 2024 09:29:21 +0200 you wrote:
> Netlink flags, although they don't have payload at the netlink level,
> are represented as having "True" as value in pyroute2.
> 
> Without it, trying to add a flow with a flag-type action (e.g: pop_vlan)
> fails with the following traceback:
> 
> Traceback (most recent call last):
>   File "[...]/ovs-dpctl.py", line 2498, in 
> sys.exit(main(sys.argv))
>  ^^
>   File "[...]/ovs-dpctl.py", line 2487, in main
> ovsflow.add_flow(rep["dpifindex"], flow)
>   File "[...]/ovs-dpctl.py", line 2136, in add_flow
> reply = self.nlm_request(
> ^
>   File "[...]/pyroute2/netlink/nlsocket.py", line 822, in nlm_request
> return tuple(self._genlm_request(*argv, **kwarg))
>  ^^^
>   File "[...]/pyroute2/netlink/generic/__init__.py", line 126, in
> nlm_request
> return tuple(super().nlm_request(*argv, **kwarg))
>^^
>   File "[...]/pyroute2/netlink/nlsocket.py", line 1124, in nlm_request
> self.put(msg, msg_type, msg_flags, msg_seq=msg_seq)
>   File "[...]/pyroute2/netlink/nlsocket.py", line 389, in put
> self.sendto_gate(msg, addr)
>   File "[...]/pyroute2/netlink/nlsocket.py", line 1056, in sendto_gate
> msg.encode()
>   File "[...]/pyroute2/netlink/__init__.py", line 1245, in encode
> offset = self.encode_nlas(offset)
>  
>   File "[...]/pyroute2/netlink/__init__.py", line 1560, in encode_nlas
> nla_instance.setvalue(cell[1])
>   File "[...]/pyroute2/netlink/__init__.py", line 1265, in setvalue
> nlv.setvalue(nla_tuple[1])
>  ~^^^
> IndexError: list index out of range
> 
> [...]

Here is the summary with links:
  - [v2] selftests: openvswitch: Set value to nla flags.
https://git.kernel.org/netdev/net/c/a876349d

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] vswitchd: Only lock pages that are faulted in.

2024-06-19 Thread Simon Horman
On Fri, Jun 14, 2024 at 02:22:47PM +0200, Ilya Maximets wrote:
> The main purpose of locking the memory is to ensure that OVS can keep
> doing what it did before in case of increased memory pressure, e.g.,
> during VM ingest / migration.  Fulfilling this requirement can be
> achieved without locking all the allocated memory, but only the pages
> already accessed in the past (faulted in).  Processing of the new
> traffic involves new memory allocations.  Latency on these operations
> can't be guaranteed by the locking.  The main difference would be
> the pre-faulting of the stack memory.  However, in order to revalidate
> or process upcalls on the same traffic, the same amount of stack is
> likely needed, so all the necessary memory will already be faulted in.
> 
> Switch 'mlockall' to MCL_ONFAULT to avoid consuming unnecessarily
> large amounts of RAM on systems with high core counts.  For example,
> in a densely populated OVN cluster this saves about 650 MB of RAM per
> node on a system with 64 cores.  This equates to 320 GB of allocated
> but unused RAM in a 500 node cluster.
> 
> This also makes OVS better suited by default for small systems with
> limited amount of memory.
> 
> The MCL_ONFAULT flag was introduced in Linux kernel 4.4 and wasn't
> available at the time of '--mlockall' introduction, but we can use it
> now.  Falling back to an old way of locking in case we're running on
> an older kernel just in case.
> 
> Only locking the faulted in pages also makes locking compatible with
> vhost post-copy live migration by default, because we'll no longer
> pre-fault all the guest's memory.  Post-copy relies on userfaultfd
> to work on shared huge pages, which is only available in 4.11+ kernels.
> So, technically, it should not be possible for MCL_ONFAULT to fail and
> the call without it to succeed.  But keeping the check just in case
> for now.
> 
> Signed-off-by: Ilya Maximets 

Acked-by: Simon Horman 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] dpdk: Check other_config:dpdk-extra for '--lcores'.

2024-06-19 Thread Kevin Traynor
On 13/06/2024 12:25, Eelco Chaudron wrote:
> 
> 
> On 13 Jun 2024, at 11:05, Kevin Traynor wrote:
> 
>> Currently dpdk lcore args for DPDK EAL init can be generated or
>> they can be written directly by the user through dpdk-extra.
>>
>> If dpdk-extra does not contain '-l' or '-c', a '-l' argument with
>> a core list for DPDK EAL init will be generated.
>>
>> The '--lcores' argument should also be checked, as if it is used in
>> dpdk-extra, currently a '-l' is still generated and that causes
>> DPDK EAL init to fail:
>>
>> |9|dpdk|INFO|EAL ARGS: ovs-vswitchd --lcores 0@18 --in-memory -l 0.
>> |00012|dpdk|ERR|EAL: Option -l is ignored, because (--lcore) is set!
>>
>> Add check for '--lcores' in dpdk-extra config and don't generate '-l'
>> if it is detected:
>>
>> |9|dpdk|INFO|EAL ARGS: ovs-vswitchd --lcores 0@8 --in-memory.
>>
>> Fixes: 543342a41cbc ("DPDK: add support for v2.0.0")
>> Signed-off-by: Kevin Traynor 
>> Reviewed-by: David Marchand 
> 
> Thanks for the patch Kevin, the change looks good to me.
> 
> Acked-by: Eelco Chaudron 
> 

Thanks David and Eelco. Applied and backported as far as oldest
maintained branch (2.17).

Interestingly, if it was to be applied to *all* branches, it would go
back to OVS 2.4!

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] checkpatch: Don't warn on pointer to pointer.

2024-06-19 Thread Simon Horman
On Mon, Jun 10, 2024 at 10:56:10AM -0400, Aaron Conole wrote:
> Aaron Conole  writes:
> 
> > Adrian Moreno  writes:
> >
> >> Current regexp used to check whitespaces around operators does not
> >> consider that there can be more than one "*" together to express pointer
> >> to pointer.
> >>
> >> As a result, false positive warnings are raised when the
> >> patch contains a simple list of pointers, e.g: "char **errrp").
> >>
> >> Fix the regexp to allow more than one consecutive "+" characters.
> >>
> >> Signed-off-by: Adrian Moreno 
> >> ---
> >
> > I'm not quite sure about the functionality of this patch.  For example,
> > this seems to pass just fine:
> >
> >  list = * *bar;
> >
> > BUT I think the coding style shouldn't allow it.  There's a question of
> > how much / where we want to get the errors (after all, it's the same
> > state we are in today of accepting these kinds of lines even when the
> > checkpatch script gets it wrong).  Obviously, the line above is a pretty
> > extreme case, but I think there must be a better regex / matching
> > criteria we can do for the asterisk rather than modifying this existing
> > one.
> >
> > Maybe Ilya / Eelco feel different about it or have opinions.
> 
> I just saw that this was applied already.

Sorry for not waiting longer before applying.
I agree that the concern you have raised is valid.
Could we prepare a follow-up patch?

...
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [Patch] ovsdb-client: Document "--timeout" option in help.

2024-06-19 Thread Simon Horman
On Fri, Jun 07, 2024 at 01:58:42PM +0200, martin.kal...@canonical.com wrote:
> I made a silly typo in the commit message s/infomration/information.
> Would it be possible to fix it when the commit is applied, or is it
> worth posting v2?

Thanks Martin,

I think we can handle that when applying.

Acked-by: Simon Horman 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net] selftests: openvswitch: Use bash as interpreter

2024-06-19 Thread David Laight
From: Simon Horman
> Sent: 17 June 2024 11:34
...
> > sidenote: I like very much the idea to use the least powerful tool, like
> > sh vs bash, awk vs gawk, but it breaks when we forget what is outside of
> > the scope of the former/standard.
> > Perhaps for shell, we could convert all the selftests at once?
> 
> Thanks,
> 
> Now that you mention it, I have the same feelings.
> 
> Do we ever expect to use the minimal tools, when other
> parts of the test suite depend on the enhanced ones?

Certainly trying to avoid bash-isms seems like a good idea.
Especially in scripts where it isn't really that hard.
OTOH avoiding posix features (so the script will run on a traditional SYSV 
/bin/sh)
is probably excessive.

I'd use "${foo%"${foo#?}"}" to get the first character without bash-isms.
But, IIRC, one version of ash/dash made a 'pig's breakfast' of the nested
pattern match.
(Most syntax highlighters don't get the quoting right either...)

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn branch-23.09] northd: Remove unused ovn_datapath_is_transit_switch().

2024-06-19 Thread Dumitru Ceara
On 6/18/24 22:47, Numan Siddique wrote:
> On Tue, Jun 18, 2024 at 3:01 PM Dumitru Ceara  wrote:
>>
>> Fixes: 5f0809be5657 ("Revert "northd: Don't skip transit switch LSP when 
>> creating mcast groups."")
>> Signed-off-by: Dumitru Ceara 
> 
> Acked-by: Numan Siddique 
> 
> I think it's better to update the commit message that it is for
> branch-23.09 only. wdyt ?
> 

Actually v2 was merged and it has a bit of a more detailed commit log:

https://patchwork.ozlabs.org/project/ovn/patch/20240618193656.247302-1-dce...@redhat.com/

Thanks for the review!

Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev