Re: [ovs-dev] [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack

2024-06-17 Thread Ilya Maximets
On 7/16/23 23:09, Xin Long wrote:
> By not setting IPS_CONFIRMED in tmpl that allows the exp not to be removed
> from the hashtable when lookup, we can simplify the exp processing code a
> lot in openvswitch conntrack.
> 
> Signed-off-by: Xin Long 
> ---
>  net/openvswitch/conntrack.c | 78 +
>  1 file changed, 10 insertions(+), 68 deletions(-)
> 
> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> index 331730fd3580..fa955e892210 100644
> --- a/net/openvswitch/conntrack.c
> +++ b/net/openvswitch/conntrack.c
> @@ -455,45 +455,6 @@ static int ovs_ct_handle_fragments(struct net *net, 
> struct sw_flow_key *key,
>   return 0;
>  }
>  
> -static struct nf_conntrack_expect *
> -ovs_ct_expect_find(struct net *net, const struct nf_conntrack_zone *zone,
> -u16 proto, const struct sk_buff *skb)
> -{
> - struct nf_conntrack_tuple tuple;
> - struct nf_conntrack_expect *exp;
> -
> - if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), proto, net, 
> ))
> - return NULL;
> -
> - exp = __nf_ct_expect_find(net, zone, );
> - if (exp) {
> - struct nf_conntrack_tuple_hash *h;
> -
> - /* Delete existing conntrack entry, if it clashes with the
> -  * expectation.  This can happen since conntrack ALGs do not
> -  * check for clashes between (new) expectations and existing
> -  * conntrack entries.  nf_conntrack_in() will check the
> -  * expectations only if a conntrack entry can not be found,
> -  * which can lead to OVS finding the expectation (here) in the
> -  * init direction, but which will not be removed by the
> -  * nf_conntrack_in() call, if a matching conntrack entry is
> -  * found instead.  In this case all init direction packets
> -  * would be reported as new related packets, while reply
> -  * direction packets would be reported as un-related
> -  * established packets.
> -  */
> - h = nf_conntrack_find_get(net, zone, );
> - if (h) {
> - struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
> -
> - nf_ct_delete(ct, 0, 0);
> - nf_ct_put(ct);
> - }
> - }
> -
> - return exp;
> -}
> -
>  /* This replicates logic from nf_conntrack_core.c that is not exported. */
>  static enum ip_conntrack_info
>  ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h)
> @@ -852,36 +813,16 @@ static int ovs_ct_lookup(struct net *net, struct 
> sw_flow_key *key,
>const struct ovs_conntrack_info *info,
>struct sk_buff *skb)
>  {
> - struct nf_conntrack_expect *exp;
> -
> - /* If we pass an expected packet through nf_conntrack_in() the
> -  * expectation is typically removed, but the packet could still be
> -  * lost in upcall processing.  To prevent this from happening we
> -  * perform an explicit expectation lookup.  Expected connections are
> -  * always new, and will be passed through conntrack only when they are
> -  * committed, as it is OK to remove the expectation at that time.
> -  */
> - exp = ovs_ct_expect_find(net, >zone, info->family, skb);
> - if (exp) {
> - u8 state;
> -
> - /* NOTE: New connections are NATted and Helped only when
> -  * committed, so we are not calling into NAT here.
> -  */
> - state = OVS_CS_F_TRACKED | OVS_CS_F_NEW | OVS_CS_F_RELATED;
> - __ovs_ct_update_key(key, state, >zone, exp->master);

Hi, Xin, others.

Unfortunately, it seems like removal of this code broke the expected behavior.
OVS in userspace expects that SYN packet of a new related FTP connection will
get +new+rel+trk flags, but after this patch we're only getting +rel+trk and not
new.  This is a problem because we need to commit this connection with the label
and we do that for +new packets.  If we can't get +new packet we'll have to 
commit
every single +rel+trk packet, which doesn't make a lot of sense.  And it's a
significant behavior change regardless.

Could you, please, take a look?

The issue can be reproduced by running check-kernel tests in OVS repo.
'FTP SNAT orig tuple' tests fail 100% of the time.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next 6/7] selftests: net: Use the provided dpctl rather than the vswitchd for tests.

2024-06-17 Thread Stefano Brivio
On Mon, 17 Jun 2024 14:02:17 -0400
Aaron Conole  wrote:

> The current pmtu test infrastucture requires an installed copy of the
> ovs-vswitchd userspace.  This means that any automated or constrained
> environments may not have the requisite tools to run the tests.  However,
> the pmtu tests don't require any special classifier processing.  Indeed
> they are only using the vswitchd in the most basic mode - as a NORMAL
> switch.
> 
> However, the ovs-dpctl kernel utility can now program all the needed basic
> flows to allow traffic to traverse the tunnels and provide support for at
> least testing some basic pmtu scenarios.  More complicated flow pipelines
> can be added to the internal ovs test infrastructure, but that is work for
> the future.  For now, enable the most common cases - wide mega flows with
> no other prerequisites.
> 
> Enhance the pmtu testing to try testing using the internal utility, first.
> As a fallback, if the internal utility isn't running, then try with the
> ovs-vswitchd userspace tools.

Oh, nice, it looks saner than I thought. :)

> Signed-off-by: Aaron Conole 
> ---
>  tools/testing/selftests/net/pmtu.sh | 145 +++-
>  1 file changed, 123 insertions(+), 22 deletions(-)

Reviewed-by: Stefano Brivio 

-- 
Stefano

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] controller: Send RARP/GARP for VIF post link state is up.

2024-06-17 Thread Mark Michelson

Thanks Shibir,

I merged this to main, branch-24.03, and branch-23.09.

On 5/27/24 14:24, Shibir Basak wrote:

Currently, GARP/RARP broadcast is sent for VIFs (part of logical
switch with localnet port) after iface-id is set.
This fix is to avoid packet loss during migration if iface-id
is set even before the VM migration is completed.

Signed-off-by: Shibir Basak 
Acked-by: Naveen Yerramneni 
---
  controller/ovn-controller.c | 1 +
  controller/pinctrl.c| 4 
  2 files changed, 5 insertions(+)

diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 6b38f113d..982378a50 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -1128,6 +1128,7 @@ ctrl_register_ovs_idl(struct ovsdb_idl *ovs_idl)
  ovsdb_idl_add_table(ovs_idl, _table_queue);
  ovsdb_idl_add_column(ovs_idl, _queue_col_other_config);
  ovsdb_idl_add_column(ovs_idl, _queue_col_external_ids);
+ovsdb_idl_add_column(ovs_idl, _interface_col_link_state);
  
  chassis_register_ovs_idl(ovs_idl);

  encaps_register_ovs_idl(ovs_idl);
diff --git a/controller/pinctrl.c b/controller/pinctrl.c
index 6a2c3dc68..b5d3162b8 100644
--- a/controller/pinctrl.c
+++ b/controller/pinctrl.c
@@ -6375,6 +6375,10 @@ get_localnet_vifs_l3gwports(
  if (!pb || pb->chassis != chassis) {
  continue;
  }
+if (!iface_rec->link_state ||
+strcmp(iface_rec->link_state, "up")) {
+continue;
+}
  struct local_datapath *ld
  = get_local_datapath(local_datapaths,
   pb->datapath->tunnel_key);


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v2 0/3] Arbitrary match for NAT

2024-06-17 Thread Mark Michelson

I pushed this series to main.

I also added a fourth commit that adds a note to NEWS about conditional NAT.

On 5/29/24 11:56, Ales Musil wrote:

This series adds the ability to have extra match per NAT, this allows
the CMS to have more fine-grained control over the NAT action. At the
same time it allows to have "duplicate" NATs e.g. multiple SNATs for
the same logical_ip as well as multiple DNATs for the same external_ip.

There is also priority in addition to the match which controls the
evaluation order of the NAT with match, as the priority can be used
only in combination with match.

Ales Musil (3):
   nothd: Unify the priority calculation for NAT flows.
   nb: Add support for match and priority in NAT.
   northd: Use the NAT match column.

  northd/northd.c   |  97 +++---
  northd/ovn-northd.8.xml   |  31 +
  ovn-nb.ovsschema  |   8 +-
  ovn-nb.xml|  15 +++
  tests/ovn-nbctl.at| 220 +-
  tests/ovn-northd.at   |  79 +++
  tests/system-ovn.at   | 272 ++
  utilities/ovn-nbctl.8.xml |  14 +-
  utilities/ovn-nbctl.c | 189 --
  9 files changed, 736 insertions(+), 189 deletions(-)



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next 7/7] selftests: net: add config for openvswitch

2024-06-17 Thread Aaron Conole
The pmtu testing will require that the OVS module is installed,
so do that.

Reviewed-by: Simon Horman 
Tested-by: Simon Horman 
Signed-off-by: Aaron Conole 
---
 tools/testing/selftests/net/config | 5 +
 1 file changed, 5 insertions(+)

diff --git a/tools/testing/selftests/net/config 
b/tools/testing/selftests/net/config
index 04de7a6ba6f3..d85fb2d1f132 100644
--- a/tools/testing/selftests/net/config
+++ b/tools/testing/selftests/net/config
@@ -101,3 +101,8 @@ CONFIG_NETFILTER_XT_MATCH_POLICY=m
 CONFIG_CRYPTO_ARIA=y
 CONFIG_XFRM_INTERFACE=m
 CONFIG_XFRM_USER=m
+CONFIG_OPENVSWITCH=m
+CONFIG_OPENVSWITCH_GRE=m
+CONFIG_OPENVSWITCH_VXLAN=m
+CONFIG_OPENVSWITCH_GENEVE=m
+CONFIG_NF_CONNTRACK_OVS=y
-- 
2.45.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next 6/7] selftests: net: Use the provided dpctl rather than the vswitchd for tests.

2024-06-17 Thread Aaron Conole
The current pmtu test infrastucture requires an installed copy of the
ovs-vswitchd userspace.  This means that any automated or constrained
environments may not have the requisite tools to run the tests.  However,
the pmtu tests don't require any special classifier processing.  Indeed
they are only using the vswitchd in the most basic mode - as a NORMAL
switch.

However, the ovs-dpctl kernel utility can now program all the needed basic
flows to allow traffic to traverse the tunnels and provide support for at
least testing some basic pmtu scenarios.  More complicated flow pipelines
can be added to the internal ovs test infrastructure, but that is work for
the future.  For now, enable the most common cases - wide mega flows with
no other prerequisites.

Enhance the pmtu testing to try testing using the internal utility, first.
As a fallback, if the internal utility isn't running, then try with the
ovs-vswitchd userspace tools.

Signed-off-by: Aaron Conole 
---
 tools/testing/selftests/net/pmtu.sh | 145 +++-
 1 file changed, 123 insertions(+), 22 deletions(-)

diff --git a/tools/testing/selftests/net/pmtu.sh 
b/tools/testing/selftests/net/pmtu.sh
index cfc84958025a..51ccb9bed069 100755
--- a/tools/testing/selftests/net/pmtu.sh
+++ b/tools/testing/selftests/net/pmtu.sh
@@ -842,25 +842,97 @@ setup_bridge() {
run_cmd ${ns_a} ip link set veth_A-C master br0
 }
 
+setup_ovs_via_internal_utility() {
+   type="${1}"
+   a_addr="${2}"
+   b_addr="${3}"
+   dport="${4}"
+
+   run_cmd python3 ./openvswitch/ovs-dpctl.py add-if ovs_br0 ${type}_a -t 
${type} || return 1
+
+   ports=$(python3 ./openvswitch/ovs-dpctl.py show)
+   br0_port=$(echo "$ports" | grep -E "\sovs_br0" | sed -e 's@port @@' | 
cut -d: -f1 | xargs)
+   type_a_port=$(echo "$ports" | grep ${type}_a | sed -e 's@port @@' | cut 
-d: -f1 | xargs)
+   veth_a_port=$(echo "$ports" | grep veth_A | sed -e 's@port @@' | cut 
-d: -f1 | xargs)
+
+   v4_a_tun="${prefix4}.${a_r1}.1"
+   v4_b_tun="${prefix4}.${b_r1}.1"
+
+   v6_a_tun="${prefix6}:${a_r1}::1"
+   v6_b_tun="${prefix6}:${b_r1}::1"
+
+   if [ "${v4_a_tun}" = "${a_addr}" ]; then
+   run_cmd python3 ./openvswitch/ovs-dpctl.py add-flow ovs_br0 \
+   
"recirc_id(0),in_port(${veth_a_port}),eth(),eth_type(0x0800),ipv4()" \
+   
"set(tunnel(tun_id=1,dst=${v4_b_tun},ttl=64,tp_dst=${dport},flags(df|csum))),${type_a_port}"
+   run_cmd python3 ./openvswitch/ovs-dpctl.py add-flow ovs_br0 \
+   
"recirc_id(0),in_port(${veth_a_port}),eth(),eth_type(0x86dd),ipv6()" \
+   
"set(tunnel(tun_id=1,dst=${v4_b_tun},ttl=64,tp_dst=${dport},flags(df|csum))),${type_a_port}"
+   run_cmd python3 ./openvswitch/ovs-dpctl.py add-flow ovs_br0 \
+   
"recirc_id(0),tunnel(tun_id=1,src=${v4_b_tun},dst=${v4_a_tun}),in_port(${type_a_port}),eth(),eth_type(0x0800),ipv4()"
 \
+   "${veth_a_port}"
+   run_cmd python3 ./openvswitch/ovs-dpctl.py add-flow ovs_br0 \
+   
"recirc_id(0),tunnel(tun_id=1,src=${v4_b_tun},dst=${v4_a_tun}),in_port(${type_a_port}),eth(),eth_type(0x86dd),ipv6()"
 \
+   "${veth_a_port}"
+   run_cmd python3 ./openvswitch/ovs-dpctl.py add-flow ovs_br0 \
+   
"recirc_id(0),tunnel(tun_id=1,src=${v4_b_tun},dst=${v4_a_tun}),in_port(${type_a_port}),eth(),eth_type(0x0806),arp()"
 \
+   "${veth_a_port}"
+   run_cmd python3 ./openvswitch/ovs-dpctl.py add-flow ovs_br0 \
+   
"recirc_id(0),in_port(${veth_a_port}),eth(),eth_type(0x0806),arp(sip=${veth4_c_addr},tip=${tunnel4_b_addr})"
 \
+   
"set(tunnel(tun_id=1,dst=${v4_b_tun},ttl=64,tp_dst=${dport},flags(df|csum))),${type_a_port}"
+   else
+   run_cmd python3 ./openvswitch/ovs-dpctl.py add-flow ovs_br0 \
+   
"recirc_id(0),in_port(${veth_a_port}),eth(),eth_type(0x0800),ipv4()" \
+   
"set(tunnel(tun_id=1,ipv6_dst=${v6_b_tun},ttl=64,tp_dst=${dport},flags(df|csum))),${type_a_port}"
+   run_cmd python3 ./openvswitch/ovs-dpctl.py add-flow ovs_br0 \
+   
"recirc_id(0),in_port(${veth_a_port}),eth(),eth_type(0x86dd),ipv6()" \
+   
"set(tunnel(tun_id=1,ipv6_dst=${v6_b_tun},ttl=64,tp_dst=${dport},flags(df|csum))),${type_a_port}"
+   run_cmd python3 ./openvswitch/ovs-dpctl.py add-flow ovs_br0 \
+   
"recirc_id(0),tunnel(tun_id=1,ipv6_src=${v6_b_tun},ipv6_dst=${v6_a_tun}),in_port(${type_a_port}),eth(),eth_type(0x0800),ipv4()"
 \
+   "${veth_a_port}"
+   run_cmd python3 ./openvswitch/ovs-dpctl.py add-flow ovs_br0 \
+   
"recirc_id(0),tunnel(tun_id=1,ipv6_src=${v6_b_tun},ipv6_dst=${v6_a_tun}),in_port(${type_a_port}),eth(),eth_type(0x86dd),ipv6()"
 \
+   "${veth_a_port}"
+

[ovs-dev] [PATCH net-next 5/7] selftests: openvswitch: Support implicit ipv6 arguments.

2024-06-17 Thread Aaron Conole
The current iteration of IPv6 support requires explicit fields to be set
in addition to not properly support the actual IPv6 addresses properly.
With this change, make it so that the ipv6() bare option is usable to
create wildcarded flows to match broad swaths of ipv6 traffic.

Reviewed-by: Simon Horman 
Tested-by: Simon Horman 
Signed-off-by: Aaron Conole 
---
 .../selftests/net/openvswitch/ovs-dpctl.py| 42 ---
 1 file changed, 27 insertions(+), 15 deletions(-)

diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py 
b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
index 2f16df2fb16b..2062e7e6e99e 100644
--- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
+++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
@@ -200,6 +200,18 @@ def convert_ipv4(data):
 
 return int(ipaddress.IPv4Address(ip)), int(ipaddress.IPv4Address(mask))
 
+def convert_ipv6(data):
+ip, _, mask = data.partition('/')
+
+if not ip:
+ip = mask = 0
+elif not mask:
+mask = ':::::::'
+elif mask.isdigit():
+mask = ipaddress.IPv6Network("::/" + mask).hostmask
+
+return ipaddress.IPv6Address(ip).packed, ipaddress.IPv6Address(mask).packed
+
 def convert_int(size):
 def convert_int_sized(data):
 value, _, mask = data.partition('/')
@@ -941,21 +953,21 @@ class ovskey(nla):
 "src",
 "src",
 lambda x: str(ipaddress.IPv6Address(x)),
-lambda x: int.from_bytes(x, "big"),
-lambda x: ipaddress.IPv6Address(x),
+lambda x: ipaddress.IPv6Address(x).packed if x else 0,
+convert_ipv6,
 ),
 (
 "dst",
 "dst",
 lambda x: str(ipaddress.IPv6Address(x)),
-lambda x: int.from_bytes(x, "big"),
-lambda x: ipaddress.IPv6Address(x),
+lambda x: ipaddress.IPv6Address(x).packed if x else 0,
+convert_ipv6,
 ),
-("label", "label", "%d", int),
-("proto", "proto", "%d", int),
-("tclass", "tclass", "%d", int),
-("hlimit", "hlimit", "%d", int),
-("frag", "frag", "%d", int),
+("label", "label", "%d", lambda x: int(x) if x else 0),
+("proto", "proto", "%d", lambda x: int(x) if x else 0),
+("tclass", "tclass", "%d", lambda x: int(x) if x else 0),
+("hlimit", "hlimit", "%d", lambda x: int(x) if x else 0),
+("frag", "frag", "%d", lambda x: int(x) if x else 0),
 )
 
 def __init__(
@@ -1152,8 +1164,8 @@ class ovskey(nla):
 (
 "target",
 "target",
-lambda x: str(ipaddress.IPv6Address(x)),
-lambda x: int.from_bytes(x, "big"),
+lambda x: ipaddress.IPv6Address(x).packed,
+convert_ipv6,
 ),
 ("sll", "sll", macstr, lambda x: int.from_bytes(x, "big")),
 ("tll", "tll", macstr, lambda x: int.from_bytes(x, "big")),
@@ -1237,14 +1249,14 @@ class ovskey(nla):
 (
 "src",
 "src",
-lambda x: str(ipaddress.IPv6Address(x)),
-lambda x: int.from_bytes(x, "big", convertmac),
+lambda x: ipaddress.IPv6Address(x).packed,
+convert_ipv6,
 ),
 (
 "dst",
 "dst",
-lambda x: str(ipaddress.IPv6Address(x)),
-lambda x: int.from_bytes(x, "big"),
+lambda x: ipaddress.IPv6Address(x).packed,
+convert_ipv6,
 ),
 ("tp_src", "tp_src", "%d", int),
 ("tp_dst", "tp_dst", "%d", int),
-- 
2.45.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next 4/7] selftests: openvswitch: Add support for tunnel() key.

2024-06-17 Thread Aaron Conole
This will be used when setting details about the tunnel to use as
transport.  There is a difference between the ODP format between tunnel():
the 'key' flag is not actually a flag field, so we don't support it in the
same way that the vswitchd userspace supports displaying it.

Signed-off-by: Aaron Conole 
---
 .../selftests/net/openvswitch/ovs-dpctl.py| 167 +-
 1 file changed, 166 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py 
b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
index 4c235ff07aeb..2f16df2fb16b 100644
--- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
+++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
@@ -709,7 +709,7 @@ class ovskey(nla):
 ("OVS_KEY_ATTR_ARP", "ovs_key_arp"),
 ("OVS_KEY_ATTR_ND", "ovs_key_nd"),
 ("OVS_KEY_ATTR_SKB_MARK", "uint32"),
-("OVS_KEY_ATTR_TUNNEL", "none"),
+("OVS_KEY_ATTR_TUNNEL", "ovs_key_tunnel"),
 ("OVS_KEY_ATTR_SCTP", "ovs_key_sctp"),
 ("OVS_KEY_ATTR_TCP_FLAGS", "be16"),
 ("OVS_KEY_ATTR_DP_HASH", "uint32"),
@@ -1269,6 +1269,163 @@ class ovskey(nla):
 init=init,
 )
 
+class ovs_key_tunnel(nla):
+nla_flags = NLA_F_NESTED
+
+nla_map = (
+("OVS_TUNNEL_KEY_ATTR_ID", "be64"),
+("OVS_TUNNEL_KEY_ATTR_IPV4_SRC", "ipaddr"),
+("OVS_TUNNEL_KEY_ATTR_IPV4_DST", "ipaddr"),
+("OVS_TUNNEL_KEY_ATTR_TOS", "uint8"),
+("OVS_TUNNEL_KEY_ATTR_TTL", "uint8"),
+("OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT", "flag"),
+("OVS_TUNNEL_KEY_ATTR_CSUM", "flag"),
+("OVS_TUNNEL_KEY_ATTR_OAM", "flag"),
+("OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS", "array(uint32)"),
+("OVS_TUNNEL_KEY_ATTR_TP_SRC", "be16"),
+("OVS_TUNNEL_KEY_ATTR_TP_DST", "be16"),
+("OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS", "none"),
+("OVS_TUNNEL_KEY_ATTR_IPV6_SRC", "ipaddr"),
+("OVS_TUNNEL_KEY_ATTR_IPV6_DST", "ipaddr"),
+("OVS_TUNNEL_KEY_ATTR_PAD", "none"),
+("OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS", "none"),
+("OVS_TUNNEL_KEY_ATTR_IPV4_INFO_BRIDGE", "flag"),
+)
+
+def parse(self, flowstr, mask=None):
+if not flowstr.startswith("tunnel("):
+return None, None
+
+k = ovskey.ovs_key_tunnel()
+if mask is not None:
+mask = ovskey.ovs_key_tunnel()
+
+flowstr = flowstr[len("tunnel("):]
+
+v6_address = None
+
+fields = [
+("tun_id=", r"(\d+)", int, "OVS_TUNNEL_KEY_ATTR_ID",
+ 0x, None, None),
+
+("src=", r"([0-9a-fA-F\.]+)", str,
+ "OVS_TUNNEL_KEY_ATTR_IPV4_SRC", "255.255.255.255", "0.0.0.0",
+ False),
+("dst=", r"([0-9a-fA-F\.]+)", str,
+ "OVS_TUNNEL_KEY_ATTR_IPV4_DST", "255.255.255.255", "0.0.0.0",
+ False),
+
+("ipv6_src=", r"([0-9a-fA-F:]+)", str,
+ "OVS_TUNNEL_KEY_ATTR_IPV6_SRC",
+ ":::::::", "::", True),
+("ipv6_dst=", r"([0-9a-fA-F:]+)", str,
+ "OVS_TUNNEL_KEY_ATTR_IPV6_DST",
+ ":::::::", "::", True),
+
+("tos=", r"(\d+)", int, "OVS_TUNNEL_KEY_ATTR_TOS", 255, 0,
+ None),
+("ttl=", r"(\d+)", int, "OVS_TUNNEL_KEY_ATTR_TTL", 255, 0,
+ None),
+
+("tp_src=", r"(\d+)", int, "OVS_TUNNEL_KEY_ATTR_TP_SRC",
+ 65535, 0, None),
+("tp_dst=", r"(\d+)", int, "OVS_TUNNEL_KEY_ATTR_TP_DST",
+ 65535, 0, None),
+]
+
+forced_include = ["OVS_TUNNEL_KEY_ATTR_TTL"]
+
+for prefix, regex, typ, attr_name, mask_val, default_val, v46_flag 
in fields:
+flowstr, value = parse_extract_field(flowstr, prefix, regex, 
typ, False)
+if not attr_name:
+raise Exception("Bad list value in tunnel fields")
+
+if value is None and attr_name in forced_include:
+value = default_val
+mask_val = default_val
+
+if value is not None:
+if v46_flag is not None:
+if v6_address is None:
+v6_address = v46_flag
+if v46_flag != v6_address:
+raise ValueError("Cannot mix v6 and v4 addresses")
+k["attrs"].append([attr_name, value])
+if mask is not None:
+mask["attrs"].append([attr_name, mask_val])
+else:
+if v46_flag is not None:
+if v6_address is None or v46_flag != v6_address:
+   

[ovs-dev] [PATCH net-next 3/7] selftests: openvswitch: Add set() and set_masked() support.

2024-06-17 Thread Aaron Conole
These will be used in upcoming commits to set specific attributes for
interacting with tunnels.  Since set() will use the key parsing routine, we
also make sure to prepend it with an open paren, for the action parsing to
properly understand it.

Reviewed-by: Simon Horman 
Tested-by: Simon Horman 
Signed-off-by: Aaron Conole 
---
 .../selftests/net/openvswitch/ovs-dpctl.py| 37 +--
 1 file changed, 34 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py 
b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
index 4db20b38b481..4c235ff07aeb 100644
--- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
+++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
@@ -284,7 +284,7 @@ class ovsactions(nla):
 ("OVS_ACTION_ATTR_UNSPEC", "none"),
 ("OVS_ACTION_ATTR_OUTPUT", "uint32"),
 ("OVS_ACTION_ATTR_USERSPACE", "userspace"),
-("OVS_ACTION_ATTR_SET", "none"),
+("OVS_ACTION_ATTR_SET", "ovskey"),
 ("OVS_ACTION_ATTR_PUSH_VLAN", "none"),
 ("OVS_ACTION_ATTR_POP_VLAN", "flag"),
 ("OVS_ACTION_ATTR_SAMPLE", "none"),
@@ -292,7 +292,7 @@ class ovsactions(nla):
 ("OVS_ACTION_ATTR_HASH", "none"),
 ("OVS_ACTION_ATTR_PUSH_MPLS", "none"),
 ("OVS_ACTION_ATTR_POP_MPLS", "flag"),
-("OVS_ACTION_ATTR_SET_MASKED", "none"),
+("OVS_ACTION_ATTR_SET_MASKED", "ovskey"),
 ("OVS_ACTION_ATTR_CT", "ctact"),
 ("OVS_ACTION_ATTR_TRUNC", "uint32"),
 ("OVS_ACTION_ATTR_PUSH_ETH", "none"),
@@ -469,6 +469,18 @@ class ovsactions(nla):
 print_str += "clone("
 print_str += datum.dpstr(more)
 print_str += ")"
+elif field[0] == "OVS_ACTION_ATTR_SET" or \
+ field[0] == "OVS_ACTION_ATTR_SET_MASKED":
+print_str += "set"
+field = datum
+mask = None
+if field[0] == "OVS_ACTION_ATTR_SET_MASKED":
+print_str += "_masked"
+field = datum[0]
+mask = datum[1]
+print_str += "("
+print_str += field.dpstr(mask, more)
+print_str += ")"
 else:
 try:
 print_str += datum.dpstr(more)
@@ -547,6 +559,25 @@ class ovsactions(nla):
 self["attrs"].append(("OVS_ACTION_ATTR_CLONE", subacts))
 actstr = actstr[parsedLen:]
 parsed = True
+elif parse_starts_block(actstr, "set(", False):
+parencount += 1
+k = ovskey()
+actstr = actstr[len("set("):]
+actstr = k.parse(actstr, None)
+self["attrs"].append(("OVS_ACTION_ATTR_SET", k))
+if not actstr.startswith(")"):
+actstr = ")" + actstr
+parsed = True
+elif parse_starts_block(actstr, "set_masked(", False):
+parencount += 1
+k = ovskey()
+m = ovskey()
+actstr = actstr[len("set_masked("):]
+actstr = k.parse(actstr, m)
+self["attrs"].append(("OVS_ACTION_ATTR_SET_MASKED", [k, m]))
+if not actstr.startswith(")"):
+actstr = ")" + actstr
+parsed = True
 elif parse_starts_block(actstr, "ct(", False):
 parencount += 1
 actstr = actstr[len("ct(") :]
@@ -1312,7 +1343,7 @@ class ovskey(nla):
 mask["attrs"].append([field[0], m])
 self["attrs"].append([field[0], k])
 
-flowstr = flowstr[strspn(flowstr, "),") :]
+flowstr = flowstr[strspn(flowstr, "), ") :]
 
 return flowstr
 
-- 
2.45.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next 1/7] selftests: openvswitch: Support explicit tunnel port creation.

2024-06-17 Thread Aaron Conole
The OVS module can operate in conjunction with various types of
tunnel ports.  These are created as either explicit tunnel vport
types, OR by creating a tunnel interface which acts as an anchor
for the lightweight tunnel support.

This patch adds the ability to add tunnel ports to an OVS
datapath for testing various scenarios with tunnel ports.  With
this addition, the vswitch "plumbing" will at least be able to
push packets around using the tunnel vports.  Future patches
will add support for setting required tunnel metadata for lwts
in the datapath.  The end goal will be to push packets via these
tunnels, and will be used in an upcoming commit for testing the
path MTU.

Reviewed-by: Simon Horman 
Tested-by: Simon Horman 
Signed-off-by: Aaron Conole 
---
 .../selftests/net/openvswitch/ovs-dpctl.py| 81 +--
 1 file changed, 75 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py 
b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
index 1dd057afd3fb..8f92215303a3 100644
--- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
+++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
@@ -10,6 +10,7 @@ import ipaddress
 import logging
 import multiprocessing
 import re
+import socket
 import struct
 import sys
 import time
@@ -29,6 +30,7 @@ try:
 from pyroute2.netlink.exceptions import NetlinkError
 from pyroute2.netlink.generic import GenericNetlinkSocket
 import pyroute2
+import pyroute2.iproute
 
 except ModuleNotFoundError:
 print("Need to install the python pyroute2 package >= 0.6.")
@@ -1617,7 +1619,7 @@ class OvsVport(GenericNetlinkSocket):
 ("OVS_VPORT_ATTR_PORT_NO", "uint32"),
 ("OVS_VPORT_ATTR_TYPE", "uint32"),
 ("OVS_VPORT_ATTR_NAME", "asciiz"),
-("OVS_VPORT_ATTR_OPTIONS", "none"),
+("OVS_VPORT_ATTR_OPTIONS", "vportopts"),
 ("OVS_VPORT_ATTR_UPCALL_PID", "array(uint32)"),
 ("OVS_VPORT_ATTR_STATS", "vportstats"),
 ("OVS_VPORT_ATTR_PAD", "none"),
@@ -1625,6 +1627,13 @@ class OvsVport(GenericNetlinkSocket):
 ("OVS_VPORT_ATTR_NETNSID", "uint32"),
 )
 
+class vportopts(nla):
+nla_map = (
+("OVS_TUNNEL_ATTR_UNSPEC", "none"),
+("OVS_TUNNEL_ATTR_DST_PORT", "uint16"),
+("OVS_TUNNEL_ATTR_EXTENSION", "none"),
+)
+
 class vportstats(nla):
 fields = (
 ("rx_packets", "=Q"),
@@ -1693,7 +1702,7 @@ class OvsVport(GenericNetlinkSocket):
 raise ne
 return reply
 
-def attach(self, dpindex, vport_ifname, ptype):
+def attach(self, dpindex, vport_ifname, ptype, dport, lwt):
 msg = OvsVport.ovs_vport_msg()
 
 msg["cmd"] = OVS_VPORT_CMD_NEW
@@ -1702,12 +1711,43 @@ class OvsVport(GenericNetlinkSocket):
 msg["dpifindex"] = dpindex
 port_type = OvsVport.str_to_type(ptype)
 
-msg["attrs"].append(["OVS_VPORT_ATTR_TYPE", port_type])
 msg["attrs"].append(["OVS_VPORT_ATTR_NAME", vport_ifname])
 msg["attrs"].append(
 ["OVS_VPORT_ATTR_UPCALL_PID", [self.upcall_packet.epid]]
 )
 
+TUNNEL_DEFAULTS = [("geneve", 6081),
+   ("vxlan", 4789)]
+
+for tnl in TUNNEL_DEFAULTS:
+if ptype == tnl[0]:
+if not dport:
+dport = tnl[1]
+
+if not lwt:
+vportopt = OvsVport.ovs_vport_msg.vportopts()
+vportopt["attrs"].append(
+["OVS_TUNNEL_ATTR_DST_PORT", socket.htons(dport)]
+)
+msg["attrs"].append(
+["OVS_VPORT_ATTR_OPTIONS", vportopt]
+)
+else:
+port_type = OvsVport.OVS_VPORT_TYPE_NETDEV
+ipr = pyroute2.iproute.IPRoute()
+
+if tnl[0] == "geneve":
+ipr.link("add", ifname=vport_ifname, kind=tnl[0],
+ geneve_port=dport,
+ geneve_collect_metadata=True,
+ geneve_udp_zero_csum6_rx=1)
+elif tnl[0] == "vxlan":
+ipr.link("add", ifname=vport_ifname, kind=tnl[0],
+ vxlan_learning=0, vxlan_collect_metadata=1,
+ vxlan_udp_zero_csum6_rx=1, vxlan_port=dport)
+break
+msg["attrs"].append(["OVS_VPORT_ATTR_TYPE", port_type])
+
 try:
 reply = self.nlm_request(
 msg, msg_type=self.prid, msg_flags=NLM_F_REQUEST | NLM_F_ACK
@@ -2053,12 +2093,19 @@ def print_ovsdp_full(dp_lookup_rep, ifindex, ndb=NDB(), 
vpl=OvsVport()):
 for iface in ndb.interfaces:
 rep = vpl.info(iface.ifname, ifindex)
 if rep is not None:
+opts 

[ovs-dev] [PATCH net-next 2/7] selftests: openvswitch: Refactor actions parsing.

2024-06-17 Thread Aaron Conole
Until recently, the ovs-dpctl utility was used with a limited actions set
and didn't need to have support for multiple similar actions.  However,
when adding support for tunnels, it will be important to support multiple
set() actions in a single flow.  When printing these actions, the existing
code will be unable to print all of the sets - it will only print the
first.

Refactor this code to be easier to read and support multiple actions of the
same type in an action list.

Reviewed-by: Simon Horman 
Tested-by: Simon Horman 
Signed-off-by: Aaron Conole 
---
 .../selftests/net/openvswitch/ovs-dpctl.py| 45 ++-
 1 file changed, 23 insertions(+), 22 deletions(-)

diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py 
b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
index 8f92215303a3..4db20b38b481 100644
--- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
+++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
@@ -439,32 +439,30 @@ class ovsactions(nla):
 def dpstr(self, more=False):
 print_str = ""
 
-for field in self.nla_map:
+for field in self["attrs"]:
 if field[1] == "none" or self.get_attr(field[0]) is None:
 continue
 if print_str != "":
 print_str += ","
 
-if field[1] == "uint32":
-if field[0] == "OVS_ACTION_ATTR_OUTPUT":
-print_str += "%d" % int(self.get_attr(field[0]))
-elif field[0] == "OVS_ACTION_ATTR_RECIRC":
-print_str += "recirc(0x%x)" % int(self.get_attr(field[0]))
-elif field[0] == "OVS_ACTION_ATTR_TRUNC":
-print_str += "trunc(%d)" % int(self.get_attr(field[0]))
-elif field[0] == "OVS_ACTION_ATTR_DROP":
-print_str += "drop(%d)" % int(self.get_attr(field[0]))
-elif field[1] == "flag":
-if field[0] == "OVS_ACTION_ATTR_CT_CLEAR":
-print_str += "ct_clear"
-elif field[0] == "OVS_ACTION_ATTR_POP_VLAN":
-print_str += "pop_vlan"
-elif field[0] == "OVS_ACTION_ATTR_POP_ETH":
-print_str += "pop_eth"
-elif field[0] == "OVS_ACTION_ATTR_POP_NSH":
-print_str += "pop_nsh"
-elif field[0] == "OVS_ACTION_ATTR_POP_MPLS":
-print_str += "pop_mpls"
+if field[0] == "OVS_ACTION_ATTR_OUTPUT":
+print_str += "%d" % int(self.get_attr(field[0]))
+elif field[0] == "OVS_ACTION_ATTR_RECIRC":
+print_str += "recirc(0x%x)" % int(self.get_attr(field[0]))
+elif field[0] == "OVS_ACTION_ATTR_TRUNC":
+print_str += "trunc(%d)" % int(self.get_attr(field[0]))
+elif field[0] == "OVS_ACTION_ATTR_DROP":
+print_str += "drop(%d)" % int(self.get_attr(field[0]))
+elif field[0] == "OVS_ACTION_ATTR_CT_CLEAR":
+print_str += "ct_clear"
+elif field[0] == "OVS_ACTION_ATTR_POP_VLAN":
+print_str += "pop_vlan"
+elif field[0] == "OVS_ACTION_ATTR_POP_ETH":
+print_str += "pop_eth"
+elif field[0] == "OVS_ACTION_ATTR_POP_NSH":
+print_str += "pop_nsh"
+elif field[0] == "OVS_ACTION_ATTR_POP_MPLS":
+print_str += "pop_mpls"
 else:
 datum = self.get_attr(field[0])
 if field[0] == "OVS_ACTION_ATTR_CLONE":
@@ -472,7 +470,10 @@ class ovsactions(nla):
 print_str += datum.dpstr(more)
 print_str += ")"
 else:
-print_str += datum.dpstr(more)
+try:
+print_str += datum.dpstr(more)
+except:
+print_str += "{ATTR: %s not decoded}" % field[0]
 
 return print_str
 
-- 
2.45.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next 0/7] selftests: net: Switch pmtu.sh to use the internal ovs script.

2024-06-17 Thread Aaron Conole
Currently, if a user wants to run pmtu.sh and cover all the provided test
cases, they need to install the Open vSwitch userspace utilities.  This
dependency is difficult for users as well as CI environments, because the
userspace build and setup may require lots of support and devel packages
to be installed, system setup to be correct, and things like permissions
and selinux policies to be properly configured.

The kernel selftest suite includes an ovs-dpctl.py utility which can
interact with the openvswitch module directly.  This lets developers and
CI environments run without needing too many extra dependencies - just
the pyroute2 python package.

This series enhances the ovs-dpctl utility to provide support for set()
and tunnel() flow specifiers, better ipv6 handling support, and the
ability to add tunnel vports, and LWT interfaces.  Finally, it modifies
the pmtu.sh script to call the ovs-dpctl.py utility rather than the
typical OVS userspace utilities.

Aaron Conole (7):
  selftests: openvswitch: Support explicit tunnel port creation.
  selftests: openvswitch: Refactor actions parsing.
  selftests: openvswitch: Add set() and set_masked() support.
  selftests: openvswitch: Add support for tunnel() key.
  selftests: openvswitch: Support implicit ipv6 arguments.
  selftests: net: Use the provided dpctl rather than the vswitchd for
tests.
  selftests: net: add config for openvswitch

 tools/testing/selftests/net/config|   5 +
 .../selftests/net/openvswitch/ovs-dpctl.py| 372 +++---
 tools/testing/selftests/net/pmtu.sh   | 145 +--
 3 files changed, 453 insertions(+), 69 deletions(-)

-- 
2.45.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC net-next 4/7] selftests: openvswitch: Add support for tunnel() key.

2024-06-17 Thread Aaron Conole
Simon Horman  writes:

> On Thu, Jun 13, 2024 at 02:13:30PM -0400, Aaron Conole wrote:
>> This will be used when setting details about the tunnel to use as
>> transport.  There is a difference between the ODP format between tunnel():
>> the 'key' flag is not actually a flag field, so we don't support it in the
>> same way that the vswitchd userspace supports displaying it.
>> 
>> Signed-off-by: Aaron Conole 
>
> ...
>
>> @@ -1265,6 +1265,165 @@ class ovskey(nla):
>>  init=init,
>>  )
>>  
>> +class ovs_key_tunnel(nla):
>> +nla_flags = NLA_F_NESTED
>> +
>> +nla_map = (
>> +("OVS_TUNNEL_KEY_ATTR_ID", "be64"),
>> +("OVS_TUNNEL_KEY_ATTR_IPV4_SRC", "ipaddr"),
>> +("OVS_TUNNEL_KEY_ATTR_IPV4_DST", "ipaddr"),
>> +("OVS_TUNNEL_KEY_ATTR_TOS", "uint8"),
>> +("OVS_TUNNEL_KEY_ATTR_TTL", "uint8"),
>> +("OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT", "flag"),
>> +("OVS_TUNNEL_KEY_ATTR_CSUM", "flag"),
>> +("OVS_TUNNEL_KEY_ATTR_OAM", "flag"),
>> +("OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS", "array(uint32)"),
>> +("OVS_TUNNEL_KEY_ATTR_TP_SRC", "be16"),
>> +("OVS_TUNNEL_KEY_ATTR_TP_DST", "be16"),
>> +("OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS", "none"),
>> +("OVS_TUNNEL_KEY_ATTR_IPV6_SRC", "ipaddr"),
>> +("OVS_TUNNEL_KEY_ATTR_IPV6_DST", "ipaddr"),
>> +("OVS_TUNNEL_KEY_ATTR_PAD", "none"),
>> +("OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS", "none"),
>> +("OVS_TUNNEL_KEY_ATTR_IPV4_INFO_BRIDGE", "flag"),
>> +)
>> +
>> +def parse(self, flowstr, mask=None):
>> +if not flowstr.startswith("tunnel("):
>> +return None, None
>> +
>> +k = ovskey.ovs_key_tunnel()
>> +if mask is not None:
>> +mask = ovskey.ovs_key_tunnel()
>> +
>> +flowstr = flowstr[len("tunnel("):]
>> +
>> +v6_address = None
>> +
>> +fields = [
>> +("tun_id=", r"(\d+)", int, "OVS_TUNNEL_KEY_ATTR_ID",
>> + 0x, None, None),
>> +
>> +("src=", r"([0-9a-fA-F\.]+)", str,
>> + "OVS_TUNNEL_KEY_ATTR_IPV4_SRC", "255.255.255.255", 
>> "0.0.0.0",
>> + False),
>> +("dst=", r"([0-9a-fA-F\.]+)", str,
>> + "OVS_TUNNEL_KEY_ATTR_IPV4_DST", "255.255.255.255", 
>> "0.0.0.0",
>> + False),
>> +
>> +("ipv6_src=", r"([0-9a-fA-F:]+)", str,
>> + "OVS_TUNNEL_KEY_ATTR_IPV6_SRC",
>> + ":::::::", "::", True),
>> +("ipv6_dst=", r"([0-9a-fA-F:]+)", str,
>> + "OVS_TUNNEL_KEY_ATTR_IPV6_DST",
>> + ":::::::", "::", True),
>> +
>> +("tos=", r"(\d+)", int, "OVS_TUNNEL_KEY_ATTR_TOS", 255, 0,
>> + None),
>> +("ttl=", r"(\d+)", int, "OVS_TUNNEL_KEY_ATTR_TTL", 255, 0,
>> + None),
>> +
>> +("tp_src=", r"(\d+)", int, "OVS_TUNNEL_KEY_ATTR_TP_SRC",
>> + 65535, 0, None),
>> +("tp_dst=", r"(\d+)", int, "OVS_TUNNEL_KEY_ATTR_TP_DST",
>> + 65535, 0, None),
>> +]
>> +
>> +forced_include = ["OVS_TUNNEL_KEY_ATTR_TTL"]
>> +
>> +for prefix, regex, typ, attr_name, mask_val, default_val, 
>> v46_flag in fields:
>> +flowstr, value = parse_extract_field(flowstr, prefix, 
>> regex, typ, False)
>> +if not attr_name:
>> +raise Exception("Bad list value in tunnel fields")
>> +
>> +if value is None and attr_name in forced_include:
>> +value = default_val
>> +mask_val = default_val
>> +
>> +if value is not None:
>> +if v6_address is None and v46_flag is not None:
>> +v6_address = v46_flag
>
> By my reading, at this point v6_address will only be None if v46_flag is
> not None.  IF so, the condition below seems excessive.

Agreed - thanks for the suggestions.

>> +if v6_address is not None and v46_flag is not None \
>> +   and v46_flag != v6_address:
>> +raise ValueError("Cannot mix v6 and v4 addresses")
>
> I wonder if we can instead express this as (completely untested!):
>
> if v46_flag is not None:
> if v6_address is None:
> v6_address = v46_flag
> if v46_flag != v6_address:
> raise ValueError("Cannot mix v6 and v4 addresses")
>
>> +k["attrs"].append([attr_name, value])
>> +if mask is not None:
>> +mask["attrs"].append([attr_name, 

Re: [ovs-dev] [PATCH v2] ovsdb: Use table indexes if available for ovsdb_query().

2024-06-17 Thread Mike Pattrick
On Mon, Jun 17, 2024 at 12:54 PM Mike Pattrick  wrote:
>
> Currently all OVSDB database queries except for UUID lookups all result
> in linear lookups over the entire table, even if an index is present.
>
> This patch modifies ovsdb_query() to attempt an index lookup first, if
> possible. If no matching indexes are present then a linear index is
> still conducted.
>
> To test this, I set up an ovsdb database with a variable number of rows
> and timed the average of how long ovsdb-client took to query a single
> row. The first two tests involved a linear scan that didn't match any
> rows, so there was no overhead associated with sending or encoding
> output. The post-patch linear scan was a worst case scenario where the
> table did have an appropriate index but the conditions made its usage
> impossible. The indexed lookup test was for a matching row, which did
> also include overhead associated with a match. The results are included
> in the table below.
>
> Rows   | 100k | 200k | 300k | 400k | 500k
> ---+--+--+--+--+-
> Pre-patch linear scan  |  9ms | 24ms | 37ms | 49ms | 61ms
> Post-patch linear scan |  9ms | 24ms | 38ms | 49ms | 61ms
> Indexed lookup |  3ms |  3ms |  3ms |  3ms |  3ms
>
> I also tested the performance of ovsdb_query() by wrapping it in a loop
> and measuring the time it took to perform 1000 linear scans on 1, 10,
> 100k, and 200k rows. This test showed that the new index checking code
> did not slow down worst case lookups to a statistically detectable
> degree.
>
> Reported-at: https://issues.redhat.com/browse/FDP-590
> Signed-off-by: Mike Pattrick 
>
> ---
>
> v2:
>  - Included txn in index code
>  - Added benchmarks
>  - Refactored code
>  - Added more tests
>  - Now a mock row is created to perform the search with standard
>  functions
> Signed-off-by: Mike Pattrick 

Recheck-request: github-robot

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] ovsdb: Use table indexes if available for ovsdb_query().

2024-06-17 Thread Mike Pattrick
On Mon, Jun 3, 2024 at 2:01 PM Ilya Maximets  wrote:
>
> On 6/3/24 06:20, Mike Pattrick wrote:
> > Currently all OVSDB database queries except for UUID lookups all result
> > in linear lookups over the entire table, even if an index is present.
> >
> > This patch modifies ovsdb_query() to attempt an index lookup first, if
> > possible. If no matching indexes are present then a linear index is
> > still conducted.
> >
> > Reported-at: https://issues.redhat.com/browse/FDP-590
> > Signed-off-by: Mike Pattrick 
> > ---
> >  NEWS |   3 ++
> >  ovsdb/query.c| 102 +++
> >  ovsdb/row.h  |  28 +++
> >  ovsdb/transaction.c  |  27 ---
> >  tests/ovsdb-execution.at |  34 -
> >  tests/ovsdb-server.at|   2 +-
> >  tests/ovsdb-tool.at  |   2 +-
> >  7 files changed, 159 insertions(+), 39 deletions(-)
>
> Hi, Mike.  Thanks for the patch.
>
> Besides what Simon asked, the patch has a few other issues:
>
> 1. Lookup is performed only on the committed index and it doesn't include
>rows that are in-flight in the current transaction.
>
>Unlike rows in a hash table, indexes are updated only after the whole
>transaction is committed.  With this change we'll not be able to find
>newly added rows.
>
>Another thing related to this is that it is allowed to have duplicates
>within a transaction as long as they are removed before the transaction
>ends.  So it is possible that multiple rows will satisfy the condition
>on indexed columns while the transaction is in-flight.
>
>Consider the following commands executed in a sandbox:
>
># ovs-vsctl set-manager "tcp:my-first-target"
># ovsdb-client transact unix:$(pwd)/sandbox/db.sock '
>["Open_vSwitch",
> {"op": "select",
>  "table": "Manager",
>  "columns": ["_uuid", "target"],
>  "where": [["target", "==", "tcp:my-first-target"]]},
> {"op": "insert",
>  "table": "Manager",
>  "uuid-name": "duplicate",
>  "row": {"target": "tcp:my-first-target"}},
> {"op": "select",
>  "table": "Manager",
>  "columns": ["_uuid", "target"],
>  "where": [["target", "==", "tcp:my-first-target"]]},
> {"op": "delete",
>  "table": "Manager",
>  "where":[["_uuid","==",["named-uuid","duplicate"]]]},
> {"op": "select",
>  "table": "Manager",
>  "columns": ["_uuid", "target"],
>  "where": [["target", "==", "tcp:my-first-target"]]}]'
>
>Transaction must succeed.  The first selection should return 1 row,
>the second should return both duplicates and the third should again
>return one row.

This is a good point, I hadn't anticipated this use-case but it does
have a large impact on this change. After working through a few
implementations, I wasn't able to find a solution that wasn't overly
complex. For the next version, I've instead opted to exclude indexed
lookups from transactions that modify the associated row.

The next version should address this and the other feedback.

Cheers,
M

>
>Ideally, implementation should not leak the transaction details to
>the query module, though I'm not sure if that is 100% achievable.
>
> 2. Taking above case into account, this change needs way more unit tests
>with different order of operations and complex data updates.
>
> 3. Since this is a performance-oriented change, please, include some
>performance numbers in the commit message as well, including impact
>on non-indexed lookups, if any.
>
> 4. There seems to be a lot of logic overlap with existing functions like
>ovsdb_condition_match_every_clause(), ovsdb_index_search() and
>ovsdb_row_hash_columns().  Can we re-use those instead?  For example,
>by creating a row from the conditions before the lookup?  What a
>performance impact will look like?
>
> Best regards, Ilya Maximets.
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC net-next 3/7] selftests: openvswitch: Add set() and set_masked() support.

2024-06-17 Thread Aaron Conole
Adrián Moreno  writes:

> On Thu, Jun 13, 2024 at 02:13:29PM GMT, Aaron Conole wrote:
>> These will be used in upcoming commits to set specific attributes for
>> interacting with tunnels.  Since set() will use the key parsing routine, we
>> also make sure to prepend it with an open paren, for the action parsing to
>> properly understand it.
>>
>> Signed-off-by: Aaron Conole 
>> ---
>>  .../selftests/net/openvswitch/ovs-dpctl.py| 39 +--
>>  1 file changed, 35 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py 
>> b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
>> index 73768f3af6e5..fee64c31d4d4 100644
>> --- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
>> +++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
>> @@ -284,7 +284,7 @@ class ovsactions(nla):
>>  ("OVS_ACTION_ATTR_UNSPEC", "none"),
>>  ("OVS_ACTION_ATTR_OUTPUT", "uint32"),
>>  ("OVS_ACTION_ATTR_USERSPACE", "userspace"),
>> -("OVS_ACTION_ATTR_SET", "none"),
>> +("OVS_ACTION_ATTR_SET", "ovskey"),
>>  ("OVS_ACTION_ATTR_PUSH_VLAN", "none"),
>>  ("OVS_ACTION_ATTR_POP_VLAN", "flag"),
>>  ("OVS_ACTION_ATTR_SAMPLE", "none"),
>> @@ -292,7 +292,7 @@ class ovsactions(nla):
>>  ("OVS_ACTION_ATTR_HASH", "none"),
>>  ("OVS_ACTION_ATTR_PUSH_MPLS", "none"),
>>  ("OVS_ACTION_ATTR_POP_MPLS", "flag"),
>> -("OVS_ACTION_ATTR_SET_MASKED", "none"),
>> +("OVS_ACTION_ATTR_SET_MASKED", "ovskey"),
>>  ("OVS_ACTION_ATTR_CT", "ctact"),
>>  ("OVS_ACTION_ATTR_TRUNC", "uint32"),
>>  ("OVS_ACTION_ATTR_PUSH_ETH", "none"),
>> @@ -469,6 +469,14 @@ class ovsactions(nla):
>>  print_str += "clone("
>>  print_str += datum.dpstr(more)
>>  print_str += ")"
>> +elif field[0] == "OVS_ACTION_ATTR_SET" or \
>> + field[0] == "OVS_ACTION_ATTR_SET_MASKED":
>> +print_str += "set"
>> +if field[0] == "OVS_ACTION_ATTR_SET_MASKED":
>> +print_str += "_masked"
>> +print_str += "("
>> +print_str += datum.dpstr(more)
>> +print_str += ")"
>>  else:
>>  try:
>>  print_str += datum.dpstr(more)
>> @@ -547,6 +555,25 @@ class ovsactions(nla):
>>  self["attrs"].append(("OVS_ACTION_ATTR_CLONE", subacts))
>>  actstr = actstr[parsedLen:]
>>  parsed = True
>> +elif parse_starts_block(actstr, "set(", False):
>> +parencount += 1
>> +k = ovskey()
>> +actstr = actstr[len("set("):]
>> +actstr = k.parse(actstr, None)
>> +self["attrs"].append(("OVS_ACTION_ATTR_SET", k))
>> +if not actstr.startswith(")"):
>> +actstr = ")" + actstr
>> +parsed = True
>> +elif parse_starts_block(actstr, "set_masked(", False):
>> +parencount += 1
>> +k = ovskey()
>> +m = ovskey()
>> +actstr = actstr[len("set_masked("):]
>> +actstr = k.parse(actstr, m)
>> +self["attrs"].append(("OVS_ACTION_ATTR_SET_MASKED", [k, m]))
>> +if not actstr.startswith(")"):
>> +actstr = ")" + actstr
>> +parsed = True
>>  elif parse_starts_block(actstr, "ct(", False):
>>  parencount += 1
>>  actstr = actstr[len("ct(") :]
>> @@ -1312,7 +1339,7 @@ class ovskey(nla):
>>  mask["attrs"].append([field[0], m])
>>  self["attrs"].append([field[0], k])
>>
>> -flowstr = flowstr[strspn(flowstr, "),") :]
>> +flowstr = flowstr[strspn(flowstr, "), ") :]
>>
>>  return flowstr
>>
>> @@ -1898,7 +1925,11 @@ class OvsFlow(GenericNetlinkSocket):
>>  ):
>>  print_str += "drop"
>>  else:
>> -print_str += actsmsg.dpstr(more)
>> +if type(actsmsg) == "list":
>
> nit: I belive the recommended way of comparing types is using
> "isinstance":
>
> https://www.flake8rules.com/rules/E721.html
>
> Also, I don't see what can make actmsg be a list. It should always be an
> instance of "ovsactions", right?

Yes, you're right.  This was some debug code that I was messing with and
it made it into this submission.  I've dropped it :)  Thanks for the review!

>
>> +for act in actsmsg:
>> +print_str += act.dpstr(more)
>> +else:
>> +print_str += actsmsg.dpstr(more)
>>
>>  return print_str
>>
>> --
>> 2.45.1
>>

___
dev mailing list
d...@openvswitch.org

Re: [ovs-dev] [RFC net-next 5/7] selftests: openvswitch: Support implicit ipv6 arguments.

2024-06-17 Thread Aaron Conole
Simon Horman  writes:

> On Thu, Jun 13, 2024 at 02:13:31PM -0400, Aaron Conole wrote:
>> The current iteration of IPv6 support requires explicit fields to be set
>> in addition to not properly support the actual IPv6 addresses properly.
>> With this change, make it so that the ipv6() bare option is usable to
>> create wildcarded flows to match broad swaths of ipv6 traffic.
>> 
>> Signed-off-by: Aaron Conole 
>> ---
>>  .../selftests/net/openvswitch/ovs-dpctl.py| 43 ---
>>  1 file changed, 28 insertions(+), 15 deletions(-)
>> 
>> diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py 
>> b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
>> index 5545e5cab1d6..2577a06c58cf 100644
>> --- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
>> +++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
>> @@ -200,6 +200,19 @@ def convert_ipv4(data):
>>  
>>  return int(ipaddress.IPv4Address(ip)), int(ipaddress.IPv4Address(mask))
>>  
>> +def convert_ipv6(data):
>> +ip, _, mask = data.partition('/')
>> +
>> +if not ip:
>> +ip = mask = 0
>> +elif not mask:
>> +mask = ':::::::'
>> +elif mask.isdigit():
>> +mask = ipaddress.IPv6Network("::/" + mask).hostmask
>> +
>> +return ipaddress.IPv6Address(ip).packed, 
>> ipaddress.IPv6Address(mask).packed
>> +
>> +
>
> nit: Perhaps one blank line is enough

Sure - dropped.

>>  def convert_int(size):
>>  def convert_int_sized(data):
>>  value, _, mask = data.partition('/')
>
> ...
>
> The nit above notwithstanding, this patch looks good to me.
>
> Reviewed-by: Simon Horman 
> Tested-by: Simon Horman 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC net-next 1/7] selftests: openvswitch: Support explicit tunnel port creation.

2024-06-17 Thread Aaron Conole
Simon Horman  writes:

> On Thu, Jun 13, 2024 at 02:13:27PM -0400, Aaron Conole wrote:
>> The OVS module can operate in conjunction with various types of
>> tunnel ports.  These are created as either explicit tunnel vport
>> types, OR by creating a tunnel interface which acts as an anchor
>> for the lightweight tunnel support.
>> 
>> This patch adds the ability to add tunnel ports to an OVS
>> datapath for testing various scenarios with tunnel ports.  With
>> this addition, the vswitch "plumbing" will at least be able to
>> push packets around using the tunnel vports.  Future patches
>> will add support for setting required tunnel metadata for lwts
>> in the datapath.  The end goal will be to push packets via these
>> tunnels, and will be used in an upcoming commit for testing the
>> path MTU.
>> 
>> Signed-off-by: Aaron Conole 
>
> ...
>
>> @@ -1702,12 +1711,43 @@ class OvsVport(GenericNetlinkSocket):
>>  msg["dpifindex"] = dpindex
>>  port_type = OvsVport.str_to_type(ptype)
>>  
>> -msg["attrs"].append(["OVS_VPORT_ATTR_TYPE", port_type])
>>  msg["attrs"].append(["OVS_VPORT_ATTR_NAME", vport_ifname])
>>  msg["attrs"].append(
>>  ["OVS_VPORT_ATTR_UPCALL_PID", [self.upcall_packet.epid]]
>>  )
>>  
>> +TUNNEL_DEFAULTS = [("geneve", 6081),
>> +   ("vxlan", 4798)]
>
> Hi Aaron,
>
> It is corrected as part of another patch in this series, but
> the correct port for vxlan is 4789 (i.e. 89 rather than 98).
>
> With that fixed, feel free to add:

Thanks Simon!  Done.

> Reviewed-by: Simon Horman 
> Tested-by: Simon Horman 
>
> ..

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] ovsdb: Use table indexes if available for ovsdb_query().

2024-06-17 Thread Mike Pattrick
Currently all OVSDB database queries except for UUID lookups all result
in linear lookups over the entire table, even if an index is present.

This patch modifies ovsdb_query() to attempt an index lookup first, if
possible. If no matching indexes are present then a linear index is
still conducted.

To test this, I set up an ovsdb database with a variable number of rows
and timed the average of how long ovsdb-client took to query a single
row. The first two tests involved a linear scan that didn't match any
rows, so there was no overhead associated with sending or encoding
output. The post-patch linear scan was a worst case scenario where the
table did have an appropriate index but the conditions made its usage
impossible. The indexed lookup test was for a matching row, which did
also include overhead associated with a match. The results are included
in the table below.

Rows   | 100k | 200k | 300k | 400k | 500k
---+--+--+--+--+-
Pre-patch linear scan  |  9ms | 24ms | 37ms | 49ms | 61ms
Post-patch linear scan |  9ms | 24ms | 38ms | 49ms | 61ms
Indexed lookup |  3ms |  3ms |  3ms |  3ms |  3ms

I also tested the performance of ovsdb_query() by wrapping it in a loop
and measuring the time it took to perform 1000 linear scans on 1, 10,
100k, and 200k rows. This test showed that the new index checking code
did not slow down worst case lookups to a statistically detectable
degree.

Reported-at: https://issues.redhat.com/browse/FDP-590
Signed-off-by: Mike Pattrick 

---

v2:
 - Included txn in index code
 - Added benchmarks
 - Refactored code
 - Added more tests
 - Now a mock row is created to perform the search with standard
 functions
Signed-off-by: Mike Pattrick 
---
 ovsdb/execution.c|  20 +++--
 ovsdb/query.c| 174 +++
 ovsdb/query.h|   6 +-
 ovsdb/rbac.c |  15 ++--
 ovsdb/rbac.h |  10 ++-
 ovsdb/row.h  |  28 +++
 ovsdb/transaction.c  |  29 +--
 ovsdb/transaction.h  |   5 ++
 tests/ovsdb-execution.at | 108 +++-
 tests/ovsdb-macros.at|  10 +++
 tests/ovsdb-query.at |  18 ++--
 tests/ovsdb-server.at|   2 +-
 tests/ovsdb-tool.at  |   2 +-
 tests/test-ovsdb.c   |  15 +++-
 14 files changed, 363 insertions(+), 79 deletions(-)

diff --git a/ovsdb/execution.c b/ovsdb/execution.c
index f4cc9e802..212839bca 100644
--- a/ovsdb/execution.c
+++ b/ovsdb/execution.c
@@ -459,7 +459,7 @@ ovsdb_execute_select(struct ovsdb_execution *x, struct 
ovsdb_parser *parser,
 if (!error) {
 struct ovsdb_row_set rows = OVSDB_ROW_SET_INITIALIZER;
 
-ovsdb_query_distinct(table, , , );
+ovsdb_query_distinct(table, , , , x->txn);
 ovsdb_row_set_sort(, );
 json_object_put(result, "rows",
 ovsdb_row_set_to_json(, ));
@@ -545,8 +545,8 @@ ovsdb_execute_update(struct ovsdb_execution *x, struct 
ovsdb_parser *parser,
 ur.row = row;
 ur.columns = 
 if (ovsdb_rbac_update(x->db, table, , , x->role,
-  x->id)) {
-ovsdb_query(table, , update_row_cb, );
+  x->id, x->txn)) {
+ovsdb_query(table, , update_row_cb, , x->txn);
 } else {
 error = ovsdb_perm_error("RBAC rules for client \"%s\" role "
  "\"%s\" prohibit modification of "
@@ -626,7 +626,7 @@ ovsdb_execute_mutate(struct ovsdb_execution *x, struct 
ovsdb_parser *parser,
 json_integer_create(hmap_count(>rows)));
 } else {
 size_t row_count = 0;
-ovsdb_query(table, , count_row_cb, _count);
+ovsdb_query(table, , count_row_cb, _count, x->txn);
 json_object_put(result, "count",
 json_integer_create(row_count));
 }
@@ -636,8 +636,8 @@ ovsdb_execute_mutate(struct ovsdb_execution *x, struct 
ovsdb_parser *parser,
 mr.mutations = 
 mr.error = 
 if (ovsdb_rbac_mutate(x->db, table, , , x->role,
-  x->id)) {
-ovsdb_query(table, , mutate_row_cb, );
+  x->id, x->txn)) {
+ovsdb_query(table, , mutate_row_cb, , x->txn);
 } else {
 error = ovsdb_perm_error("RBAC rules for client \"%s\" role "
  "\"%s\" prohibit mutate operation on "
@@ -693,8 +693,9 @@ ovsdb_execute_delete(struct ovsdb_execution *x, struct 
ovsdb_parser *parser,
 dr.table = table;
 dr.txn = x->txn;
 
-if (ovsdb_rbac_delete(x->db, table, , x->role, x->id)) {
-ovsdb_query(table, , delete_row_cb, );
+if (ovsdb_rbac_delete(x->db, table, , x->role, x->id,
+  x->txn)) {
+ovsdb_query(table, , delete_row_cb, , x->txn);
 } else {
 error 

Re: [ovs-dev] [PATCH] odp-execute: Set IPv6 traffic class in AVX implementation.

2024-06-17 Thread Finn, Emma
> -Original Message-
> From: Mike Pattrick 
> Sent: Thursday, June 13, 2024 6:53 PM
> To: Finn, Emma 
> Cc: ovs-dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH] odp-execute: Set IPv6 traffic class in AVX
> implementation.
> 
> On Wed, Jun 12, 2024 at 6:44 AM Emma Finn  wrote:
> >
> > The AVX implementation for the IPv6 action did not set traffic class
> > field. Adding support for this field to the AVX implementation.
> >
> > Signed-off-by: Emma Finn 
> > Reported-by: Eelco Chaudron 
> > ---
> >  lib/odp-execute-avx512.c | 8 
> >  lib/packets.c| 2 +-
> >  lib/packets.h| 1 +
> >  3 files changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/odp-execute-avx512.c b/lib/odp-execute-avx512.c index
> > a74a85dc1..569ea789e 100644
> > --- a/lib/odp-execute-avx512.c
> > +++ b/lib/odp-execute-avx512.c
> > @@ -741,6 +741,14 @@ action_avx512_set_ipv6(struct dp_packet_batch
> *batch, const struct nlattr *a)
> >  }
> >  /* Write back the modified IPv6 addresses. */
> >  _mm512_mask_storeu_epi64((void *) nh, 0x1F, v_new_hdr);
> > +
> > +/* Scalar method for setting IPv6 tclass field. */
> > +if (key->ipv6_tclass) {
> > +uint8_t old_tc = ntohl(get_16aligned_be32(>ip6_flow)) >> 
> > 20;
> > +uint8_t key_tc = (key->ipv6_tclass |
> > + (old_tc & ~mask->ipv6_tclass));
> > +packet_set_ipv6_tc(>ip6_flow, key_tc);
> > +}
> 
> Hello,
> 
> I'm wondering if we also need to set the flow label?
> 
> Thanks,
> M
> 

Flow label is being handled okay by the AVX implementation. 
It was only the traffic class field that was causing issues. 

The shuffle mask was ignoring the traffic class field.
And since the traffic class is not byte aligned, it was too difficult
to reorder the shuffle mask. Hence, after the AVX implementation has
stored back the ipv6 entire header, we can use the scalar method at the end
to update the traffic class only.  

Thanks,
Emma

> >  }
> >  }
> >  #endif /* HAVE_AVX512VBMI */
> > diff --git a/lib/packets.c b/lib/packets.c index ebf516d67..91c28daf0
> > 100644
> > --- a/lib/packets.c
> > +++ b/lib/packets.c
> > @@ -1299,7 +1299,7 @@ packet_set_ipv6_flow_label(ovs_16aligned_be32
> *flow_label, ovs_be32 flow_key)
> >  put_16aligned_be32(flow_label, new_label);  }
> >
> > -static void
> > +void
> >  packet_set_ipv6_tc(ovs_16aligned_be32 *flow_label, uint8_t tc)  {
> >  ovs_be32 old_label = get_16aligned_be32(flow_label); diff --git
> > a/lib/packets.h b/lib/packets.h index 8b6994809..a102f8163 100644
> > --- a/lib/packets.h
> > +++ b/lib/packets.h
> > @@ -1635,6 +1635,7 @@ void packet_set_ipv6_addr(struct dp_packet
> *packet, uint8_t proto,
> >bool recalculate_csum);  void
> > packet_set_ipv6_flow_label(ovs_16aligned_be32 *flow_label,
> >  ovs_be32 flow_key);
> > +void packet_set_ipv6_tc(ovs_16aligned_be32 *flow_label, uint8_t tc);
> >  void packet_set_tcp_port(struct dp_packet *, ovs_be16 src, ovs_be16
> > dst);  void packet_set_udp_port(struct dp_packet *, ovs_be16 src,
> > ovs_be16 dst);  void packet_set_sctp_port(struct dp_packet *, ovs_be16
> > src, ovs_be16 dst);
> > --
> > 2.34.1
> >
> > ___
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [v2] odp-execute: Check IPv4 checksum offload flag in AVX.

2024-06-17 Thread Emma Finn
The AVX implementation for IPv4 action did not check whether
the IPv4 checksum offload flag has been set and was incorrectly
calculating checksums in software. Adding a check to skip AVX
checksum calculation when offload flags are set.

Signed-off-by: Emma Finn 
Reported-by: Eelco Chaudron 
---
 lib/odp-execute-avx512.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/odp-execute-avx512.c b/lib/odp-execute-avx512.c
index 569ea789e..54bd556e1 100644
--- a/lib/odp-execute-avx512.c
+++ b/lib/odp-execute-avx512.c
@@ -473,7 +473,7 @@ action_avx512_ipv4_set_addrs(struct dp_packet_batch *batch,
  * (v_pkt_masked). */
 __m256i v_new_hdr = _mm256_or_si256(v_key_shuf, v_pkt_masked);
 
-if (dp_packet_hwol_tx_ip_csum(packet)) {
+if (dp_packet_hwol_l3_ipv4(packet)) {
 dp_packet_ol_reset_ip_csum_good(packet);
 } else {
 ovs_be16 old_csum = ~nh->ip_csum;
-- 
2.34.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net] selftests: openvswitch: Use bash as interpreter

2024-06-17 Thread Przemek Kitszel

On 6/17/24 10:28, Simon Horman wrote:

openvswitch.sh makes use of substitutions of the form ${ns:0:1}, to
obtain the first character of $ns. Empirically, this is works with bash
but not dash. When run with dash these evaluate to an empty string and
printing an error to stdout.

  # dash -c 'ns=client; echo "${ns:0:1}"' 2>error
  # cat error
  dash: 1: Bad substitution
  # bash -c 'ns=client; echo "${ns:0:1}"' 2>error
  c
  # cat error

This leads to tests that neither pass nor fail.
F.e.

  TEST: arp_ping  [START]
  adding sandbox 'test_arp_ping'
  Adding DP/Bridge IF: sbx:test_arp_ping dp:arpping {, , }
  create namespaces
  ./openvswitch.sh: 282: eval: Bad substitution
  TEST: ct_connect_v4 [START]
  adding sandbox 'test_ct_connect_v4'
  Adding DP/Bridge IF: sbx:test_ct_connect_v4 dp:ct4 {, , }
  ./openvswitch.sh: 322: eval: Bad substitution
  create namespaces

Resolve this by making openvswitch.sh a bash script.

Fixes: 918423fda910 ("selftests: openvswitch: add an initial flow programming 
case")
Signed-off-by: Simon Horman 


That's good fix,
Reviewed-by: Przemek Kitszel 

sidenote: I like very much the idea to use the least powerful tool, like
sh vs bash, awk vs gawk, but it breaks when we forget what is outside of
the scope of the former/standard.
Perhaps for shell, we could convert all the selftests at once?


---
  tools/testing/selftests/net/openvswitch/openvswitch.sh | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh 
b/tools/testing/selftests/net/openvswitch/openvswitch.sh
index 5cae53543849..15bca0708717 100755
--- a/tools/testing/selftests/net/openvswitch/openvswitch.sh
+++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh
@@ -1,4 +1,4 @@
-#!/bin/sh
+#!/bin/bash
  # SPDX-License-Identifier: GPL-2.0
  #
  # OVS kernel module self tests





___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC net-next 3/7] selftests: openvswitch: Add set() and set_masked() support.

2024-06-17 Thread Adrián Moreno
On Thu, Jun 13, 2024 at 02:13:29PM GMT, Aaron Conole wrote:
> These will be used in upcoming commits to set specific attributes for
> interacting with tunnels.  Since set() will use the key parsing routine, we
> also make sure to prepend it with an open paren, for the action parsing to
> properly understand it.
>
> Signed-off-by: Aaron Conole 
> ---
>  .../selftests/net/openvswitch/ovs-dpctl.py| 39 +--
>  1 file changed, 35 insertions(+), 4 deletions(-)
>
> diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py 
> b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
> index 73768f3af6e5..fee64c31d4d4 100644
> --- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
> +++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
> @@ -284,7 +284,7 @@ class ovsactions(nla):
>  ("OVS_ACTION_ATTR_UNSPEC", "none"),
>  ("OVS_ACTION_ATTR_OUTPUT", "uint32"),
>  ("OVS_ACTION_ATTR_USERSPACE", "userspace"),
> -("OVS_ACTION_ATTR_SET", "none"),
> +("OVS_ACTION_ATTR_SET", "ovskey"),
>  ("OVS_ACTION_ATTR_PUSH_VLAN", "none"),
>  ("OVS_ACTION_ATTR_POP_VLAN", "flag"),
>  ("OVS_ACTION_ATTR_SAMPLE", "none"),
> @@ -292,7 +292,7 @@ class ovsactions(nla):
>  ("OVS_ACTION_ATTR_HASH", "none"),
>  ("OVS_ACTION_ATTR_PUSH_MPLS", "none"),
>  ("OVS_ACTION_ATTR_POP_MPLS", "flag"),
> -("OVS_ACTION_ATTR_SET_MASKED", "none"),
> +("OVS_ACTION_ATTR_SET_MASKED", "ovskey"),
>  ("OVS_ACTION_ATTR_CT", "ctact"),
>  ("OVS_ACTION_ATTR_TRUNC", "uint32"),
>  ("OVS_ACTION_ATTR_PUSH_ETH", "none"),
> @@ -469,6 +469,14 @@ class ovsactions(nla):
>  print_str += "clone("
>  print_str += datum.dpstr(more)
>  print_str += ")"
> +elif field[0] == "OVS_ACTION_ATTR_SET" or \
> + field[0] == "OVS_ACTION_ATTR_SET_MASKED":
> +print_str += "set"
> +if field[0] == "OVS_ACTION_ATTR_SET_MASKED":
> +print_str += "_masked"
> +print_str += "("
> +print_str += datum.dpstr(more)
> +print_str += ")"
>  else:
>  try:
>  print_str += datum.dpstr(more)
> @@ -547,6 +555,25 @@ class ovsactions(nla):
>  self["attrs"].append(("OVS_ACTION_ATTR_CLONE", subacts))
>  actstr = actstr[parsedLen:]
>  parsed = True
> +elif parse_starts_block(actstr, "set(", False):
> +parencount += 1
> +k = ovskey()
> +actstr = actstr[len("set("):]
> +actstr = k.parse(actstr, None)
> +self["attrs"].append(("OVS_ACTION_ATTR_SET", k))
> +if not actstr.startswith(")"):
> +actstr = ")" + actstr
> +parsed = True
> +elif parse_starts_block(actstr, "set_masked(", False):
> +parencount += 1
> +k = ovskey()
> +m = ovskey()
> +actstr = actstr[len("set_masked("):]
> +actstr = k.parse(actstr, m)
> +self["attrs"].append(("OVS_ACTION_ATTR_SET_MASKED", [k, m]))
> +if not actstr.startswith(")"):
> +actstr = ")" + actstr
> +parsed = True
>  elif parse_starts_block(actstr, "ct(", False):
>  parencount += 1
>  actstr = actstr[len("ct(") :]
> @@ -1312,7 +1339,7 @@ class ovskey(nla):
>  mask["attrs"].append([field[0], m])
>  self["attrs"].append([field[0], k])
>
> -flowstr = flowstr[strspn(flowstr, "),") :]
> +flowstr = flowstr[strspn(flowstr, "), ") :]
>
>  return flowstr
>
> @@ -1898,7 +1925,11 @@ class OvsFlow(GenericNetlinkSocket):
>  ):
>  print_str += "drop"
>  else:
> -print_str += actsmsg.dpstr(more)
> +if type(actsmsg) == "list":

nit: I belive the recommended way of comparing types is using
"isinstance":

https://www.flake8rules.com/rules/E721.html

Also, I don't see what can make actmsg be a list. It should always be an
instance of "ovsactions", right?


> +for act in actsmsg:
> +print_str += act.dpstr(more)
> +else:
> +print_str += actsmsg.dpstr(more)
>
>  return print_str
>
> --
> 2.45.1
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v2 7/9] net: openvswitch: do not notify drops inside sample

2024-06-17 Thread Ilya Maximets
On 6/17/24 13:55, Ilya Maximets wrote:
> On 6/3/24 20:56, Adrian Moreno wrote:
>> The OVS_ACTION_ATTR_SAMPLE action is, in essence,
>> observability-oriented.
>>
>> Apart from some corner case in which it's used a replacement of clone()
>> for old kernels, it's really only used for sFlow, IPFIX and now,
>> local emit_sample.
>>
>> With this in mind, it doesn't make much sense to report
>> OVS_DROP_LAST_ACTION inside sample actions.
>>
>> For instance, if the flow:
>>
>>   actions:sample(..,emit_sample(..)),2
>>
>> triggers a OVS_DROP_LAST_ACTION skb drop event, it would be extremely
>> confusing for users since the packet did reach its destination.
>>
>> This patch makes internal action execution silently consume the skb
>> instead of notifying a drop for this case.
>>
>> Unfortunately, this patch does not remove all potential sources of
>> confusion since, if the sample action itself is the last action, e.g:
>>
>> actions:sample(..,emit_sample(..))
>>
>> we actually _should_ generate a OVS_DROP_LAST_ACTION event, but we aren't.
>>
>> Sadly, this case is difficult to solve without breaking the
>> optimization by which the skb is not cloned on last sample actions.
>> But, given explicit drop actions are now supported, OVS can just add one
>> after the last sample() and rewrite the flow as:
>>
>> actions:sample(..,emit_sample(..)),drop
>>
>> Signed-off-by: Adrian Moreno 
>> ---
>>  net/openvswitch/actions.c | 13 +++--
>>  1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
>> index 33f6d93ba5e4..54fc1abcff95 100644
>> --- a/net/openvswitch/actions.c
>> +++ b/net/openvswitch/actions.c
>> @@ -82,6 +82,15 @@ static struct action_fifo __percpu *action_fifos;
>>  static struct action_flow_keys __percpu *flow_keys;
>>  static DEFINE_PER_CPU(int, exec_actions_level);
>>  
>> +static inline void ovs_drop_skb_last_action(struct sk_buff *skb)
>> +{
>> +/* Do not emit packet drops inside sample(). */
>> +if (OVS_CB(skb)->probability)
>> +consume_skb(skb);
>> +else
>> +ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
>> +}
>> +
>>  /* Make a clone of the 'key', using the pre-allocated percpu 'flow_keys'
>>   * space. Return NULL if out of key spaces.
>>   */
>> @@ -1061,7 +1070,7 @@ static int sample(struct datapath *dp, struct sk_buff 
>> *skb,
>>  if ((arg->probability != U32_MAX) &&
>>  (!arg->probability || get_random_u32() > arg->probability)) {
>>  if (last)
>> -ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
>> +ovs_drop_skb_last_action(skb);

Always consuming the skb at this point makes sense, since having smaple()
as a last action is a reasonable thing to have.  But this looks more like
a fix for the original drop reason patch set.

>>  return 0;
>>  }
>>  
>> @@ -1579,7 +1588,7 @@ static int do_execute_actions(struct datapath *dp, 
>> struct sk_buff *skb,
>>  }
>>  }
>>  
>> -ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
>> +ovs_drop_skb_last_action(skb);
> 
> I don't think I agree with this one.  If we have a sample() action with
> a lot of different actions inside and we reached the end while the last
> action didn't consume the skb, then we should report that.  E.g.
> "sample(emit_sample(),push_vlan(),set(eth())),2"  should report that the
> cloned skb was dropped.  "sample(push_vlan(),emit_sample())" should not.
> 
> The only actions that are actually consuming the skb are "output",
> "userspace", "recirc" and now "emit_sample".  "output" and "recirc" are
> consuming the skb "naturally" by stealing it when it is the last action.
> "userspace" has an explicit check to consume the skb if it is the last
> action.  "emit_sample" should have the similar check.  It should likely
> be added at the point of action introduction instead of having a separate
> patch.
> 
> Best regards, Ilya Maximets.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v2 7/9] net: openvswitch: do not notify drops inside sample

2024-06-17 Thread Ilya Maximets
On 6/3/24 20:56, Adrian Moreno wrote:
> The OVS_ACTION_ATTR_SAMPLE action is, in essence,
> observability-oriented.
> 
> Apart from some corner case in which it's used a replacement of clone()
> for old kernels, it's really only used for sFlow, IPFIX and now,
> local emit_sample.
> 
> With this in mind, it doesn't make much sense to report
> OVS_DROP_LAST_ACTION inside sample actions.
> 
> For instance, if the flow:
> 
>   actions:sample(..,emit_sample(..)),2
> 
> triggers a OVS_DROP_LAST_ACTION skb drop event, it would be extremely
> confusing for users since the packet did reach its destination.
> 
> This patch makes internal action execution silently consume the skb
> instead of notifying a drop for this case.
> 
> Unfortunately, this patch does not remove all potential sources of
> confusion since, if the sample action itself is the last action, e.g:
> 
> actions:sample(..,emit_sample(..))
> 
> we actually _should_ generate a OVS_DROP_LAST_ACTION event, but we aren't.
> 
> Sadly, this case is difficult to solve without breaking the
> optimization by which the skb is not cloned on last sample actions.
> But, given explicit drop actions are now supported, OVS can just add one
> after the last sample() and rewrite the flow as:
> 
> actions:sample(..,emit_sample(..)),drop
> 
> Signed-off-by: Adrian Moreno 
> ---
>  net/openvswitch/actions.c | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 33f6d93ba5e4..54fc1abcff95 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -82,6 +82,15 @@ static struct action_fifo __percpu *action_fifos;
>  static struct action_flow_keys __percpu *flow_keys;
>  static DEFINE_PER_CPU(int, exec_actions_level);
>  
> +static inline void ovs_drop_skb_last_action(struct sk_buff *skb)
> +{
> + /* Do not emit packet drops inside sample(). */
> + if (OVS_CB(skb)->probability)
> + consume_skb(skb);
> + else
> + ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
> +}
> +
>  /* Make a clone of the 'key', using the pre-allocated percpu 'flow_keys'
>   * space. Return NULL if out of key spaces.
>   */
> @@ -1061,7 +1070,7 @@ static int sample(struct datapath *dp, struct sk_buff 
> *skb,
>   if ((arg->probability != U32_MAX) &&
>   (!arg->probability || get_random_u32() > arg->probability)) {
>   if (last)
> - ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
> + ovs_drop_skb_last_action(skb);
>   return 0;
>   }
>  
> @@ -1579,7 +1588,7 @@ static int do_execute_actions(struct datapath *dp, 
> struct sk_buff *skb,
>   }
>   }
>  
> - ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
> + ovs_drop_skb_last_action(skb);

I don't think I agree with this one.  If we have a sample() action with
a lot of different actions inside and we reached the end while the last
action didn't consume the skb, then we should report that.  E.g.
"sample(emit_sample(),push_vlan(),set(eth())),2"  should report that the
cloned skb was dropped.  "sample(push_vlan(),emit_sample())" should not.

The only actions that are actually consuming the skb are "output",
"userspace", "recirc" and now "emit_sample".  "output" and "recirc" are
consuming the skb "naturally" by stealing it when it is the last action.
"userspace" has an explicit check to consume the skb if it is the last
action.  "emit_sample" should have the similar check.  It should likely
be added at the point of action introduction instead of having a separate
patch.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v2 6/9] net: openvswitch: store sampling probability in cb.

2024-06-17 Thread Ilya Maximets
On 6/17/24 09:08, Adrián Moreno wrote:
> On Fri, Jun 14, 2024 at 12:55:59PM GMT, Aaron Conole wrote:
>> Adrian Moreno  writes:
>>
>>> The behavior of actions might not be the exact same if they are being
>>> executed inside a nested sample action. Store the probability of the
>>> parent sample action in the skb's cb area.
>>
>> What does that mean?
>>
> 
> Emit action, for instance, needs the probability so that psample
> consumers know what was the sampling rate applied. Also, the way we
> should inform about packet drops (via kfree_skb_reason) changes (see
> patch 7/9).
> 
>>> Use the probability in emit_sample to pass it down to psample.
>>>
>>> Signed-off-by: Adrian Moreno 
>>> ---
>>>  include/uapi/linux/openvswitch.h |  3 ++-
>>>  net/openvswitch/actions.c| 25 ++---
>>>  net/openvswitch/datapath.h   |  3 +++
>>>  net/openvswitch/vport.c  |  1 +
>>>  4 files changed, 28 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/include/uapi/linux/openvswitch.h 
>>> b/include/uapi/linux/openvswitch.h
>>> index a0e9dde0584a..9d675725fa2b 100644
>>> --- a/include/uapi/linux/openvswitch.h
>>> +++ b/include/uapi/linux/openvswitch.h
>>> @@ -649,7 +649,8 @@ enum ovs_flow_attr {
>>>   * Actions are passed as nested attributes.
>>>   *
>>>   * Executes the specified actions with the given probability on a 
>>> per-packet
>>> - * basis.
>>> + * basis. Nested actions will be able to access the probability value of 
>>> the
>>> + * parent @OVS_ACTION_ATTR_SAMPLE.
>>>   */
>>>  enum ovs_sample_attr {
>>> OVS_SAMPLE_ATTR_UNSPEC,
>>> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
>>> index 3b4dba0ded59..33f6d93ba5e4 100644
>>> --- a/net/openvswitch/actions.c
>>> +++ b/net/openvswitch/actions.c
>>> @@ -1048,12 +1048,15 @@ static int sample(struct datapath *dp, struct 
>>> sk_buff *skb,
>>> struct nlattr *sample_arg;
>>> int rem = nla_len(attr);
>>> const struct sample_arg *arg;
>>> +   u32 init_probability;
>>> bool clone_flow_key;
>>> +   int err;
>>>
>>> /* The first action is always 'OVS_SAMPLE_ATTR_ARG'. */
>>> sample_arg = nla_data(attr);
>>> arg = nla_data(sample_arg);
>>> actions = nla_next(sample_arg, );
>>> +   init_probability = OVS_CB(skb)->probability;
>>>
>>> if ((arg->probability != U32_MAX) &&
>>> (!arg->probability || get_random_u32() > arg->probability)) {
>>> @@ -1062,9 +1065,21 @@ static int sample(struct datapath *dp, struct 
>>> sk_buff *skb,
>>> return 0;
>>> }
>>>
>>> +   if (init_probability) {
>>> +   OVS_CB(skb)->probability = ((u64)OVS_CB(skb)->probability *
>>> +   arg->probability / U32_MAX);
>>> +   } else {
>>> +   OVS_CB(skb)->probability = arg->probability;
>>> +   }
>>> +
>>
>> I'm confused by this.  Eventually, integer arithmetic will practically
>> guarantee that nested sample() calls will go to 0.  So eventually, the
>> test above will be impossible to meet mathematically.
>>
>> OTOH, you could argue that a 1% of 50% is low anyway, but it still would
>> have a positive probability count, and still be possible for
>> get_random_u32() call to match.
>>
> 
> Using OVS's probability semantics, we can express probabilities as low
> as (100/U32_MAX)% which is pretty low indeed. However, just because the
> probability of executing the action is low I don't think we should not
> report it.
> 
> Rethinking the integer arithmetics, it's true that we should avoid
> hitting zero on the division, eg: nesting 6x 1% sampling rates will make
> the result be zero which will make probability restoration fail on the
> way back. Threrefore, the new probability should be at least 1.
> 
> 
>> I'm not sure about this particular change.  Why do we need it?
>>
> 
> Why do we need to propagate the probability down to nested "sample"
> actions? or why do we need to store the probability in the cb area in
> the first place?
> 
> The former: Just for correctness as only storing the last one would be
> incorrect. Although I don't know of any use for nested "sample" actions.

I think, we can drop this for now.  All the user interfaces specify
the probability per action.  So, it should be fine to report the
probability of the action that emitted the sample without taking into
account the whole timeline of that packet.  Besides, packet can leave
OVS and go back loosing the metadata, so it will not actually be a
full solution anyway.  Single-action metadata is easier to define.

> The latter: To pass it down to psample so that sample receivers know how
> the sampling rate applied (and, e.g: do throughput estimations like OVS
> does with IPFIX).
> 
> 
>>> clone_flow_key = !arg->exec;
>>> -   return clone_execute(dp, skb, key, 0, actions, rem, last,
>>> -clone_flow_key);
>>> +   err = clone_execute(dp, skb, key, 0, actions, rem, last,
>>> +   clone_flow_key);
>>> +
>>> +   if (!last)
>>
>> Is this 

Re: [ovs-dev] [PATCH net-next v2 5/9] net: openvswitch: add emit_sample action

2024-06-17 Thread Ilya Maximets
On 6/3/24 20:56, Adrian Moreno wrote:
> Add support for a new action: emit_sample.
> 
> This action accepts a u32 group id and a variable-length cookie and uses
> the psample multicast group to make the packet available for
> observability.
> 
> The maximum length of the user-defined cookie is set to 16, same as
> tc_cookie, to discourage using cookies that will not be offloadable.
> 
> Signed-off-by: Adrian Moreno 
> ---
>  Documentation/netlink/specs/ovs_flow.yaml | 17 
>  include/uapi/linux/openvswitch.h  | 25 
>  net/openvswitch/actions.c | 50 +++
>  net/openvswitch/flow_netlink.c| 33 ++-
>  4 files changed, 124 insertions(+), 1 deletion(-)

Some nits below, beside ones already mentioned.

> 
> diff --git a/Documentation/netlink/specs/ovs_flow.yaml 
> b/Documentation/netlink/specs/ovs_flow.yaml
> index 4fdfc6b5cae9..a7ab5593a24f 100644
> --- a/Documentation/netlink/specs/ovs_flow.yaml
> +++ b/Documentation/netlink/specs/ovs_flow.yaml
> @@ -727,6 +727,12 @@ attribute-sets:
>  name: dec-ttl
>  type: nest
>  nested-attributes: dec-ttl-attrs
> +  -
> +name: emit-sample
> +type: nest
> +nested-attributes: emit-sample-attrs
> +doc: |
> +  Sends a packet sample to psample for external observation.
>-
>  name: tunnel-key-attrs
>  enum-name: ovs-tunnel-key-attr
> @@ -938,6 +944,17 @@ attribute-sets:
>-
>  name: gbp
>  type: u32
> +  -
> +name: emit-sample-attrs
> +enum-name: ovs-emit-sample-attr
> +name-prefix: ovs-emit-sample-attr-
> +attributes:
> +  -
> +name: group
> +type: u32
> +  -
> +name: cookie
> +type: binary
>  
>  operations:
>name-prefix: ovs-flow-cmd-
> diff --git a/include/uapi/linux/openvswitch.h 
> b/include/uapi/linux/openvswitch.h
> index efc82c318fa2..a0e9dde0584a 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -914,6 +914,30 @@ struct check_pkt_len_arg {
>  };
>  #endif
>  
> +#define OVS_EMIT_SAMPLE_COOKIE_MAX_SIZE 16
> +/**
> + * enum ovs_emit_sample_attr - Attributes for %OVS_ACTION_ATTR_EMIT_SAMPLE
> + * action.
> + *
> + * @OVS_EMIT_SAMPLE_ATTR_GROUP: 32-bit number to identify the source of the
> + * sample.
> + * @OVS_EMIT_SAMPLE_ATTR_COOKIE: A variable-length binary cookie that 
> contains
> + * user-defined metadata. The maximum length is 16 bytes.

s/16/OVS_EMIT_SAMPLE_COOKIE_MAX_SIZE/

> + *
> + * Sends the packet to the psample multicast group with the specified group 
> and
> + * cookie. It is possible to combine this action with the
> + * %OVS_ACTION_ATTR_TRUNC action to limit the size of the packet being 
> emitted.
> + */
> +enum ovs_emit_sample_attr {
> + OVS_EMIT_SAMPLE_ATTR_UNPSEC,
> + OVS_EMIT_SAMPLE_ATTR_GROUP, /* u32 number. */
> + OVS_EMIT_SAMPLE_ATTR_COOKIE,/* Optional, user specified cookie. */
> + __OVS_EMIT_SAMPLE_ATTR_MAX
> +};
> +
> +#define OVS_EMIT_SAMPLE_ATTR_MAX (__OVS_EMIT_SAMPLE_ATTR_MAX - 1)
> +
> +
>  /**
>   * enum ovs_action_attr - Action types.
>   *
> @@ -1004,6 +1028,7 @@ enum ovs_action_attr {
>   OVS_ACTION_ATTR_ADD_MPLS, /* struct ovs_action_add_mpls. */
>   OVS_ACTION_ATTR_DEC_TTL,  /* Nested OVS_DEC_TTL_ATTR_*. */
>   OVS_ACTION_ATTR_DROP, /* u32 error code. */
> + OVS_ACTION_ATTR_EMIT_SAMPLE,  /* Nested OVS_EMIT_SAMPLE_ATTR_*. */
>  
>   __OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
>  * from userspace. */
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 964225580824..3b4dba0ded59 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -24,6 +24,11 @@
>  #include 
>  #include 
>  #include 
> +
> +#if IS_ENABLED(CONFIG_PSAMPLE)
> +#include 
> +#endif
> +
>  #include 
>  
>  #include "datapath.h"
> @@ -1299,6 +1304,46 @@ static int execute_dec_ttl(struct sk_buff *skb, struct 
> sw_flow_key *key)
>   return 0;
>  }
>  
> +static int execute_emit_sample(struct datapath *dp, struct sk_buff *skb,
> +const struct sw_flow_key *key,
> +const struct nlattr *attr)
> +{
> +#if IS_ENABLED(CONFIG_PSAMPLE)
> + struct psample_group psample_group = {};
> + struct psample_metadata md = {};
> + struct vport *input_vport;
> + const struct nlattr *a;
> + int rem;
> +
> + for (a = nla_data(attr), rem = nla_len(attr); rem > 0;
> +  a = nla_next(a, )) {

Since the action is strictly validated, can use use nla_for_each_attr()
or nla_for_each_nested() ?

> + switch (nla_type(a)) {
> + case OVS_EMIT_SAMPLE_ATTR_GROUP:
> + psample_group.group_num = nla_get_u32(a);
> + break;
> +
> + case OVS_EMIT_SAMPLE_ATTR_COOKIE:
> + 

Re: [ovs-dev] [PATCH net] selftests: openvswitch: Use bash as interpreter

2024-06-17 Thread Simon Horman
On Mon, Jun 17, 2024 at 12:05:11PM +0200, Przemek Kitszel wrote:
> On 6/17/24 10:28, Simon Horman wrote:
> > openvswitch.sh makes use of substitutions of the form ${ns:0:1}, to
> > obtain the first character of $ns. Empirically, this is works with bash
> > but not dash. When run with dash these evaluate to an empty string and
> > printing an error to stdout.
> > 
> >   # dash -c 'ns=client; echo "${ns:0:1}"' 2>error
> >   # cat error
> >   dash: 1: Bad substitution
> >   # bash -c 'ns=client; echo "${ns:0:1}"' 2>error
> >   c
> >   # cat error
> > 
> > This leads to tests that neither pass nor fail.
> > F.e.
> > 
> >   TEST: arp_ping  
> > [START]
> >   adding sandbox 'test_arp_ping'
> >   Adding DP/Bridge IF: sbx:test_arp_ping dp:arpping {, , }
> >   create namespaces
> >   ./openvswitch.sh: 282: eval: Bad substitution
> >   TEST: ct_connect_v4 
> > [START]
> >   adding sandbox 'test_ct_connect_v4'
> >   Adding DP/Bridge IF: sbx:test_ct_connect_v4 dp:ct4 {, , }
> >   ./openvswitch.sh: 322: eval: Bad substitution
> >   create namespaces
> > 
> > Resolve this by making openvswitch.sh a bash script.
> > 
> > Fixes: 918423fda910 ("selftests: openvswitch: add an initial flow 
> > programming case")
> > Signed-off-by: Simon Horman 
> 
> That's good fix,
> Reviewed-by: Przemek Kitszel 
> 
> sidenote: I like very much the idea to use the least powerful tool, like
> sh vs bash, awk vs gawk, but it breaks when we forget what is outside of
> the scope of the former/standard.
> Perhaps for shell, we could convert all the selftests at once?

Thanks,

Now that you mention it, I have the same feelings.

Do we ever expect to use the minimal tools, when other
parts of the test suite depend on the enhanced ones?

...
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v2 4/9] net: psample: allow using rate as probability

2024-06-17 Thread Simon Horman
On Mon, Jun 17, 2024 at 06:32:14AM +, Adrián Moreno wrote:
> On Fri, Jun 14, 2024 at 05:11:30PM GMT, Simon Horman wrote:
> > On Mon, Jun 03, 2024 at 08:56:38PM +0200, Adrian Moreno wrote:
> > > Although not explicitly documented in the psample module itself, the
> > > definition of PSAMPLE_ATTR_SAMPLE_RATE seems inherited from act_sample.
> > >
> > > Quoting tc-sample(8):
> > > "RATE of 100 will lead to an average of one sampled packet out of every
> > > 100 observed."
> > >
> > > With this semantics, the rates that we can express with an unsigned
> > > 32-bits number are very unevenly distributed and concentrated towards
> > > "sampling few packets".
> > > For example, we can express a probability of 2.32E-8% but we
> > > cannot express anything between 100% and 50%.
> > >
> > > For sampling applications that are capable of sampling a decent
> > > amount of packets, this sampling rate semantics is not very useful.
> > >
> > > Add a new flag to the uAPI that indicates that the sampling rate is
> > > expressed in scaled probability, this is:
> > > - 0 is 0% probability, no packets get sampled.
> > > - U32_MAX is 100% probability, all packets get sampled.
> > >
> > > Signed-off-by: Adrian Moreno 
> >
> > Hi Adrian,
> >
> > Would it be possible to add appropriate documentation for
> > rate - both the original ratio variant, and the new probability
> > variant - somewhere?
> >
> 
> Hi Simon, thanks for the suggestion. Would the uapi header be a good
> place for such documentation?

Hi Adrian,

I didn't look closely, but that does sound like a good place to me.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [BUG][meter] ovs crash when add meter openflow

2024-06-17 Thread Simon Jones
It's PR: Bugfix of meter_set crash. by batmancn · Pull Request #425 ·
openvswitch/ovs (github.com) 


Simon Jones


Simon Jones  于2024年6月17日周一 15:35写道:

> This patch:
> ```
> $ git diff
> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
> index b000aeea8..74fd7c11b 100644
> --- a/lib/netdev-offload-tc.c
> +++ b/lib/netdev-offload-tc.c
> @@ -86,6 +86,8 @@ static struct hmap meter_id_to_police_idx
> OVS_GUARDED_BY(meter_mutex)
>  = HMAP_INITIALIZER(_id_to_police_idx);
>  static struct hmap police_idx_to_meter_id OVS_GUARDED_BY(meter_mutex)
>  = HMAP_INITIALIZER(_idx_to_meter_id);
> +/* YSK2: if init tc. */
> +static atomic_bool is_tc_init = ATOMIC_VAR_INIT(false);
>
>  static int meter_id_lookup(uint32_t meter_id, uint32_t *police_idx);
>  static int police_idx_lookup(uint32_t police_idx, uint32_t *meter_id);
> @@ -2433,6 +2435,7 @@ netdev_tc_init_flow_api(struct netdev *netdev)
>  }
>
>  VLOG_INFO("added ingress qdisc to %s", netdev_get_name(netdev));
> +atomic_store_relaxed(_tc_init, true);
>
>  return 0;
>  }
> @@ -2549,6 +2552,13 @@ meter_tc_set_policer(ofproto_meter_id meter_id,
>  uint32_t rate, burst;
>  bool add_policer;
>  int err;
> +bool init;
> +
> +atomic_read_relaxed(_tc_init, );
> +if (!init) {
> +VLOG_WARN("Do not call meter_set before init.");
> +return 0;
> +}
>
>  if (!config->bands || config->n_bands < 1 ||
>  config->bands[0].type != OFPMBT13_DROP) {
> ```
> 
> Simon Jones
>
>
> Simon Jones  于2024年6月17日周一 15:30写道:
>
>> I use this patch to try to fix BUG, I test several times, it's OK
>> ```
>> [root@bogon yusur_ovs]# git diff
>> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
>> index b000aee..3330cb2 100644
>> --- a/lib/netdev-offload-tc.c
>> +++ b/lib/netdev-offload-tc.c
>> @@ -86,6 +86,8 @@ static struct hmap meter_id_to_police_idx
>> OVS_GUARDED_BY(meter_mutex)
>>  = HMAP_INITIALIZER(_id_to_police_idx);
>>  static struct hmap police_idx_to_meter_id OVS_GUARDED_BY(meter_mutex)
>>  = HMAP_INITIALIZER(_idx_to_meter_id);
>> +static atomic_bool is_tc_init = ATOMIC_VAR_INIT(false);
>>
>>  static int meter_id_lookup(uint32_t meter_id, uint32_t *police_idx);
>>  static int police_idx_lookup(uint32_t police_idx, uint32_t *meter_id);
>> @@ -2549,6 +2551,13 @@ meter_tc_set_policer(ofproto_meter_id meter_id,
>>  uint32_t rate, burst;
>>  bool add_policer;
>>  int err;
>> +bool init;
>> +
>> +atomic_read_relaxed(_tc_init, );
>> +if (!init)
>> +return 0;
>> +else
>> +VLOG_WARN("Do not call meter_set before init.");
>>
>>  if (!config->bands || config->n_bands < 1 ||
>>  config->bands[0].type != OFPMBT13_DROP) {
>> ```
>>
>> 
>> Simon Jones
>>
>>
>> Simon Jones  于2024年6月17日周一 11:13写道:
>>
>>> I found another cause of this BUG:
>>> ```
>>> In public-ovs code, @netdev_offload_dpdk and @netdev_offload_tc is
>>> register in @netdev_register_flow_api_provider.
>>> The @netdev_register_flow_api_provider is called in init stage,
>>> like @dpdk_init__ and @netdev_initialize.
>>> After register, @netdev_offload_dpdk and @netdev_offload_tc is in
>>> @netdev_flow_apis.
>>>
>>> Then ovs-vswitchd run @bridge_run.
>>> In @bridge_run, call @netdev_assign_flow_api, then
>>> call rfa->flow_api->init_flow_api of all rfa of  @netdev_flow_apis.
>>> The rfa is like p0 netdevice of DPDK type, or mip0 netdevice of system
>>> type.
>>> If DPDK type, it's  @netdev_offload_dpdk's  init_flow_api.
>>> If system type, it's  @netdev_offload_tc's   init_flow_api.
>>>
>>> Then the add meter command comes, also call @bridge_run.
>>> In  @bridge_run, at last call @meter_offload_set, then
>>> call rfa->flow_api->meter_set  of all rfa of  @netdev_flow_apis.
>>>
>>> For this BUG.
>>> Happens when ovs-vswitchd restart.
>>> As bridge/port/meter is all stored in ovsdb.
>>> If meter configure called before port configure, then 
>>> rfa->flow_api->meter_set
>>> will be called before rfa->flow_api->init_flow_api.
>>> Then BUG happens.
>>>
>>> ```
>>>
>>> 
>>> Simon Jones
>>>
>>>
>>> Simon Jones  于2024年6月17日周一 10:57写道:
>>>
 Oh, I'm using ovs-2.17.2, and I found that there is no .meter_set api
 implement in ovs-dpdk, which means only one .meter_set implement in TC.
 ```
 const struct netdev_flow_api netdev_offload_dpdk = {
 .type = "dpdk_flow_api",
 .flow_put = netdev_offload_dpdk_flow_put,
 .flow_del = netdev_offload_dpdk_flow_del,
 .init_flow_api = netdev_offload_dpdk_init_flow_api,
 .uninit_flow_api = netdev_offload_dpdk_uninit_flow_api,
 .flow_get = netdev_offload_dpdk_flow_get,
 .flow_flush = netdev_offload_dpdk_flow_flush,
 .hw_miss_packet_recover =
 netdev_offload_dpdk_hw_miss_packet_recover,
 .flow_get_n_flows = netdev_offload_dpdk_get_n_flows,
 };
 ```
 That's why public-ovs has NO BUG, because only 

Re: [ovs-dev] [PATCH net-next v2 2/9] net: sched: act_sample: add action cookie to sample

2024-06-17 Thread Ilya Maximets
On 6/3/24 20:56, Adrian Moreno wrote:
> If the action has a user_cookie, pass it along to the sample so it can
> be easily identified.
> 
> Signed-off-by: Adrian Moreno 
> ---
>  net/sched/act_sample.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c
> index a69b53d54039..5c3f86ec964a 100644
> --- a/net/sched/act_sample.c
> +++ b/net/sched/act_sample.c
> @@ -165,9 +165,11 @@ TC_INDIRECT_SCOPE int tcf_sample_act(struct sk_buff *skb,
>const struct tc_action *a,
>struct tcf_result *res)
>  {
> + u8 cookie_data[TC_COOKIE_MAX_SIZE] = {};

Is it necessary to initialize these 16 bytes on every call?
Might be expensive.  We're passing the data length around,
so the uninitialized parts should not be accessed.

Best regards, Ilya Maximets.

>   struct tcf_sample *s = to_sample(a);
>   struct psample_group *psample_group;
>   struct psample_metadata md = {};
> + struct tc_cookie *user_cookie;
>   int retval;
>  
>   tcf_lastuse_update(>tcf_tm);
> @@ -189,6 +191,16 @@ TC_INDIRECT_SCOPE int tcf_sample_act(struct sk_buff *skb,
>   if (skb_at_tc_ingress(skb) && tcf_sample_dev_ok_push(skb->dev))
>   skb_push(skb, skb->mac_len);
>  
> + rcu_read_lock();
> + user_cookie = rcu_dereference(a->user_cookie);
> + if (user_cookie) {
> + memcpy(cookie_data, user_cookie->data,
> +user_cookie->len);
> + md.user_cookie = cookie_data;
> + md.user_cookie_len = user_cookie->len;
> + }
> + rcu_read_unlock();
> +
>   md.trunc_size = s->truncate ? s->trunc_size : skb->len;
>   psample_sample_packet(psample_group, skb, s->rate, );
>  

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] controller: Send RARP/GARP for VIF post link state is up.

2024-06-17 Thread Shibir Basak
Hi Mark,

In case we are not waiting to batch any other patches, can we merge this?

Thanks,
Shibir

From: Mark Michelson 
Date: Friday, 7 June 2024 at 1:54 AM
To: Shibir Basak , d...@openvswitch.org 

Subject: Re: [ovs-dev] [PATCH ovn] controller: Send RARP/GARP for VIF post link 
state is up.
!---|
  CAUTION: External Email

|---!

Thank you for the patch Shibir. It looks good to me, so

Acked-by: Mark Michelson 

On 5/27/24 14:24, Shibir Basak wrote:
> Currently, GARP/RARP broadcast is sent for VIFs (part of logical
> switch with localnet port) after iface-id is set.
> This fix is to avoid packet loss during migration if iface-id
> is set even before the VM migration is completed.
>
> Signed-off-by: Shibir Basak 
> Acked-by: Naveen Yerramneni 
> ---
>   controller/ovn-controller.c | 1 +
>   controller/pinctrl.c| 4 
>   2 files changed, 5 insertions(+)
>
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index 6b38f113d..982378a50 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -1128,6 +1128,7 @@ ctrl_register_ovs_idl(struct ovsdb_idl *ovs_idl)
>   ovsdb_idl_add_table(ovs_idl, _table_queue);
>   ovsdb_idl_add_column(ovs_idl, _queue_col_other_config);
>   ovsdb_idl_add_column(ovs_idl, _queue_col_external_ids);
> +ovsdb_idl_add_column(ovs_idl, _interface_col_link_state);
>
>   chassis_register_ovs_idl(ovs_idl);
>   encaps_register_ovs_idl(ovs_idl);
> diff --git a/controller/pinctrl.c b/controller/pinctrl.c
> index 6a2c3dc68..b5d3162b8 100644
> --- a/controller/pinctrl.c
> +++ b/controller/pinctrl.c
> @@ -6375,6 +6375,10 @@ get_localnet_vifs_l3gwports(
>   if (!pb || pb->chassis != chassis) {
>   continue;
>   }
> +if (!iface_rec->link_state ||
> +strcmp(iface_rec->link_state, "up")) {
> +continue;
> +}
>   struct local_datapath *ld
>   = get_local_datapath(local_datapaths,
>pb->datapath->tunnel_key);
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net] selftests: openvswitch: Use bash as interpreter

2024-06-17 Thread Simon Horman
openvswitch.sh makes use of substitutions of the form ${ns:0:1}, to
obtain the first character of $ns. Empirically, this is works with bash
but not dash. When run with dash these evaluate to an empty string and
printing an error to stdout.

 # dash -c 'ns=client; echo "${ns:0:1}"' 2>error
 # cat error
 dash: 1: Bad substitution
 # bash -c 'ns=client; echo "${ns:0:1}"' 2>error
 c
 # cat error

This leads to tests that neither pass nor fail.
F.e.

 TEST: arp_ping  [START]
 adding sandbox 'test_arp_ping'
 Adding DP/Bridge IF: sbx:test_arp_ping dp:arpping {, , }
 create namespaces
 ./openvswitch.sh: 282: eval: Bad substitution
 TEST: ct_connect_v4 [START]
 adding sandbox 'test_ct_connect_v4'
 Adding DP/Bridge IF: sbx:test_ct_connect_v4 dp:ct4 {, , }
 ./openvswitch.sh: 322: eval: Bad substitution
 create namespaces

Resolve this by making openvswitch.sh a bash script.

Fixes: 918423fda910 ("selftests: openvswitch: add an initial flow programming 
case")
Signed-off-by: Simon Horman 
---
 tools/testing/selftests/net/openvswitch/openvswitch.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh 
b/tools/testing/selftests/net/openvswitch/openvswitch.sh
index 5cae53543849..15bca0708717 100755
--- a/tools/testing/selftests/net/openvswitch/openvswitch.sh
+++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh
@@ -1,4 +1,4 @@
-#!/bin/sh
+#!/bin/bash
 # SPDX-License-Identifier: GPL-2.0
 #
 # OVS kernel module self tests

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] netdev-dpdk: Use LSC interrupt mode.

2024-06-17 Thread David Marchand
On Mon, Jun 17, 2024 at 10:11 AM Ilya Maximets  wrote:
>
> On 6/17/24 09:46, David Marchand wrote:
> > On Fri, Jun 14, 2024 at 6:22 PM Ilya Maximets  wrote:
> >>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> >>> index 0fa37d5145..a260bc8485 100644
> >>> --- a/lib/netdev-dpdk.c
> >>> +++ b/lib/netdev-dpdk.c
> >>> @@ -2397,7 +2397,18 @@ netdev_dpdk_set_config(struct netdev *netdev, 
> >>> const struct smap *args,
> >>>  }
> >>>  }
> >>>
> >>> -lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", 
> >>> false);
> >>> +lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", true);
> >>> +if (lsc_interrupt_mode && !(*info.dev_flags & RTE_ETH_DEV_INTR_LSC)) 
> >>> {
> >>> +if (smap_get(args, "dpdk-lsc-interrupt")) {
> >>> +VLOG_ERR("interface '%s': link status interrupt is not 
> >>> supported.",
> >>> + netdev_get_name(netdev));
> >>
> >> Since we're exiting with an error set, the message should be buffered
> >> into errp instead, so it can be visible in the database record and
> >> returned as a result of the ovs-vsctl.
> >>
> >> Also, we're using WARN level for all other configuration issues, so we
> >> should do that here as well.  ERR is usually some sort of internal error.
> >> And we're usually just using "%s: ..." and not "interface '%s': ...".
> >
> > Ok for ERR vs WARN.
> >
> > For the rest, well, I copied the logs right before.
> >
> > vf_mac = smap_get(args, "dpdk-vf-mac");
> > if (vf_mac) {
> > struct eth_addr mac;
> >
> > if (!dpdk_port_is_representor(dev)) {
> > VLOG_WARN("'%s' is trying to set the VF MAC '%s' "
> >   "but 'options:dpdk-vf-mac' is only supported for "
> >   "VF representors.",
> >   netdev_get_name(netdev), vf_mac);
> > } else if (!eth_addr_from_string(vf_mac, )) {
> > VLOG_WARN("interface '%s': cannot parse VF MAC '%s'.",
> >   netdev_get_name(netdev), vf_mac);
> > } else if (eth_addr_is_multicast(mac)) {
> > VLOG_WARN("interface '%s': cannot set VF MAC to multicast "
> >   "address '%s'.", netdev_get_name(netdev), vf_mac);
> > } else if (!eth_addr_equals(dev->requested_hwaddr, mac)) {
> > dev->requested_hwaddr = mac;
> > netdev_request_reconfigure(netdev);
> > }
> > }
> >
> > lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", true);
> >
> >
> > So I'll fix the dpdk-vf-mac stuff (and double check the rest of this
> > function), then go with your suggestion for this added log of mine.
> >
> >
>
> We must not initialize errp if we do not fail with error, otherwise we leak
> the memory.  VF mac code does not fail the configuration, so we only log the
> warning.  All the paths that fail should set errp instead.
>

Talk about obvious...
I'll fix my stuff and leave the rest untouched.


-- 
David Marchand

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] netdev-dpdk: Use LSC interrupt mode.

2024-06-17 Thread Ilya Maximets
On 6/17/24 09:46, David Marchand wrote:
> On Fri, Jun 14, 2024 at 6:22 PM Ilya Maximets  wrote:
>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>> index 0fa37d5145..a260bc8485 100644
>>> --- a/lib/netdev-dpdk.c
>>> +++ b/lib/netdev-dpdk.c
>>> @@ -2397,7 +2397,18 @@ netdev_dpdk_set_config(struct netdev *netdev, const 
>>> struct smap *args,
>>>  }
>>>  }
>>>
>>> -lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", false);
>>> +lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", true);
>>> +if (lsc_interrupt_mode && !(*info.dev_flags & RTE_ETH_DEV_INTR_LSC)) {
>>> +if (smap_get(args, "dpdk-lsc-interrupt")) {
>>> +VLOG_ERR("interface '%s': link status interrupt is not 
>>> supported.",
>>> + netdev_get_name(netdev));
>>
>> Since we're exiting with an error set, the message should be buffered
>> into errp instead, so it can be visible in the database record and
>> returned as a result of the ovs-vsctl.
>>
>> Also, we're using WARN level for all other configuration issues, so we
>> should do that here as well.  ERR is usually some sort of internal error.
>> And we're usually just using "%s: ..." and not "interface '%s': ...".
> 
> Ok for ERR vs WARN.
> 
> For the rest, well, I copied the logs right before.
> 
> vf_mac = smap_get(args, "dpdk-vf-mac");
> if (vf_mac) {
> struct eth_addr mac;
> 
> if (!dpdk_port_is_representor(dev)) {
> VLOG_WARN("'%s' is trying to set the VF MAC '%s' "
>   "but 'options:dpdk-vf-mac' is only supported for "
>   "VF representors.",
>   netdev_get_name(netdev), vf_mac);
> } else if (!eth_addr_from_string(vf_mac, )) {
> VLOG_WARN("interface '%s': cannot parse VF MAC '%s'.",
>   netdev_get_name(netdev), vf_mac);
> } else if (eth_addr_is_multicast(mac)) {
> VLOG_WARN("interface '%s': cannot set VF MAC to multicast "
>   "address '%s'.", netdev_get_name(netdev), vf_mac);
> } else if (!eth_addr_equals(dev->requested_hwaddr, mac)) {
> dev->requested_hwaddr = mac;
> netdev_request_reconfigure(netdev);
> }
> }
> 
> lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", true);
> 
> 
> So I'll fix the dpdk-vf-mac stuff (and double check the rest of this
> function), then go with your suggestion for this added log of mine.
> 
> 

We must not initialize errp if we do not fail with error, otherwise we leak
the memory.  VF mac code does not fail the configuration, so we only log the
warning.  All the paths that fail should set errp instead.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] netdev-dpdk: Use LSC interrupt mode.

2024-06-17 Thread David Marchand
On Fri, Jun 14, 2024 at 6:22 PM Ilya Maximets  wrote:
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > index 0fa37d5145..a260bc8485 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -2397,7 +2397,18 @@ netdev_dpdk_set_config(struct netdev *netdev, const 
> > struct smap *args,
> >  }
> >  }
> >
> > -lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", false);
> > +lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", true);
> > +if (lsc_interrupt_mode && !(*info.dev_flags & RTE_ETH_DEV_INTR_LSC)) {
> > +if (smap_get(args, "dpdk-lsc-interrupt")) {
> > +VLOG_ERR("interface '%s': link status interrupt is not 
> > supported.",
> > + netdev_get_name(netdev));
>
> Since we're exiting with an error set, the message should be buffered
> into errp instead, so it can be visible in the database record and
> returned as a result of the ovs-vsctl.
>
> Also, we're using WARN level for all other configuration issues, so we
> should do that here as well.  ERR is usually some sort of internal error.
> And we're usually just using "%s: ..." and not "interface '%s': ...".

Ok for ERR vs WARN.

For the rest, well, I copied the logs right before.

vf_mac = smap_get(args, "dpdk-vf-mac");
if (vf_mac) {
struct eth_addr mac;

if (!dpdk_port_is_representor(dev)) {
VLOG_WARN("'%s' is trying to set the VF MAC '%s' "
  "but 'options:dpdk-vf-mac' is only supported for "
  "VF representors.",
  netdev_get_name(netdev), vf_mac);
} else if (!eth_addr_from_string(vf_mac, )) {
VLOG_WARN("interface '%s': cannot parse VF MAC '%s'.",
  netdev_get_name(netdev), vf_mac);
} else if (eth_addr_is_multicast(mac)) {
VLOG_WARN("interface '%s': cannot set VF MAC to multicast "
  "address '%s'.", netdev_get_name(netdev), vf_mac);
} else if (!eth_addr_equals(dev->requested_hwaddr, mac)) {
dev->requested_hwaddr = mac;
netdev_request_reconfigure(netdev);
}
}

lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", true);


So I'll fix the dpdk-vf-mac stuff (and double check the rest of this
function), then go with your suggestion for this added log of mine.


-- 
David Marchand

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [BUG][meter] ovs crash when add meter openflow

2024-06-17 Thread Simon Jones
This patch:
```
$ git diff
diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
index b000aeea8..74fd7c11b 100644
--- a/lib/netdev-offload-tc.c
+++ b/lib/netdev-offload-tc.c
@@ -86,6 +86,8 @@ static struct hmap meter_id_to_police_idx
OVS_GUARDED_BY(meter_mutex)
 = HMAP_INITIALIZER(_id_to_police_idx);
 static struct hmap police_idx_to_meter_id OVS_GUARDED_BY(meter_mutex)
 = HMAP_INITIALIZER(_idx_to_meter_id);
+/* YSK2: if init tc. */
+static atomic_bool is_tc_init = ATOMIC_VAR_INIT(false);

 static int meter_id_lookup(uint32_t meter_id, uint32_t *police_idx);
 static int police_idx_lookup(uint32_t police_idx, uint32_t *meter_id);
@@ -2433,6 +2435,7 @@ netdev_tc_init_flow_api(struct netdev *netdev)
 }

 VLOG_INFO("added ingress qdisc to %s", netdev_get_name(netdev));
+atomic_store_relaxed(_tc_init, true);

 return 0;
 }
@@ -2549,6 +2552,13 @@ meter_tc_set_policer(ofproto_meter_id meter_id,
 uint32_t rate, burst;
 bool add_policer;
 int err;
+bool init;
+
+atomic_read_relaxed(_tc_init, );
+if (!init) {
+VLOG_WARN("Do not call meter_set before init.");
+return 0;
+}

 if (!config->bands || config->n_bands < 1 ||
 config->bands[0].type != OFPMBT13_DROP) {
```

Simon Jones


Simon Jones  于2024年6月17日周一 15:30写道:

> I use this patch to try to fix BUG, I test several times, it's OK
> ```
> [root@bogon yusur_ovs]# git diff
> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
> index b000aee..3330cb2 100644
> --- a/lib/netdev-offload-tc.c
> +++ b/lib/netdev-offload-tc.c
> @@ -86,6 +86,8 @@ static struct hmap meter_id_to_police_idx
> OVS_GUARDED_BY(meter_mutex)
>  = HMAP_INITIALIZER(_id_to_police_idx);
>  static struct hmap police_idx_to_meter_id OVS_GUARDED_BY(meter_mutex)
>  = HMAP_INITIALIZER(_idx_to_meter_id);
> +static atomic_bool is_tc_init = ATOMIC_VAR_INIT(false);
>
>  static int meter_id_lookup(uint32_t meter_id, uint32_t *police_idx);
>  static int police_idx_lookup(uint32_t police_idx, uint32_t *meter_id);
> @@ -2549,6 +2551,13 @@ meter_tc_set_policer(ofproto_meter_id meter_id,
>  uint32_t rate, burst;
>  bool add_policer;
>  int err;
> +bool init;
> +
> +atomic_read_relaxed(_tc_init, );
> +if (!init)
> +return 0;
> +else
> +VLOG_WARN("Do not call meter_set before init.");
>
>  if (!config->bands || config->n_bands < 1 ||
>  config->bands[0].type != OFPMBT13_DROP) {
> ```
>
> 
> Simon Jones
>
>
> Simon Jones  于2024年6月17日周一 11:13写道:
>
>> I found another cause of this BUG:
>> ```
>> In public-ovs code, @netdev_offload_dpdk and @netdev_offload_tc is
>> register in @netdev_register_flow_api_provider.
>> The @netdev_register_flow_api_provider is called in init stage,
>> like @dpdk_init__ and @netdev_initialize.
>> After register, @netdev_offload_dpdk and @netdev_offload_tc is in
>> @netdev_flow_apis.
>>
>> Then ovs-vswitchd run @bridge_run.
>> In @bridge_run, call @netdev_assign_flow_api, then
>> call rfa->flow_api->init_flow_api of all rfa of  @netdev_flow_apis.
>> The rfa is like p0 netdevice of DPDK type, or mip0 netdevice of system
>> type.
>> If DPDK type, it's  @netdev_offload_dpdk's  init_flow_api.
>> If system type, it's  @netdev_offload_tc's   init_flow_api.
>>
>> Then the add meter command comes, also call @bridge_run.
>> In  @bridge_run, at last call @meter_offload_set, then
>> call rfa->flow_api->meter_set  of all rfa of  @netdev_flow_apis.
>>
>> For this BUG.
>> Happens when ovs-vswitchd restart.
>> As bridge/port/meter is all stored in ovsdb.
>> If meter configure called before port configure, then 
>> rfa->flow_api->meter_set
>> will be called before rfa->flow_api->init_flow_api.
>> Then BUG happens.
>>
>> ```
>>
>> 
>> Simon Jones
>>
>>
>> Simon Jones  于2024年6月17日周一 10:57写道:
>>
>>> Oh, I'm using ovs-2.17.2, and I found that there is no .meter_set api
>>> implement in ovs-dpdk, which means only one .meter_set implement in TC.
>>> ```
>>> const struct netdev_flow_api netdev_offload_dpdk = {
>>> .type = "dpdk_flow_api",
>>> .flow_put = netdev_offload_dpdk_flow_put,
>>> .flow_del = netdev_offload_dpdk_flow_del,
>>> .init_flow_api = netdev_offload_dpdk_init_flow_api,
>>> .uninit_flow_api = netdev_offload_dpdk_uninit_flow_api,
>>> .flow_get = netdev_offload_dpdk_flow_get,
>>> .flow_flush = netdev_offload_dpdk_flow_flush,
>>> .hw_miss_packet_recover = netdev_offload_dpdk_hw_miss_packet_recover,
>>> .flow_get_n_flows = netdev_offload_dpdk_get_n_flows,
>>> };
>>> ```
>>> That's why public-ovs has NO BUG, because only one .meter_set implement
>>> in TC.
>>>
>>> But I add .meter_set in dpdk linke this:
>>> ```
>>> const struct netdev_flow_api netdev_offload_dpdk = {
>>> .type = "dpdk_flow_api",
>>> .flow_put = netdev_offload_dpdk_flow_put,
>>> .flow_del = netdev_offload_dpdk_flow_del,
>>> .init_flow_api = netdev_offload_dpdk_init_flow_api,
>>> .uninit_flow_api = 

Re: [ovs-dev] [BUG][meter] ovs crash when add meter openflow

2024-06-17 Thread Simon Jones
I use this patch to try to fix BUG, I test several times, it's OK
```
[root@bogon yusur_ovs]# git diff
diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
index b000aee..3330cb2 100644
--- a/lib/netdev-offload-tc.c
+++ b/lib/netdev-offload-tc.c
@@ -86,6 +86,8 @@ static struct hmap meter_id_to_police_idx
OVS_GUARDED_BY(meter_mutex)
 = HMAP_INITIALIZER(_id_to_police_idx);
 static struct hmap police_idx_to_meter_id OVS_GUARDED_BY(meter_mutex)
 = HMAP_INITIALIZER(_idx_to_meter_id);
+static atomic_bool is_tc_init = ATOMIC_VAR_INIT(false);

 static int meter_id_lookup(uint32_t meter_id, uint32_t *police_idx);
 static int police_idx_lookup(uint32_t police_idx, uint32_t *meter_id);
@@ -2549,6 +2551,13 @@ meter_tc_set_policer(ofproto_meter_id meter_id,
 uint32_t rate, burst;
 bool add_policer;
 int err;
+bool init;
+
+atomic_read_relaxed(_tc_init, );
+if (!init)
+return 0;
+else
+VLOG_WARN("Do not call meter_set before init.");

 if (!config->bands || config->n_bands < 1 ||
 config->bands[0].type != OFPMBT13_DROP) {
```


Simon Jones


Simon Jones  于2024年6月17日周一 11:13写道:

> I found another cause of this BUG:
> ```
> In public-ovs code, @netdev_offload_dpdk and @netdev_offload_tc is
> register in @netdev_register_flow_api_provider.
> The @netdev_register_flow_api_provider is called in init stage,
> like @dpdk_init__ and @netdev_initialize.
> After register, @netdev_offload_dpdk and @netdev_offload_tc is in
> @netdev_flow_apis.
>
> Then ovs-vswitchd run @bridge_run.
> In @bridge_run, call @netdev_assign_flow_api, then
> call rfa->flow_api->init_flow_api of all rfa of  @netdev_flow_apis.
> The rfa is like p0 netdevice of DPDK type, or mip0 netdevice of system
> type.
> If DPDK type, it's  @netdev_offload_dpdk's  init_flow_api.
> If system type, it's  @netdev_offload_tc's   init_flow_api.
>
> Then the add meter command comes, also call @bridge_run.
> In  @bridge_run, at last call @meter_offload_set, then
> call rfa->flow_api->meter_set  of all rfa of  @netdev_flow_apis.
>
> For this BUG.
> Happens when ovs-vswitchd restart.
> As bridge/port/meter is all stored in ovsdb.
> If meter configure called before port configure, then rfa->flow_api->meter_set
> will be called before rfa->flow_api->init_flow_api.
> Then BUG happens.
>
> ```
>
> 
> Simon Jones
>
>
> Simon Jones  于2024年6月17日周一 10:57写道:
>
>> Oh, I'm using ovs-2.17.2, and I found that there is no .meter_set api
>> implement in ovs-dpdk, which means only one .meter_set implement in TC.
>> ```
>> const struct netdev_flow_api netdev_offload_dpdk = {
>> .type = "dpdk_flow_api",
>> .flow_put = netdev_offload_dpdk_flow_put,
>> .flow_del = netdev_offload_dpdk_flow_del,
>> .init_flow_api = netdev_offload_dpdk_init_flow_api,
>> .uninit_flow_api = netdev_offload_dpdk_uninit_flow_api,
>> .flow_get = netdev_offload_dpdk_flow_get,
>> .flow_flush = netdev_offload_dpdk_flow_flush,
>> .hw_miss_packet_recover = netdev_offload_dpdk_hw_miss_packet_recover,
>> .flow_get_n_flows = netdev_offload_dpdk_get_n_flows,
>> };
>> ```
>> That's why public-ovs has NO BUG, because only one .meter_set implement
>> in TC.
>>
>> But I add .meter_set in dpdk linke this:
>> ```
>> const struct netdev_flow_api netdev_offload_dpdk = {
>> .type = "dpdk_flow_api",
>> .flow_put = netdev_offload_dpdk_flow_put,
>> .flow_del = netdev_offload_dpdk_flow_del,
>> .init_flow_api = netdev_offload_dpdk_init_flow_api,
>> .uninit_flow_api = netdev_offload_dpdk_uninit_flow_api,
>> .flow_get = netdev_offload_dpdk_flow_get,
>> .flow_flush = netdev_offload_dpdk_flow_flush,
>> .hw_miss_packet_recover = netdev_offload_dpdk_hw_miss_packet_recover,
>> .flow_get_n_flows = netdev_offload_dpdk_get_n_flows,
>> .meter_set = netdev_offload_dpdk_meter_set,
>> .meter_get = netdev_offload_dpdk_meter_get,
>> .meter_del = netdev_offload_dpdk_meter_del,
>> };
>> ```
>>
>> That's why I have BUG.
>>
>> So I think I should add some check...
>>
>> 
>> Simon Jones
>>
>>
>> Simon Jones  于2024年6月17日周一 10:06写道:
>>
>>> Oh, I'm using ovs-2.17.2, and I found that there is no .meter_set api
>>> implement in ovs-dpdk, which means only one .meter_set implement in TC.
>>> ```
>>> const struct netdev_flow_api netdev_offload_dpdk = {
>>> .type = "dpdk_flow_api",
>>> .flow_put = netdev_offload_dpdk_flow_put,
>>> .flow_del = netdev_offload_dpdk_flow_del,
>>> .init_flow_api = netdev_offload_dpdk_init_flow_api,
>>> .uninit_flow_api = netdev_offload_dpdk_uninit_flow_api,
>>> .flow_get = netdev_offload_dpdk_flow_get,
>>> .flow_flush = netdev_offload_dpdk_flow_flush,
>>> .hw_miss_packet_recover = netdev_offload_dpdk_hw_miss_packet_recover,
>>> .flow_get_n_flows = netdev_offload_dpdk_get_n_flows,
>>> };
>>> ```
>>> That's why public-ovs has NO BUG, because only one .meter_set implement
>>> in TC.
>>>
>>> But I add .meter_set in dpdk linke this:
>>> ```
>>> 

Re: [ovs-dev] [PATCH ovn v3 4/4] controller, northd: Add support for CT zone limits.

2024-06-17 Thread 0-day Robot
Bleep bloop.  Greetings Ales Musil, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Line is 84 characters long (recommended limit is 79)
#497 FILE: ovn-nb.xml:726:
  type='{"type": "integer", "minInteger": 0, "maxInteger": 
4294967295}'>

WARNING: Line is 86 characters long (recommended limit is 79)
#514 FILE: ovn-nb.xml:1137:
type='{"type": "integer", "minInteger": 0, "maxInteger": 
4294967295}'>

WARNING: Line is 84 characters long (recommended limit is 79)
#532 FILE: ovn-nb.xml:2811:
  type='{"type": "integer", "minInteger": 0, "maxInteger": 
4294967295}'>

Lines checked: 651, Warnings: 3, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v3 1/4] controller: Move CT zone handling into separate module.

2024-06-17 Thread 0-day Robot
Bleep bloop.  Greetings Ales Musil, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Comment with 'xxx' marker
#190 FILE: controller/ct-zone.c:146:
/* XXX Add method to limit zone assignment to logical router

WARNING: Comment with 'xxx' marker
#268 FILE: controller/ct-zone.c:224:
/* xxx This is wasteful to assign a zone to each port--even if no

WARNING: Comment with 'xxx' marker
#269 FILE: controller/ct-zone.c:225:
 * xxx security policy is applied. */

Lines checked: , Warnings: 3, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v2 9/9] selftests: openvswitch: add emit_sample test

2024-06-17 Thread Adrián Moreno
On Fri, Jun 14, 2024 at 01:07:33PM GMT, Aaron Conole wrote:
> Adrian Moreno  writes:
>
> > Add a test to verify sampling packets via psample works.
> >
> > In order to do that, create a subcommand in ovs-dpctl.py to listen to
> > on the psample multicast group and print samples.
> >
> > In order to also test simultaneous sFlow and psample actions and
> > packet truncation, add missing parsing support for "userspace" and
> > "trunc" actions.
>
> Maybe split that into a separate patch.  This has a bugfix and 3
> features being pushed in.  I know it's already getting long as a series,
> so maybe it's okay to fold the userspace attribute bugfix with the parse
> support (since it wasn't really usable before).
>

OK. Sounds reasonable.

> > Signed-off-by: Adrian Moreno 
> > ---
> >  .../selftests/net/openvswitch/openvswitch.sh  |  99 +++-
> >  .../selftests/net/openvswitch/ovs-dpctl.py| 112 +-
> >  2 files changed, 204 insertions(+), 7 deletions(-)
> >
> > diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh 
> > b/tools/testing/selftests/net/openvswitch/openvswitch.sh
> > index 5cae53543849..f6e0ae3f6424 100755
> > --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh
> > +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh
> > @@ -20,7 +20,8 @@ tests="
> > nat_related_v4  ip4-nat-related: ICMP related 
> > matches work with SNAT
> > netlink_checks  ovsnl: validate netlink attrs 
> > and settings
> > upcall_interfaces   ovs: test the upcall interfaces
> > -   drop_reason drop: test drop reasons are 
> > emitted"
> > +   drop_reason drop: test drop reasons are 
> > emitted
> > +   emit_sample emit_sample: Sampling packets 
> > with psample"
> >
> >  info() {
> >  [ $VERBOSE = 0 ] || echo $*
> > @@ -170,6 +171,19 @@ ovs_drop_reason_count()
> > return `echo "$perf_output" | grep "$pattern" | wc -l`
> >  }
> >
> > +ovs_test_flow_fails () {
> > +   ERR_MSG="Flow actions may not be safe on all matching packets"
> > +
> > +   PRE_TEST=$(dmesg | grep -c "${ERR_MSG}")
> > +   ovs_add_flow $@ &> /dev/null $@ && return 1
> > +   POST_TEST=$(dmesg | grep -c "${ERR_MSG}")
> > +
> > +   if [ "$PRE_TEST" == "$POST_TEST" ]; then
> > +   return 1
> > +   fi
> > +   return 0
> > +}
> > +
> >  usage() {
> > echo
> > echo "$0 [OPTIONS] [TEST]..."
> > @@ -184,6 +198,89 @@ usage() {
> > exit 1
> >  }
> >
> > +
> > +# emit_sample test
> > +# - use emit_sample to observe packets
> > +test_emit_sample() {
> > +   sbx_add "test_emit_sample" || return $?
> > +
> > +   # Add a datapath with per-vport dispatching.
> > +   ovs_add_dp "test_emit_sample" emit_sample -V 2:1 || return 1
> > +
> > +   info "create namespaces"
> > +   ovs_add_netns_and_veths "test_emit_sample" "emit_sample" \
> > +   client c0 c1 172.31.110.10/24 -u || return 1
> > +   ovs_add_netns_and_veths "test_emit_sample" "emit_sample" \
> > +   server s0 s1 172.31.110.20/24 -u || return 1
> > +
> > +   # Check if emit_sample actions can be configured.
> > +   ovs_add_flow "test_emit_sample" emit_sample \
> > +   'in_port(1),eth(),eth_type(0x0806),arp()' 'emit_sample(group=1)'
> > +   if [ $? == 1 ]; then
> > +   info "no support for emit_sample - skipping"
> > +   ovs_exit_sig
> > +   return $ksft_skip
> > +   fi
> > +
> > +   ovs_del_flows "test_emit_sample" emit_sample
> > +
> > +   # Allow ARP
> > +   ovs_add_flow "test_emit_sample" emit_sample \
> > +   'in_port(1),eth(),eth_type(0x0806),arp()' '2' || return 1
> > +   ovs_add_flow "test_emit_sample" emit_sample \
> > +   'in_port(2),eth(),eth_type(0x0806),arp()' '1' || return 1
> > +
> > +   # Test action verification.
> > +   OLDIFS=$IFS
> > +   IFS='*'
> > +   min_key='in_port(1),eth(),eth_type(0x0800),ipv4()'
> > +   for testcase in \
> > +   "cookie to 
> > large"*"emit_sample(group=1,cookie=1615141312111009080706050403020100)" \
> > +   "no group with cookie"*"emit_sample(cookie=abcd)" \
> > +   "no group"*"sample()";
> > +   do
> > +   set -- $testcase;
> > +   ovs_test_flow_fails "test_emit_sample" emit_sample $min_key $2
> > +   if [ $? == 1 ]; then
> > +   info "failed - $1"
> > +   return 1
> > +   fi
> > +   done
> > +   IFS=$OLDIFS
> > +
> > +   # Sample first 14 bytes of all traffic.
> > +   ovs_add_flow "test_emit_sample" emit_sample \
> > +   
> > "in_port(1),eth(),eth_type(0x0800),ipv4(src=172.31.110.10,proto=1),icmp()" 
> > "trunc(14),emit_sample(group=1,cookie=c0ffee),2"
> > +
> > +   # Sample all traffic. In this case, use a sample() action with both
> > +   # emit_sample and an upcall emulating simultaneous local sampling and
> > +   # sFlow / IPFIX.
> > +   nlpid=$(grep -E "listening on upcall packet 

Re: [ovs-dev] [PATCH net-next v2 6/9] net: openvswitch: store sampling probability in cb.

2024-06-17 Thread Adrián Moreno
On Fri, Jun 14, 2024 at 12:55:59PM GMT, Aaron Conole wrote:
> Adrian Moreno  writes:
>
> > The behavior of actions might not be the exact same if they are being
> > executed inside a nested sample action. Store the probability of the
> > parent sample action in the skb's cb area.
>
> What does that mean?
>

Emit action, for instance, needs the probability so that psample
consumers know what was the sampling rate applied. Also, the way we
should inform about packet drops (via kfree_skb_reason) changes (see
patch 7/9).

> > Use the probability in emit_sample to pass it down to psample.
> >
> > Signed-off-by: Adrian Moreno 
> > ---
> >  include/uapi/linux/openvswitch.h |  3 ++-
> >  net/openvswitch/actions.c| 25 ++---
> >  net/openvswitch/datapath.h   |  3 +++
> >  net/openvswitch/vport.c  |  1 +
> >  4 files changed, 28 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/uapi/linux/openvswitch.h 
> > b/include/uapi/linux/openvswitch.h
> > index a0e9dde0584a..9d675725fa2b 100644
> > --- a/include/uapi/linux/openvswitch.h
> > +++ b/include/uapi/linux/openvswitch.h
> > @@ -649,7 +649,8 @@ enum ovs_flow_attr {
> >   * Actions are passed as nested attributes.
> >   *
> >   * Executes the specified actions with the given probability on a 
> > per-packet
> > - * basis.
> > + * basis. Nested actions will be able to access the probability value of 
> > the
> > + * parent @OVS_ACTION_ATTR_SAMPLE.
> >   */
> >  enum ovs_sample_attr {
> > OVS_SAMPLE_ATTR_UNSPEC,
> > diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> > index 3b4dba0ded59..33f6d93ba5e4 100644
> > --- a/net/openvswitch/actions.c
> > +++ b/net/openvswitch/actions.c
> > @@ -1048,12 +1048,15 @@ static int sample(struct datapath *dp, struct 
> > sk_buff *skb,
> > struct nlattr *sample_arg;
> > int rem = nla_len(attr);
> > const struct sample_arg *arg;
> > +   u32 init_probability;
> > bool clone_flow_key;
> > +   int err;
> >
> > /* The first action is always 'OVS_SAMPLE_ATTR_ARG'. */
> > sample_arg = nla_data(attr);
> > arg = nla_data(sample_arg);
> > actions = nla_next(sample_arg, );
> > +   init_probability = OVS_CB(skb)->probability;
> >
> > if ((arg->probability != U32_MAX) &&
> > (!arg->probability || get_random_u32() > arg->probability)) {
> > @@ -1062,9 +1065,21 @@ static int sample(struct datapath *dp, struct 
> > sk_buff *skb,
> > return 0;
> > }
> >
> > +   if (init_probability) {
> > +   OVS_CB(skb)->probability = ((u64)OVS_CB(skb)->probability *
> > +   arg->probability / U32_MAX);
> > +   } else {
> > +   OVS_CB(skb)->probability = arg->probability;
> > +   }
> > +
>
> I'm confused by this.  Eventually, integer arithmetic will practically
> guarantee that nested sample() calls will go to 0.  So eventually, the
> test above will be impossible to meet mathematically.
>
> OTOH, you could argue that a 1% of 50% is low anyway, but it still would
> have a positive probability count, and still be possible for
> get_random_u32() call to match.
>

Using OVS's probability semantics, we can express probabilities as low
as (100/U32_MAX)% which is pretty low indeed. However, just because the
probability of executing the action is low I don't think we should not
report it.

Rethinking the integer arithmetics, it's true that we should avoid
hitting zero on the division, eg: nesting 6x 1% sampling rates will make
the result be zero which will make probability restoration fail on the
way back. Threrefore, the new probability should be at least 1.


> I'm not sure about this particular change.  Why do we need it?
>

Why do we need to propagate the probability down to nested "sample"
actions? or why do we need to store the probability in the cb area in
the first place?

The former: Just for correctness as only storing the last one would be
incorrect. Although I don't know of any use for nested "sample" actions.
The latter: To pass it down to psample so that sample receivers know how
the sampling rate applied (and, e.g: do throughput estimations like OVS
does with IPFIX).


> > clone_flow_key = !arg->exec;
> > -   return clone_execute(dp, skb, key, 0, actions, rem, last,
> > -clone_flow_key);
> > +   err = clone_execute(dp, skb, key, 0, actions, rem, last,
> > +   clone_flow_key);
> > +
> > +   if (!last)
>
> Is this right?  Don't we only want to set the probability on the last
> action?  Should the test be 'if (last)'?
>

This is restoring the parent's probability after the actions in the
current sample action have been executed.

If it was the last action there is no need to restore the probability
back to the parent's (or zero if it's there's only one level) since no
further action will require it. And more importantly, if it's the last
action, the packet gets free'ed inside that "branch" so we must not
access its memory.


> > + 

[ovs-dev] [PATCH ovn v3 4/4] controller, northd: Add support for CT zone limits.

2024-06-17 Thread Ales Musil
Add support for limiting the CT zone usage per Ls, LR or LSP.
When the limit is configured on logical switch it will also implicitly
set limits for all ports in that logical switch. The port configuration
can be overwritten individually and has priority over the whole logical
switch configuration.

The value 0 means unlimited, when the value is not specified it is
derived from OvS default CT limit specified for given OvS datapath.

Reported-at: https://bugzilla.redhat.com/2189924
Signed-off-by: Ales Musil 
---
v3: Rebase on top of latest main.
---
 NEWS|   3 +
 controller/ct-zone.c| 170 
 controller/ct-zone.h|  12 ++-
 controller/ovn-controller.c |  25 +-
 lib/ovn-util.c  |  17 
 lib/ovn-util.h  |   3 +
 northd/northd.c |   8 ++
 ovn-nb.xml  |  29 ++
 tests/ovn-controller.at |  99 +
 9 files changed, 345 insertions(+), 21 deletions(-)

diff --git a/NEWS b/NEWS
index 3bdc55172..22c1797e6 100644
--- a/NEWS
+++ b/NEWS
@@ -31,6 +31,9 @@ Post v24.03.0
 has been renamed to "options:ic-route-denylist" in order to comply with
 inclusive language guidelines. The previous name is still recognized to
 aid with backwards compatibility.
+  - Add support for CT zone limit that can be specified per LR
+(options:ct-zone-limit), LS (other_config:ct-zone-limit) or LSP
+(options:ct-zone-limit).
 
 OVN v24.03.0 - 01 Mar 2024
 --
diff --git a/controller/ct-zone.c b/controller/ct-zone.c
index 95faec2f1..820ec061e 100644
--- a/controller/ct-zone.c
+++ b/controller/ct-zone.c
@@ -34,6 +34,17 @@ static bool ct_zone_assign_unused(struct ct_zone_ctx *ctx,
 static bool ct_zone_remove(struct ct_zone_ctx *ctx, const char *name);
 static void ct_zone_add(struct ct_zone_ctx *ctx, const char *name,
 uint16_t zone, bool set_pending);
+static void ct_zone_limits_sync_per_dp(struct ct_zone_ctx *ctx,
+   const struct sbrec_datapath_binding *dp,
+   const char *name,
+   struct ovsdb_idl_index *pb_by_dp);
+static void ct_zone_limit_sync(struct ct_zone_ctx *ctx, const char *name,
+   int64_t limit);
+static int64_t ct_zone_get_dp_limit(const struct sbrec_datapath_binding *dp);
+static int64_t ct_zone_get_pb_limit(const struct sbrec_port_binding *pb);
+static int64_t ct_zone_limit_normalize(int64_t limit);
+static struct ovsrec_ct_zone *
+ct_zone_find_ovsrec(const struct ovsrec_datapath *dp, uint16_t zone_id);
 
 void
 ct_zones_restore(struct ct_zone_ctx *ctx,
@@ -196,11 +207,14 @@ ct_zones_update(const struct sset *local_lports,
 
 void
 ct_zones_commit(const struct ovsrec_bridge *br_int,
+const struct ovsrec_datapath *ovs_dp,
+struct ovsdb_idl_txn *ovs_idl_txn,
 struct shash *pending_ct_zones)
 {
 struct shash_node *iter;
 SHASH_FOR_EACH (iter, pending_ct_zones) {
 struct ct_zone_pending_entry *ctzpe = iter->data;
+struct ct_zone *ct_zone = >ct_zone;
 
 /* The transaction is open, so any pending entries in the
  * CT_ZONE_DB_QUEUED must be sent and any in CT_ZONE_DB_QUEUED
@@ -212,7 +226,7 @@ ct_zones_commit(const struct ovsrec_bridge *br_int,
 
 char *user_str = xasprintf("ct-zone-%s", iter->name);
 if (ctzpe->add) {
-char *zone_str = xasprintf("%"PRIu16, ctzpe->ct_zone.zone);
+char *zone_str = xasprintf("%"PRIu16, ct_zone->zone);
 struct smap_node *node =
 smap_get_node(_int->external_ids, user_str);
 if (!node || strcmp(node->value, zone_str)) {
@@ -227,6 +241,19 @@ ct_zones_commit(const struct ovsrec_bridge *br_int,
 }
 free(user_str);
 
+struct ovsrec_ct_zone *ovs_zone =
+ct_zone_find_ovsrec(ovs_dp, ct_zone->zone);
+if ((!ctzpe->add || ct_zone->limit < 0) && ovs_zone) {
+ovsrec_datapath_update_ct_zones_delkey(ovs_dp, ct_zone->zone);
+} else if (ctzpe->add && ct_zone->limit >= 0) {
+if (!ovs_zone) {
+ovs_zone = ovsrec_ct_zone_insert(ovs_idl_txn);
+ovsrec_datapath_update_ct_zones_setkey(ovs_dp, ct_zone->zone,
+   ovs_zone);
+}
+ovsrec_ct_zone_set_limit(ovs_zone, _zone->limit, 1);
+}
+
 ctzpe->state = CT_ZONE_DB_SENT;
 }
 }
@@ -247,8 +274,19 @@ ct_zones_pending_clear_commited(struct shash *pending)
 /* Returns "true" when there is no need for full recompute. */
 bool
 ct_zone_handle_dp_update(struct ct_zone_ctx *ctx,
- const struct sbrec_datapath_binding *dp)
+ const struct sbrec_datapath_binding *dp,
+ struct ovsdb_idl_index *pb_by_dp)
 

[ovs-dev] [PATCH ovn v3 3/4] controller: Prepare structure around CT zone limiting.

2024-06-17 Thread Ales Musil
In order to be able to store CT limits for specified zone, store the
zone inside separate struct instead of simap. This allows to add
the addition of limit without changing the whole infrastructure again.

This is a preparation step for the CT zone limits.

Signed-off-by: Ales Musil 
---
v3: Rebase on top of latest main.
v2: Fix NULL ptr deref.
---
 controller/ct-zone.c| 171 +---
 controller/ct-zone.h|  13 ++-
 controller/ofctrl.c |   2 +-
 controller/ovn-controller.c |  15 ++--
 controller/physical.c   |  17 ++--
 controller/physical.h   |   2 +-
 6 files changed, 128 insertions(+), 92 deletions(-)

diff --git a/controller/ct-zone.c b/controller/ct-zone.c
index e4f66a52a..95faec2f1 100644
--- a/controller/ct-zone.c
+++ b/controller/ct-zone.c
@@ -26,12 +26,14 @@ ct_zone_restore(const struct sbrec_datapath_binding_table 
*dp_table,
 struct ct_zone_ctx *ctx, const char *name, int zone);
 static void ct_zone_add_pending(struct shash *pending_ct_zones,
 enum ct_zone_pending_state state,
-int zone, bool add, const char *name);
+struct ct_zone *zone, bool add,
+const char *name);
 static int ct_zone_get_snat(const struct sbrec_datapath_binding *dp);
 static bool ct_zone_assign_unused(struct ct_zone_ctx *ctx,
   const char *zone_name, int *scan_start);
-static bool ct_zone_remove(struct ct_zone_ctx *ctx,
-   struct simap_node *ct_zone);
+static bool ct_zone_remove(struct ct_zone_ctx *ctx, const char *name);
+static void ct_zone_add(struct ct_zone_ctx *ctx, const char *name,
+uint16_t zone, bool set_pending);
 
 void
 ct_zones_restore(struct ct_zone_ctx *ctx,
@@ -47,7 +49,8 @@ ct_zones_restore(struct ct_zone_ctx *ctx,
 struct ct_zone_pending_entry *ctpe = pending_node->data;
 
 if (ctpe->add) {
-ct_zone_restore(dp_table, ctx, pending_node->name, ctpe->zone);
+ct_zone_restore(dp_table, ctx, pending_node->name,
+ctpe->ct_zone.zone);
 }
 }
 
@@ -91,7 +94,6 @@ void
 ct_zones_update(const struct sset *local_lports,
 const struct hmap *local_datapaths, struct ct_zone_ctx *ctx)
 {
-struct simap_node *ct_zone;
 int scan_start = 1;
 const char *user;
 struct sset all_users = SSET_INITIALIZER(_users);
@@ -132,12 +134,14 @@ ct_zones_update(const struct sset *local_lports,
 }
 
 /* Delete zones that do not exist in above sset. */
-SIMAP_FOR_EACH_SAFE (ct_zone, >current) {
-if (!sset_contains(_users, ct_zone->name)) {
-ct_zone_remove(ctx, ct_zone);
-} else if (!simap_find(_snat_zones, ct_zone->name)) {
-bitmap_set1(unreq_snat_zones_map, ct_zone->data);
-simap_put(_snat_zones, ct_zone->name, ct_zone->data);
+struct shash_node *node;
+SHASH_FOR_EACH_SAFE (node, >current) {
+struct ct_zone *ct_zone = node->data;
+if (!sset_contains(_users, node->name)) {
+ct_zone_remove(ctx, node->name);
+} else if (!simap_find(_snat_zones, node->name)) {
+bitmap_set1(unreq_snat_zones_map, ct_zone->zone);
+simap_put(_snat_zones, node->name, ct_zone->zone);
 }
 }
 
@@ -152,7 +156,7 @@ ct_zones_update(const struct sset *local_lports,
 struct simap_node *unreq_node;
 SIMAP_FOR_EACH_SAFE (unreq_node, _snat_zones) {
 if (unreq_node->data == snat_req_node->data) {
-simap_find_and_delete(>current, unreq_node->name);
+ct_zone_remove(ctx, unreq_node->name);
 simap_delete(_snat_zones, unreq_node);
 }
 }
@@ -163,26 +167,12 @@ ct_zones_update(const struct sset *local_lports,
 bitmap_set0(unreq_snat_zones_map, snat_req_node->data);
 }
 
-struct simap_node *node = simap_find(>current,
- snat_req_node->name);
-if (node) {
-if (node->data != snat_req_node->data) {
-/* Zone request has changed for this node. delete old entry and
- * create new one*/
-ct_zone_add_pending(>pending, CT_ZONE_OF_QUEUED,
-snat_req_node->data, true,
-snat_req_node->name);
-bitmap_set0(ctx->bitmap, node->data);
-}
-bitmap_set1(ctx->bitmap, snat_req_node->data);
-node->data = snat_req_node->data;
-} else {
-ct_zone_add_pending(>pending, CT_ZONE_OF_QUEUED,
-snat_req_node->data, true,
-snat_req_node->name);
-bitmap_set1(ctx->bitmap, snat_req_node->data);
-

[ovs-dev] [PATCH ovn v3 1/4] controller: Move CT zone handling into separate module.

2024-06-17 Thread Ales Musil
Move the CT zone handling specific bits into its own module. This
allows for easier changes done within the module and separates the
logic that is unrelated from ovn-controller.

Signed-off-by: Ales Musil 
---
v3: Rebase on top of latest main.
---
 controller/automake.mk  |   4 +-
 controller/ct-zone.c| 378 ++
 controller/ct-zone.h|  74 +++
 controller/ofctrl.c |   3 +-
 controller/ovn-controller.c | 393 +++-
 controller/ovn-controller.h |  21 +-
 controller/pinctrl.c|   2 +-
 tests/ovn.at|   4 +-
 8 files changed, 486 insertions(+), 393 deletions(-)
 create mode 100644 controller/ct-zone.c
 create mode 100644 controller/ct-zone.h

diff --git a/controller/automake.mk b/controller/automake.mk
index 1b1b3aeb1..ed93cfb3c 100644
--- a/controller/automake.mk
+++ b/controller/automake.mk
@@ -47,7 +47,9 @@ controller_ovn_controller_SOURCES = \
controller/mac-cache.h \
controller/mac-cache.c \
controller/statctrl.h \
-   controller/statctrl.c
+   controller/statctrl.c \
+   controller/ct-zone.h \
+   controller/ct-zone.c
 
 controller_ovn_controller_LDADD = lib/libovn.la $(OVS_LIBDIR)/libopenvswitch.la
 man_MANS += controller/ovn-controller.8
diff --git a/controller/ct-zone.c b/controller/ct-zone.c
new file mode 100644
index 0..3e37fedb6
--- /dev/null
+++ b/controller/ct-zone.c
@@ -0,0 +1,378 @@
+/* Copyright (c) 2024, Red Hat, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+
+#include "ct-zone.h"
+#include "local_data.h"
+#include "openvswitch/vlog.h"
+
+VLOG_DEFINE_THIS_MODULE(ct_zone);
+
+static void
+ct_zone_restore(const struct sbrec_datapath_binding_table *dp_table,
+struct ct_zone_ctx *ctx, const char *name, int zone);
+static void ct_zone_add_pending(struct shash *pending_ct_zones,
+enum ct_zone_pending_state state,
+int zone, bool add, const char *name);
+
+void
+ct_zones_restore(struct ct_zone_ctx *ctx,
+ const struct ovsrec_open_vswitch_table *ovs_table,
+ const struct sbrec_datapath_binding_table *dp_table,
+ const struct ovsrec_bridge *br_int)
+{
+memset(ctx->bitmap, 0, sizeof ctx->bitmap);
+bitmap_set1(ctx->bitmap, 0); /* Zone 0 is reserved. */
+
+struct shash_node *pending_node;
+SHASH_FOR_EACH (pending_node, >pending) {
+struct ct_zone_pending_entry *ctpe = pending_node->data;
+
+if (ctpe->add) {
+ct_zone_restore(dp_table, ctx, pending_node->name, ctpe->zone);
+}
+}
+
+const struct ovsrec_open_vswitch *cfg;
+cfg = ovsrec_open_vswitch_table_first(ovs_table);
+if (!cfg) {
+return;
+}
+
+if (!br_int) {
+/* If the integration bridge hasn't been defined, assume that
+ * any existing ct-zone definitions aren't valid. */
+return;
+}
+
+struct smap_node *node;
+SMAP_FOR_EACH (node, _int->external_ids) {
+if (strncmp(node->key, "ct-zone-", 8)) {
+continue;
+}
+
+const char *user = node->key + 8;
+if (!user[0]) {
+continue;
+}
+
+if (shash_find(>pending, user)) {
+continue;
+}
+
+unsigned int zone;
+if (!str_to_uint(node->value, 10, )) {
+continue;
+}
+
+ct_zone_restore(dp_table, ctx, user, zone);
+}
+}
+
+bool
+ct_zone_assign_unused(struct ct_zone_ctx *ctx, const char *zone_name,
+  int *scan_start)
+{
+/* We assume that there are 64K zones and that we own them all. */
+int zone = bitmap_scan(ctx->bitmap, 0, *scan_start, MAX_CT_ZONES + 1);
+if (zone == MAX_CT_ZONES + 1) {
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
+VLOG_WARN_RL(, "exhausted all ct zones");
+return false;
+}
+
+*scan_start = zone + 1;
+
+ct_zone_add_pending(>pending, CT_ZONE_OF_QUEUED,
+zone, true, zone_name);
+
+bitmap_set1(ctx->bitmap, zone);
+simap_put(>current, zone_name, zone);
+return true;
+}
+
+bool
+ct_zone_remove(struct ct_zone_ctx *ctx, const char *name)
+{
+struct simap_node *ct_zone = simap_find(>current, name);
+if (!ct_zone) {
+return false;
+}
+
+VLOG_DBG("removing ct zone %"PRId32" for '%s'", 

[ovs-dev] [PATCH ovn v3 2/4] controller: Further encapsulate the CT zone handling.

2024-06-17 Thread Ales Musil
Move more code into the new ct-zone module and encapsulate
functionality that is strictly related to CT zone handling.

Signed-off-by: Ales Musil 
---
v3: Rebase on top of latest main.
---
 controller/ct-zone.c| 156 +---
 controller/ct-zone.h|   8 +-
 controller/ovn-controller.c |  49 ++-
 3 files changed, 118 insertions(+), 95 deletions(-)

diff --git a/controller/ct-zone.c b/controller/ct-zone.c
index 3e37fedb6..e4f66a52a 100644
--- a/controller/ct-zone.c
+++ b/controller/ct-zone.c
@@ -27,6 +27,11 @@ ct_zone_restore(const struct sbrec_datapath_binding_table 
*dp_table,
 static void ct_zone_add_pending(struct shash *pending_ct_zones,
 enum ct_zone_pending_state state,
 int zone, bool add, const char *name);
+static int ct_zone_get_snat(const struct sbrec_datapath_binding *dp);
+static bool ct_zone_assign_unused(struct ct_zone_ctx *ctx,
+  const char *zone_name, int *scan_start);
+static bool ct_zone_remove(struct ct_zone_ctx *ctx,
+   struct simap_node *ct_zone);
 
 void
 ct_zones_restore(struct ct_zone_ctx *ctx,
@@ -82,47 +87,6 @@ ct_zones_restore(struct ct_zone_ctx *ctx,
 }
 }
 
-bool
-ct_zone_assign_unused(struct ct_zone_ctx *ctx, const char *zone_name,
-  int *scan_start)
-{
-/* We assume that there are 64K zones and that we own them all. */
-int zone = bitmap_scan(ctx->bitmap, 0, *scan_start, MAX_CT_ZONES + 1);
-if (zone == MAX_CT_ZONES + 1) {
-static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
-VLOG_WARN_RL(, "exhausted all ct zones");
-return false;
-}
-
-*scan_start = zone + 1;
-
-ct_zone_add_pending(>pending, CT_ZONE_OF_QUEUED,
-zone, true, zone_name);
-
-bitmap_set1(ctx->bitmap, zone);
-simap_put(>current, zone_name, zone);
-return true;
-}
-
-bool
-ct_zone_remove(struct ct_zone_ctx *ctx, const char *name)
-{
-struct simap_node *ct_zone = simap_find(>current, name);
-if (!ct_zone) {
-return false;
-}
-
-VLOG_DBG("removing ct zone %"PRId32" for '%s'", ct_zone->data,
- ct_zone->name);
-
-ct_zone_add_pending(>pending, CT_ZONE_OF_QUEUED,
-ct_zone->data, false, ct_zone->name);
-bitmap_set0(ctx->bitmap, ct_zone->data);
-simap_delete(>current, ct_zone);
-
-return true;
-}
-
 void
 ct_zones_update(const struct sset *local_lports,
 const struct hmap *local_datapaths, struct ct_zone_ctx *ctx)
@@ -170,7 +134,7 @@ ct_zones_update(const struct sset *local_lports,
 /* Delete zones that do not exist in above sset. */
 SIMAP_FOR_EACH_SAFE (ct_zone, >current) {
 if (!sset_contains(_users, ct_zone->name)) {
-ct_zone_remove(ctx, ct_zone->name);
+ct_zone_remove(ctx, ct_zone);
 } else if (!simap_find(_snat_zones, ct_zone->name)) {
 bitmap_set1(unreq_snat_zones_map, ct_zone->data);
 simap_put(_snat_zones, ct_zone->name, ct_zone->data);
@@ -277,12 +241,6 @@ ct_zones_commit(const struct ovsrec_bridge *br_int,
 }
 }
 
-int
-ct_zone_get_snat(const struct sbrec_datapath_binding *dp)
-{
-return smap_get_int(>external_ids, "snat-ct-zone", -1);
-}
-
 void
 ct_zones_pending_clear_commited(struct shash *pending)
 {
@@ -296,6 +254,108 @@ ct_zones_pending_clear_commited(struct shash *pending)
 }
 }
 
+/* Returns "true" when there is no need for full recompute. */
+bool
+ct_zone_handle_dp_update(struct ct_zone_ctx *ctx,
+ const struct sbrec_datapath_binding *dp)
+{
+int req_snat_zone = ct_zone_get_snat(dp);
+if (req_snat_zone == -1) {
+/* datapath snat ct zone is not set.  This condition will also hit
+ * when CMS clears the snat-ct-zone for the logical router.
+ * In this case there is no harm in using the previosly specified
+ * snat ct zone for this datapath.  Also it is hard to know
+ * if this option was cleared or if this option is never set. */
+return true;
+}
+
+const char *name = smap_get(>external_ids, "name");
+if (!name) {
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
+VLOG_ERR_RL(, "Missing name for datapath '"UUID_FMT"' skipping"
+"zone check.", UUID_ARGS(>header_.uuid));
+return true;
+}
+
+/* Check if the requested snat zone has changed for the datapath
+ * or not.  If so, then fall back to full recompute of
+ * ct_zone engine. */
+char *snat_dp_zone_key = alloc_nat_zone_key(name, "snat");
+struct simap_node *simap_node =
+simap_find(>current, snat_dp_zone_key);
+free(snat_dp_zone_key);
+if (!simap_node || simap_node->data != req_snat_zone) {
+/* There is no entry yet or the requested snat zone has changed.
+ * Trigger full recompute of 

[ovs-dev] [PATCH ovn v3 0/4] Add ability to limit CT entries per LS/LR/LSP

2024-06-17 Thread Ales Musil
Add ability that allows to set CT limits per logical switch, logical
router or logical switch port. When the limit is applied to logical
switch it will be implicitly set for all logical ports in the logical
switch. This can be overwritten individually per port.

To achieve this there is a small refactor of the CT zone handling logic
which allows us to get the zone limiting more easily.

Ales Musil (4):
  controller: Move CT zone handling into separate module.
  controller: Further encapsulate the CT zone handling.
  controller: Prepare structure around CT zone limiting.
  controller, northd: Add support for CT zone limits.

 NEWS|   3 +
 controller/automake.mk  |   4 +-
 controller/ct-zone.c| 605 
 controller/ct-zone.h|  89 ++
 controller/ofctrl.c |   5 +-
 controller/ovn-controller.c | 452 +++
 controller/ovn-controller.h |  21 +-
 controller/physical.c   |  17 +-
 controller/physical.h   |   2 +-
 controller/pinctrl.c|   2 +-
 lib/ovn-util.c  |  17 +
 lib/ovn-util.h  |   3 +
 northd/northd.c |   8 +
 ovn-nb.xml  |  29 ++
 tests/ovn-controller.at |  99 ++
 tests/ovn.at|   4 +-
 16 files changed, 918 insertions(+), 442 deletions(-)
 create mode 100644 controller/ct-zone.c
 create mode 100644 controller/ct-zone.h

-- 
2.45.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v2 4/9] net: psample: allow using rate as probability

2024-06-17 Thread Adrián Moreno
On Fri, Jun 14, 2024 at 05:11:30PM GMT, Simon Horman wrote:
> On Mon, Jun 03, 2024 at 08:56:38PM +0200, Adrian Moreno wrote:
> > Although not explicitly documented in the psample module itself, the
> > definition of PSAMPLE_ATTR_SAMPLE_RATE seems inherited from act_sample.
> >
> > Quoting tc-sample(8):
> > "RATE of 100 will lead to an average of one sampled packet out of every
> > 100 observed."
> >
> > With this semantics, the rates that we can express with an unsigned
> > 32-bits number are very unevenly distributed and concentrated towards
> > "sampling few packets".
> > For example, we can express a probability of 2.32E-8% but we
> > cannot express anything between 100% and 50%.
> >
> > For sampling applications that are capable of sampling a decent
> > amount of packets, this sampling rate semantics is not very useful.
> >
> > Add a new flag to the uAPI that indicates that the sampling rate is
> > expressed in scaled probability, this is:
> > - 0 is 0% probability, no packets get sampled.
> > - U32_MAX is 100% probability, all packets get sampled.
> >
> > Signed-off-by: Adrian Moreno 
>
> Hi Adrian,
>
> Would it be possible to add appropriate documentation for
> rate - both the original ratio variant, and the new probability
> variant - somewhere?
>

Hi Simon, thanks for the suggestion. Would the uapi header be a good
place for such documentation?

> That aside, this looks good to me.
>
> ...
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev