Re: [ovs-dev] [PATCH v29 0/8] Add offload support for sFlow

2024-07-01 Thread Chris Mi via dev

On 6/27/2024 1:24 PM, Chris Mi wrote:

On 6/18/2024 2:27 PM, Chris Mi wrote:

On 5/16/2024 5:09 AM, Ilya Maximets wrote:

On 3/26/24 03:30, Chris Mi wrote:

This patch set adds offload support for sFlow.

Psample is a genetlink channel for packet sampling. TC action 
act_sample

uses psample to send sampled packets to userspace.

When offloading sample action to TC, userspace creates a unique ID to
map sFlow action and tunnel info and passes this ID to kernel instead
of the sFlow info. psample will send this ID and sampled packet to
userspace. Using the ID, userspace can recover the sFlow info and send
sampled packet to the right sFlow monitoring host.


Hi, Chris, others.

I've been looking through the set and a psample in general for the past
few days.  There are few fairly minor issues in the set like UBsan crash
because of incorrect OVS_RETURNS_NONNULL annotation and a few style and
formatting issues, which are not hard to fix.  However, I encountered
an issue with a way tunnel metadata is recovered that I'm not sure we
can actually fix.

The algorithm of work in this patch set is described in the cover letter
above.  The key point is then packet-1 goes though MISS upcall we 
install

a new datapath flow into TC and we remember the tunnel metadata of the
packet-1 associated with a sample ID.  When the packet-2 hits the 
datapath

flow (tc actions), it gets sent to psample and ovs-vswitchd reads this
packet from psample netlink socket and presents it as an ACTION upcall.
Doing that it finds tunnel metadata using the sample ID and uses it as a
tunnel metadata for packet-2.  The key of the problem is that this is
metadata from packet-1 and it can be different from metadata of 
packet-2.


An example of this will be having two separate TCP streams going through
the same VxLAN tunnel.  For load balancing reasons most UDP tunnels
choose source port of the outer UDP header based on RSS or 5-tuple hash
of the inner packet.  This means that two TCP streams going through the
same tunnel will have different UDP source port in the outer header.

When a packet from a first stream hits a MISS upcall, datapath flow will
be installed and the sample ID will be associated with the metadata of
the first stream.  Chances are this datapath flow will not have exact
match on the source port of the tunnel metadata.  That means that a
packet from the second stream will successfully match on the installed
datapath flow and will go to psample with the ID of the first stream.
OVS will read it from the psample socket and send to sFlow collector
with metadata it found by the ID, i.e. with a tunnel metadata that
contains tunnel source port of the first TCP stream, which is incorrect.

The test below reproduces the issue.  It's not a real test, it always 
fails

on purpose and not very pretty, but what it does is that it creates a
tunnel, then starts netcat server on one side and sends two files to it
from the other side.  These two transfers are separate TCP 
connections, so

they use different source ports, that is causing the tunnel to choose
different UDP source ports for them.  After the test failure we can 
see in

the sflow.log that all the values for 'tunnel4_in_src_port' are the same
for both streams and some exchanges prior to that.  However, if offload
is disabled, the values will be different as they should.

So, unless we can ensure that packets from different streams do not 
match

on the same datapath flow, this schema doesn't work.

The only way to ensure that packets from different streams within the 
same

tunnel do not match on the same datapath flow is to always have an exact
match on the whole tunnel metadata.  But I don't think that is a good 
idea,
because this way we'll have a separate datapath flow per TCP stream, 
which
will be very bad for performance.  And it will be bad for hardware 
offload

since hardware NICs normally have a more limited amount of available
resources.

This leads us to conclusion that only non-tunneling traffic can be 
offloaded
this way.  For this to work we'll have to add an explicit match on 
tunnel

metadata being empty (can that be expressed in TC?)

But at this point a question arises if this even worth implementing, 
taking
into account that it requires a lot of infrastructure changes 
throughout all
the layers of OVS code just for sampling of non-tunnel packets, i.e. 
a very

narrow use case.

Also, my understanding was that there is a plan to offload other 
types of
userspace() action in the future, such as controller() action using 
the same
psample infrastructure.  But that will not be possible for the same 
reason.
In addition to tunnel metadata we will also have to provide 
connection tracking
information, which is even harder, because conntrack state is more 
dynamic and
is only actually known to the datapath.  psample can't supply this 
information

to the userspace.

Thoughts?


Hi Ilya,

Sorry for the late reply.

I applied your following diff and reproduced the issue

[ovs-dev] OVS "soft freeze" for 3.4 is in effect.

2024-07-01 Thread Ilya Maximets
Hi.  As described in Documentation/internals/release-process.rst, we are
in a "soft freeze" state:

   During the freeze, we ask committers to refrain from applying patches that
   add new features unless those patches were already being publicly discussed
   and reviewed before the freeze began.  Bug fixes are welcome at any time.
   Please propose and discuss exceptions on ovs-dev.

We should branch for version 3.4 in two weeks from now, on Monday, Jul 15.
With that, release should be on Thursday, Aug 15.


Currently, on the mailing list we have a few patches and patch sets that were
already discussed / reviewed to some extent, and we do also await new versions
for several patches and patch sets for which changes were requested.  These,
of course, can be accepted.

If there are some new patches that never been reviewed / not posted yet,
please, propose an exception in reply to this email and it can be discussed.

One potential exception I'm aware of is a non-RFC version of the local sampling
patch set:
  
https://patchwork.ozlabs.org/project/openvswitch/cover/20240605202337.2904041-1-amore...@redhat.com/
Design for this feature has been extensively discussed in the past few months
and the kernel counterpart for the feature is almost ready.  So, unless there
are objections, it should be safe to get this during soft freeze.  Of course,
assuming that the kernel patches will be accepted and we'll get a non-RFC
version of the patch set in a reasonable time for review.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v7 07/10] selftests: openvswitch: add psample action

2024-07-01 Thread Aaron Conole
Adrian Moreno  writes:

> Add sample and psample action support to ovs-dpctl.py.
>
> Refactor common attribute parsing logic into an external function.
>
> Signed-off-by: Adrian Moreno 
> ---

Reviewed-by: Aaron Conole 

>  .../selftests/net/openvswitch/ovs-dpctl.py| 162 +-
>  1 file changed, 161 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py 
> b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
> index 182a09975975..dcc400a21a22 100644
> --- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
> +++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
> @@ -8,6 +8,7 @@ import argparse
>  import errno
>  import ipaddress
>  import logging
> +import math
>  import multiprocessing
>  import re
>  import socket
> @@ -60,6 +61,7 @@ OVS_FLOW_CMD_DEL = 2
>  OVS_FLOW_CMD_GET = 3
>  OVS_FLOW_CMD_SET = 4
>  
> +UINT32_MAX = 0x
>  
>  def macstr(mac):
>  outstr = ":".join(["%02X" % i for i in mac])
> @@ -281,6 +283,75 @@ def parse_extract_field(
>  return str_skipped, data
>  
>  
> +def parse_attrs(actstr, attr_desc):
> +"""Parses the given action string and returns a list of netlink
> +attributes based on a list of attribute descriptions.
> +
> +Each element in the attribute description list is a tuple such as:
> +(name, attr_name, parse_func)
> +where:
> +name: is the string representing the attribute
> +attr_name: is the name of the attribute as defined in the uAPI.
> +parse_func: is a callable accepting a string and returning either
> +a single object (the parsed attribute value) or a tuple of
> +two values (the parsed attribute value and the remaining string)
> +
> +Returns a list of attributes and the remaining string.
> +"""
> +def parse_attr(actstr, key, func):
> +actstr = actstr[len(key) :]
> +
> +if not func:
> +return None, actstr
> +
> +delim = actstr[0]
> +actstr = actstr[1:]
> +
> +if delim == "=":
> +pos = strcspn(actstr, ",)")
> +ret = func(actstr[:pos])
> +else:
> +ret = func(actstr)
> +
> +if isinstance(ret, tuple):
> +(datum, actstr) = ret
> +else:
> +datum = ret
> +actstr = actstr[strcspn(actstr, ",)"):]
> +
> +if delim == "(":
> +if not actstr or actstr[0] != ")":
> +raise ValueError("Action contains unbalanced parentheses")
> +
> +actstr = actstr[1:]
> +
> +actstr = actstr[strspn(actstr, ", ") :]
> +
> +return datum, actstr
> +
> +attrs = []
> +attr_desc = list(attr_desc)
> +while actstr and actstr[0] != ")" and attr_desc:
> +found = False
> +for i, (key, attr, func) in enumerate(attr_desc):
> +if actstr.startswith(key):
> +datum, actstr = parse_attr(actstr, key, func)
> +attrs.append([attr, datum])
> +found = True
> +del attr_desc[i]
> +
> +if not found:
> +raise ValueError("Unknown attribute: '%s'" % actstr)
> +
> +actstr = actstr[strspn(actstr, ", ") :]
> +
> +if actstr[0] != ")":
> +raise ValueError("Action string contains extra garbage or has "
> + "unbalanced parenthesis: '%s'" % actstr)
> +
> +return attrs, actstr[1:]
> +
> +
>  class ovs_dp_msg(genlmsg):
>  # include the OVS version
>  # We need a custom header rather than just being able to rely on
> @@ -299,7 +370,7 @@ class ovsactions(nla):
>  ("OVS_ACTION_ATTR_SET", "ovskey"),
>  ("OVS_ACTION_ATTR_PUSH_VLAN", "none"),
>  ("OVS_ACTION_ATTR_POP_VLAN", "flag"),
> -("OVS_ACTION_ATTR_SAMPLE", "none"),
> +("OVS_ACTION_ATTR_SAMPLE", "sample"),
>  ("OVS_ACTION_ATTR_RECIRC", "uint32"),
>  ("OVS_ACTION_ATTR_HASH", "none"),
>  ("OVS_ACTION_ATTR_PUSH_MPLS", "none"),
> @@ -318,8 +389,85 @@ class ovsactions(nla):
>  ("OVS_ACTION_ATTR_ADD_MPLS", "none"),
>  ("OVS_ACTION_ATTR_DEC_TTL", "none"),
>  ("OVS_ACTION_ATTR_DROP", "uint32"),
> +("OVS_ACTION_ATTR_PSAMPLE", "psample"),
>  )
>  
> +class psample(nla):
> +nla_flags = NLA_F_NESTED
> +
> +nla_map = (
> +("OVS_PSAMPLE_ATTR_UNSPEC", "none"),
> +("OVS_PSAMPLE_ATTR_GROUP", "uint32"),
> +("OVS_PSAMPLE_ATTR_COOKIE", "array(uint8)"),
> +)
> +
> +def dpstr(self, more=False):
> +args = "group=%d" % self.get_attr("OVS_PSAMPLE_ATTR_GROUP")
> +
> +cookie = self.get_attr("OVS_PSAMPLE_ATTR_COOKIE")
> +if cookie:
> +args += ",cookie(%s)" % \
> +"".join(format(x, "02x") for x in cookie)
> +
> +return "psample(%s)" % args
> +
> +def parse(self, actstr):
> +desc = (
> +

Re: [ovs-dev] [PATCH net-next v7 10/10] selftests: openvswitch: add psample test

2024-07-01 Thread Aaron Conole
Adrian Moreno  writes:

> Add a test to verify sampling packets via psample works.
>
> In order to do that, create a subcommand in ovs-dpctl.py to listen to
> on the psample multicast group and print samples.
>
> Signed-off-by: Adrian Moreno 
> ---
>  .../selftests/net/openvswitch/openvswitch.sh  | 115 +-
>  .../selftests/net/openvswitch/ovs-dpctl.py|  73 ++-
>  2 files changed, 182 insertions(+), 6 deletions(-)
>
> diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh 
> b/tools/testing/selftests/net/openvswitch/openvswitch.sh
> index 15bca0708717..02a366e01004 100755
> --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh
> +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh
> @@ -20,7 +20,8 @@ tests="
>   nat_related_v4  ip4-nat-related: ICMP related 
> matches work with SNAT
>   netlink_checks  ovsnl: validate netlink attrs 
> and settings
>   upcall_interfaces   ovs: test the upcall interfaces
> - drop_reason drop: test drop reasons are 
> emitted"
> + drop_reason drop: test drop reasons are 
> emitted
> + psample psample: Sampling packets with 
> psample"
>  
>  info() {
>  [ $VERBOSE = 0 ] || echo $*
> @@ -102,12 +103,21 @@ ovs_netns_spawn_daemon() {
>   shift
>   netns=$1
>   shift
> - info "spawning cmd: $*"
> - ip netns exec $netns $*  >> $ovs_dir/stdout  2>> $ovs_dir/stderr &
> + if [ "$netns" == "_default" ]; then
> + $*  >> $ovs_dir/stdout  2>> $ovs_dir/stderr &
> + else
> + ip netns exec $netns $*  >> $ovs_dir/stdout  2>> 
> $ovs_dir/stderr &
> + fi
>   pid=$!
>   ovs_sbx "$sbx" on_exit "kill -TERM $pid 2>/dev/null"
>  }
>  
> +ovs_spawn_daemon() {
> + sbx=$1
> + shift
> + ovs_netns_spawn_daemon $sbx "_default" $*
> +}
> +
>  ovs_add_netns_and_veths () {
>   info "Adding netns attached: sbx:$1 dp:$2 {$3, $4, $5}"
>   ovs_sbx "$1" ip netns add "$3" || return 1
> @@ -170,6 +180,19 @@ ovs_drop_reason_count()
>   return `echo "$perf_output" | grep "$pattern" | wc -l`
>  }
>  
> +ovs_test_flow_fails () {
> + ERR_MSG="Flow actions may not be safe on all matching packets"
> +
> + PRE_TEST=$(dmesg | grep -c "${ERR_MSG}")
> + ovs_add_flow $@ &> /dev/null $@ && return 1
> + POST_TEST=$(dmesg | grep -c "${ERR_MSG}")
> +
> + if [ "$PRE_TEST" == "$POST_TEST" ]; then
> + return 1
> + fi
> + return 0
> +}
> +
>  usage() {
>   echo
>   echo "$0 [OPTIONS] [TEST]..."
> @@ -184,6 +207,92 @@ usage() {
>   exit 1
>  }
>  
> +
> +# psample test
> +# - use psample to observe packets
> +test_psample() {
> + sbx_add "test_psample" || return $?
> +
> + # Add a datapath with per-vport dispatching.
> + ovs_add_dp "test_psample" psample -V 2:1 || return 1
> +
> + info "create namespaces"
> + ovs_add_netns_and_veths "test_psample" "psample" \
> + client c0 c1 172.31.110.10/24 -u || return 1
> + ovs_add_netns_and_veths "test_psample" "psample" \
> + server s0 s1 172.31.110.20/24 -u || return 1
> +
> + # Check if psample actions can be configured.
> + ovs_add_flow "test_psample" psample \
> + 'in_port(1),eth(),eth_type(0x0806),arp()' 'psample(group=1)'

Might be good to redirect this stdout/stderr line to /dev/null -
otherwise on an unsupported system there will be the following extra
splat:

  Traceback (most recent call last):
File 
"/home/aconole/git/linux/tools/testing/selftests/net/openvswitch/ovs-dpctl.py", 
line 2774, in 
  sys.exit(main(sys.argv))
   ...
File "/usr/lib/python3.12/site-packages/pyroute2/netlink/nlsocket.py", line 
489, in get
  raise msg['header']['error']
  pyroute2.netlink.exceptions.NetlinkError: (22, 'Invalid argument')

> + if [ $? == 1 ]; then
> + info "no support for psample - skipping"
> + ovs_exit_sig
> + return $ksft_skip
> + fi
> +
> + ovs_del_flows "test_psample" psample
> +
> + # Test action verification.
> + OLDIFS=$IFS
> + IFS='*'
> + min_key='in_port(1),eth(),eth_type(0x0800),ipv4()'
> + for testcase in \
> + "cookie to 
> large"*"psample(group=1,cookie=1615141312111009080706050403020100)" \
> + "no group with cookie"*"psample(cookie=abcd)" \
> + "no group"*"psample()";
> + do
> + set -- $testcase;
> + ovs_test_flow_fails "test_psample" psample $min_key $2
> + if [ $? == 1 ]; then
> + info "failed - $1"
> + return 1
> + fi
> + done
> + IFS=$OLDIFS
> +
> + ovs_del_flows "test_psample" psample
> + # Allow ARP
> + ovs_add_flow "test_psample" psample \
> + 'in_port(1),eth(),eth_type(0x0806),arp()' '2' || return 1
> 

Re: [ovs-dev] [PATCH net-next v7 10/10] selftests: openvswitch: add psample test

2024-07-01 Thread Ilya Maximets
On 6/30/24 21:57, Adrian Moreno wrote:
> Add a test to verify sampling packets via psample works.
> 
> In order to do that, create a subcommand in ovs-dpctl.py to listen to
> on the psample multicast group and print samples.
> 
> Signed-off-by: Adrian Moreno 
> ---
>  .../selftests/net/openvswitch/openvswitch.sh  | 115 +-
>  .../selftests/net/openvswitch/ovs-dpctl.py|  73 ++-
>  2 files changed, 182 insertions(+), 6 deletions(-)


This version seems to work correctly with and without arguments.

Tested-by: Ilya Maximets 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v7 06/10] net: openvswitch: store sampling probability in cb.

2024-07-01 Thread Aaron Conole
Adrian Moreno  writes:

> When a packet sample is observed, the sampling rate that was used is
> important to estimate the real frequency of such event.
>
> Store the probability of the parent sample action in the skb's cb area
> and use it in psample action to pass it down to psample module.
>
> Acked-by: Eelco Chaudron 
> Reviewed-by: Ilya Maximets 
> Signed-off-by: Adrian Moreno 
> ---

Reviewed-by: Aaron Conole 

>  include/uapi/linux/openvswitch.h |  3 ++-
>  net/openvswitch/actions.c| 20 +---
>  net/openvswitch/datapath.h   |  3 +++
>  net/openvswitch/vport.c  |  1 +
>  4 files changed, 23 insertions(+), 4 deletions(-)
>
> diff --git a/include/uapi/linux/openvswitch.h 
> b/include/uapi/linux/openvswitch.h
> index 3dd653748725..3a701bd1f31b 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -649,7 +649,8 @@ enum ovs_flow_attr {
>   * Actions are passed as nested attributes.
>   *
>   * Executes the specified actions with the given probability on a per-packet
> - * basis.
> + * basis. Nested actions will be able to access the probability value of the
> + * parent @OVS_ACTION_ATTR_SAMPLE.
>   */
>  enum ovs_sample_attr {
>   OVS_SAMPLE_ATTR_UNSPEC,
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index a035b7e677dd..34af6bce4085 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -1048,12 +1048,15 @@ static int sample(struct datapath *dp, struct sk_buff 
> *skb,
>   struct nlattr *sample_arg;
>   int rem = nla_len(attr);
>   const struct sample_arg *arg;
> + u32 init_probability;
>   bool clone_flow_key;
> + int err;
>  
>   /* The first action is always 'OVS_SAMPLE_ATTR_ARG'. */
>   sample_arg = nla_data(attr);
>   arg = nla_data(sample_arg);
>   actions = nla_next(sample_arg, &rem);
> + init_probability = OVS_CB(skb)->probability;
>  
>   if ((arg->probability != U32_MAX) &&
>   (!arg->probability || get_random_u32() > arg->probability)) {
> @@ -1062,9 +1065,16 @@ static int sample(struct datapath *dp, struct sk_buff 
> *skb,
>   return 0;
>   }
>  
> + OVS_CB(skb)->probability = arg->probability;
> +
>   clone_flow_key = !arg->exec;
> - return clone_execute(dp, skb, key, 0, actions, rem, last,
> -  clone_flow_key);
> + err = clone_execute(dp, skb, key, 0, actions, rem, last,
> + clone_flow_key);
> +
> + if (!last)
> + OVS_CB(skb)->probability = init_probability;
> +
> + return err;
>  }
>  
>  /* When 'last' is true, clone() should always consume the 'skb'.
> @@ -1311,6 +1321,7 @@ static void execute_psample(struct datapath *dp, struct 
> sk_buff *skb,
>   struct psample_group psample_group = {};
>   struct psample_metadata md = {};
>   const struct nlattr *a;
> + u32 rate;
>   int rem;
>  
>   nla_for_each_attr(a, nla_data(attr), nla_len(attr), rem) {
> @@ -1329,8 +1340,11 @@ static void execute_psample(struct datapath *dp, 
> struct sk_buff *skb,
>   psample_group.net = ovs_dp_get_net(dp);
>   md.in_ifindex = OVS_CB(skb)->input_vport->dev->ifindex;
>   md.trunc_size = skb->len - OVS_CB(skb)->cutlen;
> + md.rate_as_probability = 1;
> +
> + rate = OVS_CB(skb)->probability ? OVS_CB(skb)->probability : U32_MAX;
>  
> - psample_sample_packet(&psample_group, skb, 0, &md);
> + psample_sample_packet(&psample_group, skb, rate, &md);
>  }
>  #else
>  static inline void execute_psample(struct datapath *dp, struct sk_buff *skb,
> diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
> index 0cd29971a907..9ca6231ea647 100644
> --- a/net/openvswitch/datapath.h
> +++ b/net/openvswitch/datapath.h
> @@ -115,12 +115,15 @@ struct datapath {
>   * fragmented.
>   * @acts_origlen: The netlink size of the flow actions applied to this skb.
>   * @cutlen: The number of bytes from the packet end to be removed.
> + * @probability: The sampling probability that was applied to this skb; 0 
> means
> + * no sampling has occurred; U32_MAX means 100% probability.
>   */
>  struct ovs_skb_cb {
>   struct vport*input_vport;
>   u16 mru;
>   u16 acts_origlen;
>   u32 cutlen;
> + u32 probability;
>  };
>  #define OVS_CB(skb) ((struct ovs_skb_cb *)(skb)->cb)
>  
> diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
> index 972ae01a70f7..8732f6e51ae5 100644
> --- a/net/openvswitch/vport.c
> +++ b/net/openvswitch/vport.c
> @@ -500,6 +500,7 @@ int ovs_vport_receive(struct vport *vport, struct sk_buff 
> *skb,
>   OVS_CB(skb)->input_vport = vport;
>   OVS_CB(skb)->mru = 0;
>   OVS_CB(skb)->cutlen = 0;
> + OVS_CB(skb)->probability = 0;
>   if (unlikely(dev_net(skb->dev) != ovs_dp_get_net(vport->dp))) {
>   u32 mark;


Re: [ovs-dev] [PATCH net-next v7 05/10] net: openvswitch: add psample action

2024-07-01 Thread Aaron Conole
Adrian Moreno  writes:

> Add support for a new action: psample.
>
> This action accepts a u32 group id and a variable-length cookie and uses
> the psample multicast group to make the packet available for
> observability.
>
> The maximum length of the user-defined cookie is set to 16, same as
> tc_cookie, to discourage using cookies that will not be offloadable.
>
> Acked-by: Eelco Chaudron 
> Signed-off-by: Adrian Moreno 
> ---

Hi Adrian,

Just some nits below.

>  Documentation/netlink/specs/ovs_flow.yaml | 17 
>  include/uapi/linux/openvswitch.h  | 28 ++
>  net/openvswitch/Kconfig   |  1 +
>  net/openvswitch/actions.c | 47 +++
>  net/openvswitch/flow_netlink.c| 32 ++-
>  5 files changed, 124 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/netlink/specs/ovs_flow.yaml 
> b/Documentation/netlink/specs/ovs_flow.yaml
> index 4fdfc6b5cae9..46f5d1cd8a5f 100644
> --- a/Documentation/netlink/specs/ovs_flow.yaml
> +++ b/Documentation/netlink/specs/ovs_flow.yaml
> @@ -727,6 +727,12 @@ attribute-sets:
>  name: dec-ttl
>  type: nest
>  nested-attributes: dec-ttl-attrs
> +  -
> +name: psample
> +type: nest
> +nested-attributes: psample-attrs
> +doc: |
> +  Sends a packet sample to psample for external observation.
>-
>  name: tunnel-key-attrs
>  enum-name: ovs-tunnel-key-attr
> @@ -938,6 +944,17 @@ attribute-sets:
>-
>  name: gbp
>  type: u32
> +  -
> +name: psample-attrs
> +enum-name: ovs-psample-attr
> +name-prefix: ovs-psample-attr-
> +attributes:
> +  -
> +name: group
> +type: u32
> +  -
> +name: cookie
> +type: binary
>  
>  operations:
>name-prefix: ovs-flow-cmd-
> diff --git a/include/uapi/linux/openvswitch.h 
> b/include/uapi/linux/openvswitch.h
> index efc82c318fa2..3dd653748725 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -914,6 +914,31 @@ struct check_pkt_len_arg {
>  };
>  #endif
>  
> +#define OVS_PSAMPLE_COOKIE_MAX_SIZE 16
> +/**
> + * enum ovs_psample_attr - Attributes for %OVS_ACTION_ATTR_PSAMPLE
> + * action.
> + *
> + * @OVS_PSAMPLE_ATTR_GROUP: 32-bit number to identify the source of the
> + * sample.
> + * @OVS_PSAMPLE_ATTR_COOKIE: An optional variable-length binary cookie that
> + * contains user-defined metadata. The maximum length is
> + * OVS_PSAMPLE_COOKIE_MAX_SIZE bytes.
> + *
> + * Sends the packet to the psample multicast group with the specified group 
> and
> + * cookie. It is possible to combine this action with the
> + * %OVS_ACTION_ATTR_TRUNC action to limit the size of the sample.
> + */
> +enum ovs_psample_attr {
> + OVS_PSAMPLE_ATTR_GROUP = 1, /* u32 number. */
> + OVS_PSAMPLE_ATTR_COOKIE,/* Optional, user specified cookie. */
> +
> + /* private: */
> + __OVS_PSAMPLE_ATTR_MAX
> +};
> +
> +#define OVS_PSAMPLE_ATTR_MAX (__OVS_PSAMPLE_ATTR_MAX - 1)
> +
>  /**
>   * enum ovs_action_attr - Action types.
>   *
> @@ -966,6 +991,8 @@ struct check_pkt_len_arg {
>   * of l3 tunnel flag in the tun_flags field of OVS_ACTION_ATTR_ADD_MPLS
>   * argument.
>   * @OVS_ACTION_ATTR_DROP: Explicit drop action.
> + * @OVS_ACTION_ATTR_PSAMPLE: Send a sample of the packet to external 
> observers
> + * via psample.
>   *
>   * Only a single header can be set with a single %OVS_ACTION_ATTR_SET.  Not 
> all
>   * fields within a header are modifiable, e.g. the IPv4 protocol and fragment
> @@ -1004,6 +1031,7 @@ enum ovs_action_attr {
>   OVS_ACTION_ATTR_ADD_MPLS, /* struct ovs_action_add_mpls. */
>   OVS_ACTION_ATTR_DEC_TTL,  /* Nested OVS_DEC_TTL_ATTR_*. */
>   OVS_ACTION_ATTR_DROP, /* u32 error code. */
> + OVS_ACTION_ATTR_PSAMPLE,  /* Nested OVS_PSAMPLE_ATTR_*. */
>  
>   __OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
>  * from userspace. */
> diff --git a/net/openvswitch/Kconfig b/net/openvswitch/Kconfig
> index 29a7081858cd..2535f3f9f462 100644
> --- a/net/openvswitch/Kconfig
> +++ b/net/openvswitch/Kconfig
> @@ -10,6 +10,7 @@ config OPENVSWITCH
>  (NF_CONNTRACK && ((!NF_DEFRAG_IPV6 || NF_DEFRAG_IPV6) && \
>(!NF_NAT || NF_NAT) && \
>(!NETFILTER_CONNCOUNT || 
> NETFILTER_CONNCOUNT)))
> + depends on PSAMPLE || !PSAMPLE
>   select LIBCRC32C
>   select MPLS
>   select NET_MPLS_GSO
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 964225580824..a035b7e677dd 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -24,6 +24,11 @@
>  #include 
>  #include 
>  #include 
> +
> +#if IS_ENABLED(CONFIG_PSAMPLE)
> +#include 
> +#endif
> +
>  #include 
>  
>  #include "datapath.h"
> @@ -1299,6 +1304,39 @@ static int ex

Re: [ovs-dev] [PATCH net-next v7 05/10] net: openvswitch: add psample action

2024-07-01 Thread Ilya Maximets
On 6/30/24 21:57, Adrian Moreno wrote:
> Add support for a new action: psample.
> 
> This action accepts a u32 group id and a variable-length cookie and uses
> the psample multicast group to make the packet available for
> observability.
> 
> The maximum length of the user-defined cookie is set to 16, same as
> tc_cookie, to discourage using cookies that will not be offloadable.
> 
> Acked-by: Eelco Chaudron 
> Signed-off-by: Adrian Moreno 
> ---
>  Documentation/netlink/specs/ovs_flow.yaml | 17 
>  include/uapi/linux/openvswitch.h  | 28 ++
>  net/openvswitch/Kconfig   |  1 +
>  net/openvswitch/actions.c | 47 +++
>  net/openvswitch/flow_netlink.c| 32 ++-
>  5 files changed, 124 insertions(+), 1 deletion(-)

Thanks for addressing the comments!  The new name also
seems reasonable to me.

Reviewed-by: Ilya Maximets 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v7 04/10] net: psample: allow using rate as probability

2024-07-01 Thread Aaron Conole
Adrian Moreno  writes:

> Although not explicitly documented in the psample module itself, the
> definition of PSAMPLE_ATTR_SAMPLE_RATE seems inherited from act_sample.
>
> Quoting tc-sample(8):
> "RATE of 100 will lead to an average of one sampled packet out of every
> 100 observed."
>
> With this semantics, the rates that we can express with an unsigned
> 32-bits number are very unevenly distributed and concentrated towards
> "sampling few packets".
> For example, we can express a probability of 2.32E-8% but we
> cannot express anything between 100% and 50%.
>
> For sampling applications that are capable of sampling a decent
> amount of packets, this sampling rate semantics is not very useful.
>
> Add a new flag to the uAPI that indicates that the sampling rate is
> expressed in scaled probability, this is:
> - 0 is 0% probability, no packets get sampled.
> - U32_MAX is 100% probability, all packets get sampled.
>
> Acked-by: Eelco Chaudron 
> Reviewed-by: Ido Schimmel 
> Signed-off-by: Adrian Moreno 
> ---

Reviewed-by: Aaron Conole 

>  include/net/psample.h|  3 ++-
>  include/uapi/linux/psample.h | 10 +-
>  net/psample/psample.c|  3 +++
>  3 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/include/net/psample.h b/include/net/psample.h
> index 2ac71260a546..c52e9ebd88dd 100644
> --- a/include/net/psample.h
> +++ b/include/net/psample.h
> @@ -24,7 +24,8 @@ struct psample_metadata {
>   u8 out_tc_valid:1,
>  out_tc_occ_valid:1,
>  latency_valid:1,
> -unused:5;
> +rate_as_probability:1,
> +unused:4;
>   const u8 *user_cookie;
>   u32 user_cookie_len;
>  };
> diff --git a/include/uapi/linux/psample.h b/include/uapi/linux/psample.h
> index e80637e1d97b..b765f0e81f20 100644
> --- a/include/uapi/linux/psample.h
> +++ b/include/uapi/linux/psample.h
> @@ -8,7 +8,11 @@ enum {
>   PSAMPLE_ATTR_ORIGSIZE,
>   PSAMPLE_ATTR_SAMPLE_GROUP,
>   PSAMPLE_ATTR_GROUP_SEQ,
> - PSAMPLE_ATTR_SAMPLE_RATE,
> + PSAMPLE_ATTR_SAMPLE_RATE,   /* u32, ratio between observed and
> +  * sampled packets or scaled probability
> +  * if PSAMPLE_ATTR_SAMPLE_PROBABILITY
> +  * is set.
> +  */
>   PSAMPLE_ATTR_DATA,
>   PSAMPLE_ATTR_GROUP_REFCOUNT,
>   PSAMPLE_ATTR_TUNNEL,
> @@ -20,6 +24,10 @@ enum {
>   PSAMPLE_ATTR_TIMESTAMP, /* u64, nanoseconds */
>   PSAMPLE_ATTR_PROTO, /* u16 */
>   PSAMPLE_ATTR_USER_COOKIE,   /* binary, user provided data */
> + PSAMPLE_ATTR_SAMPLE_PROBABILITY,/* no argument, interpret rate in
> +  * PSAMPLE_ATTR_SAMPLE_RATE as a
> +  * probability scaled 0 - U32_MAX.
> +  */
>  
>   __PSAMPLE_ATTR_MAX
>  };
> diff --git a/net/psample/psample.c b/net/psample/psample.c
> index 1c76f3e48dcd..f48b5b9cd409 100644
> --- a/net/psample/psample.c
> +++ b/net/psample/psample.c
> @@ -497,6 +497,9 @@ void psample_sample_packet(struct psample_group *group, 
> struct sk_buff *skb,
>   md->user_cookie))
>   goto error;
>  
> + if (md->rate_as_probability)
> + nla_put_flag(skb, PSAMPLE_ATTR_SAMPLE_PROBABILITY);
> +
>   genlmsg_end(nl_skb, data);
>   genlmsg_multicast_netns(&psample_nl_family, group->net, nl_skb, 0,
>   PSAMPLE_NL_MCGRP_SAMPLE, GFP_ATOMIC);

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v7 03/10] net: psample: skip packet copy if no listeners

2024-07-01 Thread Aaron Conole
Adrian Moreno  writes:

> If nobody is listening on the multicast group, generating the sample,
> which involves copying packet data, seems completely unnecessary.
>
> Return fast in this case.
>
> Acked-by: Eelco Chaudron 
> Reviewed-by: Ido Schimmel 
> Reviewed-by: Simon Horman 
> Signed-off-by: Adrian Moreno 
> ---

Reviewed-by: Aaron Conole 

>  net/psample/psample.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/net/psample/psample.c b/net/psample/psample.c
> index b37488f426bc..1c76f3e48dcd 100644
> --- a/net/psample/psample.c
> +++ b/net/psample/psample.c
> @@ -376,6 +376,10 @@ void psample_sample_packet(struct psample_group *group, 
> struct sk_buff *skb,
>   void *data;
>   int ret;
>  
> + if (!genl_has_listeners(&psample_nl_family, group->net,
> + PSAMPLE_NL_MCGRP_SAMPLE))
> + return;
> +
>   meta_len = (in_ifindex ? nla_total_size(sizeof(u16)) : 0) +
>  (out_ifindex ? nla_total_size(sizeof(u16)) : 0) +
>  (md->out_tc_valid ? nla_total_size(sizeof(u16)) : 0) +

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v7 02/10] net: sched: act_sample: add action cookie to sample

2024-07-01 Thread Aaron Conole
Adrian Moreno  writes:

> If the action has a user_cookie, pass it along to the sample so it can
> be easily identified.
>
> Acked-by: Eelco Chaudron 
> Reviewed-by: Ido Schimmel 
> Signed-off-by: Adrian Moreno 
> ---

Reviewed-by: Aaron Conole 

>  net/sched/act_sample.c | 12 
>  1 file changed, 12 insertions(+)
>
> diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c
> index a69b53d54039..2ceb4d141b71 100644
> --- a/net/sched/act_sample.c
> +++ b/net/sched/act_sample.c
> @@ -167,7 +167,9 @@ TC_INDIRECT_SCOPE int tcf_sample_act(struct sk_buff *skb,
>  {
>   struct tcf_sample *s = to_sample(a);
>   struct psample_group *psample_group;
> + u8 cookie_data[TC_COOKIE_MAX_SIZE];
>   struct psample_metadata md = {};
> + struct tc_cookie *user_cookie;
>   int retval;
>  
>   tcf_lastuse_update(&s->tcf_tm);
> @@ -189,6 +191,16 @@ TC_INDIRECT_SCOPE int tcf_sample_act(struct sk_buff *skb,
>   if (skb_at_tc_ingress(skb) && tcf_sample_dev_ok_push(skb->dev))
>   skb_push(skb, skb->mac_len);
>  
> + rcu_read_lock();
> + user_cookie = rcu_dereference(a->user_cookie);
> + if (user_cookie) {
> + memcpy(cookie_data, user_cookie->data,
> +user_cookie->len);
> + md.user_cookie = cookie_data;
> + md.user_cookie_len = user_cookie->len;
> + }
> + rcu_read_unlock();
> +
>   md.trunc_size = s->truncate ? s->trunc_size : skb->len;
>   psample_sample_packet(psample_group, skb, s->rate, &md);

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v7 01/10] net: psample: add user cookie

2024-07-01 Thread Aaron Conole
Adrian Moreno  writes:

> Add a user cookie to the sample metadata so that sample emitters can
> provide more contextual information to samples.
>
> If present, send the user cookie in a new attribute:
> PSAMPLE_ATTR_USER_COOKIE.
>
> Acked-by: Eelco Chaudron 
> Reviewed-by: Simon Horman 
> Reviewed-by: Ido Schimmel 
> Signed-off-by: Adrian Moreno 
> ---

Reviewed-by: Aaron Conole 

>  include/net/psample.h| 2 ++
>  include/uapi/linux/psample.h | 1 +
>  net/psample/psample.c| 9 -
>  3 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/include/net/psample.h b/include/net/psample.h
> index 0509d2d6be67..2ac71260a546 100644
> --- a/include/net/psample.h
> +++ b/include/net/psample.h
> @@ -25,6 +25,8 @@ struct psample_metadata {
>  out_tc_occ_valid:1,
>  latency_valid:1,
>  unused:5;
> + const u8 *user_cookie;
> + u32 user_cookie_len;
>  };
>  
>  struct psample_group *psample_group_get(struct net *net, u32 group_num);
> diff --git a/include/uapi/linux/psample.h b/include/uapi/linux/psample.h
> index e585db5bf2d2..e80637e1d97b 100644
> --- a/include/uapi/linux/psample.h
> +++ b/include/uapi/linux/psample.h
> @@ -19,6 +19,7 @@ enum {
>   PSAMPLE_ATTR_LATENCY,   /* u64, nanoseconds */
>   PSAMPLE_ATTR_TIMESTAMP, /* u64, nanoseconds */
>   PSAMPLE_ATTR_PROTO, /* u16 */
> + PSAMPLE_ATTR_USER_COOKIE,   /* binary, user provided data */
>  
>   __PSAMPLE_ATTR_MAX
>  };
> diff --git a/net/psample/psample.c b/net/psample/psample.c
> index a5d9b8446f77..b37488f426bc 100644
> --- a/net/psample/psample.c
> +++ b/net/psample/psample.c
> @@ -386,7 +386,9 @@ void psample_sample_packet(struct psample_group *group, 
> struct sk_buff *skb,
>  nla_total_size(sizeof(u32)) +/* group_num */
>  nla_total_size(sizeof(u32)) +/* seq */
>  nla_total_size_64bit(sizeof(u64)) +  /* timestamp */
> -nla_total_size(sizeof(u16)); /* protocol */
> +nla_total_size(sizeof(u16)) +/* protocol */
> +(md->user_cookie_len ?
> + nla_total_size(md->user_cookie_len) : 0); /* user cookie */
>  
>  #ifdef CONFIG_INET
>   tun_info = skb_tunnel_info(skb);
> @@ -486,6 +488,11 @@ void psample_sample_packet(struct psample_group *group, 
> struct sk_buff *skb,
>   }
>  #endif
>  
> + if (md->user_cookie && md->user_cookie_len &&
> + nla_put(nl_skb, PSAMPLE_ATTR_USER_COOKIE, md->user_cookie_len,
> + md->user_cookie))
> + goto error;
> +
>   genlmsg_end(nl_skb, data);
>   genlmsg_multicast_netns(&psample_nl_family, group->net, nl_skb, 0,
>   PSAMPLE_NL_MCGRP_SAMPLE, GFP_ATOMIC);

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [ovs-discuss] [branch-23.09] OVN upgrade for distributed gateway -- need inputs

2024-07-01 Thread aginwala
Hello team:

Part of upgrading OVN from 2.11* versions to the latest 23.09, we have
found issues in existing north south gateways where failover doesn't work
as expected when connected to the latest 23.09 raft control plane.

OVS datapath version 2.16.0-2 on host. Tunneling protocol is stt
ovn-controller is running in a container with hostnetwork with version
23.09 on the north south gateways using ovs user space 3.2.*

Following the
https://docs.ovn.org/en/latest/intro/install/ovn-upgrades.html#fail-safe-upgrade
, even if setting northd version on the gateways, when one of the gateway's
ovn controller is stopped, there is no failover triggered.

For chassis, things work as expected where we see updates in ovn controller
logs for schema upgrade and data path works fine as expected for the
workloads scheduled on that chassis.

For n-s gateways, we don't see ovn-controller claiming lports  when
failover is triggered.
Also if there are any suggestions for interconnection gateway upgrade
procedure too, would be great.

Let us know for any suggestions/recommendations as we continue to debug.


Regards,
Aliasgar
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1 2/2] ovs-monitor-ipsec: LibreSwan v5 support.

2024-07-01 Thread Simon Horman
On Fri, Jun 28, 2024 at 12:24:06AM -0400, Mike Pattrick wrote:
> In version 5, LibreSwan made significant command line interface changes.
> This includes changing the order or command line parameters and removing
> the "ipsec auto" command.
> 
> To maintain compatibility with previous versions, the ipsec.d version
> check is repurposed for this. Checking the version proved simpler than
> removing use of auto.
> 
> There was also a change to ipsec status command that effected the tests.
> However, this change was backwards compatible.
> 
> Reported-at: https://issues.redhat.com/browse/FDP-645
> Reported-by: Ilya Maximets 
> Signed-off-by: Mike Pattrick 

Acked-by: Simon Horman 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1 1/2] ovs-monitor-ipsec: LibreSwan autodetect verion.

2024-07-01 Thread Simon Horman
On Fri, Jun 28, 2024 at 12:24:05AM -0400, Mike Pattrick wrote:
> Previously a change was made to LibreSwan necessitating the detection
> of version numbers. However, this change didn't properly account for all
> possible output from "ipsec version". When installed from the git
> repository, LibreSwan will report versions differently then when
> installed from a package.
> 
> Fixes: 3ddb31f60487 ("ovs-monitor-ipsec: LibreSwan autodetect paths.")
> Signed-off-by: Mike Pattrick 

Acked-by: Simon Horman 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v7 05/10] net: openvswitch: add psample action

2024-07-01 Thread Michal Kubiak
On Sun, Jun 30, 2024 at 09:57:26PM +0200, Adrian Moreno wrote:
> Add support for a new action: psample.
> 
> This action accepts a u32 group id and a variable-length cookie and uses
> the psample multicast group to make the packet available for
> observability.
> 
> The maximum length of the user-defined cookie is set to 16, same as
> tc_cookie, to discourage using cookies that will not be offloadable.
> 
> Acked-by: Eelco Chaudron 
> Signed-off-by: Adrian Moreno 
> ---
>  Documentation/netlink/specs/ovs_flow.yaml | 17 
>  include/uapi/linux/openvswitch.h  | 28 ++
>  net/openvswitch/Kconfig   |  1 +
>  net/openvswitch/actions.c | 47 +++
>  net/openvswitch/flow_netlink.c| 32 ++-
>  5 files changed, 124 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/netlink/specs/ovs_flow.yaml 
> b/Documentation/netlink/specs/ovs_flow.yaml
> index 4fdfc6b5cae9..46f5d1cd8a5f 100644
> --- a/Documentation/netlink/specs/ovs_flow.yaml
> +++ b/Documentation/netlink/specs/ovs_flow.yaml
> @@ -727,6 +727,12 @@ attribute-sets:
>  name: dec-ttl
>  type: nest
>  nested-attributes: dec-ttl-attrs
> +  -
> +name: psample
> +type: nest
> +nested-attributes: psample-attrs
> +doc: |
> +  Sends a packet sample to psample for external observation.
>-
>  name: tunnel-key-attrs
>  enum-name: ovs-tunnel-key-attr
> @@ -938,6 +944,17 @@ attribute-sets:
>-
>  name: gbp
>  type: u32
> +  -
> +name: psample-attrs
> +enum-name: ovs-psample-attr
> +name-prefix: ovs-psample-attr-
> +attributes:
> +  -
> +name: group
> +type: u32
> +  -
> +name: cookie
> +type: binary
>  
>  operations:
>name-prefix: ovs-flow-cmd-
> diff --git a/include/uapi/linux/openvswitch.h 
> b/include/uapi/linux/openvswitch.h
> index efc82c318fa2..3dd653748725 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -914,6 +914,31 @@ struct check_pkt_len_arg {
>  };
>  #endif
>  
> +#define OVS_PSAMPLE_COOKIE_MAX_SIZE 16

In your patch #2 you use "TC_COOKIE_MAX_SIZE" as an array size for your
cookie. I know that now OVS_PSAMPLE_COOKIE_MAX_SIZE == TC_COOKIE_MAX_SIZE,
so this size will be validated correctly.
But how likely is that those 2 constants will have different values in the
future?
Would it be reasonable to create more strict dependency between those
macros, e.g.:

#define OVS_PSAMPLE_COOKIE_MAX_SIZE TC_COOKIE_MAX_SIZE

or, at least, add a comment that the size shouldn't be bigger than
TC_COOKIE_MAX_SIZE?
I'm just considering the risk of exceeding the array from the patch #2 when
somebody increases OVS_PSAMPLE_COOKIE_MAX_SIZE in the future.

Thanks,
Michal

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v7 01/10] net: psample: add user cookie

2024-07-01 Thread Michal Kubiak
On Sun, Jun 30, 2024 at 09:57:22PM +0200, Adrian Moreno wrote:
> Add a user cookie to the sample metadata so that sample emitters can
> provide more contextual information to samples.
> 
> If present, send the user cookie in a new attribute:
> PSAMPLE_ATTR_USER_COOKIE.
> 
> Acked-by: Eelco Chaudron 
> Reviewed-by: Simon Horman 
> Reviewed-by: Ido Schimmel 
> Signed-off-by: Adrian Moreno 
> ---
> 

Reviewed-by: Michal Kubiak 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v7 05/10] net: openvswitch: add psample action

2024-07-01 Thread Adrián Moreno
On Mon, Jul 01, 2024 at 01:40:39PM GMT, Michal Kubiak wrote:
> On Sun, Jun 30, 2024 at 09:57:26PM +0200, Adrian Moreno wrote:
> > Add support for a new action: psample.
> >
> > This action accepts a u32 group id and a variable-length cookie and uses
> > the psample multicast group to make the packet available for
> > observability.
> >
> > The maximum length of the user-defined cookie is set to 16, same as
> > tc_cookie, to discourage using cookies that will not be offloadable.
> >
> > Acked-by: Eelco Chaudron 
> > Signed-off-by: Adrian Moreno 
> > ---
> >  Documentation/netlink/specs/ovs_flow.yaml | 17 
> >  include/uapi/linux/openvswitch.h  | 28 ++
> >  net/openvswitch/Kconfig   |  1 +
> >  net/openvswitch/actions.c | 47 +++
> >  net/openvswitch/flow_netlink.c| 32 ++-
> >  5 files changed, 124 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/netlink/specs/ovs_flow.yaml 
> > b/Documentation/netlink/specs/ovs_flow.yaml
> > index 4fdfc6b5cae9..46f5d1cd8a5f 100644
> > --- a/Documentation/netlink/specs/ovs_flow.yaml
> > +++ b/Documentation/netlink/specs/ovs_flow.yaml
> > @@ -727,6 +727,12 @@ attribute-sets:
> >  name: dec-ttl
> >  type: nest
> >  nested-attributes: dec-ttl-attrs
> > +  -
> > +name: psample
> > +type: nest
> > +nested-attributes: psample-attrs
> > +doc: |
> > +  Sends a packet sample to psample for external observation.
> >-
> >  name: tunnel-key-attrs
> >  enum-name: ovs-tunnel-key-attr
> > @@ -938,6 +944,17 @@ attribute-sets:
> >-
> >  name: gbp
> >  type: u32
> > +  -
> > +name: psample-attrs
> > +enum-name: ovs-psample-attr
> > +name-prefix: ovs-psample-attr-
> > +attributes:
> > +  -
> > +name: group
> > +type: u32
> > +  -
> > +name: cookie
> > +type: binary
> >
> >  operations:
> >name-prefix: ovs-flow-cmd-
> > diff --git a/include/uapi/linux/openvswitch.h 
> > b/include/uapi/linux/openvswitch.h
> > index efc82c318fa2..3dd653748725 100644
> > --- a/include/uapi/linux/openvswitch.h
> > +++ b/include/uapi/linux/openvswitch.h
> > @@ -914,6 +914,31 @@ struct check_pkt_len_arg {
> >  };
> >  #endif
> >
> > +#define OVS_PSAMPLE_COOKIE_MAX_SIZE 16
>
> In your patch #2 you use "TC_COOKIE_MAX_SIZE" as an array size for your
> cookie. I know that now OVS_PSAMPLE_COOKIE_MAX_SIZE == TC_COOKIE_MAX_SIZE,
> so this size will be validated correctly.
> But how likely is that those 2 constants will have different values in the
> future?
> Would it be reasonable to create more strict dependency between those
> macros, e.g.:
>
> #define OVS_PSAMPLE_COOKIE_MAX_SIZE TC_COOKIE_MAX_SIZE
>
> or, at least, add a comment that the size shouldn't be bigger than
> TC_COOKIE_MAX_SIZE?
> I'm just considering the risk of exceeding the array from the patch #2 when
> somebody increases OVS_PSAMPLE_COOKIE_MAX_SIZE in the future.
>
> Thanks,
> Michal
>

Hi Michal,

Thanks for sharing your thoughts.

I tried to keep the dependency between both cookie sizes loose.

I don't want a change in TC_COOKIE_MAX_SIZE to inadvertently modify
OVS_PSAMPLE_COOKIE_MAX_SIZE because OVS might not need a bigger cookie
and even if it does, backwards compatibility needs to be guaranteed:
meaning OVS userspace will have to detect the new size and use it or
fall back to a smaller cookie for older kernels. All this needs to be
known and worked on in userspace.

On the other hand, I intentionally made OVS's "psample" action as
similar as possible to act_psample, including setting the same cookie
size to begin with. The reason is that I think we should try to implement
tc-flower offloading of this action using act_sample, plus 16 seemed a
very reasonable max value.

When we decide to support offloading the "psample" action, this must
be done entirely in userspace. OVS must create a act_sample action
(instead of the OVS "psample" one) via netlink. In no circumstances the
openvswitch kmod interacts with tc directly.

Now, back to your concern. If OVS_PSAMPLE_COOKIE_MAX_SIZE grows and
TC_COOKIE_MAX_SIZE does not *and* we already support offloading this
action to tc, the only consequence is that OVS userspace has a
problem because the tc's netlink interface will reject cookies larger
than TC_COOKIE_MAX_SIZE [1].
This guarantees that the array in patch #2 is never overflown.

OVS will have to deal with the different sizes and try to squeeze the
data into TC_COOKIE_MAX_SIZE or fail to offload the action altogether.

Psample does not have a size limit so different parts of the kernel can
use psample with different internal max-sizes without any restriction.

I hope this clears your concerns.

[1] https://github.com/torvalds/linux/blob/master/net/sched/act_api.c#L1299

Thanks.
Adrián

___
dev mailing list
d...@openvswitch.org
https://m

Re: [ovs-dev] [RFC v2 5/9] ofproto-dpif-xlate: Use emit_sample for local sample.

2024-07-01 Thread Eelco Chaudron


On 1 Jul 2024, at 14:22, Adrián Moreno wrote:

> On Mon, Jul 01, 2024 at 01:30:12PM GMT, Eelco Chaudron wrote:
>>
>>
>> On 28 Jun 2024, at 18:40, Adrián Moreno wrote:
>>
>>> On Wed, Jun 26, 2024 at 02:11:51PM GMT, Eelco Chaudron wrote:
 On 5 Jun 2024, at 22:23, Adrian Moreno wrote:

> Use the newly added emit_sample to implement OpenFlow sample() actions
> with local sampling configuration.

 See some comments below.

 Cheers,

 Eelco

> Signed-off-by: Adrian Moreno 
> ---
>  ofproto/ofproto-dpif-lsample.c |  17 
>  ofproto/ofproto-dpif-lsample.h |   6 ++
>  ofproto/ofproto-dpif-xlate.c   | 163 -
>  ofproto/ofproto-dpif-xlate.h   |   3 +-
>  ofproto/ofproto-dpif.c |   2 +-
>  5 files changed, 144 insertions(+), 47 deletions(-)
>
> diff --git a/ofproto/ofproto-dpif-lsample.c 
> b/ofproto/ofproto-dpif-lsample.c
> index 7bdafac19..2c0b5da89 100644
> --- a/ofproto/ofproto-dpif-lsample.c
> +++ b/ofproto/ofproto-dpif-lsample.c
> @@ -139,6 +139,23 @@ dpif_lsample_set_options(struct dpif_lsample 
> *lsample,
>  return changed;
>  }
>
> +/* Returns the group_id of the exporter with the given collector_set_id, 
> if it
> + * exists. */

 nit: The below will fit on one line

 /* Returns the exporter group_id for a given collector_set_id, if it 
 exists. */

>>>
>>> Ack.
>>>
> +bool
> +dpif_lsample_get_group_id(struct dpif_lsample *ps, uint32_t 
> collector_set_id,
> +  uint32_t *group_id)
> +{
> +struct lsample_exporter_node *node;
> +bool found = false;
> +
> +node = dpif_lsample_find_exporter_node(ps, collector_set_id);
> +if (node) {
> +found = true;
> +*group_id = node->exporter.options.group_id;
> +}
> +return found;
> +}
> +
>  struct dpif_lsample *
>  dpif_lsample_create(void)
>  {
> diff --git a/ofproto/ofproto-dpif-lsample.h 
> b/ofproto/ofproto-dpif-lsample.h
> index c23ea8372..f13425575 100644
> --- a/ofproto/ofproto-dpif-lsample.h
> +++ b/ofproto/ofproto-dpif-lsample.h
> @@ -19,6 +19,7 @@
>
>  #include 
>  #include 
> +#include 

 Maybe add in alphabetical order.

>>>
>>> Ack.
>>>
>  struct dpif_lsample;
>  struct ofproto_lsample_options;
> @@ -31,4 +32,9 @@ bool dpif_lsample_set_options(struct dpif_lsample *,
>const struct ofproto_lsample_options *,
>size_t n_opts);
>
> +bool dpif_lsample_get_group_id(struct dpif_lsample *,
> +   uint32_t collector_set_id,
> +   uint32_t *group_id);
> +
>  #endif /* OFPROTO_DPIF_LSAMPLE_H */
> +

 Accedantely added a newline?

>>>
>>> Ack.
>>>
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index 7c4950895..5bd215d31 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -47,6 +47,7 @@
>  #include "ofproto/ofproto-dpif-ipfix.h"
>  #include "ofproto/ofproto-dpif-mirror.h"
>  #include "ofproto/ofproto-dpif-monitor.h"
> +#include "ofproto/ofproto-dpif-lsample.h"
>  #include "ofproto/ofproto-dpif-sflow.h"
>  #include "ofproto/ofproto-dpif-trace.h"
>  #include "ofproto/ofproto-dpif-xlate-cache.h"
> @@ -117,6 +118,7 @@ struct xbridge {
>  struct dpif_sflow *sflow; /* SFlow handle, or null. */
>  struct dpif_ipfix *ipfix; /* Ipfix handle, or null. */
>  struct netflow *netflow;  /* Netflow handle, or null. */
> +struct dpif_lsample *lsample; /* Local sample handle, or null. */
>  struct stp *stp;  /* STP or null if disabled. */
>  struct rstp *rstp;/* RSTP or null if disabled. */
>
> @@ -687,6 +689,7 @@ static void xlate_xbridge_set(struct xbridge *, 
> struct dpif *,
>const struct dpif_sflow *,
>const struct dpif_ipfix *,
>const struct netflow *,
> +  const struct dpif_lsample *,
>bool forward_bpdu, bool has_in_band,
>const struct dpif_backer_support *,
>const struct xbridge_addr *);
> @@ -1070,6 +1073,7 @@ xlate_xbridge_set(struct xbridge *xbridge,
>const struct dpif_sflow *sflow,
>const struct dpif_ipfix *ipfix,
>const struct netflow *netflow,
> +  const struct dpif_lsample *lsample,

 nit: I would move above netflow, as you also do the actual init/unref 
>>>

[ovs-dev] [PATCH 5/5] conntrack: Use a per zone default limit.

2024-07-01 Thread Paolo Valerio
Before this change the default limit, instead of being considered
per-zone, was considered as a global value that every new entry was
checked against during the creation. This was not the intended
behavior as the default limit should be inherited by each zone instead
of being an aggregate number.

This change corrects that by removing the default limit from the cmap
and making it global (atomic). Now, whenever a new connection needs to
be committed, if default_zone_limit is set and the entry for the zone
doesn't exist, a new entry for that zone is lazily created, marked as
default. All subsequent packets for that zone will undergo the regular
lookup process.

Operations such as creation/deletion are modified accordingly taking
into account this new behavior.

Worth noting that OVS_REQUIRES(ct->ct_lock) is not a strict
requirement for zone_limit_lookup_or_default(), however since the
function operates under the lock and it can create an entry in the
slow path, the lock requirement is enforced in order to make thread
safety checks work. The function can still be moved outside the
creation lock or any lock, keeping the fastpath lockless (turning
zone_limit_lookup_protected() to its unprotected version) and locking
only in the slow path (replacing zone_limit_create__() with
zone_limit_create__().

The patch also extends `conntrack - limit by zone` test in order to
check the behavior, and while at it, update test's packet-out to use
compose-packet function.

Fixes: a7f33fdbfb67 ("conntrack: Support zone limits.")
Reported-at: https://issues.redhat.com/browse/FDP-122
Reported-by: Ilya Maximets 
Signed-off-by: Paolo Valerio 
---
 lib/conntrack-private.h |   7 +-
 lib/conntrack.c | 233 +++-
 tests/system-traffic.at |  64 +++
 3 files changed, 231 insertions(+), 73 deletions(-)

diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index 2770470d1..46b212754 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -198,10 +198,11 @@ enum ct_ephemeral_range {
 #define FOR_EACH_PORT_IN_RANGE(curr, min, max) \
 FOR_EACH_PORT_IN_RANGE__(curr, min, max, OVS_JOIN(idx, __COUNTER__))
 
+#define ZONE_LIMIT_CONN_DEFAULT -1
 
 struct conntrack_zone_limit {
 int32_t zone;
-atomic_uint32_t limit;
+atomic_int64_t limit;
 atomic_count count;
 uint32_t zone_limit_seq; /* Used to disambiguate zone limit counts. */
 };
@@ -212,6 +213,9 @@ struct conntrack {
 struct rculist exp_lists[N_EXP_LISTS] OVS_GUARDED;
 struct cmap zone_limits OVS_GUARDED;
 struct cmap timeout_policies OVS_GUARDED;
+uint32_t zone_limit_seq OVS_GUARDED; /* Used to disambiguate zone limit
+  * counts. */
+atomic_uint32_t default_zone_limit;
 
 uint32_t hash_basis; /* Salt for hashing a connection key. */
 pthread_t clean_thread; /* Periodically cleans up connection tracker. */
@@ -234,7 +238,6 @@ struct conntrack {
  * control context.  */
 
 struct ipf *ipf; /* Fragmentation handling context. */
-uint32_t zone_limit_seq; /* Used to disambiguate zone limit counts. */
 atomic_bool tcp_seq_chk; /* Check TCP sequence numbers. */
 atomic_uint32_t sweep_ms; /* Next sweep interval. */
 };
diff --git a/lib/conntrack.c b/lib/conntrack.c
index ac0790e11..0e128a0c6 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -270,6 +270,7 @@ conntrack_init(void)
 atomic_init(&ct->n_conn_limit, DEFAULT_N_CONN_LIMIT);
 atomic_init(&ct->tcp_seq_chk, true);
 atomic_init(&ct->sweep_ms, 2);
+atomic_init(&ct->default_zone_limit, 0);
 latch_init(&ct->clean_thread_exit);
 ct->clean_thread = ovs_thread_create("ct_clean", clean_thread_main, ct);
 ct->ipf = ipf_init();
@@ -296,6 +297,28 @@ zone_key_hash(int32_t zone, uint32_t basis)
 return hash;
 }
 
+static int64_t
+zone_limit_get_limit__(struct conntrack_zone_limit *czl)
+{
+int64_t limit;
+atomic_read_relaxed(&czl->limit, &limit);
+
+return limit;
+}
+
+static int64_t
+zone_limit_get_limit(struct conntrack *ct, struct conntrack_zone_limit *czl)
+{
+int64_t limit = zone_limit_get_limit__(czl);
+
+if (limit == ZONE_LIMIT_CONN_DEFAULT) {
+atomic_read_relaxed(&ct->default_zone_limit, &limit);
+limit = limit ? : -1;
+}
+
+return limit;
+}
+
 static struct zone_limit *
 zone_limit_lookup_protected(struct conntrack *ct, int32_t zone)
 OVS_REQUIRES(ct->ct_lock)
@@ -323,11 +346,56 @@ zone_limit_lookup(struct conntrack *ct, int32_t zone)
 return NULL;
 }
 
+static struct zone_limit *
+zone_limit_create__(struct conntrack *ct, int32_t zone, int64_t limit)
+OVS_REQUIRES(ct->ct_lock)
+{
+struct zone_limit *zl = NULL;
+
+if (zone > DEFAULT_ZONE && zone <= MAX_ZONE) {
+zl = xmalloc(sizeof *zl);
+atomic_init(&zl->czl.limit, limit);
+atomic_count_init(&zl->czl.count, 0);
+zl->czl.zone = zone;
+

[ovs-dev] [PATCH 4/5] conntrack: Turn zl local limit into atomic.

2024-07-01 Thread Paolo Valerio
while at it, changes struct zone_limit initialization in
zone_limit_create() in order to use atomic init operations instead of
relying on memset() which, although correctly initializes the struct,
is semantially not aware of atomics.

Signed-off-by: Paolo Valerio 
---
 lib/conntrack-private.h |  2 +-
 lib/conntrack.c | 19 ---
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index 2c625d710..2770470d1 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -201,7 +201,7 @@ enum ct_ephemeral_range {
 
 struct conntrack_zone_limit {
 int32_t zone;
-uint32_t limit;
+atomic_uint32_t limit;
 atomic_count count;
 uint32_t zone_limit_seq; /* Used to disambiguate zone limit counts. */
 };
diff --git a/lib/conntrack.c b/lib/conntrack.c
index 0481a8c8a..ac0790e11 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -341,7 +341,7 @@ zone_limit_get(struct conntrack *ct, int32_t zone)
 struct zone_limit *zl = zone_limit_lookup_or_default(ct, zone);
 if (zl) {
 czl.zone = zl->czl.zone;
-czl.limit = zl->czl.limit;
+atomic_read_relaxed(&zl->czl.limit, &czl.limit);
 czl.count = atomic_count_get(&zl->czl.count);
 }
 return czl;
@@ -358,8 +358,9 @@ zone_limit_create(struct conntrack *ct, int32_t zone, 
uint32_t limit)
 }
 
 if (zone >= DEFAULT_ZONE && zone <= MAX_ZONE) {
-zl = xzalloc(sizeof *zl);
-zl->czl.limit = limit;
+zl = xmalloc(sizeof *zl);
+atomic_init(&zl->czl.limit, limit);
+atomic_count_init(&zl->czl.count, 0);
 zl->czl.zone = zone;
 zl->czl.zone_limit_seq = ct->zone_limit_seq++;
 uint32_t hash = zone_key_hash(zone, ct->hash_basis);
@@ -376,7 +377,7 @@ zone_limit_update(struct conntrack *ct, int32_t zone, 
uint32_t limit)
 int err = 0;
 struct zone_limit *zl = zone_limit_lookup(ct, zone);
 if (zl) {
-zl->czl.limit = limit;
+atomic_store_relaxed(&zl->czl.limit, limit);
 VLOG_INFO("Changed zone limit of %u for zone %d", limit, zone);
 } else {
 ovs_mutex_lock(&ct->ct_lock);
@@ -916,12 +917,16 @@ conn_not_found(struct conntrack *ct, struct dp_packet 
*pkt,
 }
 
 if (commit) {
+uint32_t czl_limit;
 struct conn_key_node *fwd_key_node, *rev_key_node;
 struct zone_limit *zl = zone_limit_lookup_or_default(ct,
  ctx->key.zone);
-if (zl && atomic_count_get(&zl->czl.count) >= zl->czl.limit) {
-COVERAGE_INC(conntrack_zone_full);
-return nc;
+if (zl) {
+atomic_read_relaxed(&zl->czl.limit, &czl_limit);
+if (atomic_count_get(&zl->czl.count) >= czl_limit) {
+COVERAGE_INC(conntrack_zone_full);
+return nc;
+}
 }
 
 unsigned int n_conn_limit;
-- 
2.45.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 3/5] conntrack: Do not use atomics to report zones info.

2024-07-01 Thread Paolo Valerio
Atomics are not needed when reporting zone limits.
Remove the restriction by defining a non-atomic common structure
to report such data.
The change also access atomics using the related operations to
retrieve atomics reporting only the fields required by the requesting
level instead of relying of struct copy.

Signed-off-by: Paolo Valerio 
---
 lib/conntrack-private.h |  8 
 lib/conntrack.c | 11 ++-
 lib/conntrack.h |  9 -
 lib/dpif-netdev.c   |  6 +++---
 4 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index 6c65caa07..2c625d710 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -198,6 +198,14 @@ enum ct_ephemeral_range {
 #define FOR_EACH_PORT_IN_RANGE(curr, min, max) \
 FOR_EACH_PORT_IN_RANGE__(curr, min, max, OVS_JOIN(idx, __COUNTER__))
 
+
+struct conntrack_zone_limit {
+int32_t zone;
+uint32_t limit;
+atomic_count count;
+uint32_t zone_limit_seq; /* Used to disambiguate zone limit counts. */
+};
+
 struct conntrack {
 struct ovs_mutex ct_lock; /* Protects the following fields. */
 struct cmap conns[UINT16_MAX + 1] OVS_GUARDED;
diff --git a/lib/conntrack.c b/lib/conntrack.c
index e90ade32f..0481a8c8a 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -330,18 +330,19 @@ zone_limit_lookup_or_default(struct conntrack *ct, 
int32_t zone)
 return zl ? zl : zone_limit_lookup(ct, DEFAULT_ZONE);
 }
 
-struct conntrack_zone_limit
+struct conntrack_zone_info
 zone_limit_get(struct conntrack *ct, int32_t zone)
 {
-struct conntrack_zone_limit czl = {
+struct conntrack_zone_info czl = {
 .zone = DEFAULT_ZONE,
 .limit = 0,
-.count = ATOMIC_COUNT_INIT(0),
-.zone_limit_seq = 0,
+.count = 0,
 };
 struct zone_limit *zl = zone_limit_lookup_or_default(ct, zone);
 if (zl) {
-czl = zl->czl;
+czl.zone = zl->czl.zone;
+czl.limit = zl->czl.limit;
+czl.count = atomic_count_get(&zl->czl.count);
 }
 return czl;
 }
diff --git a/lib/conntrack.h b/lib/conntrack.h
index 13bb02ea9..c3136e955 100644
--- a/lib/conntrack.h
+++ b/lib/conntrack.h
@@ -115,11 +115,10 @@ struct conntrack_dump {
 uint16_t current_zone;
 };
 
-struct conntrack_zone_limit {
+struct conntrack_zone_info {
 int32_t zone;
 uint32_t limit;
-atomic_count count;
-uint32_t zone_limit_seq; /* Used to disambiguate zone limit counts. */
+unsigned int count;
 };
 
 struct timeout_policy {
@@ -161,8 +160,8 @@ int conntrack_set_sweep_interval(struct conntrack *ct, 
uint32_t ms);
 uint32_t conntrack_get_sweep_interval(struct conntrack *ct);
 bool conntrack_get_tcp_seq_chk(struct conntrack *ct);
 struct ipf *conntrack_ipf_ctx(struct conntrack *ct);
-struct conntrack_zone_limit zone_limit_get(struct conntrack *ct,
-   int32_t zone);
+struct conntrack_zone_info zone_limit_get(struct conntrack *ct,
+  int32_t zone);
 int zone_limit_update(struct conntrack *ct, int32_t zone, uint32_t limit);
 int zone_limit_delete(struct conntrack *ct, int32_t zone);
 
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index c7f9e1490..3fbfcfa2b 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -9729,7 +9729,7 @@ dpif_netdev_ct_get_limits(struct dpif *dpif,
struct ovs_list *zone_limits_reply)
 {
 struct dp_netdev *dp = get_dp_netdev(dpif);
-struct conntrack_zone_limit czl;
+struct conntrack_zone_info czl;
 
 if (!ovs_list_is_empty(zone_limits_request)) {
 struct ct_dpif_zone_limit *zone_limit;
@@ -9738,7 +9738,7 @@ dpif_netdev_ct_get_limits(struct dpif *dpif,
 if (czl.zone == zone_limit->zone || czl.zone == DEFAULT_ZONE) {
 ct_dpif_push_zone_limit(zone_limits_reply, zone_limit->zone,
 czl.limit,
-atomic_count_get(&czl.count));
+czl.count);
 } else {
 return EINVAL;
 }
@@ -9754,7 +9754,7 @@ dpif_netdev_ct_get_limits(struct dpif *dpif,
 czl = zone_limit_get(dp->conntrack, z);
 if (czl.zone == z) {
 ct_dpif_push_zone_limit(zone_limits_reply, z, czl.limit,
-atomic_count_get(&czl.count));
+czl.count);
 }
 }
 }
-- 
2.45.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 2/5] conntrack: Add zone limit coverage counter.

2024-07-01 Thread Paolo Valerio
Similarly to what it's done for conntrack_full, add
conntrack_zone_full increased when new entries are not added due to
reaching the zone limit.

Signed-off-by: Paolo Valerio 
---
 lib/conntrack.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index db44f8237..e90ade32f 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -50,6 +50,7 @@ COVERAGE_DEFINE(conntrack_full);
 COVERAGE_DEFINE(conntrack_l3csum_err);
 COVERAGE_DEFINE(conntrack_l4csum_err);
 COVERAGE_DEFINE(conntrack_lookup_natted_miss);
+COVERAGE_DEFINE(conntrack_zone_full);
 
 struct conn_lookup_ctx {
 struct conn_key key;
@@ -918,6 +919,7 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt,
 struct zone_limit *zl = zone_limit_lookup_or_default(ct,
  ctx->key.zone);
 if (zl && atomic_count_get(&zl->czl.count) >= zl->czl.limit) {
+COVERAGE_INC(conntrack_zone_full);
 return nc;
 }
 
-- 
2.45.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 1/5] conntrack: Correctly annotate conntrack member.

2024-07-01 Thread Paolo Valerio
While at it update no longer valid comment.

Signed-off-by: Paolo Valerio 
---
 lib/conntrack-private.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index 71367f211..6c65caa07 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -199,11 +199,12 @@ enum ct_ephemeral_range {
 FOR_EACH_PORT_IN_RANGE__(curr, min, max, OVS_JOIN(idx, __COUNTER__))
 
 struct conntrack {
-struct ovs_mutex ct_lock; /* Protects 2 following fields. */
+struct ovs_mutex ct_lock; /* Protects the following fields. */
 struct cmap conns[UINT16_MAX + 1] OVS_GUARDED;
-struct rculist exp_lists[N_EXP_LISTS];
+struct rculist exp_lists[N_EXP_LISTS] OVS_GUARDED;
 struct cmap zone_limits OVS_GUARDED;
 struct cmap timeout_policies OVS_GUARDED;
+
 uint32_t hash_basis; /* Salt for hashing a connection key. */
 pthread_t clean_thread; /* Periodically cleans up connection tracker. */
 struct latch clean_thread_exit; /* To destroy the 'clean_thread'. */
-- 
2.45.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 0/5] Fix default zone limit

2024-07-01 Thread Paolo Valerio
Ilya reported a problem with zone limits in particular when a default is
set but a zone specific limit is not.
 
As per subject, the series (path 5/5) addresses this issue by creating an
entry in the zone limit map as soon as the first packet for a zone is seen
(and the default limit is in place).
A test was extended to exercise the behavior and make sure the kernel
and userspace datapath behave consistently.

Paolo Valerio (5):
  conntrack: Correctly annotate conntrack member.
  conntrack: Add zone limit coverage counter.
  conntrack: Do not use atomics to report zones info.
  conntrack: Turn zl local limit into atomic.
  conntrack: Use a per zone default limit.

 lib/conntrack-private.h |  18 ++-
 lib/conntrack.c | 241 +++-
 lib/conntrack.h |   9 +-
 lib/dpif-netdev.c   |   6 +-
 tests/system-traffic.at |  64 ---
 5 files changed, 256 insertions(+), 82 deletions(-)

-- 
2.45.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC v2 5/9] ofproto-dpif-xlate: Use emit_sample for local sample.

2024-07-01 Thread Adrián Moreno
On Mon, Jul 01, 2024 at 01:30:12PM GMT, Eelco Chaudron wrote:
>
>
> On 28 Jun 2024, at 18:40, Adrián Moreno wrote:
>
> > On Wed, Jun 26, 2024 at 02:11:51PM GMT, Eelco Chaudron wrote:
> >> On 5 Jun 2024, at 22:23, Adrian Moreno wrote:
> >>
> >>> Use the newly added emit_sample to implement OpenFlow sample() actions
> >>> with local sampling configuration.
> >>
> >> See some comments below.
> >>
> >> Cheers,
> >>
> >> Eelco
> >>
> >>> Signed-off-by: Adrian Moreno 
> >>> ---
> >>>  ofproto/ofproto-dpif-lsample.c |  17 
> >>>  ofproto/ofproto-dpif-lsample.h |   6 ++
> >>>  ofproto/ofproto-dpif-xlate.c   | 163 -
> >>>  ofproto/ofproto-dpif-xlate.h   |   3 +-
> >>>  ofproto/ofproto-dpif.c |   2 +-
> >>>  5 files changed, 144 insertions(+), 47 deletions(-)
> >>>
> >>> diff --git a/ofproto/ofproto-dpif-lsample.c 
> >>> b/ofproto/ofproto-dpif-lsample.c
> >>> index 7bdafac19..2c0b5da89 100644
> >>> --- a/ofproto/ofproto-dpif-lsample.c
> >>> +++ b/ofproto/ofproto-dpif-lsample.c
> >>> @@ -139,6 +139,23 @@ dpif_lsample_set_options(struct dpif_lsample 
> >>> *lsample,
> >>>  return changed;
> >>>  }
> >>>
> >>> +/* Returns the group_id of the exporter with the given collector_set_id, 
> >>> if it
> >>> + * exists. */
> >>
> >> nit: The below will fit on one line
> >>
> >> /* Returns the exporter group_id for a given collector_set_id, if it 
> >> exists. */
> >>
> >
> > Ack.
> >
> >>> +bool
> >>> +dpif_lsample_get_group_id(struct dpif_lsample *ps, uint32_t 
> >>> collector_set_id,
> >>> +  uint32_t *group_id)
> >>> +{
> >>> +struct lsample_exporter_node *node;
> >>> +bool found = false;
> >>> +
> >>> +node = dpif_lsample_find_exporter_node(ps, collector_set_id);
> >>> +if (node) {
> >>> +found = true;
> >>> +*group_id = node->exporter.options.group_id;
> >>> +}
> >>> +return found;
> >>> +}
> >>> +
> >>>  struct dpif_lsample *
> >>>  dpif_lsample_create(void)
> >>>  {
> >>> diff --git a/ofproto/ofproto-dpif-lsample.h 
> >>> b/ofproto/ofproto-dpif-lsample.h
> >>> index c23ea8372..f13425575 100644
> >>> --- a/ofproto/ofproto-dpif-lsample.h
> >>> +++ b/ofproto/ofproto-dpif-lsample.h
> >>> @@ -19,6 +19,7 @@
> >>>
> >>>  #include 
> >>>  #include 
> >>> +#include 
> >>
> >> Maybe add in alphabetical order.
> >>
> >
> > Ack.
> >
> >>>  struct dpif_lsample;
> >>>  struct ofproto_lsample_options;
> >>> @@ -31,4 +32,9 @@ bool dpif_lsample_set_options(struct dpif_lsample *,
> >>>const struct ofproto_lsample_options *,
> >>>size_t n_opts);
> >>>
> >>> +bool dpif_lsample_get_group_id(struct dpif_lsample *,
> >>> +   uint32_t collector_set_id,
> >>> +   uint32_t *group_id);
> >>> +
> >>>  #endif /* OFPROTO_DPIF_LSAMPLE_H */
> >>> +
> >>
> >> Accedantely added a newline?
> >>
> >
> > Ack.
> >
> >>> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> >>> index 7c4950895..5bd215d31 100644
> >>> --- a/ofproto/ofproto-dpif-xlate.c
> >>> +++ b/ofproto/ofproto-dpif-xlate.c
> >>> @@ -47,6 +47,7 @@
> >>>  #include "ofproto/ofproto-dpif-ipfix.h"
> >>>  #include "ofproto/ofproto-dpif-mirror.h"
> >>>  #include "ofproto/ofproto-dpif-monitor.h"
> >>> +#include "ofproto/ofproto-dpif-lsample.h"
> >>>  #include "ofproto/ofproto-dpif-sflow.h"
> >>>  #include "ofproto/ofproto-dpif-trace.h"
> >>>  #include "ofproto/ofproto-dpif-xlate-cache.h"
> >>> @@ -117,6 +118,7 @@ struct xbridge {
> >>>  struct dpif_sflow *sflow; /* SFlow handle, or null. */
> >>>  struct dpif_ipfix *ipfix; /* Ipfix handle, or null. */
> >>>  struct netflow *netflow;  /* Netflow handle, or null. */
> >>> +struct dpif_lsample *lsample; /* Local sample handle, or null. */
> >>>  struct stp *stp;  /* STP or null if disabled. */
> >>>  struct rstp *rstp;/* RSTP or null if disabled. */
> >>>
> >>> @@ -687,6 +689,7 @@ static void xlate_xbridge_set(struct xbridge *, 
> >>> struct dpif *,
> >>>const struct dpif_sflow *,
> >>>const struct dpif_ipfix *,
> >>>const struct netflow *,
> >>> +  const struct dpif_lsample *,
> >>>bool forward_bpdu, bool has_in_band,
> >>>const struct dpif_backer_support *,
> >>>const struct xbridge_addr *);
> >>> @@ -1070,6 +1073,7 @@ xlate_xbridge_set(struct xbridge *xbridge,
> >>>const struct dpif_sflow *sflow,
> >>>const struct dpif_ipfix *ipfix,
> >>>const struct netflow *netflow,
> >>> +  const struct dpif_lsample *lsample,
> >>
> >> nit: I would move above netflow, as you also do the actual init/unref 
> >> before
> >>  netflow.
> >
> > Ack.
> >
> >>> 

Re: [ovs-dev] [RFC v2 9/9] tests: Test local sampling.

2024-07-01 Thread Adrián Moreno
On Mon, Jul 01, 2024 at 01:33:23PM GMT, Eelco Chaudron wrote:
>
>
> On 28 Jun 2024, at 16:11, Adrián Moreno wrote:
>
> > On Wed, Jun 26, 2024 at 02:15:02PM GMT, Eelco Chaudron wrote:
> >> On 5 Jun 2024, at 22:23, Adrian Moreno wrote:
> >>
> >>> Test simultaneous IPFIX and local sampling including slow-path.
> >>
> >> I guess Ilya's comments on this patch summarize most of my comments. I had
> >> one small additional question. See below.
> >>
> >> //Eelco
> >>
> >>> Signed-off-by: Adrian Moreno 
> >>> ---
> >>>  tests/system-common-macros.at |   4 ++
> >>>  tests/system-traffic.at   | 105 ++
> >>>  2 files changed, 109 insertions(+)
> >>>
> >>> diff --git a/tests/system-common-macros.at b/tests/system-common-macros.at
> >>> index 2a68cd664..22b8885e4 100644
> >>> --- a/tests/system-common-macros.at
> >>> +++ b/tests/system-common-macros.at
> >>> @@ -378,3 +378,7 @@ m4_define([OVS_CHECK_GITHUB_ACTION],
> >>>  # OVS_CHECK_DROP_ACTION()
> >>>  m4_define([OVS_CHECK_DROP_ACTION],
> >>>  [AT_SKIP_IF([! grep -q "Datapath supports explicit drop action" 
> >>> ovs-vswitchd.log])])
> >>> +
> >>> +# OVS_CHECK_EMIT_SAMPLE()
> >>> +m4_define([OVS_CHECK_EMIT_SAMPLE],
> >>> +[AT_SKIP_IF([! grep -q "Datapath supports emit_sample" 
> >>> ovs-vswitchd.log])])
> >>> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> >>> index bd7647cbe..babc56b56 100644
> >>> --- a/tests/system-traffic.at
> >>> +++ b/tests/system-traffic.at
> >>> @@ -8977,3 +8977,108 @@ OVS_WAIT_UNTIL([cat p2.pcap | grep -E "0x0050: 
> >>> * * *5002 *2000 *b85e *00
> >>>
> >>>  OVS_TRAFFIC_VSWITCHD_STOP
> >>>  AT_CLEANUP
> >>> +
> >>> +AT_SETUP([emit_sample])
> >>> +OVS_TRAFFIC_VSWITCHD_START()
> >>> +OVS_CHECK_EMIT_SAMPLE()
> >>> +
> >>> +ADD_NAMESPACES(at_ns0, at_ns1)
> >>> +NS_CHECK_EXEC([at_ns0], [sysctl -w net.ipv6.conf.all.disable_ipv6=1], 
> >>> [0], [ignore])
> >>> +NS_CHECK_EXEC([at_ns1], [sysctl -w net.ipv6.conf.all.disable_ipv6=1], 
> >>> [0], [ignore])
> >>> +
> >>> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", "e4:11:22:33:44:55")
> >>> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", "e4:11:22:33:44:66")
> >>> +
> >>> +NS_CHECK_EXEC([at_ns0], [arp -s 10.1.1.2 e4:11:22:33:44:66])
> >>> +NS_CHECK_EXEC([at_ns1], [arp -s 10.1.1.1 e4:11:22:33:44:55])
> >>> +
> >>> +
> >>> +AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg 
> >>> ofproto_dpif_upcall:dbg])
> >>> +
> >>> +AT_CHECK([ovs-vsctl -- --id=@br0 get Bridge br0 \
> >>> +-- --id=@ipfix create IPFIX 
> >>> targets=\"127.0.0.1:4739\" \
> >>> +-- create Flow_Sample_Collector_Set id=1 bridge=@br0 
> >>> ipfix=@ipfix, local_sample_group=10 \
> >>> +-- create Flow_Sample_Collector_Set id=2 bridge=@br0 
> >>> ipfix=@ipfix, local_sample_group=12],
> >>> + [0], [ignore])
> >>> +
> >>> +AT_DATA([flows.txt], [dnl
> >>> +in_port=ovs-p0,ip 
> >>> actions=sample(probability=65535,collector_set_id=1,obs_domain_id=1431655765,obs_point_id=1717986918),output(port=ovs-p1,max_len=100)
> >>> +in_port=ovs-p1,ip 
> >>> actions=sample(probability=65535,collector_set_id=2,obs_domain_id=2290649224,obs_point_id=2576980377),output(port=ovs-p0,max_len=100)
> >>> +])
> >>> +
> >>> +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
> >>> +
> >>> +OVS_DAEMONIZE([ovstest test-psample > psample.out], [psample1.pid])
> >>> +
> >>> +NS_CHECK_EXEC([at_ns0], [ping -q -c 1 10.1.1.2 | FORMAT_PING], [0], [dnl
> >>> +1 packets transmitted, 1 received, 0% packet loss, time 0ms
> >>> +])
> >>> +
> >>> +m4_define([SAMPLE1], [group_id=0xa 
> >>> obs_domain=0x,obs_point=0x 
> >>> .*icmp.*nw_src=10.1.1.1,nw_dst=10.1.1.2])
> >>> +m4_define([SAMPLE2], [group_id=0xc 
> >>> obs_domain=0x,obs_point=0x 
> >>> .*icmp.*nw_src=10.1.1.2,nw_dst=10.1.1.1])
> >>> +AT_CHECK([grep -E 'SAMPLE1' psample.out >/dev/null])
> >>> +AT_CHECK([grep -E 'SAMPLE2' psample.out >/dev/null])
> >>> +
> >>> +AT_CHECK([ovs-ofctl dump-ipfix-flow br0 | sed 's/tx pkts=[[0-9]]*/tx 
> >>> pkts=24/' | sed 's/tx errs=[[0-9]]*/tx errs=0/'], [0], [dnl
> >>
> >> Why are we making tx pkts always 24, and are we waiving any potential tx 
> >> errors?
> >> Is this something you have seen not being consistent, if so any idea why?
> >>
> >
> > Yes. By the time we request the statistics, IPFIX cache should contain
> > the flows but they might not have been sent yet. When they are sent they
> > will fail (because IPFIX target is localhost and it returns the socket
> > returns an error if the listening socket does not exist) so the number
> > of errors might vary.
>
> Your explanation makes sense to me. Maybe just add it as a comment so people 
> looking at the test know why.
>

Ack.

> > I can look deeper into making it more consistent but it looked like a
> > quick workaround.
> >
> >>> +NXST_IPFIX_FLOW reply (xid=0x2): 2 ids
> >>> +  id   1: flows=1, current flows=0, sampled pkts=1, ipv4 ok=1, ipv6 
> >>> ok=0, tx pkts=24
> >>> + 

Re: [ovs-dev] [PATCH v3] netdev-offload-dpdk: Support offload of set dscp action.

2024-07-01 Thread Sunyang Wu via dev
Hi, Ilya, Thank you for your valuable advice!
In lines 7390-7406 of the 'ofproto_dpif xlate.c' file, show below:
case OFPACT_SET_IP_DSCP:
if (is_ip_any(flow)) {
WC_MASK_FIELD(wc, nw_proto);
wc->masks.nw_tos |= IP_DSCP_MASK;
flow->nw_tos &= ~IP_DSCP_MASK;
flow->nw_tos |= ofpact_get_SET_IP_DSCP(a)->dscp;
}
break;

case OFPACT_SET_IP_ECN:
if (is_ip_any(flow)) {
WC_MASK_FIELD(wc, nw_proto);
wc->masks.nw_tos |= IP_ECN_MASK;
flow->nw_tos &= ~IP_ECN_MASK;
flow->nw_tos |= ofpact_get_SET_IP_ECN(a)->ecn;
}
break;
when parsing and setting the DSCP and ECN actions, different masks are applied 
to the `nw_tos` field respectively, 
and a mask operation is performed on `nw_tos`. Additionally, the same action 
identifier is used for offloading both DSCP
and ECN, ensuring that setting DSCP and setting ECN do not interfere with each 
other.

Regarding the `add_set_flow_action__` function, the mask is cleared to zero. 
This clearing operation is intended
for subsequent checks to determine if all the fields issued are supported. This 
mask will not be passed to the DPDK
side, so it will not affect the offloading operation.

On 6/28/24 23:08, Ilya Maximets wrote:
> On 6/20/24 09:35, Sunyang Wu via dev wrote:
>> Add the "set dscp action" parsing function, so that the "set dscp 
>> action" can be offloaded.
>>
>> Signed-off-by: Sunyang Wu 
>> ---
>>  lib/netdev-offload-dpdk.c | 19 ++-
>>  1 file changed, 18 insertions(+), 1 deletion(-)
>>
>
> Hi, Sunyang Wu.  Thanks for the patch!
> See some comments inline.
>
> Best regards, Ilya Maximets.
>
>> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c 
>> index 623005b1c..524942457 100644
>> --- a/lib/netdev-offload-dpdk.c
>> +++ b/lib/netdev-offload-dpdk.c
>> @@ -791,6 +791,17 @@ dump_flow_action(struct ds *s, struct ds *s_extra,
>>  ds_put_format(s, "port %"PRIu16" ", ntohs(set_tp->port));
>>  }
>>  ds_put_cstr(s, "/ ");
>> +} else if (actions->type == RTE_FLOW_ACTION_TYPE_SET_IPV4_DSCP ||
>> +   actions->type == RTE_FLOW_ACTION_TYPE_SET_IPV6_DSCP) {
>> +const struct rte_flow_action_set_dscp *set_dscp = actions->conf;
>> +char *dirstr = actions->type == RTE_FLOW_ACTION_TYPE_SET_IPV4_DSCP
>> +   ? "set_ipv4_dscp " : "set_ipv6_dscp ";
>> +
>> +ds_put_cstr(s, dirstr);
>> +if (set_dscp) {
>> +ds_put_format(s, "dscp_value %d ", set_dscp->dscp);
>> +}
>> +ds_put_cstr(s, "/ ");
>>  } else if (actions->type == RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN) {
>>  const struct rte_flow_action_of_push_vlan *of_push_vlan =
>>  actions->conf;
>> @@ -1835,7 +1846,10 @@ add_set_flow_action__(struct flow_actions *actions,
>>  if (is_all_zeros(mask, size)) {
>>  return 0;
>>  }
>> -if (!is_all_ones(mask, size)) {
>> +if (!is_all_ones(mask, size) &&
>> +/* set dscp need patial mask */
>> +attr != RTE_FLOW_ACTION_TYPE_SET_IPV4_DSCP &&
>> +attr != RTE_FLOW_ACTION_TYPE_SET_IPV6_DSCP) {
>
> This doesn't seem right.  We need to check that the mask contains all 
> the DSCP bits and that it doesn't have anything else inside.
>
> For example, if we try to set ECN bits, we'll have them in a mask.
> In this case is_all_zeros() check will fail and then this check will 
> allow us to proceed as well.  In the end, we'll end up setting DSCP 
> bits to all zeroes, or worse, to whatever was in the first six bits of 
> nw_tos.
>

Also, this function will clear the whole mask below, but we should only clear 
DSCP bits in this case.

>>  VLOG_DBG_RL(&rl, "Partial mask is not supported");
>>  return -1;
>>  }
>> @@ -1912,6 +1926,7 @@ parse_set_actions(struct flow_actions *actions,
>>  add_set_flow_action(ipv4_src, 
>> RTE_FLOW_ACTION_TYPE_SET_IPV4_SRC);
>>  add_set_flow_action(ipv4_dst, 
>> RTE_FLOW_ACTION_TYPE_SET_IPV4_DST);
>>  add_set_flow_action(ipv4_ttl, 
>> RTE_FLOW_ACTION_TYPE_SET_TTL);
>> +add_set_flow_action(ipv4_tos, 
>> + RTE_FLOW_ACTION_TYPE_SET_IPV4_DSCP);z
>>
>>  if (mask && !is_all_zeros(mask, sizeof *mask)) {
>>  VLOG_DBG_RL(&rl, "Unsupported IPv4 set action"); @@ 
>> -1924,6 +1939,8 @@ parse_set_actions(struct flow_actions *actions,
>>  add_set_flow_action(ipv6_src, 
>> RTE_FLOW_ACTION_TYPE_SET_IPV6_SRC);
>>  add_set_flow_action(ipv6_dst, 
>> RTE_FLOW_ACTION_TYPE_SET_IPV6_DST);
>>  add_set_flow_action(ipv6_hlimit, 
>> RTE_FLOW_ACTION_TYPE_SET_TTL);
>> +add_set_flow_action(ipv6_tclass,
>> +RTE_FLOW_ACTION_TYPE_SET_IPV6_DSCP);
>>
>>  if (mask && !is_all_zeros(mask, sizeof *mask)) {
>> 

Re: [ovs-dev] [RFC v2 9/9] tests: Test local sampling.

2024-07-01 Thread Eelco Chaudron


On 28 Jun 2024, at 16:11, Adrián Moreno wrote:

> On Wed, Jun 26, 2024 at 02:15:02PM GMT, Eelco Chaudron wrote:
>> On 5 Jun 2024, at 22:23, Adrian Moreno wrote:
>>
>>> Test simultaneous IPFIX and local sampling including slow-path.
>>
>> I guess Ilya's comments on this patch summarize most of my comments. I had
>> one small additional question. See below.
>>
>> //Eelco
>>
>>> Signed-off-by: Adrian Moreno 
>>> ---
>>>  tests/system-common-macros.at |   4 ++
>>>  tests/system-traffic.at   | 105 ++
>>>  2 files changed, 109 insertions(+)
>>>
>>> diff --git a/tests/system-common-macros.at b/tests/system-common-macros.at
>>> index 2a68cd664..22b8885e4 100644
>>> --- a/tests/system-common-macros.at
>>> +++ b/tests/system-common-macros.at
>>> @@ -378,3 +378,7 @@ m4_define([OVS_CHECK_GITHUB_ACTION],
>>>  # OVS_CHECK_DROP_ACTION()
>>>  m4_define([OVS_CHECK_DROP_ACTION],
>>>  [AT_SKIP_IF([! grep -q "Datapath supports explicit drop action" 
>>> ovs-vswitchd.log])])
>>> +
>>> +# OVS_CHECK_EMIT_SAMPLE()
>>> +m4_define([OVS_CHECK_EMIT_SAMPLE],
>>> +[AT_SKIP_IF([! grep -q "Datapath supports emit_sample" 
>>> ovs-vswitchd.log])])
>>> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
>>> index bd7647cbe..babc56b56 100644
>>> --- a/tests/system-traffic.at
>>> +++ b/tests/system-traffic.at
>>> @@ -8977,3 +8977,108 @@ OVS_WAIT_UNTIL([cat p2.pcap | grep -E "0x0050: 
>>> * * *5002 *2000 *b85e *00
>>>
>>>  OVS_TRAFFIC_VSWITCHD_STOP
>>>  AT_CLEANUP
>>> +
>>> +AT_SETUP([emit_sample])
>>> +OVS_TRAFFIC_VSWITCHD_START()
>>> +OVS_CHECK_EMIT_SAMPLE()
>>> +
>>> +ADD_NAMESPACES(at_ns0, at_ns1)
>>> +NS_CHECK_EXEC([at_ns0], [sysctl -w net.ipv6.conf.all.disable_ipv6=1], [0], 
>>> [ignore])
>>> +NS_CHECK_EXEC([at_ns1], [sysctl -w net.ipv6.conf.all.disable_ipv6=1], [0], 
>>> [ignore])
>>> +
>>> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", "e4:11:22:33:44:55")
>>> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", "e4:11:22:33:44:66")
>>> +
>>> +NS_CHECK_EXEC([at_ns0], [arp -s 10.1.1.2 e4:11:22:33:44:66])
>>> +NS_CHECK_EXEC([at_ns1], [arp -s 10.1.1.1 e4:11:22:33:44:55])
>>> +
>>> +
>>> +AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg 
>>> ofproto_dpif_upcall:dbg])
>>> +
>>> +AT_CHECK([ovs-vsctl -- --id=@br0 get Bridge br0 \
>>> +-- --id=@ipfix create IPFIX targets=\"127.0.0.1:4739\" 
>>> \
>>> +-- create Flow_Sample_Collector_Set id=1 bridge=@br0 
>>> ipfix=@ipfix, local_sample_group=10 \
>>> +-- create Flow_Sample_Collector_Set id=2 bridge=@br0 
>>> ipfix=@ipfix, local_sample_group=12],
>>> + [0], [ignore])
>>> +
>>> +AT_DATA([flows.txt], [dnl
>>> +in_port=ovs-p0,ip 
>>> actions=sample(probability=65535,collector_set_id=1,obs_domain_id=1431655765,obs_point_id=1717986918),output(port=ovs-p1,max_len=100)
>>> +in_port=ovs-p1,ip 
>>> actions=sample(probability=65535,collector_set_id=2,obs_domain_id=2290649224,obs_point_id=2576980377),output(port=ovs-p0,max_len=100)
>>> +])
>>> +
>>> +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
>>> +
>>> +OVS_DAEMONIZE([ovstest test-psample > psample.out], [psample1.pid])
>>> +
>>> +NS_CHECK_EXEC([at_ns0], [ping -q -c 1 10.1.1.2 | FORMAT_PING], [0], [dnl
>>> +1 packets transmitted, 1 received, 0% packet loss, time 0ms
>>> +])
>>> +
>>> +m4_define([SAMPLE1], [group_id=0xa 
>>> obs_domain=0x,obs_point=0x 
>>> .*icmp.*nw_src=10.1.1.1,nw_dst=10.1.1.2])
>>> +m4_define([SAMPLE2], [group_id=0xc 
>>> obs_domain=0x,obs_point=0x 
>>> .*icmp.*nw_src=10.1.1.2,nw_dst=10.1.1.1])
>>> +AT_CHECK([grep -E 'SAMPLE1' psample.out >/dev/null])
>>> +AT_CHECK([grep -E 'SAMPLE2' psample.out >/dev/null])
>>> +
>>> +AT_CHECK([ovs-ofctl dump-ipfix-flow br0 | sed 's/tx pkts=[[0-9]]*/tx 
>>> pkts=24/' | sed 's/tx errs=[[0-9]]*/tx errs=0/'], [0], [dnl
>>
>> Why are we making tx pkts always 24, and are we waiving any potential tx 
>> errors?
>> Is this something you have seen not being consistent, if so any idea why?
>>
>
> Yes. By the time we request the statistics, IPFIX cache should contain
> the flows but they might not have been sent yet. When they are sent they
> will fail (because IPFIX target is localhost and it returns the socket
> returns an error if the listening socket does not exist) so the number
> of errors might vary.

Your explanation makes sense to me. Maybe just add it as a comment so people 
looking at the test know why.

> I can look deeper into making it more consistent but it looked like a
> quick workaround.
>
>>> +NXST_IPFIX_FLOW reply (xid=0x2): 2 ids
>>> +  id   1: flows=1, current flows=0, sampled pkts=1, ipv4 ok=1, ipv6 ok=0, 
>>> tx pkts=24
>>> +  pkts errs=0, ipv4 errs=0, ipv6 errs=0, tx errs=0
>>> +  id   2: flows=1, current flows=0, sampled pkts=1, ipv4 ok=1, ipv6 ok=0, 
>>> tx pkts=24
>>> +  pkts errs=0, ipv4 errs=0, ipv6 errs=0, tx errs=0
>>> +])
>>> +
>>> +AT_CHECK([ovs-appctl lsample/show br0], [0], [dnl
>>> +local sample statistics

Re: [ovs-dev] [RFC v2 5/9] ofproto-dpif-xlate: Use emit_sample for local sample.

2024-07-01 Thread Eelco Chaudron


On 28 Jun 2024, at 18:40, Adrián Moreno wrote:

> On Wed, Jun 26, 2024 at 02:11:51PM GMT, Eelco Chaudron wrote:
>> On 5 Jun 2024, at 22:23, Adrian Moreno wrote:
>>
>>> Use the newly added emit_sample to implement OpenFlow sample() actions
>>> with local sampling configuration.
>>
>> See some comments below.
>>
>> Cheers,
>>
>> Eelco
>>
>>> Signed-off-by: Adrian Moreno 
>>> ---
>>>  ofproto/ofproto-dpif-lsample.c |  17 
>>>  ofproto/ofproto-dpif-lsample.h |   6 ++
>>>  ofproto/ofproto-dpif-xlate.c   | 163 -
>>>  ofproto/ofproto-dpif-xlate.h   |   3 +-
>>>  ofproto/ofproto-dpif.c |   2 +-
>>>  5 files changed, 144 insertions(+), 47 deletions(-)
>>>
>>> diff --git a/ofproto/ofproto-dpif-lsample.c b/ofproto/ofproto-dpif-lsample.c
>>> index 7bdafac19..2c0b5da89 100644
>>> --- a/ofproto/ofproto-dpif-lsample.c
>>> +++ b/ofproto/ofproto-dpif-lsample.c
>>> @@ -139,6 +139,23 @@ dpif_lsample_set_options(struct dpif_lsample *lsample,
>>>  return changed;
>>>  }
>>>
>>> +/* Returns the group_id of the exporter with the given collector_set_id, 
>>> if it
>>> + * exists. */
>>
>> nit: The below will fit on one line
>>
>> /* Returns the exporter group_id for a given collector_set_id, if it exists. 
>> */
>>
>
> Ack.
>
>>> +bool
>>> +dpif_lsample_get_group_id(struct dpif_lsample *ps, uint32_t 
>>> collector_set_id,
>>> +  uint32_t *group_id)
>>> +{
>>> +struct lsample_exporter_node *node;
>>> +bool found = false;
>>> +
>>> +node = dpif_lsample_find_exporter_node(ps, collector_set_id);
>>> +if (node) {
>>> +found = true;
>>> +*group_id = node->exporter.options.group_id;
>>> +}
>>> +return found;
>>> +}
>>> +
>>>  struct dpif_lsample *
>>>  dpif_lsample_create(void)
>>>  {
>>> diff --git a/ofproto/ofproto-dpif-lsample.h b/ofproto/ofproto-dpif-lsample.h
>>> index c23ea8372..f13425575 100644
>>> --- a/ofproto/ofproto-dpif-lsample.h
>>> +++ b/ofproto/ofproto-dpif-lsample.h
>>> @@ -19,6 +19,7 @@
>>>
>>>  #include 
>>>  #include 
>>> +#include 
>>
>> Maybe add in alphabetical order.
>>
>
> Ack.
>
>>>  struct dpif_lsample;
>>>  struct ofproto_lsample_options;
>>> @@ -31,4 +32,9 @@ bool dpif_lsample_set_options(struct dpif_lsample *,
>>>const struct ofproto_lsample_options *,
>>>size_t n_opts);
>>>
>>> +bool dpif_lsample_get_group_id(struct dpif_lsample *,
>>> +   uint32_t collector_set_id,
>>> +   uint32_t *group_id);
>>> +
>>>  #endif /* OFPROTO_DPIF_LSAMPLE_H */
>>> +
>>
>> Accedantely added a newline?
>>
>
> Ack.
>
>>> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
>>> index 7c4950895..5bd215d31 100644
>>> --- a/ofproto/ofproto-dpif-xlate.c
>>> +++ b/ofproto/ofproto-dpif-xlate.c
>>> @@ -47,6 +47,7 @@
>>>  #include "ofproto/ofproto-dpif-ipfix.h"
>>>  #include "ofproto/ofproto-dpif-mirror.h"
>>>  #include "ofproto/ofproto-dpif-monitor.h"
>>> +#include "ofproto/ofproto-dpif-lsample.h"
>>>  #include "ofproto/ofproto-dpif-sflow.h"
>>>  #include "ofproto/ofproto-dpif-trace.h"
>>>  #include "ofproto/ofproto-dpif-xlate-cache.h"
>>> @@ -117,6 +118,7 @@ struct xbridge {
>>>  struct dpif_sflow *sflow; /* SFlow handle, or null. */
>>>  struct dpif_ipfix *ipfix; /* Ipfix handle, or null. */
>>>  struct netflow *netflow;  /* Netflow handle, or null. */
>>> +struct dpif_lsample *lsample; /* Local sample handle, or null. */
>>>  struct stp *stp;  /* STP or null if disabled. */
>>>  struct rstp *rstp;/* RSTP or null if disabled. */
>>>
>>> @@ -687,6 +689,7 @@ static void xlate_xbridge_set(struct xbridge *, struct 
>>> dpif *,
>>>const struct dpif_sflow *,
>>>const struct dpif_ipfix *,
>>>const struct netflow *,
>>> +  const struct dpif_lsample *,
>>>bool forward_bpdu, bool has_in_band,
>>>const struct dpif_backer_support *,
>>>const struct xbridge_addr *);
>>> @@ -1070,6 +1073,7 @@ xlate_xbridge_set(struct xbridge *xbridge,
>>>const struct dpif_sflow *sflow,
>>>const struct dpif_ipfix *ipfix,
>>>const struct netflow *netflow,
>>> +  const struct dpif_lsample *lsample,
>>
>> nit: I would move above netflow, as you also do the actual init/unref before
>>  netflow.
>
> Ack.
>
>>>bool forward_bpdu, bool has_in_band,
>>>const struct dpif_backer_support *support,
>>>const struct xbridge_addr *addr)
>>> @@ -1099,6 +1103,11 @@ xlate_xbridge_set(struct xbridge *xbridge,
>>>  xbridge->ipfix = dpif_ipfix_ref(ipfix);
>>>  }
>>>
>>> +if (xbridge->lsample != lsample) {
>>> + 

Re: [ovs-dev] [RFC v2 4/9] vswitchd: Add local sampling to vswitchd schema.

2024-07-01 Thread Eelco Chaudron


On 28 Jun 2024, at 15:53, Adrián Moreno wrote:

> On Thu, Jun 27, 2024 at 03:57:03PM GMT, Eelco Chaudron wrote:
>>
>>
>> On 27 Jun 2024, at 13:37, Adrián Moreno wrote:
>>
>>> On Wed, Jun 26, 2024 at 02:11:00PM GMT, Eelco Chaudron wrote:
 On 5 Jun 2024, at 22:23, Adrian Moreno wrote:

> Add as new column in the Flow_Sample_Collector_Set table named
> "local_sample_group" which enables this feature.
>
> Signed-off-by: Adrian Moreno 
> ---
>  NEWS   |  4 ++
>  vswitchd/bridge.c  | 78 +++---
>  vswitchd/vswitch.ovsschema |  9 -
>  vswitchd/vswitch.xml   | 39 +--
>  4 files changed, 119 insertions(+), 11 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index b92cec532..1c05a7120 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -7,6 +7,10 @@ Post-v3.3.0
> - The primary development branch has been renamed from 'master' to 
> 'main'.
>   The OVS tree remains hosted on GitHub.
>   https://github.com/openvswitch/ovs.git
> +   - Local sampling is introduced. It reuses the OpenFlow sample action 
> and
> + allows samples to be emitted locally (instead of via IPFIX) in a
> + datapath-specific manner via the new datapath action called 
> "emit_sample".
> + Linux kernel datapath is the first to support this feature.
>
>
>  v3.3.0 - 16 Feb 2024
> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
> index 95a65fcdc..cd7dc6646 100644
> --- a/vswitchd/bridge.c
> +++ b/vswitchd/bridge.c
> @@ -288,6 +288,7 @@ static void bridge_configure_mac_table(struct bridge 
> *);
>  static void bridge_configure_mcast_snooping(struct bridge *);
>  static void bridge_configure_sflow(struct bridge *, int 
> *sflow_bridge_number);
>  static void bridge_configure_ipfix(struct bridge *);
> +static void bridge_configure_lsample(struct bridge *);
>  static void bridge_configure_spanning_tree(struct bridge *);
>  static void bridge_configure_tables(struct bridge *);
>  static void bridge_configure_dp_desc(struct bridge *);
> @@ -989,6 +990,7 @@ bridge_reconfigure(const struct ovsrec_open_vswitch 
> *ovs_cfg)
>  bridge_configure_netflow(br);
>  bridge_configure_sflow(br, &sflow_bridge_number);
>  bridge_configure_ipfix(br);
> +bridge_configure_lsample(br);
>  bridge_configure_spanning_tree(br);
>  bridge_configure_tables(br);
>  bridge_configure_dp_desc(br);
> @@ -1537,10 +1539,11 @@ ovsrec_ipfix_is_valid(const struct ovsrec_ipfix 
> *ipfix)
>  return ipfix && ipfix->n_targets > 0;
>  }
>
> -/* Returns whether a Flow_Sample_Collector_Set row is valid. */
> +/* Returns whether a Flow_Sample_Collector_Set row constains valid IPFIX

 constains -> 'contains a'

>>>
>>> Ack.
>>>
> + * configuration. */
>  static bool
> -ovsrec_fscs_is_valid(const struct ovsrec_flow_sample_collector_set *fscs,
> - const struct bridge *br)
> +ovsrec_fscs_is_valid_ipfix(const struct ovsrec_flow_sample_collector_set 
> *fscs,
> +   const struct bridge *br)
>  {
>  return ovsrec_ipfix_is_valid(fscs->ipfix) && fscs->bridge == br->cfg;
>  }
> @@ -1558,7 +1561,7 @@ bridge_configure_ipfix(struct bridge *br)
>  const char *virtual_obs_id;
>
>  OVSREC_FLOW_SAMPLE_COLLECTOR_SET_FOR_EACH(fe_cfg, idl) {
> -if (ovsrec_fscs_is_valid(fe_cfg, br)) {
> +if (ovsrec_fscs_is_valid_ipfix(fe_cfg, br)) {
>  n_fe_opts++;
>  }
>  }
> @@ -1621,7 +1624,7 @@ bridge_configure_ipfix(struct bridge *br)
>  fe_opts = xcalloc(n_fe_opts, sizeof *fe_opts);
>  opts = fe_opts;
>  OVSREC_FLOW_SAMPLE_COLLECTOR_SET_FOR_EACH(fe_cfg, idl) {
> -if (ovsrec_fscs_is_valid(fe_cfg, br)) {
> +if (ovsrec_fscs_is_valid_ipfix(fe_cfg, br)) {
>  opts->collector_set_id = fe_cfg->id;
>  sset_init(&opts->targets);
>  sset_add_array(&opts->targets, fe_cfg->ipfix->targets,
> @@ -1667,6 +1670,71 @@ bridge_configure_ipfix(struct bridge *br)
>  }
>  }
>
> +/* Returns whether a Flow_Sample_Collector_Set row contains valid local
> + * sampling configuraiton. */

 ...row contains a valid local...
++

 configuraiton -> configuration
>>>
>>> Ack.
>>>

> +static bool
> +ovsrec_fscs_is_valid_local(const struct ovsrec_flow_sample_collector_set 
> *fscs,
> +   const struct bridge *br)
> +{
> +return fscs->local_sample_group && fscs->n_local_sample_group == 1 &&
> +   fscs->bridge == br->cfg;
> +}
> +
> +/

Re: [ovs-dev] [RFC v2 1/9] odp-util: Add support OVS_ACTION_ATTR_EMIT_SAMPLE.

2024-07-01 Thread Eelco Chaudron


On 28 Jun 2024, at 15:31, Adrián Moreno wrote:

> On Thu, Jun 27, 2024 at 03:42:38PM GMT, Eelco Chaudron wrote:
>>
>>
>> On 27 Jun 2024, at 12:45, Adrián Moreno wrote:
>>
>>> On Wed, Jun 26, 2024 at 02:08:27PM GMT, Eelco Chaudron wrote:
 On 5 Jun 2024, at 22:23, Adrian Moreno wrote:

> Add support for parsing and formatting the new action.
>
> Also, flag OVS_ACTION_ATTR_SAMPLE as requiring datapath assistance if it
> contains a nested OVS_ACTION_ATTR_EMIT_SAMPLE. The reason is that the
> sampling rate form the parent "sample" is made available to the nested
> "emit_sample" by the kernel.

 Hi Adrian,

 Thanks for this series! This email kicks off my review of the series,
 see comments below and in the other patches.

 Cheers,

 Eelco

> Signed-off-by: Adrian Moreno 
> ---
>  include/linux/openvswitch.h  |  25 +
>  lib/dpif-netdev.c|   1 +
>  lib/dpif.c   |   1 +
>  lib/odp-execute.c|  25 -
>  lib/odp-util.c   | 103 +++
>  lib/odp-util.h   |   3 +
>  ofproto/ofproto-dpif-ipfix.c |   1 +
>  ofproto/ofproto-dpif-sflow.c |   1 +
>  python/ovs/flow/odp.py   |   8 +++
>  tests/odp.at |  16 ++
>  10 files changed, 183 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
> index d9fb991ef..b4e0647bd 100644
> --- a/include/linux/openvswitch.h
> +++ b/include/linux/openvswitch.h
> @@ -992,6 +992,30 @@ struct check_pkt_len_arg {
>  };
>  #endif
>
> +#define OVS_EMIT_SAMPLE_COOKIE_MAX_SIZE 16
> +/**
> + * enum ovs_emit_sample_attr - Attributes for 
> %OVS_ACTION_ATTR_EMIT_SAMPLE
> + * action.
> + *
> + * @OVS_EMIT_SAMPLE_ATTR_GROUP: 32-bit number to identify the source of 
> the
> + * sample.
> + * @OVS_EMIT_SAMPLE_ATTR_COOKIE: A variable-length binary cookie that 
> contains
> + * user-defined metadata. The maximum length is 16 bytes.
> + *
> + * Sends the packet to the psample multicast group with the specified 
> group and
> + * cookie. It is possible to combine this action with the
> + * %OVS_ACTION_ATTR_TRUNC to limit the size of the packet being emitted.

 I'll start the discussion again on the naming. The name "emit_sample()"
 does not seem appropriate. This function's primary role is to copy the
 packet and send it to a local collector, which varies depending on the
 datapath. For the kernel datapath, this collector is psample, while for
 userspace, it will likely be some kind of probe. This action is distinct
 from the sample() action by design; it is a standalone action that can
 be combined with others.

 Furthermore, the action itself does not involve taking a sample; it
 consistently pushes the packet to the local collector. Therefore, I
 suggest renaming "emit_sample()" to "emit_local()". This same goes for
 all the derivative ATTR naming.

>>>
>>> Let's discuss this in the kernel ml.
>>>
> + */
> +enum ovs_emit_sample_attr {
> + OVS_EMIT_SAMPLE_ATTR_UNPSEC,
> + OVS_EMIT_SAMPLE_ATTR_GROUP, /* u32 number. */
> + OVS_EMIT_SAMPLE_ATTR_COOKIE,/* Optional, user specified cookie. */

 Fix comment alignment? Maybe also change the order to be alphabetical?

>>>
>>> What do you mean by comment alignment?
>>
>> Well you are using tabs to index which might result in it looking like this:
>>
>> +OVS_EMIT_SAMPLE_ATTR_GROUP, /* u32 number. */
>> +OVS_EMIT_SAMPLE_ATTR_COOKIE,/* Optional, user specified cookie. */
>>
>> (Most) of the other comments use spaces after the definition.
>>
> + __OVS_EMIT_SAMPLE_ATTR_MAX
> +};
> +
> +#define OVS_EMIT_SAMPLE_ATTR_MAX (__OVS_EMIT_SAMPLE_ATTR_MAX - 1)
> +
> +
>  /**
>   * enum ovs_action_attr - Action types.
>   *
> @@ -1087,6 +,7 @@ enum ovs_action_attr {
>   OVS_ACTION_ATTR_ADD_MPLS, /* struct ovs_action_add_mpls. */
>   OVS_ACTION_ATTR_DEC_TTL,  /* Nested OVS_DEC_TTL_ATTR_*. */
>   OVS_ACTION_ATTR_DROP, /* u32 xlate_error. */
> + OVS_ACTION_ATTR_EMIT_SAMPLE,  /* Nested OVS_EMIT_SAMPLE_ATTR_*. */
>
>  #ifndef __KERNEL__
>   OVS_ACTION_ATTR_TUNNEL_PUSH,   /* struct ovs_action_push_tnl*/
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index c7f9e1490..c1171890c 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -9519,6 +9519,7 @@ dp_execute_cb(void *aux_, struct dp_packet_batch 
> *packets_,
>  case OVS_ACTION_ATTR_DROP:
>  case OVS_ACTION_ATTR_ADD_MPLS:
>  case OVS_ACTION_ATTR_DEC_TTL:
> +case OVS_ACTION_ATTR_EMIT_SAMPLE:
>  case __OVS_ACTION_ATTR_MAX:
>  OVS_NOT_REACHED();
>

[ovs-dev] [PATCH v5 3/3] ovsdb: Optimize monitor update by directly serializing data into ds.

2024-07-01 Thread Grigorii Nazarov
Currently serialization is performed by first converting
the internal data representation into JSON objects, followed by
serializing these objects by jsonrpc. This process results in lots of
allocations for these intermediate objects. Consequently, this
not only increases peak memory consumption, but also
demands significantly more CPU work.

By forming row-update JSONs directly in `ds`, which is then used
to create 'serialized object' JSONs, the overall speed increased
by a factor of 2.3.

A local benchmark was run on a proprietary Southbound backup.
Both versions, before and after applying the patch, were measured.
For each version, there were two runs with 10 parallel clients,
and two runs with 30 parallel clients. CPU time was recorded
after startup (before clients started running) and after
all clients received all updates. Clients were essentially running
`ovsdb-client monitor-cond unix:pipe OVN_Southbound
'[[\"actions\",\"!=\",\"sdfdfsd\"]]' Logical_Flow`.
Similar results were obtained with other requests
that required a significant amount of data transfer.
The backup size is about 600 MB. Results are measured in seconds.

Before  After
Baseline x10:   9.53108.54
Baseline x10:   9.62108.67
Baseline x30:   9.69307.04
Baseline x30:   9.65303.32
Patch x10:  9.6752.57
Patch x10:  9.5753.12
Patch x30:  9.53136.33
Patch x30:  9.63135.88

Signed-off-by: Grigorii Nazarov 
---
v2: updated title
v3: fixed bracing
v4: changed patch number from 4/4 to 3/3

 include/openvswitch/json.h |   1 +
 lib/json.c |   8 ++
 lib/ovsdb-data.c   | 105 +++
 lib/ovsdb-data.h   |   3 +
 ovsdb/monitor.c| 210 -
 5 files changed, 254 insertions(+), 73 deletions(-)

diff --git a/include/openvswitch/json.h b/include/openvswitch/json.h
index 555440760..80b9479c7 100644
--- a/include/openvswitch/json.h
+++ b/include/openvswitch/json.h
@@ -81,6 +81,7 @@ struct json *json_boolean_create(bool);
 struct json *json_string_create(const char *);
 struct json *json_string_create_nocopy(char *);
 struct json *json_serialized_object_create(const struct json *);
+struct json *json_serialized_object_create_from_string(const char *);
 struct json *json_integer_create(long long int);
 struct json *json_real_create(double);
 
diff --git a/lib/json.c b/lib/json.c
index d40e93857..66b1f571f 100644
--- a/lib/json.c
+++ b/lib/json.c
@@ -199,6 +199,14 @@ json_serialized_object_create(const struct json *src)
 return json;
 }
 
+struct json *
+json_serialized_object_create_from_string(const char *s)
+{
+struct json *json = json_create(JSON_SERIALIZED_OBJECT);
+json->string = xstrdup(s);
+return json;
+}
+
 struct json *
 json_serialized_object_create_with_yield(const struct json *src)
 {
diff --git a/lib/ovsdb-data.c b/lib/ovsdb-data.c
index defb048d7..f32b7975a 100644
--- a/lib/ovsdb-data.c
+++ b/lib/ovsdb-data.c
@@ -492,6 +492,45 @@ ovsdb_atom_to_json__(const union ovsdb_atom *atom, enum 
ovsdb_atomic_type type,
 }
 }
 
+/* Serializes 'atom', of the specified 'type', into 'ds' in JSON format.
+ *
+ * Refer to RFC 7047 for the format of the JSON that this function produces. */
+static void
+ovsdb_atom_to_json_ds(const union ovsdb_atom *atom,
+  enum ovsdb_atomic_type type, struct ds *ds)
+{
+switch (type) {
+case OVSDB_TYPE_VOID:
+OVS_NOT_REACHED();
+
+case OVSDB_TYPE_INTEGER:
+ds_put_format(ds, "%lld", (long long) atom->integer);
+return;
+
+case OVSDB_TYPE_REAL:
+ds_put_format(ds, "%.*g", DBL_DIG, atom->real);
+return;
+
+case OVSDB_TYPE_BOOLEAN:
+ds_put_cstr(ds, atom->boolean ? "true" : "false");
+return;
+
+case OVSDB_TYPE_STRING:
+json_to_ds(atom->s, JSSF_SORT, ds);
+return;
+
+case OVSDB_TYPE_UUID:
+ds_put_cstr(ds, "[\"uuid\",\"");
+ds_put_format(ds, UUID_FMT, UUID_ARGS(&atom->uuid));
+ds_put_cstr(ds, "\"]");
+return;
+
+case OVSDB_N_TYPES:
+default:
+OVS_NOT_REACHED();
+}
+}
+
 struct json *
 ovsdb_atom_to_json(const union ovsdb_atom *atom, enum ovsdb_atomic_type type)
 {
@@ -1445,6 +1484,23 @@ ovsdb_base_to_json(const union ovsdb_atom *atom,
 }
 }
 
+static void
+ovsdb_base_to_json_ds(const union ovsdb_atom *atom,
+  const struct ovsdb_base_type *base,
+  bool use_row_names,
+  struct ds *ds)
+{
+if (!use_row_names
+|| base->type != OVSDB_TYPE_UUID
+|| !base->uuid.refTableName) {
+ovsdb_atom_to_json_ds(atom, base->type, ds);
+} else {
+ds_put_cstr(ds, "[\"named-uuid\",\"");
+ds_put_format(ds, UUID_ROW_FMT, UUID_ARGS(&atom->uuid));
+ds_put_cstr(ds, "\"]");
+}
+}
+
 static struct json *
 ovsdb_datum_to_json__(const struct ovsdb_datum *datum,
   const struct ovsdb

[ovs-dev] [PATCH v5 2/3] lib/json: Simplify string serialization code.

2024-07-01 Thread Grigorii Nazarov
Signed-off-by: Grigorii Nazarov 
---
There's an open question on whether this function should exist, or being
placed in header etc. However, no decision was made yet.

v2: fixed title
v4: changed patch number from 3/4 to 2/3

 lib/json.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/lib/json.c b/lib/json.c
index 001f6e6ab..d40e93857 100644
--- a/lib/json.c
+++ b/lib/json.c
@@ -127,7 +127,9 @@ static void json_parser_input(struct json_parser *, struct 
json_token *);
 
 static void json_error(struct json_parser *p, const char *format, ...)
 OVS_PRINTF_FORMAT(2, 3);
-
+
+static void json_serialize_string(const char *, struct ds *);
+
 const char *
 json_type_to_string(enum json_type type)
 {
@@ -987,11 +989,7 @@ exit:
 void
 json_string_escape(const char *in, struct ds *out)
 {
-struct json json = {
-.type = JSON_STRING,
-.string = CONST_CAST(char *, in),
-};
-json_to_ds(&json, 0, out);
+json_serialize_string(in, out);
 }
 
 static void
@@ -1572,7 +1570,6 @@ static void json_serialize_object(const struct shash 
*object,
   struct json_serializer *);
 static void json_serialize_array(const struct json_array *,
  struct json_serializer *);
-static void json_serialize_string(const char *, struct ds *);
 
 /* Converts 'json' to a string in JSON format, encoded in UTF-8, and returns
  * that string.  The caller is responsible for freeing the returned string,
-- 
2.45.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v5 1/3] ovsdb: Simplify UUID formatting code.

2024-07-01 Thread Grigorii Nazarov
Signed-off-by: Grigorii Nazarov 
---
v2: fixed title
v4: changed patch number from 2/4 to 1/3

 lib/ovsdb-data.c | 8 +---
 lib/uuid.h   | 1 +
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/lib/ovsdb-data.c b/lib/ovsdb-data.c
index abb923ad8..defb048d7 100644
--- a/lib/ovsdb-data.c
+++ b/lib/ovsdb-data.c
@@ -2582,14 +2582,8 @@ char *
 ovsdb_data_row_name(const struct uuid *uuid)
 {
 char *name;
-char *p;
 
-name = xasprintf("row"UUID_FMT, UUID_ARGS(uuid));
-for (p = name; *p != '\0'; p++) {
-if (*p == '-') {
-*p = '_';
-}
-}
+name = xasprintf(UUID_ROW_FMT, UUID_ARGS(uuid));
 
 return name;
 }
diff --git a/lib/uuid.h b/lib/uuid.h
index fa49354f6..6a8069f68 100644
--- a/lib/uuid.h
+++ b/lib/uuid.h
@@ -34,6 +34,7 @@ extern "C" {
  */
 #define UUID_LEN 36
 #define UUID_FMT "%08x-%04x-%04x-%04x-%04x%08x"
+#define UUID_ROW_FMT "row%08x_%04x_%04x_%04x_%04x%08x"
 #define UUID_ARGS(UUID) \
 ((unsigned int) ((UUID)->parts[0])),\
 ((unsigned int) ((UUID)->parts[1] >> 16)),  \
-- 
2.45.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v5 0/3] Optimize json serialization 2.3 times

2024-07-01 Thread Grigorii Nazarov
This patch series optimizes monitor and monitor-cond commands by
optimizing server-side json serialization. This is achieved by removing
intermediate representation. Most importantly, this greatly reduces
allocations and deallocations done in process. This also does reduce
peak memory consumption, but it wasn't a bottleneck and thus wasn't
precisely measured. See patch 3/3 description for more details.

This can be further improved by building complete response in ds. Also
benificial will be to reduce calls to stdio.h library. Proper profiling
is required to identify even further improvements.

The patches 1/3 and 2/3 contain (almost) independent improvements, that were
all-too-easy to be done during the optimization process.

Patch 3/3 is the main part.

Original patch 1/4 was removed due to it not being relevant to the purpose
of series.

v1->v2:
fixed commit message titles
v2->v3:
fixed incorrect bracing in touched code in PATCH 4/4
v3->v4:
removed patch 1/4
fixed incorrectly sent v3 series
added cover letter
v4->v5:
fixed yet again incorrectly sent v4 series

Grigorii Nazarov (3):
  ovsdb: Simplify UUID formatting code.
  lib/json: Simplify string serialization code.
  ovsdb: Optimize monitor update by directly serializing data into ds.

 include/openvswitch/json.h |   1 +
 lib/json.c |  19 ++--
 lib/ovsdb-data.c   | 113 ++--
 lib/ovsdb-data.h   |   3 +
 lib/uuid.h |   1 +
 ovsdb/monitor.c| 210 -
 6 files changed, 260 insertions(+), 87 deletions(-)

-- 
2.45.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v4 2/3] lib/json: Simplify string serialization code.

2024-07-01 Thread Grigorii Nazarov
Signed-off-by: Grigorii Nazarov 
---
There's an open question on whether this function should exist, or being
placed in header etc. However, no decision was made yet.

v2: fixed title
v4: changed patch number from 3/4 to 2/3

 lib/json.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/lib/json.c b/lib/json.c
index 001f6e6ab..d40e93857 100644
--- a/lib/json.c
+++ b/lib/json.c
@@ -127,7 +127,9 @@ static void json_parser_input(struct json_parser *, struct 
json_token *);
 
 static void json_error(struct json_parser *p, const char *format, ...)
 OVS_PRINTF_FORMAT(2, 3);
-
+
+static void json_serialize_string(const char *, struct ds *);
+
 const char *
 json_type_to_string(enum json_type type)
 {
@@ -987,11 +989,7 @@ exit:
 void
 json_string_escape(const char *in, struct ds *out)
 {
-struct json json = {
-.type = JSON_STRING,
-.string = CONST_CAST(char *, in),
-};
-json_to_ds(&json, 0, out);
+json_serialize_string(in, out);
 }
 
 static void
@@ -1572,7 +1570,6 @@ static void json_serialize_object(const struct shash 
*object,
   struct json_serializer *);
 static void json_serialize_array(const struct json_array *,
  struct json_serializer *);
-static void json_serialize_string(const char *, struct ds *);
 
 /* Converts 'json' to a string in JSON format, encoded in UTF-8, and returns
  * that string.  The caller is responsible for freeing the returned string,
-- 
2.45.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v4 3/3] ovsdb: Optimize monitor update by directly serializing data into ds.

2024-07-01 Thread Grigorii Nazarov
Currently serialization is performed by first converting
the internal data representation into JSON objects, followed by
serializing these objects by jsonrpc. This process results in lots of
allocations for these intermediate objects. Consequently, this
not only increases peak memory consumption, but also
demands significantly more CPU work.

By forming row-update JSONs directly in `ds`, which is then used
to create 'serialized object' JSONs, the overall speed increased
by a factor of 2.3.

A local benchmark was run on a proprietary Southbound backup.
Both versions, before and after applying the patch, were measured.
For each version, there were two runs with 10 parallel clients,
and two runs with 30 parallel clients. CPU time was recorded
after startup (before clients started running) and after
all clients received all updates. Clients were essentially running
`ovsdb-client monitor-cond unix:pipe OVN_Southbound
'[[\"actions\",\"!=\",\"sdfdfsd\"]]' Logical_Flow`.
Similar results were obtained with other requests
that required a significant amount of data transfer.
The backup size is about 600 MB. Results are measured in seconds.

Before  After
Baseline x10:   9.53108.54
Baseline x10:   9.62108.67
Baseline x30:   9.69307.04
Baseline x30:   9.65303.32
Patch x10:  9.6752.57
Patch x10:  9.5753.12
Patch x30:  9.53136.33
Patch x30:  9.63135.88

Signed-off-by: Grigorii Nazarov 
---
v2: updated title
v3: fixed bracing
v4: changed patch number from 4/4 to 3/3

 include/openvswitch/json.h |   1 +
 lib/json.c |   8 ++
 lib/ovsdb-data.c   | 105 +++
 lib/ovsdb-data.h   |   3 +
 ovsdb/monitor.c| 210 -
 5 files changed, 254 insertions(+), 73 deletions(-)

diff --git a/include/openvswitch/json.h b/include/openvswitch/json.h
index 555440760..80b9479c7 100644
--- a/include/openvswitch/json.h
+++ b/include/openvswitch/json.h
@@ -81,6 +81,7 @@ struct json *json_boolean_create(bool);
 struct json *json_string_create(const char *);
 struct json *json_string_create_nocopy(char *);
 struct json *json_serialized_object_create(const struct json *);
+struct json *json_serialized_object_create_from_string(const char *);
 struct json *json_integer_create(long long int);
 struct json *json_real_create(double);
 
diff --git a/lib/json.c b/lib/json.c
index d40e93857..66b1f571f 100644
--- a/lib/json.c
+++ b/lib/json.c
@@ -199,6 +199,14 @@ json_serialized_object_create(const struct json *src)
 return json;
 }
 
+struct json *
+json_serialized_object_create_from_string(const char *s)
+{
+struct json *json = json_create(JSON_SERIALIZED_OBJECT);
+json->string = xstrdup(s);
+return json;
+}
+
 struct json *
 json_serialized_object_create_with_yield(const struct json *src)
 {
diff --git a/lib/ovsdb-data.c b/lib/ovsdb-data.c
index defb048d7..f32b7975a 100644
--- a/lib/ovsdb-data.c
+++ b/lib/ovsdb-data.c
@@ -492,6 +492,45 @@ ovsdb_atom_to_json__(const union ovsdb_atom *atom, enum 
ovsdb_atomic_type type,
 }
 }
 
+/* Serializes 'atom', of the specified 'type', into 'ds' in JSON format.
+ *
+ * Refer to RFC 7047 for the format of the JSON that this function produces. */
+static void
+ovsdb_atom_to_json_ds(const union ovsdb_atom *atom,
+  enum ovsdb_atomic_type type, struct ds *ds)
+{
+switch (type) {
+case OVSDB_TYPE_VOID:
+OVS_NOT_REACHED();
+
+case OVSDB_TYPE_INTEGER:
+ds_put_format(ds, "%lld", (long long) atom->integer);
+return;
+
+case OVSDB_TYPE_REAL:
+ds_put_format(ds, "%.*g", DBL_DIG, atom->real);
+return;
+
+case OVSDB_TYPE_BOOLEAN:
+ds_put_cstr(ds, atom->boolean ? "true" : "false");
+return;
+
+case OVSDB_TYPE_STRING:
+json_to_ds(atom->s, JSSF_SORT, ds);
+return;
+
+case OVSDB_TYPE_UUID:
+ds_put_cstr(ds, "[\"uuid\",\"");
+ds_put_format(ds, UUID_FMT, UUID_ARGS(&atom->uuid));
+ds_put_cstr(ds, "\"]");
+return;
+
+case OVSDB_N_TYPES:
+default:
+OVS_NOT_REACHED();
+}
+}
+
 struct json *
 ovsdb_atom_to_json(const union ovsdb_atom *atom, enum ovsdb_atomic_type type)
 {
@@ -1445,6 +1484,23 @@ ovsdb_base_to_json(const union ovsdb_atom *atom,
 }
 }
 
+static void
+ovsdb_base_to_json_ds(const union ovsdb_atom *atom,
+  const struct ovsdb_base_type *base,
+  bool use_row_names,
+  struct ds *ds)
+{
+if (!use_row_names
+|| base->type != OVSDB_TYPE_UUID
+|| !base->uuid.refTableName) {
+ovsdb_atom_to_json_ds(atom, base->type, ds);
+} else {
+ds_put_cstr(ds, "[\"named-uuid\",\"");
+ds_put_format(ds, UUID_ROW_FMT, UUID_ARGS(&atom->uuid));
+ds_put_cstr(ds, "\"]");
+}
+}
+
 static struct json *
 ovsdb_datum_to_json__(const struct ovsdb_datum *datum,
   const struct ovsdb

[ovs-dev] [PATCH v4 1/3] ovsdb: Simplify UUID formatting code.

2024-07-01 Thread Grigorii Nazarov
Signed-off-by: Grigorii Nazarov 
---
v2: fixed title
v4: changed patch number from 2/4 to 1/3

 lib/ovsdb-data.c | 8 +---
 lib/uuid.h   | 1 +
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/lib/ovsdb-data.c b/lib/ovsdb-data.c
index abb923ad8..defb048d7 100644
--- a/lib/ovsdb-data.c
+++ b/lib/ovsdb-data.c
@@ -2582,14 +2582,8 @@ char *
 ovsdb_data_row_name(const struct uuid *uuid)
 {
 char *name;
-char *p;
 
-name = xasprintf("row"UUID_FMT, UUID_ARGS(uuid));
-for (p = name; *p != '\0'; p++) {
-if (*p == '-') {
-*p = '_';
-}
-}
+name = xasprintf(UUID_ROW_FMT, UUID_ARGS(uuid));
 
 return name;
 }
diff --git a/lib/uuid.h b/lib/uuid.h
index fa49354f6..6a8069f68 100644
--- a/lib/uuid.h
+++ b/lib/uuid.h
@@ -34,6 +34,7 @@ extern "C" {
  */
 #define UUID_LEN 36
 #define UUID_FMT "%08x-%04x-%04x-%04x-%04x%08x"
+#define UUID_ROW_FMT "row%08x_%04x_%04x_%04x_%04x%08x"
 #define UUID_ARGS(UUID) \
 ((unsigned int) ((UUID)->parts[0])),\
 ((unsigned int) ((UUID)->parts[1] >> 16)),  \
-- 
2.45.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v4 0/3] Optimize json serialization 2.3 times

2024-07-01 Thread Grigorii Nazarov
This patch series optimizes monitor and monitor-cond commands by
optimizing server-side json serialization. This is achieved by removing
intermediate representation. Most importantly, this greatly reduces
allocations and deallocations done in process. This also does reduce
peak memory consumption, but it wasn't a bottleneck and thus wasn't
precisely measured. See patch 3/3 description for more details.

This can be further improved by building complete response in ds. Also
benificial will be to reduce calls to stdio.h library. Proper profiling
is required to identify even further improvements.

The patches 1/3 and 2/3 contain (almost) independent improvements, that were
all-too-easy to be done during the optimization process.

Patch 3/3 is the main part.

Original patch 1/4 was removed due to it not being relevant to the purpose
of series.

v1->v2:
fixed commit message titles
v2->v3:
fixed incorrect bracing in touched code in PATCH 4/4
v3->v4:
removed patch 1/4
fixed incorrectly sent v3 series
added cover letter

Grigorii Nazarov (3):
  ovsdb: Simplify UUID formatting code.
  lib/json: Simplify string serialization code.
  ovsdb: Optimize monitor update by directly serializing data into ds.

 include/openvswitch/json.h |   1 +
 lib/json.c |  19 ++--
 lib/ovsdb-data.c   | 113 ++--
 lib/ovsdb-data.h   |   3 +
 lib/uuid.h |   1 +
 ovsdb/monitor.c| 210 -
 6 files changed, 260 insertions(+), 87 deletions(-)

-- 
2.45.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] RFC OVN: fabric integration

2024-07-01 Thread Gurpreet Singh
If there is no or minimal overhead of adding an LR or restricting the routing 
to LR, then perhaps keeping that logical separation makes sense. I think the 
design we evaluate or look at has to focus very tightly on performance as any 
non-trivial forwarding performance impact will affect the adoption of the 
solution. 

Regards
Gurpreet

> On Jun 28, 2024, at 11:56 AM, Ilya Maximets  wrote:
> 
> On 6/28/24 17:38, Dumitru Ceara wrote:
>> On 6/28/24 15:05, Ilya Maximets wrote:
>>> On 6/28/24 11:03, Ales Musil wrote:
 Hi Frode,
 
 looking forward to the RFC. AFAIU it means that the routes would be 
 exposed on
 LR, more specifically GW router. Would it make sense to allow this 
 behavior for
 provider networks (LS with localnet port)? In that case we could advertise
 chassis-local information from logical routers attached to BGP-enabled
 switches. E.g.: FIPs, LBs. It would cover the use case for distributed
 routers. To achieve that we should have BGP peers for each chassis that 
 the LS
 is local on.
>>> 
>>> I haven't read the whole thing yet, but can we, please, stop adding routing 
>>> features
>>> to switches? :)  If someone wants routing, they should use a router, IMO.
>>> 
>> 
>> I'm fairly certain that there are precedents in "classic" network
>> appliances: switches that can do a certain amount of routing (even run BGP).
>> 
>> In this case we could add a logical router but I'm not sure that
>> simplifies things.
>> 
> 
> "classic" network appliances are a subject for restrictions of a physical
> material world.  It's just way easier and cheaper to acquire and install
> a single physical box instead of N.  This is not a problem for a virtual
> network.  AP+router+switch+modem combo boxes are also "classic" last mile
> network appliances that we just call a "router".  It doesn't mean we should
> implement one.
> 
> The distinction between OVN logical routers and switches is there for a
> reason.  That is so you can look at the logical topology and understand
> how it works more or less without diving into configuration of every single
> part of it.  If switches do routing and routers do switching, what's the
> point of differentiation?  It only brings more confusion.
> 
> Best regards, Ilya Maximets.
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev