Re: [ovs-dev] [PATCH v29 0/8] Add offload support for sFlow
On 6/27/2024 1:24 PM, Chris Mi wrote: On 6/18/2024 2:27 PM, Chris Mi wrote: On 5/16/2024 5:09 AM, Ilya Maximets wrote: On 3/26/24 03:30, Chris Mi wrote: This patch set adds offload support for sFlow. Psample is a genetlink channel for packet sampling. TC action act_sample uses psample to send sampled packets to userspace. When offloading sample action to TC, userspace creates a unique ID to map sFlow action and tunnel info and passes this ID to kernel instead of the sFlow info. psample will send this ID and sampled packet to userspace. Using the ID, userspace can recover the sFlow info and send sampled packet to the right sFlow monitoring host. Hi, Chris, others. I've been looking through the set and a psample in general for the past few days. There are few fairly minor issues in the set like UBsan crash because of incorrect OVS_RETURNS_NONNULL annotation and a few style and formatting issues, which are not hard to fix. However, I encountered an issue with a way tunnel metadata is recovered that I'm not sure we can actually fix. The algorithm of work in this patch set is described in the cover letter above. The key point is then packet-1 goes though MISS upcall we install a new datapath flow into TC and we remember the tunnel metadata of the packet-1 associated with a sample ID. When the packet-2 hits the datapath flow (tc actions), it gets sent to psample and ovs-vswitchd reads this packet from psample netlink socket and presents it as an ACTION upcall. Doing that it finds tunnel metadata using the sample ID and uses it as a tunnel metadata for packet-2. The key of the problem is that this is metadata from packet-1 and it can be different from metadata of packet-2. An example of this will be having two separate TCP streams going through the same VxLAN tunnel. For load balancing reasons most UDP tunnels choose source port of the outer UDP header based on RSS or 5-tuple hash of the inner packet. This means that two TCP streams going through the same tunnel will have different UDP source port in the outer header. When a packet from a first stream hits a MISS upcall, datapath flow will be installed and the sample ID will be associated with the metadata of the first stream. Chances are this datapath flow will not have exact match on the source port of the tunnel metadata. That means that a packet from the second stream will successfully match on the installed datapath flow and will go to psample with the ID of the first stream. OVS will read it from the psample socket and send to sFlow collector with metadata it found by the ID, i.e. with a tunnel metadata that contains tunnel source port of the first TCP stream, which is incorrect. The test below reproduces the issue. It's not a real test, it always fails on purpose and not very pretty, but what it does is that it creates a tunnel, then starts netcat server on one side and sends two files to it from the other side. These two transfers are separate TCP connections, so they use different source ports, that is causing the tunnel to choose different UDP source ports for them. After the test failure we can see in the sflow.log that all the values for 'tunnel4_in_src_port' are the same for both streams and some exchanges prior to that. However, if offload is disabled, the values will be different as they should. So, unless we can ensure that packets from different streams do not match on the same datapath flow, this schema doesn't work. The only way to ensure that packets from different streams within the same tunnel do not match on the same datapath flow is to always have an exact match on the whole tunnel metadata. But I don't think that is a good idea, because this way we'll have a separate datapath flow per TCP stream, which will be very bad for performance. And it will be bad for hardware offload since hardware NICs normally have a more limited amount of available resources. This leads us to conclusion that only non-tunneling traffic can be offloaded this way. For this to work we'll have to add an explicit match on tunnel metadata being empty (can that be expressed in TC?) But at this point a question arises if this even worth implementing, taking into account that it requires a lot of infrastructure changes throughout all the layers of OVS code just for sampling of non-tunnel packets, i.e. a very narrow use case. Also, my understanding was that there is a plan to offload other types of userspace() action in the future, such as controller() action using the same psample infrastructure. But that will not be possible for the same reason. In addition to tunnel metadata we will also have to provide connection tracking information, which is even harder, because conntrack state is more dynamic and is only actually known to the datapath. psample can't supply this information to the userspace. Thoughts? Hi Ilya, Sorry for the late reply. I applied your following diff and reproduced the issue
[ovs-dev] OVS "soft freeze" for 3.4 is in effect.
Hi. As described in Documentation/internals/release-process.rst, we are in a "soft freeze" state: During the freeze, we ask committers to refrain from applying patches that add new features unless those patches were already being publicly discussed and reviewed before the freeze began. Bug fixes are welcome at any time. Please propose and discuss exceptions on ovs-dev. We should branch for version 3.4 in two weeks from now, on Monday, Jul 15. With that, release should be on Thursday, Aug 15. Currently, on the mailing list we have a few patches and patch sets that were already discussed / reviewed to some extent, and we do also await new versions for several patches and patch sets for which changes were requested. These, of course, can be accepted. If there are some new patches that never been reviewed / not posted yet, please, propose an exception in reply to this email and it can be discussed. One potential exception I'm aware of is a non-RFC version of the local sampling patch set: https://patchwork.ozlabs.org/project/openvswitch/cover/20240605202337.2904041-1-amore...@redhat.com/ Design for this feature has been extensively discussed in the past few months and the kernel counterpart for the feature is almost ready. So, unless there are objections, it should be safe to get this during soft freeze. Of course, assuming that the kernel patches will be accepted and we'll get a non-RFC version of the patch set in a reasonable time for review. Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v7 07/10] selftests: openvswitch: add psample action
Adrian Moreno writes: > Add sample and psample action support to ovs-dpctl.py. > > Refactor common attribute parsing logic into an external function. > > Signed-off-by: Adrian Moreno > --- Reviewed-by: Aaron Conole > .../selftests/net/openvswitch/ovs-dpctl.py| 162 +- > 1 file changed, 161 insertions(+), 1 deletion(-) > > diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py > b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py > index 182a09975975..dcc400a21a22 100644 > --- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py > +++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py > @@ -8,6 +8,7 @@ import argparse > import errno > import ipaddress > import logging > +import math > import multiprocessing > import re > import socket > @@ -60,6 +61,7 @@ OVS_FLOW_CMD_DEL = 2 > OVS_FLOW_CMD_GET = 3 > OVS_FLOW_CMD_SET = 4 > > +UINT32_MAX = 0x > > def macstr(mac): > outstr = ":".join(["%02X" % i for i in mac]) > @@ -281,6 +283,75 @@ def parse_extract_field( > return str_skipped, data > > > +def parse_attrs(actstr, attr_desc): > +"""Parses the given action string and returns a list of netlink > +attributes based on a list of attribute descriptions. > + > +Each element in the attribute description list is a tuple such as: > +(name, attr_name, parse_func) > +where: > +name: is the string representing the attribute > +attr_name: is the name of the attribute as defined in the uAPI. > +parse_func: is a callable accepting a string and returning either > +a single object (the parsed attribute value) or a tuple of > +two values (the parsed attribute value and the remaining string) > + > +Returns a list of attributes and the remaining string. > +""" > +def parse_attr(actstr, key, func): > +actstr = actstr[len(key) :] > + > +if not func: > +return None, actstr > + > +delim = actstr[0] > +actstr = actstr[1:] > + > +if delim == "=": > +pos = strcspn(actstr, ",)") > +ret = func(actstr[:pos]) > +else: > +ret = func(actstr) > + > +if isinstance(ret, tuple): > +(datum, actstr) = ret > +else: > +datum = ret > +actstr = actstr[strcspn(actstr, ",)"):] > + > +if delim == "(": > +if not actstr or actstr[0] != ")": > +raise ValueError("Action contains unbalanced parentheses") > + > +actstr = actstr[1:] > + > +actstr = actstr[strspn(actstr, ", ") :] > + > +return datum, actstr > + > +attrs = [] > +attr_desc = list(attr_desc) > +while actstr and actstr[0] != ")" and attr_desc: > +found = False > +for i, (key, attr, func) in enumerate(attr_desc): > +if actstr.startswith(key): > +datum, actstr = parse_attr(actstr, key, func) > +attrs.append([attr, datum]) > +found = True > +del attr_desc[i] > + > +if not found: > +raise ValueError("Unknown attribute: '%s'" % actstr) > + > +actstr = actstr[strspn(actstr, ", ") :] > + > +if actstr[0] != ")": > +raise ValueError("Action string contains extra garbage or has " > + "unbalanced parenthesis: '%s'" % actstr) > + > +return attrs, actstr[1:] > + > + > class ovs_dp_msg(genlmsg): > # include the OVS version > # We need a custom header rather than just being able to rely on > @@ -299,7 +370,7 @@ class ovsactions(nla): > ("OVS_ACTION_ATTR_SET", "ovskey"), > ("OVS_ACTION_ATTR_PUSH_VLAN", "none"), > ("OVS_ACTION_ATTR_POP_VLAN", "flag"), > -("OVS_ACTION_ATTR_SAMPLE", "none"), > +("OVS_ACTION_ATTR_SAMPLE", "sample"), > ("OVS_ACTION_ATTR_RECIRC", "uint32"), > ("OVS_ACTION_ATTR_HASH", "none"), > ("OVS_ACTION_ATTR_PUSH_MPLS", "none"), > @@ -318,8 +389,85 @@ class ovsactions(nla): > ("OVS_ACTION_ATTR_ADD_MPLS", "none"), > ("OVS_ACTION_ATTR_DEC_TTL", "none"), > ("OVS_ACTION_ATTR_DROP", "uint32"), > +("OVS_ACTION_ATTR_PSAMPLE", "psample"), > ) > > +class psample(nla): > +nla_flags = NLA_F_NESTED > + > +nla_map = ( > +("OVS_PSAMPLE_ATTR_UNSPEC", "none"), > +("OVS_PSAMPLE_ATTR_GROUP", "uint32"), > +("OVS_PSAMPLE_ATTR_COOKIE", "array(uint8)"), > +) > + > +def dpstr(self, more=False): > +args = "group=%d" % self.get_attr("OVS_PSAMPLE_ATTR_GROUP") > + > +cookie = self.get_attr("OVS_PSAMPLE_ATTR_COOKIE") > +if cookie: > +args += ",cookie(%s)" % \ > +"".join(format(x, "02x") for x in cookie) > + > +return "psample(%s)" % args > + > +def parse(self, actstr): > +desc = ( > +
Re: [ovs-dev] [PATCH net-next v7 10/10] selftests: openvswitch: add psample test
Adrian Moreno writes: > Add a test to verify sampling packets via psample works. > > In order to do that, create a subcommand in ovs-dpctl.py to listen to > on the psample multicast group and print samples. > > Signed-off-by: Adrian Moreno > --- > .../selftests/net/openvswitch/openvswitch.sh | 115 +- > .../selftests/net/openvswitch/ovs-dpctl.py| 73 ++- > 2 files changed, 182 insertions(+), 6 deletions(-) > > diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh > b/tools/testing/selftests/net/openvswitch/openvswitch.sh > index 15bca0708717..02a366e01004 100755 > --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh > +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh > @@ -20,7 +20,8 @@ tests=" > nat_related_v4 ip4-nat-related: ICMP related > matches work with SNAT > netlink_checks ovsnl: validate netlink attrs > and settings > upcall_interfaces ovs: test the upcall interfaces > - drop_reason drop: test drop reasons are > emitted" > + drop_reason drop: test drop reasons are > emitted > + psample psample: Sampling packets with > psample" > > info() { > [ $VERBOSE = 0 ] || echo $* > @@ -102,12 +103,21 @@ ovs_netns_spawn_daemon() { > shift > netns=$1 > shift > - info "spawning cmd: $*" > - ip netns exec $netns $* >> $ovs_dir/stdout 2>> $ovs_dir/stderr & > + if [ "$netns" == "_default" ]; then > + $* >> $ovs_dir/stdout 2>> $ovs_dir/stderr & > + else > + ip netns exec $netns $* >> $ovs_dir/stdout 2>> > $ovs_dir/stderr & > + fi > pid=$! > ovs_sbx "$sbx" on_exit "kill -TERM $pid 2>/dev/null" > } > > +ovs_spawn_daemon() { > + sbx=$1 > + shift > + ovs_netns_spawn_daemon $sbx "_default" $* > +} > + > ovs_add_netns_and_veths () { > info "Adding netns attached: sbx:$1 dp:$2 {$3, $4, $5}" > ovs_sbx "$1" ip netns add "$3" || return 1 > @@ -170,6 +180,19 @@ ovs_drop_reason_count() > return `echo "$perf_output" | grep "$pattern" | wc -l` > } > > +ovs_test_flow_fails () { > + ERR_MSG="Flow actions may not be safe on all matching packets" > + > + PRE_TEST=$(dmesg | grep -c "${ERR_MSG}") > + ovs_add_flow $@ &> /dev/null $@ && return 1 > + POST_TEST=$(dmesg | grep -c "${ERR_MSG}") > + > + if [ "$PRE_TEST" == "$POST_TEST" ]; then > + return 1 > + fi > + return 0 > +} > + > usage() { > echo > echo "$0 [OPTIONS] [TEST]..." > @@ -184,6 +207,92 @@ usage() { > exit 1 > } > > + > +# psample test > +# - use psample to observe packets > +test_psample() { > + sbx_add "test_psample" || return $? > + > + # Add a datapath with per-vport dispatching. > + ovs_add_dp "test_psample" psample -V 2:1 || return 1 > + > + info "create namespaces" > + ovs_add_netns_and_veths "test_psample" "psample" \ > + client c0 c1 172.31.110.10/24 -u || return 1 > + ovs_add_netns_and_veths "test_psample" "psample" \ > + server s0 s1 172.31.110.20/24 -u || return 1 > + > + # Check if psample actions can be configured. > + ovs_add_flow "test_psample" psample \ > + 'in_port(1),eth(),eth_type(0x0806),arp()' 'psample(group=1)' Might be good to redirect this stdout/stderr line to /dev/null - otherwise on an unsupported system there will be the following extra splat: Traceback (most recent call last): File "/home/aconole/git/linux/tools/testing/selftests/net/openvswitch/ovs-dpctl.py", line 2774, in sys.exit(main(sys.argv)) ... File "/usr/lib/python3.12/site-packages/pyroute2/netlink/nlsocket.py", line 489, in get raise msg['header']['error'] pyroute2.netlink.exceptions.NetlinkError: (22, 'Invalid argument') > + if [ $? == 1 ]; then > + info "no support for psample - skipping" > + ovs_exit_sig > + return $ksft_skip > + fi > + > + ovs_del_flows "test_psample" psample > + > + # Test action verification. > + OLDIFS=$IFS > + IFS='*' > + min_key='in_port(1),eth(),eth_type(0x0800),ipv4()' > + for testcase in \ > + "cookie to > large"*"psample(group=1,cookie=1615141312111009080706050403020100)" \ > + "no group with cookie"*"psample(cookie=abcd)" \ > + "no group"*"psample()"; > + do > + set -- $testcase; > + ovs_test_flow_fails "test_psample" psample $min_key $2 > + if [ $? == 1 ]; then > + info "failed - $1" > + return 1 > + fi > + done > + IFS=$OLDIFS > + > + ovs_del_flows "test_psample" psample > + # Allow ARP > + ovs_add_flow "test_psample" psample \ > + 'in_port(1),eth(),eth_type(0x0806),arp()' '2' || return 1 >
Re: [ovs-dev] [PATCH net-next v7 10/10] selftests: openvswitch: add psample test
On 6/30/24 21:57, Adrian Moreno wrote: > Add a test to verify sampling packets via psample works. > > In order to do that, create a subcommand in ovs-dpctl.py to listen to > on the psample multicast group and print samples. > > Signed-off-by: Adrian Moreno > --- > .../selftests/net/openvswitch/openvswitch.sh | 115 +- > .../selftests/net/openvswitch/ovs-dpctl.py| 73 ++- > 2 files changed, 182 insertions(+), 6 deletions(-) This version seems to work correctly with and without arguments. Tested-by: Ilya Maximets ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v7 06/10] net: openvswitch: store sampling probability in cb.
Adrian Moreno writes: > When a packet sample is observed, the sampling rate that was used is > important to estimate the real frequency of such event. > > Store the probability of the parent sample action in the skb's cb area > and use it in psample action to pass it down to psample module. > > Acked-by: Eelco Chaudron > Reviewed-by: Ilya Maximets > Signed-off-by: Adrian Moreno > --- Reviewed-by: Aaron Conole > include/uapi/linux/openvswitch.h | 3 ++- > net/openvswitch/actions.c| 20 +--- > net/openvswitch/datapath.h | 3 +++ > net/openvswitch/vport.c | 1 + > 4 files changed, 23 insertions(+), 4 deletions(-) > > diff --git a/include/uapi/linux/openvswitch.h > b/include/uapi/linux/openvswitch.h > index 3dd653748725..3a701bd1f31b 100644 > --- a/include/uapi/linux/openvswitch.h > +++ b/include/uapi/linux/openvswitch.h > @@ -649,7 +649,8 @@ enum ovs_flow_attr { > * Actions are passed as nested attributes. > * > * Executes the specified actions with the given probability on a per-packet > - * basis. > + * basis. Nested actions will be able to access the probability value of the > + * parent @OVS_ACTION_ATTR_SAMPLE. > */ > enum ovs_sample_attr { > OVS_SAMPLE_ATTR_UNSPEC, > diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c > index a035b7e677dd..34af6bce4085 100644 > --- a/net/openvswitch/actions.c > +++ b/net/openvswitch/actions.c > @@ -1048,12 +1048,15 @@ static int sample(struct datapath *dp, struct sk_buff > *skb, > struct nlattr *sample_arg; > int rem = nla_len(attr); > const struct sample_arg *arg; > + u32 init_probability; > bool clone_flow_key; > + int err; > > /* The first action is always 'OVS_SAMPLE_ATTR_ARG'. */ > sample_arg = nla_data(attr); > arg = nla_data(sample_arg); > actions = nla_next(sample_arg, &rem); > + init_probability = OVS_CB(skb)->probability; > > if ((arg->probability != U32_MAX) && > (!arg->probability || get_random_u32() > arg->probability)) { > @@ -1062,9 +1065,16 @@ static int sample(struct datapath *dp, struct sk_buff > *skb, > return 0; > } > > + OVS_CB(skb)->probability = arg->probability; > + > clone_flow_key = !arg->exec; > - return clone_execute(dp, skb, key, 0, actions, rem, last, > - clone_flow_key); > + err = clone_execute(dp, skb, key, 0, actions, rem, last, > + clone_flow_key); > + > + if (!last) > + OVS_CB(skb)->probability = init_probability; > + > + return err; > } > > /* When 'last' is true, clone() should always consume the 'skb'. > @@ -1311,6 +1321,7 @@ static void execute_psample(struct datapath *dp, struct > sk_buff *skb, > struct psample_group psample_group = {}; > struct psample_metadata md = {}; > const struct nlattr *a; > + u32 rate; > int rem; > > nla_for_each_attr(a, nla_data(attr), nla_len(attr), rem) { > @@ -1329,8 +1340,11 @@ static void execute_psample(struct datapath *dp, > struct sk_buff *skb, > psample_group.net = ovs_dp_get_net(dp); > md.in_ifindex = OVS_CB(skb)->input_vport->dev->ifindex; > md.trunc_size = skb->len - OVS_CB(skb)->cutlen; > + md.rate_as_probability = 1; > + > + rate = OVS_CB(skb)->probability ? OVS_CB(skb)->probability : U32_MAX; > > - psample_sample_packet(&psample_group, skb, 0, &md); > + psample_sample_packet(&psample_group, skb, rate, &md); > } > #else > static inline void execute_psample(struct datapath *dp, struct sk_buff *skb, > diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h > index 0cd29971a907..9ca6231ea647 100644 > --- a/net/openvswitch/datapath.h > +++ b/net/openvswitch/datapath.h > @@ -115,12 +115,15 @@ struct datapath { > * fragmented. > * @acts_origlen: The netlink size of the flow actions applied to this skb. > * @cutlen: The number of bytes from the packet end to be removed. > + * @probability: The sampling probability that was applied to this skb; 0 > means > + * no sampling has occurred; U32_MAX means 100% probability. > */ > struct ovs_skb_cb { > struct vport*input_vport; > u16 mru; > u16 acts_origlen; > u32 cutlen; > + u32 probability; > }; > #define OVS_CB(skb) ((struct ovs_skb_cb *)(skb)->cb) > > diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c > index 972ae01a70f7..8732f6e51ae5 100644 > --- a/net/openvswitch/vport.c > +++ b/net/openvswitch/vport.c > @@ -500,6 +500,7 @@ int ovs_vport_receive(struct vport *vport, struct sk_buff > *skb, > OVS_CB(skb)->input_vport = vport; > OVS_CB(skb)->mru = 0; > OVS_CB(skb)->cutlen = 0; > + OVS_CB(skb)->probability = 0; > if (unlikely(dev_net(skb->dev) != ovs_dp_get_net(vport->dp))) { > u32 mark;
Re: [ovs-dev] [PATCH net-next v7 05/10] net: openvswitch: add psample action
Adrian Moreno writes: > Add support for a new action: psample. > > This action accepts a u32 group id and a variable-length cookie and uses > the psample multicast group to make the packet available for > observability. > > The maximum length of the user-defined cookie is set to 16, same as > tc_cookie, to discourage using cookies that will not be offloadable. > > Acked-by: Eelco Chaudron > Signed-off-by: Adrian Moreno > --- Hi Adrian, Just some nits below. > Documentation/netlink/specs/ovs_flow.yaml | 17 > include/uapi/linux/openvswitch.h | 28 ++ > net/openvswitch/Kconfig | 1 + > net/openvswitch/actions.c | 47 +++ > net/openvswitch/flow_netlink.c| 32 ++- > 5 files changed, 124 insertions(+), 1 deletion(-) > > diff --git a/Documentation/netlink/specs/ovs_flow.yaml > b/Documentation/netlink/specs/ovs_flow.yaml > index 4fdfc6b5cae9..46f5d1cd8a5f 100644 > --- a/Documentation/netlink/specs/ovs_flow.yaml > +++ b/Documentation/netlink/specs/ovs_flow.yaml > @@ -727,6 +727,12 @@ attribute-sets: > name: dec-ttl > type: nest > nested-attributes: dec-ttl-attrs > + - > +name: psample > +type: nest > +nested-attributes: psample-attrs > +doc: | > + Sends a packet sample to psample for external observation. >- > name: tunnel-key-attrs > enum-name: ovs-tunnel-key-attr > @@ -938,6 +944,17 @@ attribute-sets: >- > name: gbp > type: u32 > + - > +name: psample-attrs > +enum-name: ovs-psample-attr > +name-prefix: ovs-psample-attr- > +attributes: > + - > +name: group > +type: u32 > + - > +name: cookie > +type: binary > > operations: >name-prefix: ovs-flow-cmd- > diff --git a/include/uapi/linux/openvswitch.h > b/include/uapi/linux/openvswitch.h > index efc82c318fa2..3dd653748725 100644 > --- a/include/uapi/linux/openvswitch.h > +++ b/include/uapi/linux/openvswitch.h > @@ -914,6 +914,31 @@ struct check_pkt_len_arg { > }; > #endif > > +#define OVS_PSAMPLE_COOKIE_MAX_SIZE 16 > +/** > + * enum ovs_psample_attr - Attributes for %OVS_ACTION_ATTR_PSAMPLE > + * action. > + * > + * @OVS_PSAMPLE_ATTR_GROUP: 32-bit number to identify the source of the > + * sample. > + * @OVS_PSAMPLE_ATTR_COOKIE: An optional variable-length binary cookie that > + * contains user-defined metadata. The maximum length is > + * OVS_PSAMPLE_COOKIE_MAX_SIZE bytes. > + * > + * Sends the packet to the psample multicast group with the specified group > and > + * cookie. It is possible to combine this action with the > + * %OVS_ACTION_ATTR_TRUNC action to limit the size of the sample. > + */ > +enum ovs_psample_attr { > + OVS_PSAMPLE_ATTR_GROUP = 1, /* u32 number. */ > + OVS_PSAMPLE_ATTR_COOKIE,/* Optional, user specified cookie. */ > + > + /* private: */ > + __OVS_PSAMPLE_ATTR_MAX > +}; > + > +#define OVS_PSAMPLE_ATTR_MAX (__OVS_PSAMPLE_ATTR_MAX - 1) > + > /** > * enum ovs_action_attr - Action types. > * > @@ -966,6 +991,8 @@ struct check_pkt_len_arg { > * of l3 tunnel flag in the tun_flags field of OVS_ACTION_ATTR_ADD_MPLS > * argument. > * @OVS_ACTION_ATTR_DROP: Explicit drop action. > + * @OVS_ACTION_ATTR_PSAMPLE: Send a sample of the packet to external > observers > + * via psample. > * > * Only a single header can be set with a single %OVS_ACTION_ATTR_SET. Not > all > * fields within a header are modifiable, e.g. the IPv4 protocol and fragment > @@ -1004,6 +1031,7 @@ enum ovs_action_attr { > OVS_ACTION_ATTR_ADD_MPLS, /* struct ovs_action_add_mpls. */ > OVS_ACTION_ATTR_DEC_TTL, /* Nested OVS_DEC_TTL_ATTR_*. */ > OVS_ACTION_ATTR_DROP, /* u32 error code. */ > + OVS_ACTION_ATTR_PSAMPLE, /* Nested OVS_PSAMPLE_ATTR_*. */ > > __OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted > * from userspace. */ > diff --git a/net/openvswitch/Kconfig b/net/openvswitch/Kconfig > index 29a7081858cd..2535f3f9f462 100644 > --- a/net/openvswitch/Kconfig > +++ b/net/openvswitch/Kconfig > @@ -10,6 +10,7 @@ config OPENVSWITCH > (NF_CONNTRACK && ((!NF_DEFRAG_IPV6 || NF_DEFRAG_IPV6) && \ >(!NF_NAT || NF_NAT) && \ >(!NETFILTER_CONNCOUNT || > NETFILTER_CONNCOUNT))) > + depends on PSAMPLE || !PSAMPLE > select LIBCRC32C > select MPLS > select NET_MPLS_GSO > diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c > index 964225580824..a035b7e677dd 100644 > --- a/net/openvswitch/actions.c > +++ b/net/openvswitch/actions.c > @@ -24,6 +24,11 @@ > #include > #include > #include > + > +#if IS_ENABLED(CONFIG_PSAMPLE) > +#include > +#endif > + > #include > > #include "datapath.h" > @@ -1299,6 +1304,39 @@ static int ex
Re: [ovs-dev] [PATCH net-next v7 05/10] net: openvswitch: add psample action
On 6/30/24 21:57, Adrian Moreno wrote: > Add support for a new action: psample. > > This action accepts a u32 group id and a variable-length cookie and uses > the psample multicast group to make the packet available for > observability. > > The maximum length of the user-defined cookie is set to 16, same as > tc_cookie, to discourage using cookies that will not be offloadable. > > Acked-by: Eelco Chaudron > Signed-off-by: Adrian Moreno > --- > Documentation/netlink/specs/ovs_flow.yaml | 17 > include/uapi/linux/openvswitch.h | 28 ++ > net/openvswitch/Kconfig | 1 + > net/openvswitch/actions.c | 47 +++ > net/openvswitch/flow_netlink.c| 32 ++- > 5 files changed, 124 insertions(+), 1 deletion(-) Thanks for addressing the comments! The new name also seems reasonable to me. Reviewed-by: Ilya Maximets ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v7 04/10] net: psample: allow using rate as probability
Adrian Moreno writes: > Although not explicitly documented in the psample module itself, the > definition of PSAMPLE_ATTR_SAMPLE_RATE seems inherited from act_sample. > > Quoting tc-sample(8): > "RATE of 100 will lead to an average of one sampled packet out of every > 100 observed." > > With this semantics, the rates that we can express with an unsigned > 32-bits number are very unevenly distributed and concentrated towards > "sampling few packets". > For example, we can express a probability of 2.32E-8% but we > cannot express anything between 100% and 50%. > > For sampling applications that are capable of sampling a decent > amount of packets, this sampling rate semantics is not very useful. > > Add a new flag to the uAPI that indicates that the sampling rate is > expressed in scaled probability, this is: > - 0 is 0% probability, no packets get sampled. > - U32_MAX is 100% probability, all packets get sampled. > > Acked-by: Eelco Chaudron > Reviewed-by: Ido Schimmel > Signed-off-by: Adrian Moreno > --- Reviewed-by: Aaron Conole > include/net/psample.h| 3 ++- > include/uapi/linux/psample.h | 10 +- > net/psample/psample.c| 3 +++ > 3 files changed, 14 insertions(+), 2 deletions(-) > > diff --git a/include/net/psample.h b/include/net/psample.h > index 2ac71260a546..c52e9ebd88dd 100644 > --- a/include/net/psample.h > +++ b/include/net/psample.h > @@ -24,7 +24,8 @@ struct psample_metadata { > u8 out_tc_valid:1, > out_tc_occ_valid:1, > latency_valid:1, > -unused:5; > +rate_as_probability:1, > +unused:4; > const u8 *user_cookie; > u32 user_cookie_len; > }; > diff --git a/include/uapi/linux/psample.h b/include/uapi/linux/psample.h > index e80637e1d97b..b765f0e81f20 100644 > --- a/include/uapi/linux/psample.h > +++ b/include/uapi/linux/psample.h > @@ -8,7 +8,11 @@ enum { > PSAMPLE_ATTR_ORIGSIZE, > PSAMPLE_ATTR_SAMPLE_GROUP, > PSAMPLE_ATTR_GROUP_SEQ, > - PSAMPLE_ATTR_SAMPLE_RATE, > + PSAMPLE_ATTR_SAMPLE_RATE, /* u32, ratio between observed and > + * sampled packets or scaled probability > + * if PSAMPLE_ATTR_SAMPLE_PROBABILITY > + * is set. > + */ > PSAMPLE_ATTR_DATA, > PSAMPLE_ATTR_GROUP_REFCOUNT, > PSAMPLE_ATTR_TUNNEL, > @@ -20,6 +24,10 @@ enum { > PSAMPLE_ATTR_TIMESTAMP, /* u64, nanoseconds */ > PSAMPLE_ATTR_PROTO, /* u16 */ > PSAMPLE_ATTR_USER_COOKIE, /* binary, user provided data */ > + PSAMPLE_ATTR_SAMPLE_PROBABILITY,/* no argument, interpret rate in > + * PSAMPLE_ATTR_SAMPLE_RATE as a > + * probability scaled 0 - U32_MAX. > + */ > > __PSAMPLE_ATTR_MAX > }; > diff --git a/net/psample/psample.c b/net/psample/psample.c > index 1c76f3e48dcd..f48b5b9cd409 100644 > --- a/net/psample/psample.c > +++ b/net/psample/psample.c > @@ -497,6 +497,9 @@ void psample_sample_packet(struct psample_group *group, > struct sk_buff *skb, > md->user_cookie)) > goto error; > > + if (md->rate_as_probability) > + nla_put_flag(skb, PSAMPLE_ATTR_SAMPLE_PROBABILITY); > + > genlmsg_end(nl_skb, data); > genlmsg_multicast_netns(&psample_nl_family, group->net, nl_skb, 0, > PSAMPLE_NL_MCGRP_SAMPLE, GFP_ATOMIC); ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v7 03/10] net: psample: skip packet copy if no listeners
Adrian Moreno writes: > If nobody is listening on the multicast group, generating the sample, > which involves copying packet data, seems completely unnecessary. > > Return fast in this case. > > Acked-by: Eelco Chaudron > Reviewed-by: Ido Schimmel > Reviewed-by: Simon Horman > Signed-off-by: Adrian Moreno > --- Reviewed-by: Aaron Conole > net/psample/psample.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/net/psample/psample.c b/net/psample/psample.c > index b37488f426bc..1c76f3e48dcd 100644 > --- a/net/psample/psample.c > +++ b/net/psample/psample.c > @@ -376,6 +376,10 @@ void psample_sample_packet(struct psample_group *group, > struct sk_buff *skb, > void *data; > int ret; > > + if (!genl_has_listeners(&psample_nl_family, group->net, > + PSAMPLE_NL_MCGRP_SAMPLE)) > + return; > + > meta_len = (in_ifindex ? nla_total_size(sizeof(u16)) : 0) + > (out_ifindex ? nla_total_size(sizeof(u16)) : 0) + > (md->out_tc_valid ? nla_total_size(sizeof(u16)) : 0) + ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v7 02/10] net: sched: act_sample: add action cookie to sample
Adrian Moreno writes: > If the action has a user_cookie, pass it along to the sample so it can > be easily identified. > > Acked-by: Eelco Chaudron > Reviewed-by: Ido Schimmel > Signed-off-by: Adrian Moreno > --- Reviewed-by: Aaron Conole > net/sched/act_sample.c | 12 > 1 file changed, 12 insertions(+) > > diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c > index a69b53d54039..2ceb4d141b71 100644 > --- a/net/sched/act_sample.c > +++ b/net/sched/act_sample.c > @@ -167,7 +167,9 @@ TC_INDIRECT_SCOPE int tcf_sample_act(struct sk_buff *skb, > { > struct tcf_sample *s = to_sample(a); > struct psample_group *psample_group; > + u8 cookie_data[TC_COOKIE_MAX_SIZE]; > struct psample_metadata md = {}; > + struct tc_cookie *user_cookie; > int retval; > > tcf_lastuse_update(&s->tcf_tm); > @@ -189,6 +191,16 @@ TC_INDIRECT_SCOPE int tcf_sample_act(struct sk_buff *skb, > if (skb_at_tc_ingress(skb) && tcf_sample_dev_ok_push(skb->dev)) > skb_push(skb, skb->mac_len); > > + rcu_read_lock(); > + user_cookie = rcu_dereference(a->user_cookie); > + if (user_cookie) { > + memcpy(cookie_data, user_cookie->data, > +user_cookie->len); > + md.user_cookie = cookie_data; > + md.user_cookie_len = user_cookie->len; > + } > + rcu_read_unlock(); > + > md.trunc_size = s->truncate ? s->trunc_size : skb->len; > psample_sample_packet(psample_group, skb, s->rate, &md); ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v7 01/10] net: psample: add user cookie
Adrian Moreno writes: > Add a user cookie to the sample metadata so that sample emitters can > provide more contextual information to samples. > > If present, send the user cookie in a new attribute: > PSAMPLE_ATTR_USER_COOKIE. > > Acked-by: Eelco Chaudron > Reviewed-by: Simon Horman > Reviewed-by: Ido Schimmel > Signed-off-by: Adrian Moreno > --- Reviewed-by: Aaron Conole > include/net/psample.h| 2 ++ > include/uapi/linux/psample.h | 1 + > net/psample/psample.c| 9 - > 3 files changed, 11 insertions(+), 1 deletion(-) > > diff --git a/include/net/psample.h b/include/net/psample.h > index 0509d2d6be67..2ac71260a546 100644 > --- a/include/net/psample.h > +++ b/include/net/psample.h > @@ -25,6 +25,8 @@ struct psample_metadata { > out_tc_occ_valid:1, > latency_valid:1, > unused:5; > + const u8 *user_cookie; > + u32 user_cookie_len; > }; > > struct psample_group *psample_group_get(struct net *net, u32 group_num); > diff --git a/include/uapi/linux/psample.h b/include/uapi/linux/psample.h > index e585db5bf2d2..e80637e1d97b 100644 > --- a/include/uapi/linux/psample.h > +++ b/include/uapi/linux/psample.h > @@ -19,6 +19,7 @@ enum { > PSAMPLE_ATTR_LATENCY, /* u64, nanoseconds */ > PSAMPLE_ATTR_TIMESTAMP, /* u64, nanoseconds */ > PSAMPLE_ATTR_PROTO, /* u16 */ > + PSAMPLE_ATTR_USER_COOKIE, /* binary, user provided data */ > > __PSAMPLE_ATTR_MAX > }; > diff --git a/net/psample/psample.c b/net/psample/psample.c > index a5d9b8446f77..b37488f426bc 100644 > --- a/net/psample/psample.c > +++ b/net/psample/psample.c > @@ -386,7 +386,9 @@ void psample_sample_packet(struct psample_group *group, > struct sk_buff *skb, > nla_total_size(sizeof(u32)) +/* group_num */ > nla_total_size(sizeof(u32)) +/* seq */ > nla_total_size_64bit(sizeof(u64)) + /* timestamp */ > -nla_total_size(sizeof(u16)); /* protocol */ > +nla_total_size(sizeof(u16)) +/* protocol */ > +(md->user_cookie_len ? > + nla_total_size(md->user_cookie_len) : 0); /* user cookie */ > > #ifdef CONFIG_INET > tun_info = skb_tunnel_info(skb); > @@ -486,6 +488,11 @@ void psample_sample_packet(struct psample_group *group, > struct sk_buff *skb, > } > #endif > > + if (md->user_cookie && md->user_cookie_len && > + nla_put(nl_skb, PSAMPLE_ATTR_USER_COOKIE, md->user_cookie_len, > + md->user_cookie)) > + goto error; > + > genlmsg_end(nl_skb, data); > genlmsg_multicast_netns(&psample_nl_family, group->net, nl_skb, 0, > PSAMPLE_NL_MCGRP_SAMPLE, GFP_ATOMIC); ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [ovs-discuss] [branch-23.09] OVN upgrade for distributed gateway -- need inputs
Hello team: Part of upgrading OVN from 2.11* versions to the latest 23.09, we have found issues in existing north south gateways where failover doesn't work as expected when connected to the latest 23.09 raft control plane. OVS datapath version 2.16.0-2 on host. Tunneling protocol is stt ovn-controller is running in a container with hostnetwork with version 23.09 on the north south gateways using ovs user space 3.2.* Following the https://docs.ovn.org/en/latest/intro/install/ovn-upgrades.html#fail-safe-upgrade , even if setting northd version on the gateways, when one of the gateway's ovn controller is stopped, there is no failover triggered. For chassis, things work as expected where we see updates in ovn controller logs for schema upgrade and data path works fine as expected for the workloads scheduled on that chassis. For n-s gateways, we don't see ovn-controller claiming lports when failover is triggered. Also if there are any suggestions for interconnection gateway upgrade procedure too, would be great. Let us know for any suggestions/recommendations as we continue to debug. Regards, Aliasgar ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v1 2/2] ovs-monitor-ipsec: LibreSwan v5 support.
On Fri, Jun 28, 2024 at 12:24:06AM -0400, Mike Pattrick wrote: > In version 5, LibreSwan made significant command line interface changes. > This includes changing the order or command line parameters and removing > the "ipsec auto" command. > > To maintain compatibility with previous versions, the ipsec.d version > check is repurposed for this. Checking the version proved simpler than > removing use of auto. > > There was also a change to ipsec status command that effected the tests. > However, this change was backwards compatible. > > Reported-at: https://issues.redhat.com/browse/FDP-645 > Reported-by: Ilya Maximets > Signed-off-by: Mike Pattrick Acked-by: Simon Horman ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v1 1/2] ovs-monitor-ipsec: LibreSwan autodetect verion.
On Fri, Jun 28, 2024 at 12:24:05AM -0400, Mike Pattrick wrote: > Previously a change was made to LibreSwan necessitating the detection > of version numbers. However, this change didn't properly account for all > possible output from "ipsec version". When installed from the git > repository, LibreSwan will report versions differently then when > installed from a package. > > Fixes: 3ddb31f60487 ("ovs-monitor-ipsec: LibreSwan autodetect paths.") > Signed-off-by: Mike Pattrick Acked-by: Simon Horman ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v7 05/10] net: openvswitch: add psample action
On Sun, Jun 30, 2024 at 09:57:26PM +0200, Adrian Moreno wrote: > Add support for a new action: psample. > > This action accepts a u32 group id and a variable-length cookie and uses > the psample multicast group to make the packet available for > observability. > > The maximum length of the user-defined cookie is set to 16, same as > tc_cookie, to discourage using cookies that will not be offloadable. > > Acked-by: Eelco Chaudron > Signed-off-by: Adrian Moreno > --- > Documentation/netlink/specs/ovs_flow.yaml | 17 > include/uapi/linux/openvswitch.h | 28 ++ > net/openvswitch/Kconfig | 1 + > net/openvswitch/actions.c | 47 +++ > net/openvswitch/flow_netlink.c| 32 ++- > 5 files changed, 124 insertions(+), 1 deletion(-) > > diff --git a/Documentation/netlink/specs/ovs_flow.yaml > b/Documentation/netlink/specs/ovs_flow.yaml > index 4fdfc6b5cae9..46f5d1cd8a5f 100644 > --- a/Documentation/netlink/specs/ovs_flow.yaml > +++ b/Documentation/netlink/specs/ovs_flow.yaml > @@ -727,6 +727,12 @@ attribute-sets: > name: dec-ttl > type: nest > nested-attributes: dec-ttl-attrs > + - > +name: psample > +type: nest > +nested-attributes: psample-attrs > +doc: | > + Sends a packet sample to psample for external observation. >- > name: tunnel-key-attrs > enum-name: ovs-tunnel-key-attr > @@ -938,6 +944,17 @@ attribute-sets: >- > name: gbp > type: u32 > + - > +name: psample-attrs > +enum-name: ovs-psample-attr > +name-prefix: ovs-psample-attr- > +attributes: > + - > +name: group > +type: u32 > + - > +name: cookie > +type: binary > > operations: >name-prefix: ovs-flow-cmd- > diff --git a/include/uapi/linux/openvswitch.h > b/include/uapi/linux/openvswitch.h > index efc82c318fa2..3dd653748725 100644 > --- a/include/uapi/linux/openvswitch.h > +++ b/include/uapi/linux/openvswitch.h > @@ -914,6 +914,31 @@ struct check_pkt_len_arg { > }; > #endif > > +#define OVS_PSAMPLE_COOKIE_MAX_SIZE 16 In your patch #2 you use "TC_COOKIE_MAX_SIZE" as an array size for your cookie. I know that now OVS_PSAMPLE_COOKIE_MAX_SIZE == TC_COOKIE_MAX_SIZE, so this size will be validated correctly. But how likely is that those 2 constants will have different values in the future? Would it be reasonable to create more strict dependency between those macros, e.g.: #define OVS_PSAMPLE_COOKIE_MAX_SIZE TC_COOKIE_MAX_SIZE or, at least, add a comment that the size shouldn't be bigger than TC_COOKIE_MAX_SIZE? I'm just considering the risk of exceeding the array from the patch #2 when somebody increases OVS_PSAMPLE_COOKIE_MAX_SIZE in the future. Thanks, Michal ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v7 01/10] net: psample: add user cookie
On Sun, Jun 30, 2024 at 09:57:22PM +0200, Adrian Moreno wrote: > Add a user cookie to the sample metadata so that sample emitters can > provide more contextual information to samples. > > If present, send the user cookie in a new attribute: > PSAMPLE_ATTR_USER_COOKIE. > > Acked-by: Eelco Chaudron > Reviewed-by: Simon Horman > Reviewed-by: Ido Schimmel > Signed-off-by: Adrian Moreno > --- > Reviewed-by: Michal Kubiak ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v7 05/10] net: openvswitch: add psample action
On Mon, Jul 01, 2024 at 01:40:39PM GMT, Michal Kubiak wrote: > On Sun, Jun 30, 2024 at 09:57:26PM +0200, Adrian Moreno wrote: > > Add support for a new action: psample. > > > > This action accepts a u32 group id and a variable-length cookie and uses > > the psample multicast group to make the packet available for > > observability. > > > > The maximum length of the user-defined cookie is set to 16, same as > > tc_cookie, to discourage using cookies that will not be offloadable. > > > > Acked-by: Eelco Chaudron > > Signed-off-by: Adrian Moreno > > --- > > Documentation/netlink/specs/ovs_flow.yaml | 17 > > include/uapi/linux/openvswitch.h | 28 ++ > > net/openvswitch/Kconfig | 1 + > > net/openvswitch/actions.c | 47 +++ > > net/openvswitch/flow_netlink.c| 32 ++- > > 5 files changed, 124 insertions(+), 1 deletion(-) > > > > diff --git a/Documentation/netlink/specs/ovs_flow.yaml > > b/Documentation/netlink/specs/ovs_flow.yaml > > index 4fdfc6b5cae9..46f5d1cd8a5f 100644 > > --- a/Documentation/netlink/specs/ovs_flow.yaml > > +++ b/Documentation/netlink/specs/ovs_flow.yaml > > @@ -727,6 +727,12 @@ attribute-sets: > > name: dec-ttl > > type: nest > > nested-attributes: dec-ttl-attrs > > + - > > +name: psample > > +type: nest > > +nested-attributes: psample-attrs > > +doc: | > > + Sends a packet sample to psample for external observation. > >- > > name: tunnel-key-attrs > > enum-name: ovs-tunnel-key-attr > > @@ -938,6 +944,17 @@ attribute-sets: > >- > > name: gbp > > type: u32 > > + - > > +name: psample-attrs > > +enum-name: ovs-psample-attr > > +name-prefix: ovs-psample-attr- > > +attributes: > > + - > > +name: group > > +type: u32 > > + - > > +name: cookie > > +type: binary > > > > operations: > >name-prefix: ovs-flow-cmd- > > diff --git a/include/uapi/linux/openvswitch.h > > b/include/uapi/linux/openvswitch.h > > index efc82c318fa2..3dd653748725 100644 > > --- a/include/uapi/linux/openvswitch.h > > +++ b/include/uapi/linux/openvswitch.h > > @@ -914,6 +914,31 @@ struct check_pkt_len_arg { > > }; > > #endif > > > > +#define OVS_PSAMPLE_COOKIE_MAX_SIZE 16 > > In your patch #2 you use "TC_COOKIE_MAX_SIZE" as an array size for your > cookie. I know that now OVS_PSAMPLE_COOKIE_MAX_SIZE == TC_COOKIE_MAX_SIZE, > so this size will be validated correctly. > But how likely is that those 2 constants will have different values in the > future? > Would it be reasonable to create more strict dependency between those > macros, e.g.: > > #define OVS_PSAMPLE_COOKIE_MAX_SIZE TC_COOKIE_MAX_SIZE > > or, at least, add a comment that the size shouldn't be bigger than > TC_COOKIE_MAX_SIZE? > I'm just considering the risk of exceeding the array from the patch #2 when > somebody increases OVS_PSAMPLE_COOKIE_MAX_SIZE in the future. > > Thanks, > Michal > Hi Michal, Thanks for sharing your thoughts. I tried to keep the dependency between both cookie sizes loose. I don't want a change in TC_COOKIE_MAX_SIZE to inadvertently modify OVS_PSAMPLE_COOKIE_MAX_SIZE because OVS might not need a bigger cookie and even if it does, backwards compatibility needs to be guaranteed: meaning OVS userspace will have to detect the new size and use it or fall back to a smaller cookie for older kernels. All this needs to be known and worked on in userspace. On the other hand, I intentionally made OVS's "psample" action as similar as possible to act_psample, including setting the same cookie size to begin with. The reason is that I think we should try to implement tc-flower offloading of this action using act_sample, plus 16 seemed a very reasonable max value. When we decide to support offloading the "psample" action, this must be done entirely in userspace. OVS must create a act_sample action (instead of the OVS "psample" one) via netlink. In no circumstances the openvswitch kmod interacts with tc directly. Now, back to your concern. If OVS_PSAMPLE_COOKIE_MAX_SIZE grows and TC_COOKIE_MAX_SIZE does not *and* we already support offloading this action to tc, the only consequence is that OVS userspace has a problem because the tc's netlink interface will reject cookies larger than TC_COOKIE_MAX_SIZE [1]. This guarantees that the array in patch #2 is never overflown. OVS will have to deal with the different sizes and try to squeeze the data into TC_COOKIE_MAX_SIZE or fail to offload the action altogether. Psample does not have a size limit so different parts of the kernel can use psample with different internal max-sizes without any restriction. I hope this clears your concerns. [1] https://github.com/torvalds/linux/blob/master/net/sched/act_api.c#L1299 Thanks. Adrián ___ dev mailing list d...@openvswitch.org https://m
Re: [ovs-dev] [RFC v2 5/9] ofproto-dpif-xlate: Use emit_sample for local sample.
On 1 Jul 2024, at 14:22, Adrián Moreno wrote: > On Mon, Jul 01, 2024 at 01:30:12PM GMT, Eelco Chaudron wrote: >> >> >> On 28 Jun 2024, at 18:40, Adrián Moreno wrote: >> >>> On Wed, Jun 26, 2024 at 02:11:51PM GMT, Eelco Chaudron wrote: On 5 Jun 2024, at 22:23, Adrian Moreno wrote: > Use the newly added emit_sample to implement OpenFlow sample() actions > with local sampling configuration. See some comments below. Cheers, Eelco > Signed-off-by: Adrian Moreno > --- > ofproto/ofproto-dpif-lsample.c | 17 > ofproto/ofproto-dpif-lsample.h | 6 ++ > ofproto/ofproto-dpif-xlate.c | 163 - > ofproto/ofproto-dpif-xlate.h | 3 +- > ofproto/ofproto-dpif.c | 2 +- > 5 files changed, 144 insertions(+), 47 deletions(-) > > diff --git a/ofproto/ofproto-dpif-lsample.c > b/ofproto/ofproto-dpif-lsample.c > index 7bdafac19..2c0b5da89 100644 > --- a/ofproto/ofproto-dpif-lsample.c > +++ b/ofproto/ofproto-dpif-lsample.c > @@ -139,6 +139,23 @@ dpif_lsample_set_options(struct dpif_lsample > *lsample, > return changed; > } > > +/* Returns the group_id of the exporter with the given collector_set_id, > if it > + * exists. */ nit: The below will fit on one line /* Returns the exporter group_id for a given collector_set_id, if it exists. */ >>> >>> Ack. >>> > +bool > +dpif_lsample_get_group_id(struct dpif_lsample *ps, uint32_t > collector_set_id, > + uint32_t *group_id) > +{ > +struct lsample_exporter_node *node; > +bool found = false; > + > +node = dpif_lsample_find_exporter_node(ps, collector_set_id); > +if (node) { > +found = true; > +*group_id = node->exporter.options.group_id; > +} > +return found; > +} > + > struct dpif_lsample * > dpif_lsample_create(void) > { > diff --git a/ofproto/ofproto-dpif-lsample.h > b/ofproto/ofproto-dpif-lsample.h > index c23ea8372..f13425575 100644 > --- a/ofproto/ofproto-dpif-lsample.h > +++ b/ofproto/ofproto-dpif-lsample.h > @@ -19,6 +19,7 @@ > > #include > #include > +#include Maybe add in alphabetical order. >>> >>> Ack. >>> > struct dpif_lsample; > struct ofproto_lsample_options; > @@ -31,4 +32,9 @@ bool dpif_lsample_set_options(struct dpif_lsample *, >const struct ofproto_lsample_options *, >size_t n_opts); > > +bool dpif_lsample_get_group_id(struct dpif_lsample *, > + uint32_t collector_set_id, > + uint32_t *group_id); > + > #endif /* OFPROTO_DPIF_LSAMPLE_H */ > + Accedantely added a newline? >>> >>> Ack. >>> > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c > index 7c4950895..5bd215d31 100644 > --- a/ofproto/ofproto-dpif-xlate.c > +++ b/ofproto/ofproto-dpif-xlate.c > @@ -47,6 +47,7 @@ > #include "ofproto/ofproto-dpif-ipfix.h" > #include "ofproto/ofproto-dpif-mirror.h" > #include "ofproto/ofproto-dpif-monitor.h" > +#include "ofproto/ofproto-dpif-lsample.h" > #include "ofproto/ofproto-dpif-sflow.h" > #include "ofproto/ofproto-dpif-trace.h" > #include "ofproto/ofproto-dpif-xlate-cache.h" > @@ -117,6 +118,7 @@ struct xbridge { > struct dpif_sflow *sflow; /* SFlow handle, or null. */ > struct dpif_ipfix *ipfix; /* Ipfix handle, or null. */ > struct netflow *netflow; /* Netflow handle, or null. */ > +struct dpif_lsample *lsample; /* Local sample handle, or null. */ > struct stp *stp; /* STP or null if disabled. */ > struct rstp *rstp;/* RSTP or null if disabled. */ > > @@ -687,6 +689,7 @@ static void xlate_xbridge_set(struct xbridge *, > struct dpif *, >const struct dpif_sflow *, >const struct dpif_ipfix *, >const struct netflow *, > + const struct dpif_lsample *, >bool forward_bpdu, bool has_in_band, >const struct dpif_backer_support *, >const struct xbridge_addr *); > @@ -1070,6 +1073,7 @@ xlate_xbridge_set(struct xbridge *xbridge, >const struct dpif_sflow *sflow, >const struct dpif_ipfix *ipfix, >const struct netflow *netflow, > + const struct dpif_lsample *lsample, nit: I would move above netflow, as you also do the actual init/unref >>>
[ovs-dev] [PATCH 5/5] conntrack: Use a per zone default limit.
Before this change the default limit, instead of being considered per-zone, was considered as a global value that every new entry was checked against during the creation. This was not the intended behavior as the default limit should be inherited by each zone instead of being an aggregate number. This change corrects that by removing the default limit from the cmap and making it global (atomic). Now, whenever a new connection needs to be committed, if default_zone_limit is set and the entry for the zone doesn't exist, a new entry for that zone is lazily created, marked as default. All subsequent packets for that zone will undergo the regular lookup process. Operations such as creation/deletion are modified accordingly taking into account this new behavior. Worth noting that OVS_REQUIRES(ct->ct_lock) is not a strict requirement for zone_limit_lookup_or_default(), however since the function operates under the lock and it can create an entry in the slow path, the lock requirement is enforced in order to make thread safety checks work. The function can still be moved outside the creation lock or any lock, keeping the fastpath lockless (turning zone_limit_lookup_protected() to its unprotected version) and locking only in the slow path (replacing zone_limit_create__() with zone_limit_create__(). The patch also extends `conntrack - limit by zone` test in order to check the behavior, and while at it, update test's packet-out to use compose-packet function. Fixes: a7f33fdbfb67 ("conntrack: Support zone limits.") Reported-at: https://issues.redhat.com/browse/FDP-122 Reported-by: Ilya Maximets Signed-off-by: Paolo Valerio --- lib/conntrack-private.h | 7 +- lib/conntrack.c | 233 +++- tests/system-traffic.at | 64 +++ 3 files changed, 231 insertions(+), 73 deletions(-) diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h index 2770470d1..46b212754 100644 --- a/lib/conntrack-private.h +++ b/lib/conntrack-private.h @@ -198,10 +198,11 @@ enum ct_ephemeral_range { #define FOR_EACH_PORT_IN_RANGE(curr, min, max) \ FOR_EACH_PORT_IN_RANGE__(curr, min, max, OVS_JOIN(idx, __COUNTER__)) +#define ZONE_LIMIT_CONN_DEFAULT -1 struct conntrack_zone_limit { int32_t zone; -atomic_uint32_t limit; +atomic_int64_t limit; atomic_count count; uint32_t zone_limit_seq; /* Used to disambiguate zone limit counts. */ }; @@ -212,6 +213,9 @@ struct conntrack { struct rculist exp_lists[N_EXP_LISTS] OVS_GUARDED; struct cmap zone_limits OVS_GUARDED; struct cmap timeout_policies OVS_GUARDED; +uint32_t zone_limit_seq OVS_GUARDED; /* Used to disambiguate zone limit + * counts. */ +atomic_uint32_t default_zone_limit; uint32_t hash_basis; /* Salt for hashing a connection key. */ pthread_t clean_thread; /* Periodically cleans up connection tracker. */ @@ -234,7 +238,6 @@ struct conntrack { * control context. */ struct ipf *ipf; /* Fragmentation handling context. */ -uint32_t zone_limit_seq; /* Used to disambiguate zone limit counts. */ atomic_bool tcp_seq_chk; /* Check TCP sequence numbers. */ atomic_uint32_t sweep_ms; /* Next sweep interval. */ }; diff --git a/lib/conntrack.c b/lib/conntrack.c index ac0790e11..0e128a0c6 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -270,6 +270,7 @@ conntrack_init(void) atomic_init(&ct->n_conn_limit, DEFAULT_N_CONN_LIMIT); atomic_init(&ct->tcp_seq_chk, true); atomic_init(&ct->sweep_ms, 2); +atomic_init(&ct->default_zone_limit, 0); latch_init(&ct->clean_thread_exit); ct->clean_thread = ovs_thread_create("ct_clean", clean_thread_main, ct); ct->ipf = ipf_init(); @@ -296,6 +297,28 @@ zone_key_hash(int32_t zone, uint32_t basis) return hash; } +static int64_t +zone_limit_get_limit__(struct conntrack_zone_limit *czl) +{ +int64_t limit; +atomic_read_relaxed(&czl->limit, &limit); + +return limit; +} + +static int64_t +zone_limit_get_limit(struct conntrack *ct, struct conntrack_zone_limit *czl) +{ +int64_t limit = zone_limit_get_limit__(czl); + +if (limit == ZONE_LIMIT_CONN_DEFAULT) { +atomic_read_relaxed(&ct->default_zone_limit, &limit); +limit = limit ? : -1; +} + +return limit; +} + static struct zone_limit * zone_limit_lookup_protected(struct conntrack *ct, int32_t zone) OVS_REQUIRES(ct->ct_lock) @@ -323,11 +346,56 @@ zone_limit_lookup(struct conntrack *ct, int32_t zone) return NULL; } +static struct zone_limit * +zone_limit_create__(struct conntrack *ct, int32_t zone, int64_t limit) +OVS_REQUIRES(ct->ct_lock) +{ +struct zone_limit *zl = NULL; + +if (zone > DEFAULT_ZONE && zone <= MAX_ZONE) { +zl = xmalloc(sizeof *zl); +atomic_init(&zl->czl.limit, limit); +atomic_count_init(&zl->czl.count, 0); +zl->czl.zone = zone; +
[ovs-dev] [PATCH 4/5] conntrack: Turn zl local limit into atomic.
while at it, changes struct zone_limit initialization in zone_limit_create() in order to use atomic init operations instead of relying on memset() which, although correctly initializes the struct, is semantially not aware of atomics. Signed-off-by: Paolo Valerio --- lib/conntrack-private.h | 2 +- lib/conntrack.c | 19 --- 2 files changed, 13 insertions(+), 8 deletions(-) diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h index 2c625d710..2770470d1 100644 --- a/lib/conntrack-private.h +++ b/lib/conntrack-private.h @@ -201,7 +201,7 @@ enum ct_ephemeral_range { struct conntrack_zone_limit { int32_t zone; -uint32_t limit; +atomic_uint32_t limit; atomic_count count; uint32_t zone_limit_seq; /* Used to disambiguate zone limit counts. */ }; diff --git a/lib/conntrack.c b/lib/conntrack.c index 0481a8c8a..ac0790e11 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -341,7 +341,7 @@ zone_limit_get(struct conntrack *ct, int32_t zone) struct zone_limit *zl = zone_limit_lookup_or_default(ct, zone); if (zl) { czl.zone = zl->czl.zone; -czl.limit = zl->czl.limit; +atomic_read_relaxed(&zl->czl.limit, &czl.limit); czl.count = atomic_count_get(&zl->czl.count); } return czl; @@ -358,8 +358,9 @@ zone_limit_create(struct conntrack *ct, int32_t zone, uint32_t limit) } if (zone >= DEFAULT_ZONE && zone <= MAX_ZONE) { -zl = xzalloc(sizeof *zl); -zl->czl.limit = limit; +zl = xmalloc(sizeof *zl); +atomic_init(&zl->czl.limit, limit); +atomic_count_init(&zl->czl.count, 0); zl->czl.zone = zone; zl->czl.zone_limit_seq = ct->zone_limit_seq++; uint32_t hash = zone_key_hash(zone, ct->hash_basis); @@ -376,7 +377,7 @@ zone_limit_update(struct conntrack *ct, int32_t zone, uint32_t limit) int err = 0; struct zone_limit *zl = zone_limit_lookup(ct, zone); if (zl) { -zl->czl.limit = limit; +atomic_store_relaxed(&zl->czl.limit, limit); VLOG_INFO("Changed zone limit of %u for zone %d", limit, zone); } else { ovs_mutex_lock(&ct->ct_lock); @@ -916,12 +917,16 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, } if (commit) { +uint32_t czl_limit; struct conn_key_node *fwd_key_node, *rev_key_node; struct zone_limit *zl = zone_limit_lookup_or_default(ct, ctx->key.zone); -if (zl && atomic_count_get(&zl->czl.count) >= zl->czl.limit) { -COVERAGE_INC(conntrack_zone_full); -return nc; +if (zl) { +atomic_read_relaxed(&zl->czl.limit, &czl_limit); +if (atomic_count_get(&zl->czl.count) >= czl_limit) { +COVERAGE_INC(conntrack_zone_full); +return nc; +} } unsigned int n_conn_limit; -- 2.45.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH 3/5] conntrack: Do not use atomics to report zones info.
Atomics are not needed when reporting zone limits. Remove the restriction by defining a non-atomic common structure to report such data. The change also access atomics using the related operations to retrieve atomics reporting only the fields required by the requesting level instead of relying of struct copy. Signed-off-by: Paolo Valerio --- lib/conntrack-private.h | 8 lib/conntrack.c | 11 ++- lib/conntrack.h | 9 - lib/dpif-netdev.c | 6 +++--- 4 files changed, 21 insertions(+), 13 deletions(-) diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h index 6c65caa07..2c625d710 100644 --- a/lib/conntrack-private.h +++ b/lib/conntrack-private.h @@ -198,6 +198,14 @@ enum ct_ephemeral_range { #define FOR_EACH_PORT_IN_RANGE(curr, min, max) \ FOR_EACH_PORT_IN_RANGE__(curr, min, max, OVS_JOIN(idx, __COUNTER__)) + +struct conntrack_zone_limit { +int32_t zone; +uint32_t limit; +atomic_count count; +uint32_t zone_limit_seq; /* Used to disambiguate zone limit counts. */ +}; + struct conntrack { struct ovs_mutex ct_lock; /* Protects the following fields. */ struct cmap conns[UINT16_MAX + 1] OVS_GUARDED; diff --git a/lib/conntrack.c b/lib/conntrack.c index e90ade32f..0481a8c8a 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -330,18 +330,19 @@ zone_limit_lookup_or_default(struct conntrack *ct, int32_t zone) return zl ? zl : zone_limit_lookup(ct, DEFAULT_ZONE); } -struct conntrack_zone_limit +struct conntrack_zone_info zone_limit_get(struct conntrack *ct, int32_t zone) { -struct conntrack_zone_limit czl = { +struct conntrack_zone_info czl = { .zone = DEFAULT_ZONE, .limit = 0, -.count = ATOMIC_COUNT_INIT(0), -.zone_limit_seq = 0, +.count = 0, }; struct zone_limit *zl = zone_limit_lookup_or_default(ct, zone); if (zl) { -czl = zl->czl; +czl.zone = zl->czl.zone; +czl.limit = zl->czl.limit; +czl.count = atomic_count_get(&zl->czl.count); } return czl; } diff --git a/lib/conntrack.h b/lib/conntrack.h index 13bb02ea9..c3136e955 100644 --- a/lib/conntrack.h +++ b/lib/conntrack.h @@ -115,11 +115,10 @@ struct conntrack_dump { uint16_t current_zone; }; -struct conntrack_zone_limit { +struct conntrack_zone_info { int32_t zone; uint32_t limit; -atomic_count count; -uint32_t zone_limit_seq; /* Used to disambiguate zone limit counts. */ +unsigned int count; }; struct timeout_policy { @@ -161,8 +160,8 @@ int conntrack_set_sweep_interval(struct conntrack *ct, uint32_t ms); uint32_t conntrack_get_sweep_interval(struct conntrack *ct); bool conntrack_get_tcp_seq_chk(struct conntrack *ct); struct ipf *conntrack_ipf_ctx(struct conntrack *ct); -struct conntrack_zone_limit zone_limit_get(struct conntrack *ct, - int32_t zone); +struct conntrack_zone_info zone_limit_get(struct conntrack *ct, + int32_t zone); int zone_limit_update(struct conntrack *ct, int32_t zone, uint32_t limit); int zone_limit_delete(struct conntrack *ct, int32_t zone); diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index c7f9e1490..3fbfcfa2b 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -9729,7 +9729,7 @@ dpif_netdev_ct_get_limits(struct dpif *dpif, struct ovs_list *zone_limits_reply) { struct dp_netdev *dp = get_dp_netdev(dpif); -struct conntrack_zone_limit czl; +struct conntrack_zone_info czl; if (!ovs_list_is_empty(zone_limits_request)) { struct ct_dpif_zone_limit *zone_limit; @@ -9738,7 +9738,7 @@ dpif_netdev_ct_get_limits(struct dpif *dpif, if (czl.zone == zone_limit->zone || czl.zone == DEFAULT_ZONE) { ct_dpif_push_zone_limit(zone_limits_reply, zone_limit->zone, czl.limit, -atomic_count_get(&czl.count)); +czl.count); } else { return EINVAL; } @@ -9754,7 +9754,7 @@ dpif_netdev_ct_get_limits(struct dpif *dpif, czl = zone_limit_get(dp->conntrack, z); if (czl.zone == z) { ct_dpif_push_zone_limit(zone_limits_reply, z, czl.limit, -atomic_count_get(&czl.count)); +czl.count); } } } -- 2.45.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH 2/5] conntrack: Add zone limit coverage counter.
Similarly to what it's done for conntrack_full, add conntrack_zone_full increased when new entries are not added due to reaching the zone limit. Signed-off-by: Paolo Valerio --- lib/conntrack.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/lib/conntrack.c b/lib/conntrack.c index db44f8237..e90ade32f 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -50,6 +50,7 @@ COVERAGE_DEFINE(conntrack_full); COVERAGE_DEFINE(conntrack_l3csum_err); COVERAGE_DEFINE(conntrack_l4csum_err); COVERAGE_DEFINE(conntrack_lookup_natted_miss); +COVERAGE_DEFINE(conntrack_zone_full); struct conn_lookup_ctx { struct conn_key key; @@ -918,6 +919,7 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, struct zone_limit *zl = zone_limit_lookup_or_default(ct, ctx->key.zone); if (zl && atomic_count_get(&zl->czl.count) >= zl->czl.limit) { +COVERAGE_INC(conntrack_zone_full); return nc; } -- 2.45.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH 1/5] conntrack: Correctly annotate conntrack member.
While at it update no longer valid comment. Signed-off-by: Paolo Valerio --- lib/conntrack-private.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h index 71367f211..6c65caa07 100644 --- a/lib/conntrack-private.h +++ b/lib/conntrack-private.h @@ -199,11 +199,12 @@ enum ct_ephemeral_range { FOR_EACH_PORT_IN_RANGE__(curr, min, max, OVS_JOIN(idx, __COUNTER__)) struct conntrack { -struct ovs_mutex ct_lock; /* Protects 2 following fields. */ +struct ovs_mutex ct_lock; /* Protects the following fields. */ struct cmap conns[UINT16_MAX + 1] OVS_GUARDED; -struct rculist exp_lists[N_EXP_LISTS]; +struct rculist exp_lists[N_EXP_LISTS] OVS_GUARDED; struct cmap zone_limits OVS_GUARDED; struct cmap timeout_policies OVS_GUARDED; + uint32_t hash_basis; /* Salt for hashing a connection key. */ pthread_t clean_thread; /* Periodically cleans up connection tracker. */ struct latch clean_thread_exit; /* To destroy the 'clean_thread'. */ -- 2.45.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH 0/5] Fix default zone limit
Ilya reported a problem with zone limits in particular when a default is set but a zone specific limit is not. As per subject, the series (path 5/5) addresses this issue by creating an entry in the zone limit map as soon as the first packet for a zone is seen (and the default limit is in place). A test was extended to exercise the behavior and make sure the kernel and userspace datapath behave consistently. Paolo Valerio (5): conntrack: Correctly annotate conntrack member. conntrack: Add zone limit coverage counter. conntrack: Do not use atomics to report zones info. conntrack: Turn zl local limit into atomic. conntrack: Use a per zone default limit. lib/conntrack-private.h | 18 ++- lib/conntrack.c | 241 +++- lib/conntrack.h | 9 +- lib/dpif-netdev.c | 6 +- tests/system-traffic.at | 64 --- 5 files changed, 256 insertions(+), 82 deletions(-) -- 2.45.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC v2 5/9] ofproto-dpif-xlate: Use emit_sample for local sample.
On Mon, Jul 01, 2024 at 01:30:12PM GMT, Eelco Chaudron wrote: > > > On 28 Jun 2024, at 18:40, Adrián Moreno wrote: > > > On Wed, Jun 26, 2024 at 02:11:51PM GMT, Eelco Chaudron wrote: > >> On 5 Jun 2024, at 22:23, Adrian Moreno wrote: > >> > >>> Use the newly added emit_sample to implement OpenFlow sample() actions > >>> with local sampling configuration. > >> > >> See some comments below. > >> > >> Cheers, > >> > >> Eelco > >> > >>> Signed-off-by: Adrian Moreno > >>> --- > >>> ofproto/ofproto-dpif-lsample.c | 17 > >>> ofproto/ofproto-dpif-lsample.h | 6 ++ > >>> ofproto/ofproto-dpif-xlate.c | 163 - > >>> ofproto/ofproto-dpif-xlate.h | 3 +- > >>> ofproto/ofproto-dpif.c | 2 +- > >>> 5 files changed, 144 insertions(+), 47 deletions(-) > >>> > >>> diff --git a/ofproto/ofproto-dpif-lsample.c > >>> b/ofproto/ofproto-dpif-lsample.c > >>> index 7bdafac19..2c0b5da89 100644 > >>> --- a/ofproto/ofproto-dpif-lsample.c > >>> +++ b/ofproto/ofproto-dpif-lsample.c > >>> @@ -139,6 +139,23 @@ dpif_lsample_set_options(struct dpif_lsample > >>> *lsample, > >>> return changed; > >>> } > >>> > >>> +/* Returns the group_id of the exporter with the given collector_set_id, > >>> if it > >>> + * exists. */ > >> > >> nit: The below will fit on one line > >> > >> /* Returns the exporter group_id for a given collector_set_id, if it > >> exists. */ > >> > > > > Ack. > > > >>> +bool > >>> +dpif_lsample_get_group_id(struct dpif_lsample *ps, uint32_t > >>> collector_set_id, > >>> + uint32_t *group_id) > >>> +{ > >>> +struct lsample_exporter_node *node; > >>> +bool found = false; > >>> + > >>> +node = dpif_lsample_find_exporter_node(ps, collector_set_id); > >>> +if (node) { > >>> +found = true; > >>> +*group_id = node->exporter.options.group_id; > >>> +} > >>> +return found; > >>> +} > >>> + > >>> struct dpif_lsample * > >>> dpif_lsample_create(void) > >>> { > >>> diff --git a/ofproto/ofproto-dpif-lsample.h > >>> b/ofproto/ofproto-dpif-lsample.h > >>> index c23ea8372..f13425575 100644 > >>> --- a/ofproto/ofproto-dpif-lsample.h > >>> +++ b/ofproto/ofproto-dpif-lsample.h > >>> @@ -19,6 +19,7 @@ > >>> > >>> #include > >>> #include > >>> +#include > >> > >> Maybe add in alphabetical order. > >> > > > > Ack. > > > >>> struct dpif_lsample; > >>> struct ofproto_lsample_options; > >>> @@ -31,4 +32,9 @@ bool dpif_lsample_set_options(struct dpif_lsample *, > >>>const struct ofproto_lsample_options *, > >>>size_t n_opts); > >>> > >>> +bool dpif_lsample_get_group_id(struct dpif_lsample *, > >>> + uint32_t collector_set_id, > >>> + uint32_t *group_id); > >>> + > >>> #endif /* OFPROTO_DPIF_LSAMPLE_H */ > >>> + > >> > >> Accedantely added a newline? > >> > > > > Ack. > > > >>> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c > >>> index 7c4950895..5bd215d31 100644 > >>> --- a/ofproto/ofproto-dpif-xlate.c > >>> +++ b/ofproto/ofproto-dpif-xlate.c > >>> @@ -47,6 +47,7 @@ > >>> #include "ofproto/ofproto-dpif-ipfix.h" > >>> #include "ofproto/ofproto-dpif-mirror.h" > >>> #include "ofproto/ofproto-dpif-monitor.h" > >>> +#include "ofproto/ofproto-dpif-lsample.h" > >>> #include "ofproto/ofproto-dpif-sflow.h" > >>> #include "ofproto/ofproto-dpif-trace.h" > >>> #include "ofproto/ofproto-dpif-xlate-cache.h" > >>> @@ -117,6 +118,7 @@ struct xbridge { > >>> struct dpif_sflow *sflow; /* SFlow handle, or null. */ > >>> struct dpif_ipfix *ipfix; /* Ipfix handle, or null. */ > >>> struct netflow *netflow; /* Netflow handle, or null. */ > >>> +struct dpif_lsample *lsample; /* Local sample handle, or null. */ > >>> struct stp *stp; /* STP or null if disabled. */ > >>> struct rstp *rstp;/* RSTP or null if disabled. */ > >>> > >>> @@ -687,6 +689,7 @@ static void xlate_xbridge_set(struct xbridge *, > >>> struct dpif *, > >>>const struct dpif_sflow *, > >>>const struct dpif_ipfix *, > >>>const struct netflow *, > >>> + const struct dpif_lsample *, > >>>bool forward_bpdu, bool has_in_band, > >>>const struct dpif_backer_support *, > >>>const struct xbridge_addr *); > >>> @@ -1070,6 +1073,7 @@ xlate_xbridge_set(struct xbridge *xbridge, > >>>const struct dpif_sflow *sflow, > >>>const struct dpif_ipfix *ipfix, > >>>const struct netflow *netflow, > >>> + const struct dpif_lsample *lsample, > >> > >> nit: I would move above netflow, as you also do the actual init/unref > >> before > >> netflow. > > > > Ack. > > > >>>
Re: [ovs-dev] [RFC v2 9/9] tests: Test local sampling.
On Mon, Jul 01, 2024 at 01:33:23PM GMT, Eelco Chaudron wrote: > > > On 28 Jun 2024, at 16:11, Adrián Moreno wrote: > > > On Wed, Jun 26, 2024 at 02:15:02PM GMT, Eelco Chaudron wrote: > >> On 5 Jun 2024, at 22:23, Adrian Moreno wrote: > >> > >>> Test simultaneous IPFIX and local sampling including slow-path. > >> > >> I guess Ilya's comments on this patch summarize most of my comments. I had > >> one small additional question. See below. > >> > >> //Eelco > >> > >>> Signed-off-by: Adrian Moreno > >>> --- > >>> tests/system-common-macros.at | 4 ++ > >>> tests/system-traffic.at | 105 ++ > >>> 2 files changed, 109 insertions(+) > >>> > >>> diff --git a/tests/system-common-macros.at b/tests/system-common-macros.at > >>> index 2a68cd664..22b8885e4 100644 > >>> --- a/tests/system-common-macros.at > >>> +++ b/tests/system-common-macros.at > >>> @@ -378,3 +378,7 @@ m4_define([OVS_CHECK_GITHUB_ACTION], > >>> # OVS_CHECK_DROP_ACTION() > >>> m4_define([OVS_CHECK_DROP_ACTION], > >>> [AT_SKIP_IF([! grep -q "Datapath supports explicit drop action" > >>> ovs-vswitchd.log])]) > >>> + > >>> +# OVS_CHECK_EMIT_SAMPLE() > >>> +m4_define([OVS_CHECK_EMIT_SAMPLE], > >>> +[AT_SKIP_IF([! grep -q "Datapath supports emit_sample" > >>> ovs-vswitchd.log])]) > >>> diff --git a/tests/system-traffic.at b/tests/system-traffic.at > >>> index bd7647cbe..babc56b56 100644 > >>> --- a/tests/system-traffic.at > >>> +++ b/tests/system-traffic.at > >>> @@ -8977,3 +8977,108 @@ OVS_WAIT_UNTIL([cat p2.pcap | grep -E "0x0050: > >>> * * *5002 *2000 *b85e *00 > >>> > >>> OVS_TRAFFIC_VSWITCHD_STOP > >>> AT_CLEANUP > >>> + > >>> +AT_SETUP([emit_sample]) > >>> +OVS_TRAFFIC_VSWITCHD_START() > >>> +OVS_CHECK_EMIT_SAMPLE() > >>> + > >>> +ADD_NAMESPACES(at_ns0, at_ns1) > >>> +NS_CHECK_EXEC([at_ns0], [sysctl -w net.ipv6.conf.all.disable_ipv6=1], > >>> [0], [ignore]) > >>> +NS_CHECK_EXEC([at_ns1], [sysctl -w net.ipv6.conf.all.disable_ipv6=1], > >>> [0], [ignore]) > >>> + > >>> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", "e4:11:22:33:44:55") > >>> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", "e4:11:22:33:44:66") > >>> + > >>> +NS_CHECK_EXEC([at_ns0], [arp -s 10.1.1.2 e4:11:22:33:44:66]) > >>> +NS_CHECK_EXEC([at_ns1], [arp -s 10.1.1.1 e4:11:22:33:44:55]) > >>> + > >>> + > >>> +AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg > >>> ofproto_dpif_upcall:dbg]) > >>> + > >>> +AT_CHECK([ovs-vsctl -- --id=@br0 get Bridge br0 \ > >>> +-- --id=@ipfix create IPFIX > >>> targets=\"127.0.0.1:4739\" \ > >>> +-- create Flow_Sample_Collector_Set id=1 bridge=@br0 > >>> ipfix=@ipfix, local_sample_group=10 \ > >>> +-- create Flow_Sample_Collector_Set id=2 bridge=@br0 > >>> ipfix=@ipfix, local_sample_group=12], > >>> + [0], [ignore]) > >>> + > >>> +AT_DATA([flows.txt], [dnl > >>> +in_port=ovs-p0,ip > >>> actions=sample(probability=65535,collector_set_id=1,obs_domain_id=1431655765,obs_point_id=1717986918),output(port=ovs-p1,max_len=100) > >>> +in_port=ovs-p1,ip > >>> actions=sample(probability=65535,collector_set_id=2,obs_domain_id=2290649224,obs_point_id=2576980377),output(port=ovs-p0,max_len=100) > >>> +]) > >>> + > >>> +AT_CHECK([ovs-ofctl add-flows br0 flows.txt]) > >>> + > >>> +OVS_DAEMONIZE([ovstest test-psample > psample.out], [psample1.pid]) > >>> + > >>> +NS_CHECK_EXEC([at_ns0], [ping -q -c 1 10.1.1.2 | FORMAT_PING], [0], [dnl > >>> +1 packets transmitted, 1 received, 0% packet loss, time 0ms > >>> +]) > >>> + > >>> +m4_define([SAMPLE1], [group_id=0xa > >>> obs_domain=0x,obs_point=0x > >>> .*icmp.*nw_src=10.1.1.1,nw_dst=10.1.1.2]) > >>> +m4_define([SAMPLE2], [group_id=0xc > >>> obs_domain=0x,obs_point=0x > >>> .*icmp.*nw_src=10.1.1.2,nw_dst=10.1.1.1]) > >>> +AT_CHECK([grep -E 'SAMPLE1' psample.out >/dev/null]) > >>> +AT_CHECK([grep -E 'SAMPLE2' psample.out >/dev/null]) > >>> + > >>> +AT_CHECK([ovs-ofctl dump-ipfix-flow br0 | sed 's/tx pkts=[[0-9]]*/tx > >>> pkts=24/' | sed 's/tx errs=[[0-9]]*/tx errs=0/'], [0], [dnl > >> > >> Why are we making tx pkts always 24, and are we waiving any potential tx > >> errors? > >> Is this something you have seen not being consistent, if so any idea why? > >> > > > > Yes. By the time we request the statistics, IPFIX cache should contain > > the flows but they might not have been sent yet. When they are sent they > > will fail (because IPFIX target is localhost and it returns the socket > > returns an error if the listening socket does not exist) so the number > > of errors might vary. > > Your explanation makes sense to me. Maybe just add it as a comment so people > looking at the test know why. > Ack. > > I can look deeper into making it more consistent but it looked like a > > quick workaround. > > > >>> +NXST_IPFIX_FLOW reply (xid=0x2): 2 ids > >>> + id 1: flows=1, current flows=0, sampled pkts=1, ipv4 ok=1, ipv6 > >>> ok=0, tx pkts=24 > >>> +
Re: [ovs-dev] [PATCH v3] netdev-offload-dpdk: Support offload of set dscp action.
Hi, Ilya, Thank you for your valuable advice! In lines 7390-7406 of the 'ofproto_dpif xlate.c' file, show below: case OFPACT_SET_IP_DSCP: if (is_ip_any(flow)) { WC_MASK_FIELD(wc, nw_proto); wc->masks.nw_tos |= IP_DSCP_MASK; flow->nw_tos &= ~IP_DSCP_MASK; flow->nw_tos |= ofpact_get_SET_IP_DSCP(a)->dscp; } break; case OFPACT_SET_IP_ECN: if (is_ip_any(flow)) { WC_MASK_FIELD(wc, nw_proto); wc->masks.nw_tos |= IP_ECN_MASK; flow->nw_tos &= ~IP_ECN_MASK; flow->nw_tos |= ofpact_get_SET_IP_ECN(a)->ecn; } break; when parsing and setting the DSCP and ECN actions, different masks are applied to the `nw_tos` field respectively, and a mask operation is performed on `nw_tos`. Additionally, the same action identifier is used for offloading both DSCP and ECN, ensuring that setting DSCP and setting ECN do not interfere with each other. Regarding the `add_set_flow_action__` function, the mask is cleared to zero. This clearing operation is intended for subsequent checks to determine if all the fields issued are supported. This mask will not be passed to the DPDK side, so it will not affect the offloading operation. On 6/28/24 23:08, Ilya Maximets wrote: > On 6/20/24 09:35, Sunyang Wu via dev wrote: >> Add the "set dscp action" parsing function, so that the "set dscp >> action" can be offloaded. >> >> Signed-off-by: Sunyang Wu >> --- >> lib/netdev-offload-dpdk.c | 19 ++- >> 1 file changed, 18 insertions(+), 1 deletion(-) >> > > Hi, Sunyang Wu. Thanks for the patch! > See some comments inline. > > Best regards, Ilya Maximets. > >> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c >> index 623005b1c..524942457 100644 >> --- a/lib/netdev-offload-dpdk.c >> +++ b/lib/netdev-offload-dpdk.c >> @@ -791,6 +791,17 @@ dump_flow_action(struct ds *s, struct ds *s_extra, >> ds_put_format(s, "port %"PRIu16" ", ntohs(set_tp->port)); >> } >> ds_put_cstr(s, "/ "); >> +} else if (actions->type == RTE_FLOW_ACTION_TYPE_SET_IPV4_DSCP || >> + actions->type == RTE_FLOW_ACTION_TYPE_SET_IPV6_DSCP) { >> +const struct rte_flow_action_set_dscp *set_dscp = actions->conf; >> +char *dirstr = actions->type == RTE_FLOW_ACTION_TYPE_SET_IPV4_DSCP >> + ? "set_ipv4_dscp " : "set_ipv6_dscp "; >> + >> +ds_put_cstr(s, dirstr); >> +if (set_dscp) { >> +ds_put_format(s, "dscp_value %d ", set_dscp->dscp); >> +} >> +ds_put_cstr(s, "/ "); >> } else if (actions->type == RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN) { >> const struct rte_flow_action_of_push_vlan *of_push_vlan = >> actions->conf; >> @@ -1835,7 +1846,10 @@ add_set_flow_action__(struct flow_actions *actions, >> if (is_all_zeros(mask, size)) { >> return 0; >> } >> -if (!is_all_ones(mask, size)) { >> +if (!is_all_ones(mask, size) && >> +/* set dscp need patial mask */ >> +attr != RTE_FLOW_ACTION_TYPE_SET_IPV4_DSCP && >> +attr != RTE_FLOW_ACTION_TYPE_SET_IPV6_DSCP) { > > This doesn't seem right. We need to check that the mask contains all > the DSCP bits and that it doesn't have anything else inside. > > For example, if we try to set ECN bits, we'll have them in a mask. > In this case is_all_zeros() check will fail and then this check will > allow us to proceed as well. In the end, we'll end up setting DSCP > bits to all zeroes, or worse, to whatever was in the first six bits of > nw_tos. > Also, this function will clear the whole mask below, but we should only clear DSCP bits in this case. >> VLOG_DBG_RL(&rl, "Partial mask is not supported"); >> return -1; >> } >> @@ -1912,6 +1926,7 @@ parse_set_actions(struct flow_actions *actions, >> add_set_flow_action(ipv4_src, >> RTE_FLOW_ACTION_TYPE_SET_IPV4_SRC); >> add_set_flow_action(ipv4_dst, >> RTE_FLOW_ACTION_TYPE_SET_IPV4_DST); >> add_set_flow_action(ipv4_ttl, >> RTE_FLOW_ACTION_TYPE_SET_TTL); >> +add_set_flow_action(ipv4_tos, >> + RTE_FLOW_ACTION_TYPE_SET_IPV4_DSCP);z >> >> if (mask && !is_all_zeros(mask, sizeof *mask)) { >> VLOG_DBG_RL(&rl, "Unsupported IPv4 set action"); @@ >> -1924,6 +1939,8 @@ parse_set_actions(struct flow_actions *actions, >> add_set_flow_action(ipv6_src, >> RTE_FLOW_ACTION_TYPE_SET_IPV6_SRC); >> add_set_flow_action(ipv6_dst, >> RTE_FLOW_ACTION_TYPE_SET_IPV6_DST); >> add_set_flow_action(ipv6_hlimit, >> RTE_FLOW_ACTION_TYPE_SET_TTL); >> +add_set_flow_action(ipv6_tclass, >> +RTE_FLOW_ACTION_TYPE_SET_IPV6_DSCP); >> >> if (mask && !is_all_zeros(mask, sizeof *mask)) { >>
Re: [ovs-dev] [RFC v2 9/9] tests: Test local sampling.
On 28 Jun 2024, at 16:11, Adrián Moreno wrote: > On Wed, Jun 26, 2024 at 02:15:02PM GMT, Eelco Chaudron wrote: >> On 5 Jun 2024, at 22:23, Adrian Moreno wrote: >> >>> Test simultaneous IPFIX and local sampling including slow-path. >> >> I guess Ilya's comments on this patch summarize most of my comments. I had >> one small additional question. See below. >> >> //Eelco >> >>> Signed-off-by: Adrian Moreno >>> --- >>> tests/system-common-macros.at | 4 ++ >>> tests/system-traffic.at | 105 ++ >>> 2 files changed, 109 insertions(+) >>> >>> diff --git a/tests/system-common-macros.at b/tests/system-common-macros.at >>> index 2a68cd664..22b8885e4 100644 >>> --- a/tests/system-common-macros.at >>> +++ b/tests/system-common-macros.at >>> @@ -378,3 +378,7 @@ m4_define([OVS_CHECK_GITHUB_ACTION], >>> # OVS_CHECK_DROP_ACTION() >>> m4_define([OVS_CHECK_DROP_ACTION], >>> [AT_SKIP_IF([! grep -q "Datapath supports explicit drop action" >>> ovs-vswitchd.log])]) >>> + >>> +# OVS_CHECK_EMIT_SAMPLE() >>> +m4_define([OVS_CHECK_EMIT_SAMPLE], >>> +[AT_SKIP_IF([! grep -q "Datapath supports emit_sample" >>> ovs-vswitchd.log])]) >>> diff --git a/tests/system-traffic.at b/tests/system-traffic.at >>> index bd7647cbe..babc56b56 100644 >>> --- a/tests/system-traffic.at >>> +++ b/tests/system-traffic.at >>> @@ -8977,3 +8977,108 @@ OVS_WAIT_UNTIL([cat p2.pcap | grep -E "0x0050: >>> * * *5002 *2000 *b85e *00 >>> >>> OVS_TRAFFIC_VSWITCHD_STOP >>> AT_CLEANUP >>> + >>> +AT_SETUP([emit_sample]) >>> +OVS_TRAFFIC_VSWITCHD_START() >>> +OVS_CHECK_EMIT_SAMPLE() >>> + >>> +ADD_NAMESPACES(at_ns0, at_ns1) >>> +NS_CHECK_EXEC([at_ns0], [sysctl -w net.ipv6.conf.all.disable_ipv6=1], [0], >>> [ignore]) >>> +NS_CHECK_EXEC([at_ns1], [sysctl -w net.ipv6.conf.all.disable_ipv6=1], [0], >>> [ignore]) >>> + >>> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", "e4:11:22:33:44:55") >>> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", "e4:11:22:33:44:66") >>> + >>> +NS_CHECK_EXEC([at_ns0], [arp -s 10.1.1.2 e4:11:22:33:44:66]) >>> +NS_CHECK_EXEC([at_ns1], [arp -s 10.1.1.1 e4:11:22:33:44:55]) >>> + >>> + >>> +AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg >>> ofproto_dpif_upcall:dbg]) >>> + >>> +AT_CHECK([ovs-vsctl -- --id=@br0 get Bridge br0 \ >>> +-- --id=@ipfix create IPFIX targets=\"127.0.0.1:4739\" >>> \ >>> +-- create Flow_Sample_Collector_Set id=1 bridge=@br0 >>> ipfix=@ipfix, local_sample_group=10 \ >>> +-- create Flow_Sample_Collector_Set id=2 bridge=@br0 >>> ipfix=@ipfix, local_sample_group=12], >>> + [0], [ignore]) >>> + >>> +AT_DATA([flows.txt], [dnl >>> +in_port=ovs-p0,ip >>> actions=sample(probability=65535,collector_set_id=1,obs_domain_id=1431655765,obs_point_id=1717986918),output(port=ovs-p1,max_len=100) >>> +in_port=ovs-p1,ip >>> actions=sample(probability=65535,collector_set_id=2,obs_domain_id=2290649224,obs_point_id=2576980377),output(port=ovs-p0,max_len=100) >>> +]) >>> + >>> +AT_CHECK([ovs-ofctl add-flows br0 flows.txt]) >>> + >>> +OVS_DAEMONIZE([ovstest test-psample > psample.out], [psample1.pid]) >>> + >>> +NS_CHECK_EXEC([at_ns0], [ping -q -c 1 10.1.1.2 | FORMAT_PING], [0], [dnl >>> +1 packets transmitted, 1 received, 0% packet loss, time 0ms >>> +]) >>> + >>> +m4_define([SAMPLE1], [group_id=0xa >>> obs_domain=0x,obs_point=0x >>> .*icmp.*nw_src=10.1.1.1,nw_dst=10.1.1.2]) >>> +m4_define([SAMPLE2], [group_id=0xc >>> obs_domain=0x,obs_point=0x >>> .*icmp.*nw_src=10.1.1.2,nw_dst=10.1.1.1]) >>> +AT_CHECK([grep -E 'SAMPLE1' psample.out >/dev/null]) >>> +AT_CHECK([grep -E 'SAMPLE2' psample.out >/dev/null]) >>> + >>> +AT_CHECK([ovs-ofctl dump-ipfix-flow br0 | sed 's/tx pkts=[[0-9]]*/tx >>> pkts=24/' | sed 's/tx errs=[[0-9]]*/tx errs=0/'], [0], [dnl >> >> Why are we making tx pkts always 24, and are we waiving any potential tx >> errors? >> Is this something you have seen not being consistent, if so any idea why? >> > > Yes. By the time we request the statistics, IPFIX cache should contain > the flows but they might not have been sent yet. When they are sent they > will fail (because IPFIX target is localhost and it returns the socket > returns an error if the listening socket does not exist) so the number > of errors might vary. Your explanation makes sense to me. Maybe just add it as a comment so people looking at the test know why. > I can look deeper into making it more consistent but it looked like a > quick workaround. > >>> +NXST_IPFIX_FLOW reply (xid=0x2): 2 ids >>> + id 1: flows=1, current flows=0, sampled pkts=1, ipv4 ok=1, ipv6 ok=0, >>> tx pkts=24 >>> + pkts errs=0, ipv4 errs=0, ipv6 errs=0, tx errs=0 >>> + id 2: flows=1, current flows=0, sampled pkts=1, ipv4 ok=1, ipv6 ok=0, >>> tx pkts=24 >>> + pkts errs=0, ipv4 errs=0, ipv6 errs=0, tx errs=0 >>> +]) >>> + >>> +AT_CHECK([ovs-appctl lsample/show br0], [0], [dnl >>> +local sample statistics
Re: [ovs-dev] [RFC v2 5/9] ofproto-dpif-xlate: Use emit_sample for local sample.
On 28 Jun 2024, at 18:40, Adrián Moreno wrote: > On Wed, Jun 26, 2024 at 02:11:51PM GMT, Eelco Chaudron wrote: >> On 5 Jun 2024, at 22:23, Adrian Moreno wrote: >> >>> Use the newly added emit_sample to implement OpenFlow sample() actions >>> with local sampling configuration. >> >> See some comments below. >> >> Cheers, >> >> Eelco >> >>> Signed-off-by: Adrian Moreno >>> --- >>> ofproto/ofproto-dpif-lsample.c | 17 >>> ofproto/ofproto-dpif-lsample.h | 6 ++ >>> ofproto/ofproto-dpif-xlate.c | 163 - >>> ofproto/ofproto-dpif-xlate.h | 3 +- >>> ofproto/ofproto-dpif.c | 2 +- >>> 5 files changed, 144 insertions(+), 47 deletions(-) >>> >>> diff --git a/ofproto/ofproto-dpif-lsample.c b/ofproto/ofproto-dpif-lsample.c >>> index 7bdafac19..2c0b5da89 100644 >>> --- a/ofproto/ofproto-dpif-lsample.c >>> +++ b/ofproto/ofproto-dpif-lsample.c >>> @@ -139,6 +139,23 @@ dpif_lsample_set_options(struct dpif_lsample *lsample, >>> return changed; >>> } >>> >>> +/* Returns the group_id of the exporter with the given collector_set_id, >>> if it >>> + * exists. */ >> >> nit: The below will fit on one line >> >> /* Returns the exporter group_id for a given collector_set_id, if it exists. >> */ >> > > Ack. > >>> +bool >>> +dpif_lsample_get_group_id(struct dpif_lsample *ps, uint32_t >>> collector_set_id, >>> + uint32_t *group_id) >>> +{ >>> +struct lsample_exporter_node *node; >>> +bool found = false; >>> + >>> +node = dpif_lsample_find_exporter_node(ps, collector_set_id); >>> +if (node) { >>> +found = true; >>> +*group_id = node->exporter.options.group_id; >>> +} >>> +return found; >>> +} >>> + >>> struct dpif_lsample * >>> dpif_lsample_create(void) >>> { >>> diff --git a/ofproto/ofproto-dpif-lsample.h b/ofproto/ofproto-dpif-lsample.h >>> index c23ea8372..f13425575 100644 >>> --- a/ofproto/ofproto-dpif-lsample.h >>> +++ b/ofproto/ofproto-dpif-lsample.h >>> @@ -19,6 +19,7 @@ >>> >>> #include >>> #include >>> +#include >> >> Maybe add in alphabetical order. >> > > Ack. > >>> struct dpif_lsample; >>> struct ofproto_lsample_options; >>> @@ -31,4 +32,9 @@ bool dpif_lsample_set_options(struct dpif_lsample *, >>>const struct ofproto_lsample_options *, >>>size_t n_opts); >>> >>> +bool dpif_lsample_get_group_id(struct dpif_lsample *, >>> + uint32_t collector_set_id, >>> + uint32_t *group_id); >>> + >>> #endif /* OFPROTO_DPIF_LSAMPLE_H */ >>> + >> >> Accedantely added a newline? >> > > Ack. > >>> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c >>> index 7c4950895..5bd215d31 100644 >>> --- a/ofproto/ofproto-dpif-xlate.c >>> +++ b/ofproto/ofproto-dpif-xlate.c >>> @@ -47,6 +47,7 @@ >>> #include "ofproto/ofproto-dpif-ipfix.h" >>> #include "ofproto/ofproto-dpif-mirror.h" >>> #include "ofproto/ofproto-dpif-monitor.h" >>> +#include "ofproto/ofproto-dpif-lsample.h" >>> #include "ofproto/ofproto-dpif-sflow.h" >>> #include "ofproto/ofproto-dpif-trace.h" >>> #include "ofproto/ofproto-dpif-xlate-cache.h" >>> @@ -117,6 +118,7 @@ struct xbridge { >>> struct dpif_sflow *sflow; /* SFlow handle, or null. */ >>> struct dpif_ipfix *ipfix; /* Ipfix handle, or null. */ >>> struct netflow *netflow; /* Netflow handle, or null. */ >>> +struct dpif_lsample *lsample; /* Local sample handle, or null. */ >>> struct stp *stp; /* STP or null if disabled. */ >>> struct rstp *rstp;/* RSTP or null if disabled. */ >>> >>> @@ -687,6 +689,7 @@ static void xlate_xbridge_set(struct xbridge *, struct >>> dpif *, >>>const struct dpif_sflow *, >>>const struct dpif_ipfix *, >>>const struct netflow *, >>> + const struct dpif_lsample *, >>>bool forward_bpdu, bool has_in_band, >>>const struct dpif_backer_support *, >>>const struct xbridge_addr *); >>> @@ -1070,6 +1073,7 @@ xlate_xbridge_set(struct xbridge *xbridge, >>>const struct dpif_sflow *sflow, >>>const struct dpif_ipfix *ipfix, >>>const struct netflow *netflow, >>> + const struct dpif_lsample *lsample, >> >> nit: I would move above netflow, as you also do the actual init/unref before >> netflow. > > Ack. > >>>bool forward_bpdu, bool has_in_band, >>>const struct dpif_backer_support *support, >>>const struct xbridge_addr *addr) >>> @@ -1099,6 +1103,11 @@ xlate_xbridge_set(struct xbridge *xbridge, >>> xbridge->ipfix = dpif_ipfix_ref(ipfix); >>> } >>> >>> +if (xbridge->lsample != lsample) { >>> +
Re: [ovs-dev] [RFC v2 4/9] vswitchd: Add local sampling to vswitchd schema.
On 28 Jun 2024, at 15:53, Adrián Moreno wrote: > On Thu, Jun 27, 2024 at 03:57:03PM GMT, Eelco Chaudron wrote: >> >> >> On 27 Jun 2024, at 13:37, Adrián Moreno wrote: >> >>> On Wed, Jun 26, 2024 at 02:11:00PM GMT, Eelco Chaudron wrote: On 5 Jun 2024, at 22:23, Adrian Moreno wrote: > Add as new column in the Flow_Sample_Collector_Set table named > "local_sample_group" which enables this feature. > > Signed-off-by: Adrian Moreno > --- > NEWS | 4 ++ > vswitchd/bridge.c | 78 +++--- > vswitchd/vswitch.ovsschema | 9 - > vswitchd/vswitch.xml | 39 +-- > 4 files changed, 119 insertions(+), 11 deletions(-) > > diff --git a/NEWS b/NEWS > index b92cec532..1c05a7120 100644 > --- a/NEWS > +++ b/NEWS > @@ -7,6 +7,10 @@ Post-v3.3.0 > - The primary development branch has been renamed from 'master' to > 'main'. > The OVS tree remains hosted on GitHub. > https://github.com/openvswitch/ovs.git > + - Local sampling is introduced. It reuses the OpenFlow sample action > and > + allows samples to be emitted locally (instead of via IPFIX) in a > + datapath-specific manner via the new datapath action called > "emit_sample". > + Linux kernel datapath is the first to support this feature. > > > v3.3.0 - 16 Feb 2024 > diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c > index 95a65fcdc..cd7dc6646 100644 > --- a/vswitchd/bridge.c > +++ b/vswitchd/bridge.c > @@ -288,6 +288,7 @@ static void bridge_configure_mac_table(struct bridge > *); > static void bridge_configure_mcast_snooping(struct bridge *); > static void bridge_configure_sflow(struct bridge *, int > *sflow_bridge_number); > static void bridge_configure_ipfix(struct bridge *); > +static void bridge_configure_lsample(struct bridge *); > static void bridge_configure_spanning_tree(struct bridge *); > static void bridge_configure_tables(struct bridge *); > static void bridge_configure_dp_desc(struct bridge *); > @@ -989,6 +990,7 @@ bridge_reconfigure(const struct ovsrec_open_vswitch > *ovs_cfg) > bridge_configure_netflow(br); > bridge_configure_sflow(br, &sflow_bridge_number); > bridge_configure_ipfix(br); > +bridge_configure_lsample(br); > bridge_configure_spanning_tree(br); > bridge_configure_tables(br); > bridge_configure_dp_desc(br); > @@ -1537,10 +1539,11 @@ ovsrec_ipfix_is_valid(const struct ovsrec_ipfix > *ipfix) > return ipfix && ipfix->n_targets > 0; > } > > -/* Returns whether a Flow_Sample_Collector_Set row is valid. */ > +/* Returns whether a Flow_Sample_Collector_Set row constains valid IPFIX constains -> 'contains a' >>> >>> Ack. >>> > + * configuration. */ > static bool > -ovsrec_fscs_is_valid(const struct ovsrec_flow_sample_collector_set *fscs, > - const struct bridge *br) > +ovsrec_fscs_is_valid_ipfix(const struct ovsrec_flow_sample_collector_set > *fscs, > + const struct bridge *br) > { > return ovsrec_ipfix_is_valid(fscs->ipfix) && fscs->bridge == br->cfg; > } > @@ -1558,7 +1561,7 @@ bridge_configure_ipfix(struct bridge *br) > const char *virtual_obs_id; > > OVSREC_FLOW_SAMPLE_COLLECTOR_SET_FOR_EACH(fe_cfg, idl) { > -if (ovsrec_fscs_is_valid(fe_cfg, br)) { > +if (ovsrec_fscs_is_valid_ipfix(fe_cfg, br)) { > n_fe_opts++; > } > } > @@ -1621,7 +1624,7 @@ bridge_configure_ipfix(struct bridge *br) > fe_opts = xcalloc(n_fe_opts, sizeof *fe_opts); > opts = fe_opts; > OVSREC_FLOW_SAMPLE_COLLECTOR_SET_FOR_EACH(fe_cfg, idl) { > -if (ovsrec_fscs_is_valid(fe_cfg, br)) { > +if (ovsrec_fscs_is_valid_ipfix(fe_cfg, br)) { > opts->collector_set_id = fe_cfg->id; > sset_init(&opts->targets); > sset_add_array(&opts->targets, fe_cfg->ipfix->targets, > @@ -1667,6 +1670,71 @@ bridge_configure_ipfix(struct bridge *br) > } > } > > +/* Returns whether a Flow_Sample_Collector_Set row contains valid local > + * sampling configuraiton. */ ...row contains a valid local... ++ configuraiton -> configuration >>> >>> Ack. >>> > +static bool > +ovsrec_fscs_is_valid_local(const struct ovsrec_flow_sample_collector_set > *fscs, > + const struct bridge *br) > +{ > +return fscs->local_sample_group && fscs->n_local_sample_group == 1 && > + fscs->bridge == br->cfg; > +} > + > +/
Re: [ovs-dev] [RFC v2 1/9] odp-util: Add support OVS_ACTION_ATTR_EMIT_SAMPLE.
On 28 Jun 2024, at 15:31, Adrián Moreno wrote: > On Thu, Jun 27, 2024 at 03:42:38PM GMT, Eelco Chaudron wrote: >> >> >> On 27 Jun 2024, at 12:45, Adrián Moreno wrote: >> >>> On Wed, Jun 26, 2024 at 02:08:27PM GMT, Eelco Chaudron wrote: On 5 Jun 2024, at 22:23, Adrian Moreno wrote: > Add support for parsing and formatting the new action. > > Also, flag OVS_ACTION_ATTR_SAMPLE as requiring datapath assistance if it > contains a nested OVS_ACTION_ATTR_EMIT_SAMPLE. The reason is that the > sampling rate form the parent "sample" is made available to the nested > "emit_sample" by the kernel. Hi Adrian, Thanks for this series! This email kicks off my review of the series, see comments below and in the other patches. Cheers, Eelco > Signed-off-by: Adrian Moreno > --- > include/linux/openvswitch.h | 25 + > lib/dpif-netdev.c| 1 + > lib/dpif.c | 1 + > lib/odp-execute.c| 25 - > lib/odp-util.c | 103 +++ > lib/odp-util.h | 3 + > ofproto/ofproto-dpif-ipfix.c | 1 + > ofproto/ofproto-dpif-sflow.c | 1 + > python/ovs/flow/odp.py | 8 +++ > tests/odp.at | 16 ++ > 10 files changed, 183 insertions(+), 1 deletion(-) > > diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h > index d9fb991ef..b4e0647bd 100644 > --- a/include/linux/openvswitch.h > +++ b/include/linux/openvswitch.h > @@ -992,6 +992,30 @@ struct check_pkt_len_arg { > }; > #endif > > +#define OVS_EMIT_SAMPLE_COOKIE_MAX_SIZE 16 > +/** > + * enum ovs_emit_sample_attr - Attributes for > %OVS_ACTION_ATTR_EMIT_SAMPLE > + * action. > + * > + * @OVS_EMIT_SAMPLE_ATTR_GROUP: 32-bit number to identify the source of > the > + * sample. > + * @OVS_EMIT_SAMPLE_ATTR_COOKIE: A variable-length binary cookie that > contains > + * user-defined metadata. The maximum length is 16 bytes. > + * > + * Sends the packet to the psample multicast group with the specified > group and > + * cookie. It is possible to combine this action with the > + * %OVS_ACTION_ATTR_TRUNC to limit the size of the packet being emitted. I'll start the discussion again on the naming. The name "emit_sample()" does not seem appropriate. This function's primary role is to copy the packet and send it to a local collector, which varies depending on the datapath. For the kernel datapath, this collector is psample, while for userspace, it will likely be some kind of probe. This action is distinct from the sample() action by design; it is a standalone action that can be combined with others. Furthermore, the action itself does not involve taking a sample; it consistently pushes the packet to the local collector. Therefore, I suggest renaming "emit_sample()" to "emit_local()". This same goes for all the derivative ATTR naming. >>> >>> Let's discuss this in the kernel ml. >>> > + */ > +enum ovs_emit_sample_attr { > + OVS_EMIT_SAMPLE_ATTR_UNPSEC, > + OVS_EMIT_SAMPLE_ATTR_GROUP, /* u32 number. */ > + OVS_EMIT_SAMPLE_ATTR_COOKIE,/* Optional, user specified cookie. */ Fix comment alignment? Maybe also change the order to be alphabetical? >>> >>> What do you mean by comment alignment? >> >> Well you are using tabs to index which might result in it looking like this: >> >> +OVS_EMIT_SAMPLE_ATTR_GROUP, /* u32 number. */ >> +OVS_EMIT_SAMPLE_ATTR_COOKIE,/* Optional, user specified cookie. */ >> >> (Most) of the other comments use spaces after the definition. >> > + __OVS_EMIT_SAMPLE_ATTR_MAX > +}; > + > +#define OVS_EMIT_SAMPLE_ATTR_MAX (__OVS_EMIT_SAMPLE_ATTR_MAX - 1) > + > + > /** > * enum ovs_action_attr - Action types. > * > @@ -1087,6 +,7 @@ enum ovs_action_attr { > OVS_ACTION_ATTR_ADD_MPLS, /* struct ovs_action_add_mpls. */ > OVS_ACTION_ATTR_DEC_TTL, /* Nested OVS_DEC_TTL_ATTR_*. */ > OVS_ACTION_ATTR_DROP, /* u32 xlate_error. */ > + OVS_ACTION_ATTR_EMIT_SAMPLE, /* Nested OVS_EMIT_SAMPLE_ATTR_*. */ > > #ifndef __KERNEL__ > OVS_ACTION_ATTR_TUNNEL_PUSH, /* struct ovs_action_push_tnl*/ > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index c7f9e1490..c1171890c 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -9519,6 +9519,7 @@ dp_execute_cb(void *aux_, struct dp_packet_batch > *packets_, > case OVS_ACTION_ATTR_DROP: > case OVS_ACTION_ATTR_ADD_MPLS: > case OVS_ACTION_ATTR_DEC_TTL: > +case OVS_ACTION_ATTR_EMIT_SAMPLE: > case __OVS_ACTION_ATTR_MAX: > OVS_NOT_REACHED(); >
[ovs-dev] [PATCH v5 3/3] ovsdb: Optimize monitor update by directly serializing data into ds.
Currently serialization is performed by first converting the internal data representation into JSON objects, followed by serializing these objects by jsonrpc. This process results in lots of allocations for these intermediate objects. Consequently, this not only increases peak memory consumption, but also demands significantly more CPU work. By forming row-update JSONs directly in `ds`, which is then used to create 'serialized object' JSONs, the overall speed increased by a factor of 2.3. A local benchmark was run on a proprietary Southbound backup. Both versions, before and after applying the patch, were measured. For each version, there were two runs with 10 parallel clients, and two runs with 30 parallel clients. CPU time was recorded after startup (before clients started running) and after all clients received all updates. Clients were essentially running `ovsdb-client monitor-cond unix:pipe OVN_Southbound '[[\"actions\",\"!=\",\"sdfdfsd\"]]' Logical_Flow`. Similar results were obtained with other requests that required a significant amount of data transfer. The backup size is about 600 MB. Results are measured in seconds. Before After Baseline x10: 9.53108.54 Baseline x10: 9.62108.67 Baseline x30: 9.69307.04 Baseline x30: 9.65303.32 Patch x10: 9.6752.57 Patch x10: 9.5753.12 Patch x30: 9.53136.33 Patch x30: 9.63135.88 Signed-off-by: Grigorii Nazarov --- v2: updated title v3: fixed bracing v4: changed patch number from 4/4 to 3/3 include/openvswitch/json.h | 1 + lib/json.c | 8 ++ lib/ovsdb-data.c | 105 +++ lib/ovsdb-data.h | 3 + ovsdb/monitor.c| 210 - 5 files changed, 254 insertions(+), 73 deletions(-) diff --git a/include/openvswitch/json.h b/include/openvswitch/json.h index 555440760..80b9479c7 100644 --- a/include/openvswitch/json.h +++ b/include/openvswitch/json.h @@ -81,6 +81,7 @@ struct json *json_boolean_create(bool); struct json *json_string_create(const char *); struct json *json_string_create_nocopy(char *); struct json *json_serialized_object_create(const struct json *); +struct json *json_serialized_object_create_from_string(const char *); struct json *json_integer_create(long long int); struct json *json_real_create(double); diff --git a/lib/json.c b/lib/json.c index d40e93857..66b1f571f 100644 --- a/lib/json.c +++ b/lib/json.c @@ -199,6 +199,14 @@ json_serialized_object_create(const struct json *src) return json; } +struct json * +json_serialized_object_create_from_string(const char *s) +{ +struct json *json = json_create(JSON_SERIALIZED_OBJECT); +json->string = xstrdup(s); +return json; +} + struct json * json_serialized_object_create_with_yield(const struct json *src) { diff --git a/lib/ovsdb-data.c b/lib/ovsdb-data.c index defb048d7..f32b7975a 100644 --- a/lib/ovsdb-data.c +++ b/lib/ovsdb-data.c @@ -492,6 +492,45 @@ ovsdb_atom_to_json__(const union ovsdb_atom *atom, enum ovsdb_atomic_type type, } } +/* Serializes 'atom', of the specified 'type', into 'ds' in JSON format. + * + * Refer to RFC 7047 for the format of the JSON that this function produces. */ +static void +ovsdb_atom_to_json_ds(const union ovsdb_atom *atom, + enum ovsdb_atomic_type type, struct ds *ds) +{ +switch (type) { +case OVSDB_TYPE_VOID: +OVS_NOT_REACHED(); + +case OVSDB_TYPE_INTEGER: +ds_put_format(ds, "%lld", (long long) atom->integer); +return; + +case OVSDB_TYPE_REAL: +ds_put_format(ds, "%.*g", DBL_DIG, atom->real); +return; + +case OVSDB_TYPE_BOOLEAN: +ds_put_cstr(ds, atom->boolean ? "true" : "false"); +return; + +case OVSDB_TYPE_STRING: +json_to_ds(atom->s, JSSF_SORT, ds); +return; + +case OVSDB_TYPE_UUID: +ds_put_cstr(ds, "[\"uuid\",\""); +ds_put_format(ds, UUID_FMT, UUID_ARGS(&atom->uuid)); +ds_put_cstr(ds, "\"]"); +return; + +case OVSDB_N_TYPES: +default: +OVS_NOT_REACHED(); +} +} + struct json * ovsdb_atom_to_json(const union ovsdb_atom *atom, enum ovsdb_atomic_type type) { @@ -1445,6 +1484,23 @@ ovsdb_base_to_json(const union ovsdb_atom *atom, } } +static void +ovsdb_base_to_json_ds(const union ovsdb_atom *atom, + const struct ovsdb_base_type *base, + bool use_row_names, + struct ds *ds) +{ +if (!use_row_names +|| base->type != OVSDB_TYPE_UUID +|| !base->uuid.refTableName) { +ovsdb_atom_to_json_ds(atom, base->type, ds); +} else { +ds_put_cstr(ds, "[\"named-uuid\",\""); +ds_put_format(ds, UUID_ROW_FMT, UUID_ARGS(&atom->uuid)); +ds_put_cstr(ds, "\"]"); +} +} + static struct json * ovsdb_datum_to_json__(const struct ovsdb_datum *datum, const struct ovsdb
[ovs-dev] [PATCH v5 2/3] lib/json: Simplify string serialization code.
Signed-off-by: Grigorii Nazarov --- There's an open question on whether this function should exist, or being placed in header etc. However, no decision was made yet. v2: fixed title v4: changed patch number from 3/4 to 2/3 lib/json.c | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/lib/json.c b/lib/json.c index 001f6e6ab..d40e93857 100644 --- a/lib/json.c +++ b/lib/json.c @@ -127,7 +127,9 @@ static void json_parser_input(struct json_parser *, struct json_token *); static void json_error(struct json_parser *p, const char *format, ...) OVS_PRINTF_FORMAT(2, 3); - + +static void json_serialize_string(const char *, struct ds *); + const char * json_type_to_string(enum json_type type) { @@ -987,11 +989,7 @@ exit: void json_string_escape(const char *in, struct ds *out) { -struct json json = { -.type = JSON_STRING, -.string = CONST_CAST(char *, in), -}; -json_to_ds(&json, 0, out); +json_serialize_string(in, out); } static void @@ -1572,7 +1570,6 @@ static void json_serialize_object(const struct shash *object, struct json_serializer *); static void json_serialize_array(const struct json_array *, struct json_serializer *); -static void json_serialize_string(const char *, struct ds *); /* Converts 'json' to a string in JSON format, encoded in UTF-8, and returns * that string. The caller is responsible for freeing the returned string, -- 2.45.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v5 1/3] ovsdb: Simplify UUID formatting code.
Signed-off-by: Grigorii Nazarov --- v2: fixed title v4: changed patch number from 2/4 to 1/3 lib/ovsdb-data.c | 8 +--- lib/uuid.h | 1 + 2 files changed, 2 insertions(+), 7 deletions(-) diff --git a/lib/ovsdb-data.c b/lib/ovsdb-data.c index abb923ad8..defb048d7 100644 --- a/lib/ovsdb-data.c +++ b/lib/ovsdb-data.c @@ -2582,14 +2582,8 @@ char * ovsdb_data_row_name(const struct uuid *uuid) { char *name; -char *p; -name = xasprintf("row"UUID_FMT, UUID_ARGS(uuid)); -for (p = name; *p != '\0'; p++) { -if (*p == '-') { -*p = '_'; -} -} +name = xasprintf(UUID_ROW_FMT, UUID_ARGS(uuid)); return name; } diff --git a/lib/uuid.h b/lib/uuid.h index fa49354f6..6a8069f68 100644 --- a/lib/uuid.h +++ b/lib/uuid.h @@ -34,6 +34,7 @@ extern "C" { */ #define UUID_LEN 36 #define UUID_FMT "%08x-%04x-%04x-%04x-%04x%08x" +#define UUID_ROW_FMT "row%08x_%04x_%04x_%04x_%04x%08x" #define UUID_ARGS(UUID) \ ((unsigned int) ((UUID)->parts[0])),\ ((unsigned int) ((UUID)->parts[1] >> 16)), \ -- 2.45.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v5 0/3] Optimize json serialization 2.3 times
This patch series optimizes monitor and monitor-cond commands by optimizing server-side json serialization. This is achieved by removing intermediate representation. Most importantly, this greatly reduces allocations and deallocations done in process. This also does reduce peak memory consumption, but it wasn't a bottleneck and thus wasn't precisely measured. See patch 3/3 description for more details. This can be further improved by building complete response in ds. Also benificial will be to reduce calls to stdio.h library. Proper profiling is required to identify even further improvements. The patches 1/3 and 2/3 contain (almost) independent improvements, that were all-too-easy to be done during the optimization process. Patch 3/3 is the main part. Original patch 1/4 was removed due to it not being relevant to the purpose of series. v1->v2: fixed commit message titles v2->v3: fixed incorrect bracing in touched code in PATCH 4/4 v3->v4: removed patch 1/4 fixed incorrectly sent v3 series added cover letter v4->v5: fixed yet again incorrectly sent v4 series Grigorii Nazarov (3): ovsdb: Simplify UUID formatting code. lib/json: Simplify string serialization code. ovsdb: Optimize monitor update by directly serializing data into ds. include/openvswitch/json.h | 1 + lib/json.c | 19 ++-- lib/ovsdb-data.c | 113 ++-- lib/ovsdb-data.h | 3 + lib/uuid.h | 1 + ovsdb/monitor.c| 210 - 6 files changed, 260 insertions(+), 87 deletions(-) -- 2.45.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v4 2/3] lib/json: Simplify string serialization code.
Signed-off-by: Grigorii Nazarov --- There's an open question on whether this function should exist, or being placed in header etc. However, no decision was made yet. v2: fixed title v4: changed patch number from 3/4 to 2/3 lib/json.c | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/lib/json.c b/lib/json.c index 001f6e6ab..d40e93857 100644 --- a/lib/json.c +++ b/lib/json.c @@ -127,7 +127,9 @@ static void json_parser_input(struct json_parser *, struct json_token *); static void json_error(struct json_parser *p, const char *format, ...) OVS_PRINTF_FORMAT(2, 3); - + +static void json_serialize_string(const char *, struct ds *); + const char * json_type_to_string(enum json_type type) { @@ -987,11 +989,7 @@ exit: void json_string_escape(const char *in, struct ds *out) { -struct json json = { -.type = JSON_STRING, -.string = CONST_CAST(char *, in), -}; -json_to_ds(&json, 0, out); +json_serialize_string(in, out); } static void @@ -1572,7 +1570,6 @@ static void json_serialize_object(const struct shash *object, struct json_serializer *); static void json_serialize_array(const struct json_array *, struct json_serializer *); -static void json_serialize_string(const char *, struct ds *); /* Converts 'json' to a string in JSON format, encoded in UTF-8, and returns * that string. The caller is responsible for freeing the returned string, -- 2.45.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v4 3/3] ovsdb: Optimize monitor update by directly serializing data into ds.
Currently serialization is performed by first converting the internal data representation into JSON objects, followed by serializing these objects by jsonrpc. This process results in lots of allocations for these intermediate objects. Consequently, this not only increases peak memory consumption, but also demands significantly more CPU work. By forming row-update JSONs directly in `ds`, which is then used to create 'serialized object' JSONs, the overall speed increased by a factor of 2.3. A local benchmark was run on a proprietary Southbound backup. Both versions, before and after applying the patch, were measured. For each version, there were two runs with 10 parallel clients, and two runs with 30 parallel clients. CPU time was recorded after startup (before clients started running) and after all clients received all updates. Clients were essentially running `ovsdb-client monitor-cond unix:pipe OVN_Southbound '[[\"actions\",\"!=\",\"sdfdfsd\"]]' Logical_Flow`. Similar results were obtained with other requests that required a significant amount of data transfer. The backup size is about 600 MB. Results are measured in seconds. Before After Baseline x10: 9.53108.54 Baseline x10: 9.62108.67 Baseline x30: 9.69307.04 Baseline x30: 9.65303.32 Patch x10: 9.6752.57 Patch x10: 9.5753.12 Patch x30: 9.53136.33 Patch x30: 9.63135.88 Signed-off-by: Grigorii Nazarov --- v2: updated title v3: fixed bracing v4: changed patch number from 4/4 to 3/3 include/openvswitch/json.h | 1 + lib/json.c | 8 ++ lib/ovsdb-data.c | 105 +++ lib/ovsdb-data.h | 3 + ovsdb/monitor.c| 210 - 5 files changed, 254 insertions(+), 73 deletions(-) diff --git a/include/openvswitch/json.h b/include/openvswitch/json.h index 555440760..80b9479c7 100644 --- a/include/openvswitch/json.h +++ b/include/openvswitch/json.h @@ -81,6 +81,7 @@ struct json *json_boolean_create(bool); struct json *json_string_create(const char *); struct json *json_string_create_nocopy(char *); struct json *json_serialized_object_create(const struct json *); +struct json *json_serialized_object_create_from_string(const char *); struct json *json_integer_create(long long int); struct json *json_real_create(double); diff --git a/lib/json.c b/lib/json.c index d40e93857..66b1f571f 100644 --- a/lib/json.c +++ b/lib/json.c @@ -199,6 +199,14 @@ json_serialized_object_create(const struct json *src) return json; } +struct json * +json_serialized_object_create_from_string(const char *s) +{ +struct json *json = json_create(JSON_SERIALIZED_OBJECT); +json->string = xstrdup(s); +return json; +} + struct json * json_serialized_object_create_with_yield(const struct json *src) { diff --git a/lib/ovsdb-data.c b/lib/ovsdb-data.c index defb048d7..f32b7975a 100644 --- a/lib/ovsdb-data.c +++ b/lib/ovsdb-data.c @@ -492,6 +492,45 @@ ovsdb_atom_to_json__(const union ovsdb_atom *atom, enum ovsdb_atomic_type type, } } +/* Serializes 'atom', of the specified 'type', into 'ds' in JSON format. + * + * Refer to RFC 7047 for the format of the JSON that this function produces. */ +static void +ovsdb_atom_to_json_ds(const union ovsdb_atom *atom, + enum ovsdb_atomic_type type, struct ds *ds) +{ +switch (type) { +case OVSDB_TYPE_VOID: +OVS_NOT_REACHED(); + +case OVSDB_TYPE_INTEGER: +ds_put_format(ds, "%lld", (long long) atom->integer); +return; + +case OVSDB_TYPE_REAL: +ds_put_format(ds, "%.*g", DBL_DIG, atom->real); +return; + +case OVSDB_TYPE_BOOLEAN: +ds_put_cstr(ds, atom->boolean ? "true" : "false"); +return; + +case OVSDB_TYPE_STRING: +json_to_ds(atom->s, JSSF_SORT, ds); +return; + +case OVSDB_TYPE_UUID: +ds_put_cstr(ds, "[\"uuid\",\""); +ds_put_format(ds, UUID_FMT, UUID_ARGS(&atom->uuid)); +ds_put_cstr(ds, "\"]"); +return; + +case OVSDB_N_TYPES: +default: +OVS_NOT_REACHED(); +} +} + struct json * ovsdb_atom_to_json(const union ovsdb_atom *atom, enum ovsdb_atomic_type type) { @@ -1445,6 +1484,23 @@ ovsdb_base_to_json(const union ovsdb_atom *atom, } } +static void +ovsdb_base_to_json_ds(const union ovsdb_atom *atom, + const struct ovsdb_base_type *base, + bool use_row_names, + struct ds *ds) +{ +if (!use_row_names +|| base->type != OVSDB_TYPE_UUID +|| !base->uuid.refTableName) { +ovsdb_atom_to_json_ds(atom, base->type, ds); +} else { +ds_put_cstr(ds, "[\"named-uuid\",\""); +ds_put_format(ds, UUID_ROW_FMT, UUID_ARGS(&atom->uuid)); +ds_put_cstr(ds, "\"]"); +} +} + static struct json * ovsdb_datum_to_json__(const struct ovsdb_datum *datum, const struct ovsdb
[ovs-dev] [PATCH v4 1/3] ovsdb: Simplify UUID formatting code.
Signed-off-by: Grigorii Nazarov --- v2: fixed title v4: changed patch number from 2/4 to 1/3 lib/ovsdb-data.c | 8 +--- lib/uuid.h | 1 + 2 files changed, 2 insertions(+), 7 deletions(-) diff --git a/lib/ovsdb-data.c b/lib/ovsdb-data.c index abb923ad8..defb048d7 100644 --- a/lib/ovsdb-data.c +++ b/lib/ovsdb-data.c @@ -2582,14 +2582,8 @@ char * ovsdb_data_row_name(const struct uuid *uuid) { char *name; -char *p; -name = xasprintf("row"UUID_FMT, UUID_ARGS(uuid)); -for (p = name; *p != '\0'; p++) { -if (*p == '-') { -*p = '_'; -} -} +name = xasprintf(UUID_ROW_FMT, UUID_ARGS(uuid)); return name; } diff --git a/lib/uuid.h b/lib/uuid.h index fa49354f6..6a8069f68 100644 --- a/lib/uuid.h +++ b/lib/uuid.h @@ -34,6 +34,7 @@ extern "C" { */ #define UUID_LEN 36 #define UUID_FMT "%08x-%04x-%04x-%04x-%04x%08x" +#define UUID_ROW_FMT "row%08x_%04x_%04x_%04x_%04x%08x" #define UUID_ARGS(UUID) \ ((unsigned int) ((UUID)->parts[0])),\ ((unsigned int) ((UUID)->parts[1] >> 16)), \ -- 2.45.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v4 0/3] Optimize json serialization 2.3 times
This patch series optimizes monitor and monitor-cond commands by optimizing server-side json serialization. This is achieved by removing intermediate representation. Most importantly, this greatly reduces allocations and deallocations done in process. This also does reduce peak memory consumption, but it wasn't a bottleneck and thus wasn't precisely measured. See patch 3/3 description for more details. This can be further improved by building complete response in ds. Also benificial will be to reduce calls to stdio.h library. Proper profiling is required to identify even further improvements. The patches 1/3 and 2/3 contain (almost) independent improvements, that were all-too-easy to be done during the optimization process. Patch 3/3 is the main part. Original patch 1/4 was removed due to it not being relevant to the purpose of series. v1->v2: fixed commit message titles v2->v3: fixed incorrect bracing in touched code in PATCH 4/4 v3->v4: removed patch 1/4 fixed incorrectly sent v3 series added cover letter Grigorii Nazarov (3): ovsdb: Simplify UUID formatting code. lib/json: Simplify string serialization code. ovsdb: Optimize monitor update by directly serializing data into ds. include/openvswitch/json.h | 1 + lib/json.c | 19 ++-- lib/ovsdb-data.c | 113 ++-- lib/ovsdb-data.h | 3 + lib/uuid.h | 1 + ovsdb/monitor.c| 210 - 6 files changed, 260 insertions(+), 87 deletions(-) -- 2.45.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] RFC OVN: fabric integration
If there is no or minimal overhead of adding an LR or restricting the routing to LR, then perhaps keeping that logical separation makes sense. I think the design we evaluate or look at has to focus very tightly on performance as any non-trivial forwarding performance impact will affect the adoption of the solution. Regards Gurpreet > On Jun 28, 2024, at 11:56 AM, Ilya Maximets wrote: > > On 6/28/24 17:38, Dumitru Ceara wrote: >> On 6/28/24 15:05, Ilya Maximets wrote: >>> On 6/28/24 11:03, Ales Musil wrote: Hi Frode, looking forward to the RFC. AFAIU it means that the routes would be exposed on LR, more specifically GW router. Would it make sense to allow this behavior for provider networks (LS with localnet port)? In that case we could advertise chassis-local information from logical routers attached to BGP-enabled switches. E.g.: FIPs, LBs. It would cover the use case for distributed routers. To achieve that we should have BGP peers for each chassis that the LS is local on. >>> >>> I haven't read the whole thing yet, but can we, please, stop adding routing >>> features >>> to switches? :) If someone wants routing, they should use a router, IMO. >>> >> >> I'm fairly certain that there are precedents in "classic" network >> appliances: switches that can do a certain amount of routing (even run BGP). >> >> In this case we could add a logical router but I'm not sure that >> simplifies things. >> > > "classic" network appliances are a subject for restrictions of a physical > material world. It's just way easier and cheaper to acquire and install > a single physical box instead of N. This is not a problem for a virtual > network. AP+router+switch+modem combo boxes are also "classic" last mile > network appliances that we just call a "router". It doesn't mean we should > implement one. > > The distinction between OVN logical routers and switches is there for a > reason. That is so you can look at the logical topology and understand > how it works more or less without diving into configuration of every single > part of it. If switches do routing and routers do switching, what's the > point of differentiation? It only brings more confusion. > > Best regards, Ilya Maximets. > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev