Re: [ovs-dev] [PATCH net-next v3 0/7] selftests: net: Switch pmtu.sh to use the internal ovs script.
On Fri, 28 Jun 2024 14:04:09 -0400 Aaron Conole wrote: > > I also added the OvS tests themselves, and those are not passing, yet: > > https://netdev.bots.linux.dev/contest.html?test=openvswitch-sh > > Could you take a look and LMK if these are likely env issues or > > something bad in the test itself? > > I saw that. I was looking for a place in the nipa repository where I > could submit a small fix, because I noticed in the stdout: > > make -C tools/testing/selftests TARGETS="net/openvswitch" > TEST_PROGS=openvvswitch.sh TEST_GEN_PROGS="" run_tests > > and I think the TEST_PROGS=openvvswitch.sh is misspelled (but it seems > to not matter too much for the run_test target). :o that's a weird bug, whatever is echoing back the input from the VMs stdin to stdout is duplicating 74th character?! but as you say looks like the VM gets the right input, it's just the echo. > From what I understand, there are two things causing it to be flaky. > First, the module detection is a bit flaky (and that's why it results is > some 'skip' reports). Additionally, the connection oriented tests > include negative cases and those hit timeouts. The default is to > declare failure after 45s. That can be seen in: > > > https://netdev-3.bots.linux.dev/vmksft-net/results/659601/91-openvswitch-sh/stdout > ... > # timeout set to 45 > ... > # TEST: nat_connect_v4 > [START] > # Terminated > # Terminated > > This is showing that the timeout is too short. > > I have patches ready for these issues, but I didn't know if you would > like me to submit config config meaning make OvS built in? We have a number of tests using module auto-loading, I don't think there were major issues with it. (well, the rebuilding of modules is a bit questionable with vng, but we do 'make mrproper' between each build to work around that). > and settings files to go under net/openvswitch, > or if you would prefer to see the openvswitch.sh script, and > ovs-dpctl.py utilities move out of their net/openvswitch/ directory. If > the latter, I can submit patches quickly with config and settings (and a > small change to the script itself) that addresses these. If you'd > prefer the former (moving around the files), I'll need to spend some > additional time modifying pmtu and doing a larger test. I don't have a > strong opinion on either approach. Since we talked about it a while back I gave in and implemented support for combining multiple targets in a single runner. So it doesn't matter any more (barring bugs in NIPA), up to you. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] vswitchd: Only lock pages that are faulted in.
On 6/28/24 16:51, Eelco Chaudron wrote: > > > On 14 Jun 2024, at 14:22, Ilya Maximets wrote: > >> The main purpose of locking the memory is to ensure that OVS can keep >> doing what it did before in case of increased memory pressure, e.g., >> during VM ingest / migration. Fulfilling this requirement can be >> achieved without locking all the allocated memory, but only the pages >> already accessed in the past (faulted in). Processing of the new >> traffic involves new memory allocations. Latency on these operations >> can't be guaranteed by the locking. The main difference would be >> the pre-faulting of the stack memory. However, in order to revalidate >> or process upcalls on the same traffic, the same amount of stack is >> likely needed, so all the necessary memory will already be faulted in. >> >> Switch 'mlockall' to MCL_ONFAULT to avoid consuming unnecessarily >> large amounts of RAM on systems with high core counts. For example, >> in a densely populated OVN cluster this saves about 650 MB of RAM per >> node on a system with 64 cores. This equates to 320 GB of allocated >> but unused RAM in a 500 node cluster. >> >> This also makes OVS better suited by default for small systems with >> limited amount of memory. >> >> The MCL_ONFAULT flag was introduced in Linux kernel 4.4 and wasn't >> available at the time of '--mlockall' introduction, but we can use it >> now. Falling back to an old way of locking in case we're running on >> an older kernel just in case. >> >> Only locking the faulted in pages also makes locking compatible with >> vhost post-copy live migration by default, because we'll no longer >> pre-fault all the guest's memory. Post-copy relies on userfaultfd >> to work on shared huge pages, which is only available in 4.11+ kernels. >> So, technically, it should not be possible for MCL_ONFAULT to fail and >> the call without it to succeed. But keeping the check just in case >> for now. >> >> Signed-off-by: Ilya Maximets > > This change looks good. I did some performance testing and I noticed not > negative impacts. > > Acked-by: Eelco Chaudron > Thanks, Eelco and Simon! Applied. Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3] netdev-offload-dpdk: Support offload of set dscp action.
On 6/28/24 23:08, Ilya Maximets wrote: > On 6/20/24 09:35, Sunyang Wu via dev wrote: >> Add the "set dscp action" parsing function, >> so that the "set dscp action" can be offloaded. >> >> Signed-off-by: Sunyang Wu >> --- >> lib/netdev-offload-dpdk.c | 19 ++- >> 1 file changed, 18 insertions(+), 1 deletion(-) >> > > Hi, Sunyang Wu. Thanks for the patch! > See some comments inline. > > Best regards, Ilya Maximets. > >> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c >> index 623005b1c..524942457 100644 >> --- a/lib/netdev-offload-dpdk.c >> +++ b/lib/netdev-offload-dpdk.c >> @@ -791,6 +791,17 @@ dump_flow_action(struct ds *s, struct ds *s_extra, >> ds_put_format(s, "port %"PRIu16" ", ntohs(set_tp->port)); >> } >> ds_put_cstr(s, "/ "); >> +} else if (actions->type == RTE_FLOW_ACTION_TYPE_SET_IPV4_DSCP || >> + actions->type == RTE_FLOW_ACTION_TYPE_SET_IPV6_DSCP) { >> +const struct rte_flow_action_set_dscp *set_dscp = actions->conf; >> +char *dirstr = actions->type == RTE_FLOW_ACTION_TYPE_SET_IPV4_DSCP >> + ? "set_ipv4_dscp " : "set_ipv6_dscp "; >> + >> +ds_put_cstr(s, dirstr); >> +if (set_dscp) { >> +ds_put_format(s, "dscp_value %d ", set_dscp->dscp); >> +} >> +ds_put_cstr(s, "/ "); >> } else if (actions->type == RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN) { >> const struct rte_flow_action_of_push_vlan *of_push_vlan = >> actions->conf; >> @@ -1835,7 +1846,10 @@ add_set_flow_action__(struct flow_actions *actions, >> if (is_all_zeros(mask, size)) { >> return 0; >> } >> -if (!is_all_ones(mask, size)) { >> +if (!is_all_ones(mask, size) && >> +/* set dscp need patial mask */ >> +attr != RTE_FLOW_ACTION_TYPE_SET_IPV4_DSCP && >> +attr != RTE_FLOW_ACTION_TYPE_SET_IPV6_DSCP) { > > This doesn't seem right. We need to check that the mask contains > all the DSCP bits and that it doesn't have anything else inside. > > For example, if we try to set ECN bits, we'll have them in a mask. > In this case is_all_zeros() check will fail and then this check > will allow us to proceed as well. In the end, we'll end up setting > DSCP bits to all zeroes, or worse, to whatever was in the first > six bits of nw_tos. > Also, this function will clear the whole mask below, but we should only clear DSCP bits in this case. >> VLOG_DBG_RL(&rl, "Partial mask is not supported"); >> return -1; >> } >> @@ -1912,6 +1926,7 @@ parse_set_actions(struct flow_actions *actions, >> add_set_flow_action(ipv4_src, >> RTE_FLOW_ACTION_TYPE_SET_IPV4_SRC); >> add_set_flow_action(ipv4_dst, >> RTE_FLOW_ACTION_TYPE_SET_IPV4_DST); >> add_set_flow_action(ipv4_ttl, RTE_FLOW_ACTION_TYPE_SET_TTL); >> +add_set_flow_action(ipv4_tos, >> RTE_FLOW_ACTION_TYPE_SET_IPV4_DSCP);z >> >> if (mask && !is_all_zeros(mask, sizeof *mask)) { >> VLOG_DBG_RL(&rl, "Unsupported IPv4 set action"); >> @@ -1924,6 +1939,8 @@ parse_set_actions(struct flow_actions *actions, >> add_set_flow_action(ipv6_src, >> RTE_FLOW_ACTION_TYPE_SET_IPV6_SRC); >> add_set_flow_action(ipv6_dst, >> RTE_FLOW_ACTION_TYPE_SET_IPV6_DST); >> add_set_flow_action(ipv6_hlimit, RTE_FLOW_ACTION_TYPE_SET_TTL); >> +add_set_flow_action(ipv6_tclass, >> +RTE_FLOW_ACTION_TYPE_SET_IPV6_DSCP); >> >> if (mask && !is_all_zeros(mask, sizeof *mask)) { >> VLOG_DBG_RL(&rl, "Unsupported IPv6 set action"); > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v5 2/2] pinctrl: Handle arp/nd for other address families.
On Wed, Jun 12, 2024 at 2:50 AM Felix Huettner via dev wrote: > > Previously we could only generate ARP requests from IPv4 packets > and NS requests from IPv6 packets. This was the case because we rely on > information in the packet to generate the ARP/NS requests. > > However in case of ARP/NS requests originating from the Logical_Router > pipeline for nexthop lookups we overwrite the affected fields > afterwards. This overwrite is done by the userdata openflow actions. > Because of this we actually do not rely on any information of the IPv4/6 > packets in these cases. > > Unfortunately we can not easily determine if we are actually later > overwriting the affected fields. The approach now is to use the fields > from the IP header if we have a matching IP version and default to some > values otherwise. In case we overwrite this data afterwards we are > generally good. If we do not overwrite this data because of some bug we > will send out invalid ARP/NS requests. They will hopefully be dropped by > the rest of the network. > > The alternative would have been to introduce new arp/nd_ns actions where > we guarantee this overwrite. This would not suffer from the above > limitations, but would require a coordination on upgrades between all > ovn-controllers and northd. > > Signed-off-by: Felix Huettner Hi Felix, Thanks for the patch series and sorry for the delay in reviews. Looks like this patch series requires another rebase. I tested this patch series using the ovn-fake-mutlinode - [1] . After starting it, I added the below static route ovn-nbctl lr-route-add lr1 172.15.0.0/24 3001::b And in the 'ovn-chassis-1' container I ran this command [root@ovn-chassis-1 ~]# ip netns exec sw01p1 ping 172.15.0.50 Since we don't know the next hop, the below logical flow is hit (which is as expected) table=24(lr_in_arp_request ), priority=200 , match=(eth.dst == 00:00:00:00:00:00 && reg9[9] == 0 && xxreg0 == 3001::4), action=(nd_ns { eth.dst = 33:33:ff:00:00:04; ip6.dst = ff02::1:ff00:4; nd.target = 3001::b; output; }; output;) And If I look into the ovn-controller logs (after enabling vconn:dbg) I see that ovn-controller receives the icmp ping packet and it generates IPv6 NS packet which is also as expected 2024-06-28T21:39:06.303Z|00020|vconn(ovn_pinctrl0)|DBG|unix:/var/run/openvswitch/br-int.mgmt: received: NXT_PACKET_IN2 (OF1.5) (xid=0x0): cookie=0x77e2d8b total_len=98 reg0=0x3001,reg1=0xac10016e,reg3=0xb,reg4=0x3001,reg7=0xa,reg9=0x4,reg10=0x1,reg11=0x1,reg12=0x3,reg14=0x1,reg15=0x3,metadata=0x3,in_port=5 (via action) data_len=98 (unbuffered) userdata=00.00.00.09.00.00.00.00.00.1c.00.18.00.80.00.00.00.00.00.00.00.01.de.10.80.00.3e.10.00.00.00.00.ff.ff.00.10.00.00.23.20.00.0e.ff.f8.25.00.00.00 continuation.bridge=b72b4e65-0756-4215-bfbf-c3db3652e8ee continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,37) continuation.odp_port=5 icmp,vlan_tci=0x,dl_src=30:51:00:00:00:03,dl_dst=00:00:00:00:00:00,nw_src=11.0.0.3,nw_dst=172.15.0.50,nw_tos=0,nw_ecn=0,nw_ttl=63,nw_frag=no,icmp_type=8,icmp_code=0 icmp_csum:cf9b 2024-06-28T21:39:06.304Z|00021|vconn(ovn_pinctrl0)|DBG|unix:/var/run/openvswitch/br-int.mgmt: sent (Success): OFPT_PACKET_OUT (OF1.5) (xid=0x413): in_port=CONTROLLER actions=set_field:0x3001->reg0,set_field:0xac10016e->reg1,set_field:0xb->reg3,set_field:0x3001->reg4,set_field:0xa->reg7,set_field:0x4->reg9,set_field:0x1->reg10,set_field:0x1->reg11,set_field:0x3->reg12,set_field:0x1->reg14,set_field:0x3->reg15,set_field:0x3->metadata,move:NXM_NX_XXREG0[]->NXM_NX_ND_TARGET[],resubmit(,37) data_len=86 icmp6,vlan_tci=0x,dl_src=30:51:00:00:00:03,dl_dst=33:33:ff:ff:ff:ff,ipv6_src=fe80::3251:ff:fe00:3,ipv6_dst=ff02::1::,ipv6_label=0x0,nw_tos=0,nw_ecn=0,nw_ttl=255,nw_frag=no,icmp_type=135,icmp_code=0,nd_target=:::::::,nd_sll=30:51:00:00:00:03,nd_tll=00:00:00:00:00:00 icmp6_csum:1877 -- But strangely when this packet leaves the physical interface 'eth2' on ovn-chassis-1, the nd_target is wrongly populated tcpdump on the physical interface (connected to br-ex) --- 17:37:40.289881 30:51:00:00:00:03 > 33:33:ff:ff:ff:ff, ethertype IPv6 (0x86dd), length 86: (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::3251:ff:fe00:3 > ff02::1::: [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, who has 3001:0:ac10:16e::b source link-address option (1), length 8 (1): 30:51:00:00:00:03 --- You can see that the nd_target is '3001:0:ac10:16e::b and not '3001::b', which is strange. The inner action of "nd_ns" clearly sets nd.target = 3001::b. Can you please verify in your end if that is the case ? You can perhaps use ovn-fake-multinode if you can't reproduce in your setup. Thanks Numan > --- > v4->v5: rebase > v4: newly added > > controller/pinctrl.c | 52 +++-- > lib/actions.c| 4 +- > northd/northd.c | 9 +- > tests/ovn-northd.at | 8 +- > tests/ovn.at | 268 +
Re: [ovs-dev] [PATCH 1/1] netdev-dpdk: Check pending reset when adding device.
On 6/13/24 11:40, David Marchand wrote: > On Wed, Jun 12, 2024 at 4:33 PM Kevin Traynor wrote: >> >> When a device reset interrupt event (RTE_ETH_EVENT_INTR_RESET) >> is detected for a DPDK device added to OVS, a device reset is >> performed. >> >> If a device reset interrupt event is detected for a device before >> it is added to OVS, device reset is not called. >> >> If that device is later attempted to be added to OVS, it may fail >> while being configured if it is still pending a reset as pending >> reset is not checked when adding a device. >> >> A simple way to force a reset event from the ice driver for an >> iavf device is to set the mac address after binding iavf dev to >> vfio but before adding to OVS. (note: should not be set like this >> in normal case). e.g. >> >> $ echo 2 > /sys/class/net/ens3f0/device/sriov_numvfs >> $ ./devbind.py -b vfio-pci :d8:01.1 >> $ ip link set ens3f0 vf 1 mac 26:ab:e6:6f:79:4d >> $ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk \ >> options:dpdk-devargs=:d8:01.1 >> >> |dpdk|ERR|Port1 dev_configure = -1 >> |netdev_dpdk|WARN|Interface dpdk0 eth_dev setup error >> Operation not permitted >> |netdev_dpdk|ERR|Interface dpdk0(rxq:1 txq:5 lsc interrupt mode:false) >> configure error: Operation not permitted >> |dpif_netdev|ERR|Failed to set interface dpdk0 new configuration >> >> Add a check if there was any previous device reset interrupt events >> when a device is added to OVS. If there was, perform the reset >> before continuing with the rest of the configuration. >> >> netdev_dpdk_pending_reset[] already tracks device reset interrupt >> events for all devices, so it can be reused to check if there is a >> reset needed during configuration of newly added devices. By extending >> it's usage, dev->reset_needed is no longer needed. >> > > Maybe add a Fixes: and I think we should backport this. > >> Signed-off-by: Kevin Traynor >> --- >> lib/netdev-dpdk.c | 20 >> 1 file changed, 12 insertions(+), 8 deletions(-) > > The patch looks good to me. > Reviewed-by: David Marchand Thanks, Kevin and David! I fixed the nits and applied the change. Also backported to branch-3.3. Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3] netdev-offload-dpdk: Support offload of set dscp action.
On 6/20/24 09:35, Sunyang Wu via dev wrote: > Add the "set dscp action" parsing function, > so that the "set dscp action" can be offloaded. > > Signed-off-by: Sunyang Wu > --- > lib/netdev-offload-dpdk.c | 19 ++- > 1 file changed, 18 insertions(+), 1 deletion(-) > Hi, Sunyang Wu. Thanks for the patch! See some comments inline. Best regards, Ilya Maximets. > diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c > index 623005b1c..524942457 100644 > --- a/lib/netdev-offload-dpdk.c > +++ b/lib/netdev-offload-dpdk.c > @@ -791,6 +791,17 @@ dump_flow_action(struct ds *s, struct ds *s_extra, > ds_put_format(s, "port %"PRIu16" ", ntohs(set_tp->port)); > } > ds_put_cstr(s, "/ "); > +} else if (actions->type == RTE_FLOW_ACTION_TYPE_SET_IPV4_DSCP || > + actions->type == RTE_FLOW_ACTION_TYPE_SET_IPV6_DSCP) { > +const struct rte_flow_action_set_dscp *set_dscp = actions->conf; > +char *dirstr = actions->type == RTE_FLOW_ACTION_TYPE_SET_IPV4_DSCP > + ? "set_ipv4_dscp " : "set_ipv6_dscp "; > + > +ds_put_cstr(s, dirstr); > +if (set_dscp) { > +ds_put_format(s, "dscp_value %d ", set_dscp->dscp); > +} > +ds_put_cstr(s, "/ "); > } else if (actions->type == RTE_FLOW_ACTION_TYPE_OF_PUSH_VLAN) { > const struct rte_flow_action_of_push_vlan *of_push_vlan = > actions->conf; > @@ -1835,7 +1846,10 @@ add_set_flow_action__(struct flow_actions *actions, > if (is_all_zeros(mask, size)) { > return 0; > } > -if (!is_all_ones(mask, size)) { > +if (!is_all_ones(mask, size) && > +/* set dscp need patial mask */ > +attr != RTE_FLOW_ACTION_TYPE_SET_IPV4_DSCP && > +attr != RTE_FLOW_ACTION_TYPE_SET_IPV6_DSCP) { This doesn't seem right. We need to check that the mask contains all the DSCP bits and that it doesn't have anything else inside. For example, if we try to set ECN bits, we'll have them in a mask. In this case is_all_zeros() check will fail and then this check will allow us to proceed as well. In the end, we'll end up setting DSCP bits to all zeroes, or worse, to whatever was in the first six bits of nw_tos. > VLOG_DBG_RL(&rl, "Partial mask is not supported"); > return -1; > } > @@ -1912,6 +1926,7 @@ parse_set_actions(struct flow_actions *actions, > add_set_flow_action(ipv4_src, RTE_FLOW_ACTION_TYPE_SET_IPV4_SRC); > add_set_flow_action(ipv4_dst, RTE_FLOW_ACTION_TYPE_SET_IPV4_DST); > add_set_flow_action(ipv4_ttl, RTE_FLOW_ACTION_TYPE_SET_TTL); > +add_set_flow_action(ipv4_tos, > RTE_FLOW_ACTION_TYPE_SET_IPV4_DSCP);z > > if (mask && !is_all_zeros(mask, sizeof *mask)) { > VLOG_DBG_RL(&rl, "Unsupported IPv4 set action"); > @@ -1924,6 +1939,8 @@ parse_set_actions(struct flow_actions *actions, > add_set_flow_action(ipv6_src, RTE_FLOW_ACTION_TYPE_SET_IPV6_SRC); > add_set_flow_action(ipv6_dst, RTE_FLOW_ACTION_TYPE_SET_IPV6_DST); > add_set_flow_action(ipv6_hlimit, RTE_FLOW_ACTION_TYPE_SET_TTL); > +add_set_flow_action(ipv6_tclass, > +RTE_FLOW_ACTION_TYPE_SET_IPV6_DSCP); > > if (mask && !is_all_zeros(mask, sizeof *mask)) { > VLOG_DBG_RL(&rl, "Unsupported IPv6 set action"); ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] Userspace: Software fallback for UDP encapsulated TCP segmentation.
On Fri, Jun 28, 2024 at 4:50 AM David Marchand wrote: > > On Thu, Jun 27, 2024 at 4:04 PM David Marchand > wrote: > > @@ -926,6 +926,17 @@ netdev_send(struct netdev *netdev, int qid, > > struct dp_packet_batch *batch, > > return netdev_send_tso(netdev, qid, batch, > > concurrent_txq); > > } > > } > > +} else if (!(netdev_flags & NETDEV_TX_OFFLOAD_OUTER_UDP_CKSUM)) { > > +DP_PACKET_BATCH_FOR_EACH (i, packet, batch) { > > +if (!dp_packet_hwol_is_tso(packet) && > > +!dp_packet_hwol_is_tunnel_vxlan(packet) && > > +!dp_packet_hwol_is_tunnel_geneve(packet)) { > > +continue; > > Erm, less buggy: Thanks for the review David! These are some good ideas. I'll send in a v3 with them included. Cheers, M > > +if (!dp_packet_hwol_is_tso(packet) || > +(!dp_packet_hwol_is_tunnel_vxlan(packet) && > + !dp_packet_hwol_is_tunnel_geneve(packet))) { > +continue; > +} > > > +} > > +if (dp_packet_hwol_is_outer_udp_cksum(packet)) { > > +return netdev_send_tso(netdev, qid, batch, > > concurrent_txq); > > +} > > +} > > } > > } > > > -- > David Marchand > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v6 05/10] net: openvswitch: add psample action
On Fri, Jun 28, 2024 at 01:05:41PM +0200, Adrian Moreno wrote: > Add support for a new action: psample. > > This action accepts a u32 group id and a variable-length cookie and uses > the psample multicast group to make the packet available for > observability. > > The maximum length of the user-defined cookie is set to 16, same as > tc_cookie, to discourage using cookies that will not be offloadable. > > Signed-off-by: Adrian Moreno ... > diff --git a/include/uapi/linux/openvswitch.h > b/include/uapi/linux/openvswitch.h > index efc82c318fa2..07086759556b 100644 > --- a/include/uapi/linux/openvswitch.h > +++ b/include/uapi/linux/openvswitch.h > @@ -914,6 +914,31 @@ struct check_pkt_len_arg { > }; > #endif > > +#define OVS_PSAMPLE_COOKIE_MAX_SIZE 16 > +/** > + * enum ovs_pample_attr - Attributes for %OVS_ACTION_ATTR_PSAMPLE nit: s/ovs_pample_attr/ovs_psample_attr/ > + * action. > + * > + * @OVS_PSAMPLE_ATTR_GROUP: 32-bit number to identify the source of the > + * sample. > + * @OVS_PSAMPLE_ATTR_COOKIE: An optional variable-length binary cookie that > + * contains user-defined metadata. The maximum length is > + * OVS_PSAMPLE_COOKIE_MAX_SIZE bytes. > + * > + * Sends the packet to the psample multicast group with the specified group > and > + * cookie. It is possible to combine this action with the > + * %OVS_ACTION_ATTR_TRUNC action to limit the size of the sample. > + */ > +enum ovs_psample_attr { > + OVS_PSAMPLE_ATTR_GROUP = 1, /* u32 number. */ > + OVS_PSAMPLE_ATTR_COOKIE,/* Optional, user specified cookie. */ > + > + /* private: */ > + __OVS_PSAMPLE_ATTR_MAX > +}; ... ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3] python: ovsdb-idl: Add custom transaction operations.
It can be useful to be able to send raw transaction operations through the Idl's connection. For example, to clean up MAC_Binding entries for floating IPs without having to monitor the MAC_Binding table which can be quite large. Signed-off-by: Terry Wilson --- NEWS | 2 ++ python/ovs/db/idl.py | 21 - tests/ovsdb-idl.at | 27 tests/test-ovsdb.py | 73 4 files changed, 122 insertions(+), 1 deletion(-) diff --git a/NEWS b/NEWS index d05f2d0f8..fca983e5a 100644 --- a/NEWS +++ b/NEWS @@ -12,6 +12,8 @@ Post-v3.3.0 * Link status changes are now handled via interrupt mode if the DPDK driver supports it. It is possible to revert to polling mode by setting per interface 'options:dpdk-lsc-interrupt' to 'false'. + - Python: + * Add custom transaction support to the Idl via add_op(). v3.3.0 - 16 Feb 2024 diff --git a/python/ovs/db/idl.py b/python/ovs/db/idl.py index b6d5ed697..c1caab7b2 100644 --- a/python/ovs/db/idl.py +++ b/python/ovs/db/idl.py @@ -1708,6 +1708,8 @@ class Transaction(object): self._inserted_rows = {} # Map from UUID to _InsertedRow +self._operations = [] + def add_comment(self, comment): """Appends 'comment' to the comments that will be passed to the OVSDB server when this transaction is committed. (The comment will be @@ -1843,7 +1845,7 @@ class Transaction(object): "rows": [rows]}) # Add updates. -any_updates = False +any_updates = bool(self._operations) for row in self._txn_rows.values(): if row._changes is None: if row._table.is_root: @@ -1978,6 +1980,8 @@ class Transaction(object): operations.append({"op": "comment", "comment": "\n".join(self._comments)}) +operations += self._operations + # Dry run? if self.dry_run: operations.append({"op": "abort"}) @@ -1996,6 +2000,21 @@ class Transaction(object): self.__disassemble() return self._status +def add_op(self, op): +"""Add a raw OVSDB operation to the transaction + +This can be useful for re-using the existing Idl connection to take +actions that are difficult or expensive to do with the Idl itself, e.g. +bulk deleting rows from the server without downloading them into a +local cache. + +All ops are applied after any other operations in the transaction + +:param op: An "op" for an OVSDB "transact" request (rfc 7047 Sec 5.2) +:type op: dict +""" +self._operations.append(op) + def commit_block(self): """Attempts to commit this transaction, blocking until the commit either succeeds or fails. Returns the final commit status, which may diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at index b9dc0bdea..9070ea051 100644 --- a/tests/ovsdb-idl.at +++ b/tests/ovsdb-idl.at @@ -2863,6 +2863,33 @@ OVSDB_CHECK_IDL_PERS_UUID_INSERT([simple idl, persistent uuid insert], [['This UUID would duplicate a UUID already present within the table or deleted within the same transaction']]) +OVSDB_CHECK_IDL_PY([simple idl, python, add_op], + [], + [['insert 1, insert 2, insert 3, insert 1' \ +'add_op {"op": "delete", "table": "simple", "where": [["i", "==", 1]]}' \ +'add_op {"op": "insert", "table": "simple", "row": {"i": 2}}, delete 3' \ +'insert 2, add_op {"op": "update", "table": "simple", "row": {"i": 1}, "where": [["i", "==", 2]]}' + ]], + [[000: empty +001: commit, status=success +002: table simple: i=1 r=0 b=false s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<1> +002: table simple: i=1 r=0 b=false s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<2> +002: table simple: i=2 r=0 b=false s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<3> +002: table simple: i=3 r=0 b=false s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<4> +003: commit, status=success +004: table simple: i=2 r=0 b=false s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<3> +004: table simple: i=3 r=0 b=false s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<4> +005: commit, status=success +006: table simple: i=2 r=0 b=false s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<3> +006: table simple: i=2 r=0 b=false s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<5> +007: commit, status=success +008: table simple: i=1 r=0 b=false s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<3> +008: table simple: i=1 r=0 b=false s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<5> +008: table simple: i=1 r=0 b=false s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<6> +009: done +]],[],sort) + + m4_define([OVSDB_CHECK_IDL_CHANGE_AWARE], [AT_SETUP([simple idl, database change aware, online conversion - $1]) AT_KEYWORDS([ovsdb server idl db_change_aware conversion $1]) diff --git a/tests/test-ovsdb.py b/tests/test-ovsdb.py index 67a4
Re: [ovs-dev] [PATCH net-next v6 05/10] net: openvswitch: add psample action
On 28 Jun 2024, at 13:05, Adrian Moreno wrote: > Add support for a new action: psample. > > This action accepts a u32 group id and a variable-length cookie and uses > the psample multicast group to make the packet available for > observability. > > The maximum length of the user-defined cookie is set to 16, same as > tc_cookie, to discourage using cookies that will not be offloadable. > > Signed-off-by: Adrian Moreno I think this patch looks good. After some offline discussion on alignment with the userspace model, we decided to proceed with a psample() specific action. With that in mind, and considering the additional changes, this patch looks good to me. Acked-by: Eelco Chaudron echau...@redhat.com Cheers, Eelco ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v6 05/10] net: openvswitch: add psample action
Adrian Moreno writes: > Add support for a new action: psample. > > This action accepts a u32 group id and a variable-length cookie and uses > the psample multicast group to make the packet available for > observability. > > The maximum length of the user-defined cookie is set to 16, same as > tc_cookie, to discourage using cookies that will not be offloadable. > > Signed-off-by: Adrian Moreno > --- I didn't thoroughly review this, but just wanted to comment that I like the idea of a psample action here specific to the actual action that is being performed on the packet - psample. Much like we do for userspace and other actions. > Documentation/netlink/specs/ovs_flow.yaml | 17 > include/uapi/linux/openvswitch.h | 28 ++ > net/openvswitch/Kconfig | 1 + > net/openvswitch/actions.c | 47 +++ > net/openvswitch/flow_netlink.c| 32 ++- > 5 files changed, 124 insertions(+), 1 deletion(-) > > diff --git a/Documentation/netlink/specs/ovs_flow.yaml > b/Documentation/netlink/specs/ovs_flow.yaml > index 4fdfc6b5cae9..46f5d1cd8a5f 100644 > --- a/Documentation/netlink/specs/ovs_flow.yaml > +++ b/Documentation/netlink/specs/ovs_flow.yaml > @@ -727,6 +727,12 @@ attribute-sets: > name: dec-ttl > type: nest > nested-attributes: dec-ttl-attrs > + - > +name: psample > +type: nest > +nested-attributes: psample-attrs > +doc: | > + Sends a packet sample to psample for external observation. >- > name: tunnel-key-attrs > enum-name: ovs-tunnel-key-attr > @@ -938,6 +944,17 @@ attribute-sets: >- > name: gbp > type: u32 > + - > +name: psample-attrs > +enum-name: ovs-psample-attr > +name-prefix: ovs-psample-attr- > +attributes: > + - > +name: group > +type: u32 > + - > +name: cookie > +type: binary > > operations: >name-prefix: ovs-flow-cmd- > diff --git a/include/uapi/linux/openvswitch.h > b/include/uapi/linux/openvswitch.h > index efc82c318fa2..07086759556b 100644 > --- a/include/uapi/linux/openvswitch.h > +++ b/include/uapi/linux/openvswitch.h > @@ -914,6 +914,31 @@ struct check_pkt_len_arg { > }; > #endif > > +#define OVS_PSAMPLE_COOKIE_MAX_SIZE 16 > +/** > + * enum ovs_pample_attr - Attributes for %OVS_ACTION_ATTR_PSAMPLE > + * action. > + * > + * @OVS_PSAMPLE_ATTR_GROUP: 32-bit number to identify the source of the > + * sample. > + * @OVS_PSAMPLE_ATTR_COOKIE: An optional variable-length binary cookie that > + * contains user-defined metadata. The maximum length is > + * OVS_PSAMPLE_COOKIE_MAX_SIZE bytes. > + * > + * Sends the packet to the psample multicast group with the specified group > and > + * cookie. It is possible to combine this action with the > + * %OVS_ACTION_ATTR_TRUNC action to limit the size of the sample. > + */ > +enum ovs_psample_attr { > + OVS_PSAMPLE_ATTR_GROUP = 1, /* u32 number. */ > + OVS_PSAMPLE_ATTR_COOKIE,/* Optional, user specified cookie. */ > + > + /* private: */ > + __OVS_PSAMPLE_ATTR_MAX > +}; > + > +#define OVS_PSAMPLE_ATTR_MAX (__OVS_PSAMPLE_ATTR_MAX - 1) > + > /** > * enum ovs_action_attr - Action types. > * > @@ -966,6 +991,8 @@ struct check_pkt_len_arg { > * of l3 tunnel flag in the tun_flags field of OVS_ACTION_ATTR_ADD_MPLS > * argument. > * @OVS_ACTION_ATTR_DROP: Explicit drop action. > + * @OVS_ACTION_ATTR_PSAMPLE: Send a sample of the packet to external > observers > + * via psample. > * > * Only a single header can be set with a single %OVS_ACTION_ATTR_SET. Not > all > * fields within a header are modifiable, e.g. the IPv4 protocol and fragment > @@ -1004,6 +1031,7 @@ enum ovs_action_attr { > OVS_ACTION_ATTR_ADD_MPLS, /* struct ovs_action_add_mpls. */ > OVS_ACTION_ATTR_DEC_TTL, /* Nested OVS_DEC_TTL_ATTR_*. */ > OVS_ACTION_ATTR_DROP, /* u32 error code. */ > + OVS_ACTION_ATTR_PSAMPLE, /* Nested OVS_PSAMPLE_ATTR_*. */ > > __OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted > * from userspace. */ > diff --git a/net/openvswitch/Kconfig b/net/openvswitch/Kconfig > index 29a7081858cd..2535f3f9f462 100644 > --- a/net/openvswitch/Kconfig > +++ b/net/openvswitch/Kconfig > @@ -10,6 +10,7 @@ config OPENVSWITCH > (NF_CONNTRACK && ((!NF_DEFRAG_IPV6 || NF_DEFRAG_IPV6) && \ >(!NF_NAT || NF_NAT) && \ >(!NETFILTER_CONNCOUNT || > NETFILTER_CONNCOUNT))) > + depends on PSAMPLE || !PSAMPLE > select LIBCRC32C > select MPLS > select NET_MPLS_GSO > diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c > index 964225580824..a035b7e677dd 100644 > --- a/net/openvswitch/actions.c > +++ b/net/openvswitch/actions.c > @@ -24,6 +24,11 @@ > #include
Re: [ovs-dev] [PATCH net-next v3 0/7] selftests: net: Switch pmtu.sh to use the internal ovs script.
Jakub Kicinski writes: > On Tue, 25 Jun 2024 13:22:38 -0400 Aaron Conole wrote: >> Currently, if a user wants to run pmtu.sh and cover all the provided test >> cases, they need to install the Open vSwitch userspace utilities. This >> dependency is difficult for users as well as CI environments, because the >> userspace build and setup may require lots of support and devel packages >> to be installed, system setup to be correct, and things like permissions >> and selinux policies to be properly configured. > > Hi Aaron! > > I merged this yesterday (with slight alphabetical reshuffling of > the config options). The pmtu.sh test is solid now, which is great! :) Thanks! That's great to see. > I also added the OvS tests themselves, and those are not passing, yet: > https://netdev.bots.linux.dev/contest.html?test=openvswitch-sh > Could you take a look and LMK if these are likely env issues or > something bad in the test itself? I saw that. I was looking for a place in the nipa repository where I could submit a small fix, because I noticed in the stdout: make -C tools/testing/selftests TARGETS="net/openvswitch" TEST_PROGS=openvvswitch.sh TEST_GEN_PROGS="" run_tests and I think the TEST_PROGS=openvvswitch.sh is misspelled (but it seems to not matter too much for the run_test target). >From what I understand, there are two things causing it to be flaky. First, the module detection is a bit flaky (and that's why it results is some 'skip' reports). Additionally, the connection oriented tests include negative cases and those hit timeouts. The default is to declare failure after 45s. That can be seen in: https://netdev-3.bots.linux.dev/vmksft-net/results/659601/91-openvswitch-sh/stdout ... # timeout set to 45 ... # TEST: nat_connect_v4[START] # Terminated # Terminated This is showing that the timeout is too short. I have patches ready for these issues, but I didn't know if you would like me to submit config and settings files to go under net/openvswitch, or if you would prefer to see the openvswitch.sh script, and ovs-dpctl.py utilities move out of their net/openvswitch/ directory. If the latter, I can submit patches quickly with config and settings (and a small change to the script itself) that addresses these. If you'd prefer the former (moving around the files), I'll need to spend some additional time modifying pmtu and doing a larger test. I don't have a strong opinion on either approach. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v6 00/10] net: openvswitch: Add sample multicasting.
On Fri, Jun 28, 2024 at 01:05:36PM GMT, Adrian Moreno wrote: > ** Background ** > Currently, OVS supports several packet sampling mechanisms (sFlow, > per-bridge IPFIX, per-flow IPFIX). These end up being translated into a > userspace action that needs to be handled by ovs-vswitchd's handler > threads only to be forwarded to some third party application that > will somehow process the sample and provide observability on the > datapath. > > A particularly interesting use-case is controller-driven > per-flow IPFIX sampling where the OpenFlow controller can add metadata > to samples (via two 32bit integers) and this metadata is then available > to the sample-collecting system for correlation. > > ** Problem ** > The fact that sampled traffic share netlink sockets and handler thread > time with upcalls, apart from being a performance bottleneck in the > sample extraction itself, can severely compromise the datapath, > yielding this solution unfit for highly loaded production systems. > > Users are left with little options other than guessing what sampling > rate will be OK for their traffic pattern and system load and dealing > with the lost accuracy. > > Looking at available infrastructure, an obvious candidated would be > to use psample. However, it's current state does not help with the > use-case at stake because sampled packets do not contain user-defined > metadata. > > ** Proposal ** > This series is an attempt to fix this situation by extending the > existing psample infrastructure to carry a variable length > user-defined cookie. > > The main existing user of psample is tc's act_sample. It is also > extended to forward the action's cookie to psample. > > Finally, a new OVS action (OVS_SAMPLE_ATTR_EMIT_SAMPLE) is created. > It accepts a group and an optional cookie and uses psample to > multicast the packet and the metadata. > > -- > v5 -> v6: > - Renamed emit_sample -> psample > - Addressed unused variable and conditionally compilation of function. > > v4 -> v5: > - Rebased. > - Removed lefover enum value and wrapped some long lines in selftests. > > v3 -> v4: > - Rebased. > - Addressed Jakub's comment on private and unused nla attributes. > > v2 -> v3: > - Addressed comments from Simon, Aaron and Ilya. > - Dropped probability propagation in nested sample actions. > - Dropped patch v2's 7/9 in favor of a userspace implementation and > consume skb if emit_sample is the last action, same as we do with > userspace. > - Split ovs-dpctl.py features in independent patches. > > v1 -> v2: > - Create a new action ("emit_sample") rather than reuse existing > "sample" one. > - Add probability semantics to psample's sampling rate. > - Store sampling probability in skb's cb area and use it in emit_sample. > - Test combining "emit_sample" with "trunc" > - Drop group_id filtering and tracepoint in psample. > > rfc_v2 -> v1: > - Accommodate Ilya's comments. > - Split OVS's attribute in two attributes and simplify internal > handling of psample arguments. > - Extend psample and tc with a user-defined cookie. > - Add a tracepoint to psample to facilitate troubleshooting. > > rfc_v1 -> rfc_v2: > - Use psample instead of a new OVS-only multicast group. > - Extend psample and tc with a user-defined cookie. > > > Adrian Moreno (10): > net: psample: add user cookie > net: sched: act_sample: add action cookie to sample > net: psample: skip packet copy if no listeners > net: psample: allow using rate as probability > net: openvswitch: add psample action > net: openvswitch: store sampling probability in cb. > selftests: openvswitch: add psample action > selftests: openvswitch: add userspace parsing > selftests: openvswitch: parse trunc action > selftests: openvswitch: add psample test > > Documentation/netlink/specs/ovs_flow.yaml | 17 ++ > include/net/psample.h | 5 +- > include/uapi/linux/openvswitch.h | 31 +- > include/uapi/linux/psample.h | 11 +- > net/openvswitch/Kconfig | 1 + > net/openvswitch/actions.c | 65 - > net/openvswitch/datapath.h| 3 + > net/openvswitch/flow_netlink.c| 32 ++- > net/openvswitch/vport.c | 1 + > net/psample/psample.c | 16 +- > net/sched/act_sample.c| 12 + > .../selftests/net/openvswitch/openvswitch.sh | 115 +++- > .../selftests/net/openvswitch/ovs-dpctl.py| 272 +- > 13 files changed, 565 insertions(+), 16 deletions(-) > > -- > 2.45.2 > Patchwork says this patch is not applying on net-next. I'll wait for some reviews and rebase+resubmit it later tonight or tomorrow. Thanks. Adrián ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC v2 5/9] ofproto-dpif-xlate: Use emit_sample for local sample.
On Wed, Jun 26, 2024 at 02:11:51PM GMT, Eelco Chaudron wrote: > On 5 Jun 2024, at 22:23, Adrian Moreno wrote: > > > Use the newly added emit_sample to implement OpenFlow sample() actions > > with local sampling configuration. > > See some comments below. > > Cheers, > > Eelco > > > Signed-off-by: Adrian Moreno > > --- > > ofproto/ofproto-dpif-lsample.c | 17 > > ofproto/ofproto-dpif-lsample.h | 6 ++ > > ofproto/ofproto-dpif-xlate.c | 163 - > > ofproto/ofproto-dpif-xlate.h | 3 +- > > ofproto/ofproto-dpif.c | 2 +- > > 5 files changed, 144 insertions(+), 47 deletions(-) > > > > diff --git a/ofproto/ofproto-dpif-lsample.c b/ofproto/ofproto-dpif-lsample.c > > index 7bdafac19..2c0b5da89 100644 > > --- a/ofproto/ofproto-dpif-lsample.c > > +++ b/ofproto/ofproto-dpif-lsample.c > > @@ -139,6 +139,23 @@ dpif_lsample_set_options(struct dpif_lsample *lsample, > > return changed; > > } > > > > +/* Returns the group_id of the exporter with the given collector_set_id, > > if it > > + * exists. */ > > nit: The below will fit on one line > > /* Returns the exporter group_id for a given collector_set_id, if it exists. > */ > Ack. > > +bool > > +dpif_lsample_get_group_id(struct dpif_lsample *ps, uint32_t > > collector_set_id, > > + uint32_t *group_id) > > +{ > > +struct lsample_exporter_node *node; > > +bool found = false; > > + > > +node = dpif_lsample_find_exporter_node(ps, collector_set_id); > > +if (node) { > > +found = true; > > +*group_id = node->exporter.options.group_id; > > +} > > +return found; > > +} > > + > > struct dpif_lsample * > > dpif_lsample_create(void) > > { > > diff --git a/ofproto/ofproto-dpif-lsample.h b/ofproto/ofproto-dpif-lsample.h > > index c23ea8372..f13425575 100644 > > --- a/ofproto/ofproto-dpif-lsample.h > > +++ b/ofproto/ofproto-dpif-lsample.h > > @@ -19,6 +19,7 @@ > > > > #include > > #include > > +#include > > Maybe add in alphabetical order. > Ack. > > struct dpif_lsample; > > struct ofproto_lsample_options; > > @@ -31,4 +32,9 @@ bool dpif_lsample_set_options(struct dpif_lsample *, > >const struct ofproto_lsample_options *, > >size_t n_opts); > > > > +bool dpif_lsample_get_group_id(struct dpif_lsample *, > > + uint32_t collector_set_id, > > + uint32_t *group_id); > > + > > #endif /* OFPROTO_DPIF_LSAMPLE_H */ > > + > > Accedantely added a newline? > Ack. > > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c > > index 7c4950895..5bd215d31 100644 > > --- a/ofproto/ofproto-dpif-xlate.c > > +++ b/ofproto/ofproto-dpif-xlate.c > > @@ -47,6 +47,7 @@ > > #include "ofproto/ofproto-dpif-ipfix.h" > > #include "ofproto/ofproto-dpif-mirror.h" > > #include "ofproto/ofproto-dpif-monitor.h" > > +#include "ofproto/ofproto-dpif-lsample.h" > > #include "ofproto/ofproto-dpif-sflow.h" > > #include "ofproto/ofproto-dpif-trace.h" > > #include "ofproto/ofproto-dpif-xlate-cache.h" > > @@ -117,6 +118,7 @@ struct xbridge { > > struct dpif_sflow *sflow; /* SFlow handle, or null. */ > > struct dpif_ipfix *ipfix; /* Ipfix handle, or null. */ > > struct netflow *netflow; /* Netflow handle, or null. */ > > +struct dpif_lsample *lsample; /* Local sample handle, or null. */ > > struct stp *stp; /* STP or null if disabled. */ > > struct rstp *rstp;/* RSTP or null if disabled. */ > > > > @@ -687,6 +689,7 @@ static void xlate_xbridge_set(struct xbridge *, struct > > dpif *, > >const struct dpif_sflow *, > >const struct dpif_ipfix *, > >const struct netflow *, > > + const struct dpif_lsample *, > >bool forward_bpdu, bool has_in_band, > >const struct dpif_backer_support *, > >const struct xbridge_addr *); > > @@ -1070,6 +1073,7 @@ xlate_xbridge_set(struct xbridge *xbridge, > >const struct dpif_sflow *sflow, > >const struct dpif_ipfix *ipfix, > >const struct netflow *netflow, > > + const struct dpif_lsample *lsample, > > nit: I would move above netflow, as you also do the actual init/unref before > netflow. Ack. > >bool forward_bpdu, bool has_in_band, > >const struct dpif_backer_support *support, > >const struct xbridge_addr *addr) > > @@ -1099,6 +1103,11 @@ xlate_xbridge_set(struct xbridge *xbridge, > > xbridge->ipfix = dpif_ipfix_ref(ipfix); > > } > > > > +if (xbridge->lsample != lsample) { > > +dpif_lsample_unref(xbridge->lsample); > > +xbridge->lsample = dpif_lsample_re
Re: [ovs-dev] RFC OVN: fabric integration
On 6/28/24 17:38, Dumitru Ceara wrote: > On 6/28/24 15:05, Ilya Maximets wrote: >> On 6/28/24 11:03, Ales Musil wrote: >>> Hi Frode, >>> >>> looking forward to the RFC. AFAIU it means that the routes would be exposed >>> on >>> LR, more specifically GW router. Would it make sense to allow this behavior >>> for >>> provider networks (LS with localnet port)? In that case we could advertise >>> chassis-local information from logical routers attached to BGP-enabled >>> switches. E.g.: FIPs, LBs. It would cover the use case for distributed >>> routers. To achieve that we should have BGP peers for each chassis that the >>> LS >>> is local on. >> >> I haven't read the whole thing yet, but can we, please, stop adding routing >> features >> to switches? :) If someone wants routing, they should use a router, IMO. >> > > I'm fairly certain that there are precedents in "classic" network > appliances: switches that can do a certain amount of routing (even run BGP). > > In this case we could add a logical router but I'm not sure that > simplifies things. > "classic" network appliances are a subject for restrictions of a physical material world. It's just way easier and cheaper to acquire and install a single physical box instead of N. This is not a problem for a virtual network. AP+router+switch+modem combo boxes are also "classic" last mile network appliances that we just call a "router". It doesn't mean we should implement one. The distinction between OVN logical routers and switches is there for a reason. That is so you can look at the logical topology and understand how it works more or less without diving into configuration of every single part of it. If switches do routing and routers do switching, what's the point of differentiation? It only brings more confusion. Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] RFC OVN: fabric integration
On 6/28/24 15:05, Ilya Maximets wrote: > On 6/28/24 11:03, Ales Musil wrote: >> Hi Frode, >> >> looking forward to the RFC. AFAIU it means that the routes would be exposed >> on >> LR, more specifically GW router. Would it make sense to allow this behavior >> for >> provider networks (LS with localnet port)? In that case we could advertise >> chassis-local information from logical routers attached to BGP-enabled >> switches. E.g.: FIPs, LBs. It would cover the use case for distributed >> routers. To achieve that we should have BGP peers for each chassis that the >> LS >> is local on. > > I haven't read the whole thing yet, but can we, please, stop adding routing > features > to switches? :) If someone wants routing, they should use a router, IMO. > I'm fairly certain that there are precedents in "classic" network appliances: switches that can do a certain amount of routing (even run BGP). In this case we could add a logical router but I'm not sure that simplifies things. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v3 0/7] selftests: net: Switch pmtu.sh to use the internal ovs script.
On Tue, 25 Jun 2024 13:22:38 -0400 Aaron Conole wrote: > Currently, if a user wants to run pmtu.sh and cover all the provided test > cases, they need to install the Open vSwitch userspace utilities. This > dependency is difficult for users as well as CI environments, because the > userspace build and setup may require lots of support and devel packages > to be installed, system setup to be correct, and things like permissions > and selinux policies to be properly configured. Hi Aaron! I merged this yesterday (with slight alphabetical reshuffling of the config options). The pmtu.sh test is solid now, which is great! I also added the OvS tests themselves, and those are not passing, yet: https://netdev.bots.linux.dev/contest.html?test=openvswitch-sh Could you take a look and LMK if these are likely env issues or something bad in the test itself? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] vswitchd: Only lock pages that are faulted in.
On 14 Jun 2024, at 14:22, Ilya Maximets wrote: > The main purpose of locking the memory is to ensure that OVS can keep > doing what it did before in case of increased memory pressure, e.g., > during VM ingest / migration. Fulfilling this requirement can be > achieved without locking all the allocated memory, but only the pages > already accessed in the past (faulted in). Processing of the new > traffic involves new memory allocations. Latency on these operations > can't be guaranteed by the locking. The main difference would be > the pre-faulting of the stack memory. However, in order to revalidate > or process upcalls on the same traffic, the same amount of stack is > likely needed, so all the necessary memory will already be faulted in. > > Switch 'mlockall' to MCL_ONFAULT to avoid consuming unnecessarily > large amounts of RAM on systems with high core counts. For example, > in a densely populated OVN cluster this saves about 650 MB of RAM per > node on a system with 64 cores. This equates to 320 GB of allocated > but unused RAM in a 500 node cluster. > > This also makes OVS better suited by default for small systems with > limited amount of memory. > > The MCL_ONFAULT flag was introduced in Linux kernel 4.4 and wasn't > available at the time of '--mlockall' introduction, but we can use it > now. Falling back to an old way of locking in case we're running on > an older kernel just in case. > > Only locking the faulted in pages also makes locking compatible with > vhost post-copy live migration by default, because we'll no longer > pre-fault all the guest's memory. Post-copy relies on userfaultfd > to work on shared huge pages, which is only available in 4.11+ kernels. > So, technically, it should not be possible for MCL_ONFAULT to fail and > the call without it to succeed. But keeping the check just in case > for now. > > Signed-off-by: Ilya Maximets This change looks good. I did some performance testing and I noticed not negative impacts. Acked-by: Eelco Chaudron ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next v5 05/10] net: openvswitch: add emit_sample action
Adrián Moreno writes: > On Thu, Jun 27, 2024 at 09:30:08AM GMT, Aaron Conole wrote: >> Ilya Maximets writes: >> >> > On 6/27/24 12:15, Adrián Moreno wrote: >> >> On Thu, Jun 27, 2024 at 11:31:41AM GMT, Eelco Chaudron wrote: >> >>> >> >>> >> >>> On 27 Jun 2024, at 11:23, Ilya Maximets wrote: >> >>> >> On 6/27/24 11:14, Eelco Chaudron wrote: >> > >> > >> > On 27 Jun 2024, at 10:36, Ilya Maximets wrote: >> > >> >> On 6/27/24 09:52, Adrián Moreno wrote: >> >>> On Thu, Jun 27, 2024 at 09:06:46AM GMT, Eelco Chaudron wrote: >> >> >> On 26 Jun 2024, at 22:34, Adrián Moreno wrote: >> >> > On Wed, Jun 26, 2024 at 04:28:17PM GMT, Eelco Chaudron wrote: >> >> >> >> >> >> On 25 Jun 2024, at 22:51, Adrian Moreno wrote: >> >> >> >>> Add support for a new action: emit_sample. >> >>> >> >>> This action accepts a u32 group id and a variable-length >> >>> cookie and uses >> >>> the psample multicast group to make the packet available for >> >>> observability. >> >>> >> >>> The maximum length of the user-defined cookie is set to >> >>> 16, same as >> >>> tc_cookie, to discourage using cookies that will not be >> >>> offloadable. >> >> >> >> I’ll add the same comment as I had in the user space part, and >> >> that >> >> is that I feel from an OVS perspective this action should be >> >> called >> >> emit_local() instead of emit_sample() to make it Datapath >> >> independent. >> >> Or quoting the earlier comment: >> >> >> >> >> >> “I’ll start the discussion again on the naming. The name >> >> "emit_sample()" >> >> does not seem appropriate. This function's primary role >> >> is to copy the >> >> packet and send it to a local collector, which varies >> >> depending on the >> >> datapath. For the kernel datapath, this collector is >> >> psample, while for >> >> userspace, it will likely be some kind of probe. This >> >> action is distinct >> >> from the sample() action by design; it is a standalone >> >> action that can >> >> be combined with others. >> >> >> >> Furthermore, the action itself does not involve taking a sample; >> >> it >> >> consistently pushes the packet to the local collector. Therefore, >> >> I >> >> suggest renaming "emit_sample()" to "emit_local()". This >> >> same goes for >> >> all the derivative ATTR naming.” >> >> >> > >> > This is a blurry semantic area. >> > IMO, "sample" is the act of extracting (potentially a piece of) >> > someting, in this case, a packet. It is common to only >> > take some packets >> > as samples, so this action usually comes with some kind of >> > "rate", but >> > even if the rate is 1, it's still sampling in this context. >> > >> > OTOH, OVS kernel design tries to be super-modular and define small >> > combinable actions, so the rate or probability generation >> > is done with >> > another action which is (IMHO unfortunately) named "sample". >> > >> > With that interpretation of the term it would actually >> > make more sense >> > to rename "sample" to something like "random" (of course I'm not >> > suggestion we do it). "sample" without any nested action >> > that actually >> > sends the packet somewhere is not sampling, it's just >> > doing something or >> > not based on a probability. Where as "emit_sample" is >> > sampling even if >> > it's not nested inside a "sample". >> >> You're assuming we are extracting a packet for sampling, >> but this function >> can be used for various other purposes. For instance, it >> could handle the >> packet outside of the OVS pipeline through an eBPF program >> (so we are not >> taking a sample, but continue packet processing outside of the OVS >> pipeline). Calling it emit_sampling() in such cases could be very >> confusing. >> >> >> >> We can't change the implementation of the action once it is >> >> part of uAPI. >> >> We have to document where users can find these packets and we >> >> can't just >> >> change the destination later. >> > >> > I'm not suggesting we change the uAPI implementation, but we >> > could use the >> > emit_xxx() action with an eBPF probe on the action to perform >> > other tasks. >> > This is just an example. >> >> Yeah, but as Adrian said below, you could do that with any action and >> this doesn't change the semantics of the actio
Re: [ovs-dev] [RFC v2 9/9] tests: Test local sampling.
On Wed, Jun 26, 2024 at 02:15:02PM GMT, Eelco Chaudron wrote: > On 5 Jun 2024, at 22:23, Adrian Moreno wrote: > > > Test simultaneous IPFIX and local sampling including slow-path. > > I guess Ilya's comments on this patch summarize most of my comments. I had > one small additional question. See below. > > //Eelco > > > Signed-off-by: Adrian Moreno > > --- > > tests/system-common-macros.at | 4 ++ > > tests/system-traffic.at | 105 ++ > > 2 files changed, 109 insertions(+) > > > > diff --git a/tests/system-common-macros.at b/tests/system-common-macros.at > > index 2a68cd664..22b8885e4 100644 > > --- a/tests/system-common-macros.at > > +++ b/tests/system-common-macros.at > > @@ -378,3 +378,7 @@ m4_define([OVS_CHECK_GITHUB_ACTION], > > # OVS_CHECK_DROP_ACTION() > > m4_define([OVS_CHECK_DROP_ACTION], > > [AT_SKIP_IF([! grep -q "Datapath supports explicit drop action" > > ovs-vswitchd.log])]) > > + > > +# OVS_CHECK_EMIT_SAMPLE() > > +m4_define([OVS_CHECK_EMIT_SAMPLE], > > +[AT_SKIP_IF([! grep -q "Datapath supports emit_sample" > > ovs-vswitchd.log])]) > > diff --git a/tests/system-traffic.at b/tests/system-traffic.at > > index bd7647cbe..babc56b56 100644 > > --- a/tests/system-traffic.at > > +++ b/tests/system-traffic.at > > @@ -8977,3 +8977,108 @@ OVS_WAIT_UNTIL([cat p2.pcap | grep -E "0x0050: > > * * *5002 *2000 *b85e *00 > > > > OVS_TRAFFIC_VSWITCHD_STOP > > AT_CLEANUP > > + > > +AT_SETUP([emit_sample]) > > +OVS_TRAFFIC_VSWITCHD_START() > > +OVS_CHECK_EMIT_SAMPLE() > > + > > +ADD_NAMESPACES(at_ns0, at_ns1) > > +NS_CHECK_EXEC([at_ns0], [sysctl -w net.ipv6.conf.all.disable_ipv6=1], [0], > > [ignore]) > > +NS_CHECK_EXEC([at_ns1], [sysctl -w net.ipv6.conf.all.disable_ipv6=1], [0], > > [ignore]) > > + > > +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", "e4:11:22:33:44:55") > > +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", "e4:11:22:33:44:66") > > + > > +NS_CHECK_EXEC([at_ns0], [arp -s 10.1.1.2 e4:11:22:33:44:66]) > > +NS_CHECK_EXEC([at_ns1], [arp -s 10.1.1.1 e4:11:22:33:44:55]) > > + > > + > > +AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg > > ofproto_dpif_upcall:dbg]) > > + > > +AT_CHECK([ovs-vsctl -- --id=@br0 get Bridge br0 \ > > +-- --id=@ipfix create IPFIX targets=\"127.0.0.1:4739\" > > \ > > +-- create Flow_Sample_Collector_Set id=1 bridge=@br0 > > ipfix=@ipfix, local_sample_group=10 \ > > +-- create Flow_Sample_Collector_Set id=2 bridge=@br0 > > ipfix=@ipfix, local_sample_group=12], > > + [0], [ignore]) > > + > > +AT_DATA([flows.txt], [dnl > > +in_port=ovs-p0,ip > > actions=sample(probability=65535,collector_set_id=1,obs_domain_id=1431655765,obs_point_id=1717986918),output(port=ovs-p1,max_len=100) > > +in_port=ovs-p1,ip > > actions=sample(probability=65535,collector_set_id=2,obs_domain_id=2290649224,obs_point_id=2576980377),output(port=ovs-p0,max_len=100) > > +]) > > + > > +AT_CHECK([ovs-ofctl add-flows br0 flows.txt]) > > + > > +OVS_DAEMONIZE([ovstest test-psample > psample.out], [psample1.pid]) > > + > > +NS_CHECK_EXEC([at_ns0], [ping -q -c 1 10.1.1.2 | FORMAT_PING], [0], [dnl > > +1 packets transmitted, 1 received, 0% packet loss, time 0ms > > +]) > > + > > +m4_define([SAMPLE1], [group_id=0xa > > obs_domain=0x,obs_point=0x > > .*icmp.*nw_src=10.1.1.1,nw_dst=10.1.1.2]) > > +m4_define([SAMPLE2], [group_id=0xc > > obs_domain=0x,obs_point=0x > > .*icmp.*nw_src=10.1.1.2,nw_dst=10.1.1.1]) > > +AT_CHECK([grep -E 'SAMPLE1' psample.out >/dev/null]) > > +AT_CHECK([grep -E 'SAMPLE2' psample.out >/dev/null]) > > + > > +AT_CHECK([ovs-ofctl dump-ipfix-flow br0 | sed 's/tx pkts=[[0-9]]*/tx > > pkts=24/' | sed 's/tx errs=[[0-9]]*/tx errs=0/'], [0], [dnl > > Why are we making tx pkts always 24, and are we waiving any potential tx > errors? > Is this something you have seen not being consistent, if so any idea why? > Yes. By the time we request the statistics, IPFIX cache should contain the flows but they might not have been sent yet. When they are sent they will fail (because IPFIX target is localhost and it returns the socket returns an error if the listening socket does not exist) so the number of errors might vary. I can look deeper into making it more consistent but it looked like a quick workaround. > > +NXST_IPFIX_FLOW reply (xid=0x2): 2 ids > > + id 1: flows=1, current flows=0, sampled pkts=1, ipv4 ok=1, ipv6 ok=0, > > tx pkts=24 > > + pkts errs=0, ipv4 errs=0, ipv6 errs=0, tx errs=0 > > + id 2: flows=1, current flows=0, sampled pkts=1, ipv4 ok=1, ipv6 ok=0, > > tx pkts=24 > > + pkts errs=0, ipv4 errs=0, ipv6 errs=0, tx errs=0 > > +]) > > + > > +AT_CHECK([ovs-appctl lsample/show br0], [0], [dnl > > +local sample statistics for bridge "br0": > > +- Collector Set ID: 1 > > + Local Sample Group: 10 > > + Total number of bytes: 98 > > + Total number of packets: 1 > > + > > +- Collector Set ID: 2 > > + Local S
Re: [ovs-dev] [RFC v2 8/9] tests: Add test-psample testing utility.
On Wed, Jun 26, 2024 at 02:14:27PM GMT, Eelco Chaudron wrote: > On 5 Jun 2024, at 22:23, Adrian Moreno wrote: > > > This simple program reads from psample and prints the packets to stdout. > > Some comments below. > > //Eelco > > > Signed-off-by: Adrian Moreno > > --- > > include/linux/automake.mk | 1 + > > include/linux/psample.h | 68 + > > tests/automake.mk | 3 +- > > tests/test-psample.c | 282 ++ > > 4 files changed, 353 insertions(+), 1 deletion(-) > > create mode 100644 include/linux/psample.h > > create mode 100644 tests/test-psample.c > > > > diff --git a/include/linux/automake.mk b/include/linux/automake.mk > > index cdae5eedc..ac306b53c 100644 > > --- a/include/linux/automake.mk > > +++ b/include/linux/automake.mk > > @@ -3,6 +3,7 @@ noinst_HEADERS += \ > > include/linux/netfilter/nf_conntrack_sctp.h \ > > include/linux/openvswitch.h \ > > include/linux/pkt_cls.h \ > > + include/linux/psample.h \ > > include/linux/gen_stats.h \ > > include/linux/tc_act/tc_mpls.h \ > > include/linux/tc_act/tc_pedit.h \ > > diff --git a/include/linux/psample.h b/include/linux/psample.h > > new file mode 100644 > > index 0..d5761b730 > > --- /dev/null > > +++ b/include/linux/psample.h > > @@ -0,0 +1,68 @@ > > +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ > > +#ifndef __LINUX_PSAMPLE_H > > +#define __LINUX_PSAMPLE_H > > + > > +enum { > > + PSAMPLE_ATTR_IIFINDEX, > > + PSAMPLE_ATTR_OIFINDEX, > > + PSAMPLE_ATTR_ORIGSIZE, > > + PSAMPLE_ATTR_SAMPLE_GROUP, > > + PSAMPLE_ATTR_GROUP_SEQ, > > + PSAMPLE_ATTR_SAMPLE_RATE, > > + PSAMPLE_ATTR_DATA, > > + PSAMPLE_ATTR_GROUP_REFCOUNT, > > + PSAMPLE_ATTR_TUNNEL, > > + > > + PSAMPLE_ATTR_PAD, > > + PSAMPLE_ATTR_OUT_TC,/* u16 */ > > + PSAMPLE_ATTR_OUT_TC_OCC,/* u64, bytes */ > > + PSAMPLE_ATTR_LATENCY, /* u64, nanoseconds */ > > + PSAMPLE_ATTR_TIMESTAMP, /* u64, nanoseconds */ > > + PSAMPLE_ATTR_PROTO, /* u16 */ > > + PSAMPLE_ATTR_USER_COOKIE, /* binary, user provided data */ > > + PSAMPLE_ATTR_SAMPLE_PROBABILITY,/* no argument, interpret rate in > > +* PSAMPLE_ATTR_SAMPLE_RATE as a > > +* probability scaled 0 - U32_MAX. > > +*/ > > + > > + __PSAMPLE_ATTR_MAX > > +}; > > + > > +enum psample_command { > > + PSAMPLE_CMD_SAMPLE, > > + PSAMPLE_CMD_GET_GROUP, > > + PSAMPLE_CMD_NEW_GROUP, > > + PSAMPLE_CMD_DEL_GROUP, > > + PSAMPLE_CMD_SAMPLE_FILTER_SET, > > +}; > > + > > +enum psample_tunnel_key_attr { > > + PSAMPLE_TUNNEL_KEY_ATTR_ID, /* be64 Tunnel ID */ > > + PSAMPLE_TUNNEL_KEY_ATTR_IPV4_SRC, /* be32 src IP address. */ > > + PSAMPLE_TUNNEL_KEY_ATTR_IPV4_DST, /* be32 dst IP address. */ > > + PSAMPLE_TUNNEL_KEY_ATTR_TOS,/* u8 Tunnel IP ToS. */ > > + PSAMPLE_TUNNEL_KEY_ATTR_TTL,/* u8 Tunnel IP TTL. */ > > + PSAMPLE_TUNNEL_KEY_ATTR_DONT_FRAGMENT, /* No argument, set DF. */ > > + PSAMPLE_TUNNEL_KEY_ATTR_CSUM, /* No argument. CSUM > > packet. */ > > + PSAMPLE_TUNNEL_KEY_ATTR_OAM,/* No argument. OAM frame. > > */ > > + PSAMPLE_TUNNEL_KEY_ATTR_GENEVE_OPTS,/* Array of Geneve options. > > */ > > + PSAMPLE_TUNNEL_KEY_ATTR_TP_SRC, /* be16 src Transport Port. > > */ > > + PSAMPLE_TUNNEL_KEY_ATTR_TP_DST, /* be16 dst Transport Port. > > */ > > + PSAMPLE_TUNNEL_KEY_ATTR_VXLAN_OPTS, /* Nested VXLAN opts* */ > > + PSAMPLE_TUNNEL_KEY_ATTR_IPV6_SRC, /* struct in6_addr src IPv6 > > address. */ > > + PSAMPLE_TUNNEL_KEY_ATTR_IPV6_DST, /* struct in6_addr dst IPv6 > > address. */ > > + PSAMPLE_TUNNEL_KEY_ATTR_PAD, > > + PSAMPLE_TUNNEL_KEY_ATTR_ERSPAN_OPTS,/* struct erspan_metadata */ > > + PSAMPLE_TUNNEL_KEY_ATTR_IPV4_INFO_BRIDGE, /* No argument. > > IPV4_INFO_BRIDGE mode.*/ > > + __PSAMPLE_TUNNEL_KEY_ATTR_MAX > > +}; > > + > > +/* Can be overridden at runtime by module option */ > > +#define PSAMPLE_ATTR_MAX (__PSAMPLE_ATTR_MAX - 1) > > + > > +#define PSAMPLE_NL_MCGRP_CONFIG_NAME "config" > > +#define PSAMPLE_NL_MCGRP_SAMPLE_NAME "packets" > > +#define PSAMPLE_GENL_NAME "psample" > > +#define PSAMPLE_GENL_VERSION 1 > > + > > +#endif > > diff --git a/tests/automake.mk b/tests/automake.mk > > index 04f48f2d8..edfc2cb33 100644 > > --- a/tests/automake.mk > > +++ b/tests/automake.mk > > @@ -499,7 +499,8 @@ endif > > if LINUX > > tests_ovstest_SOURCES += \ > > tests/test-netlink-conntrack.c \ > > - tests/test-netlink-policy.c > > + tests/test-netlink-policy.c \ > > + tests/test-psample.c > > endif > > > > tests_ovstest_LDADD = lib/libopenvswitch.la > > diff --git a/tests/test-psample.c b/tests/test-psample.c > > new file mode 100644 > > index 0..f04d903
Re: [ovs-dev] [RFC v2 7/9] ofproto-dpif-lsample: Show stats via unixctl.
On Wed, Jun 26, 2024 at 02:13:40PM GMT, Eelco Chaudron wrote: > On 5 Jun 2024, at 22:23, Adrian Moreno wrote: > > > Add a command to dump statistics per exporter. > > Some comments below. > > //Eelco > > > Signed-off-by: Adrian Moreno > > --- > > NEWS | 2 + > > ofproto/ofproto-dpif-lsample.c | 113 + > > ofproto/ofproto-dpif-lsample.h | 1 + > > ofproto/ofproto-dpif.c | 1 + > > 4 files changed, 117 insertions(+) > > > > diff --git a/NEWS b/NEWS > > index 1c05a7120..a443f9fe1 100644 > > --- a/NEWS > > +++ b/NEWS > > @@ -11,6 +11,8 @@ Post-v3.3.0 > > allows samples to be emitted locally (instead of via IPFIX) in a > > datapath-specific manner via the new datapath action called > > "emit_sample". > > Linux kernel datapath is the first to support this feature. > > + A new unixctl command 'lsample/show' shows packet and bytes statistics > > + per local sample exporter. > > > > > > v3.3.0 - 16 Feb 2024 > > diff --git a/ofproto/ofproto-dpif-lsample.c b/ofproto/ofproto-dpif-lsample.c > > index 0c71e354d..e31dabac4 100644 > > --- a/ofproto/ofproto-dpif-lsample.c > > +++ b/ofproto/ofproto-dpif-lsample.c > > @@ -21,7 +21,10 @@ > > #include "dpif.h" > > #include "hash.h" > > #include "ofproto.h" > > +#include "ofproto-dpif.h" > > +#include "openvswitch/dynamic-string.h" > > #include "openvswitch/thread.h" > > +#include "unixctl.h" > > > > /* Dpif local sampling. > > * > > @@ -218,3 +221,113 @@ dpif_lsample_unref(struct dpif_lsample *lsample) > > dpif_lsample_destroy(lsample); > > } > > } > > + > > +static int > > +compare_exporter_list(const void *a_, const void *b_) > > We are not comparing a list here, so we should choose a more appropriate name, > maybe something like; compare_exporter_collector_set_id(), or if you want it > shorter, maybe comp_exporter_collector_id(), use you fantasy (but not to > much ;). > > > +{ > > +const struct lsample_exporter_node *a, *b; > > + > > +a = *(struct lsample_exporter_node **)a_; > > +b = *(struct lsample_exporter_node **)b_; > > Fix space around casting. > Ack. > > + > > +if (a->exporter.options.collector_set_id > > > +b->exporter.options.collector_set_id) { > > +return 1; > > +} > > +if (a->exporter.options.collector_set_id < > > +b->exporter.options.collector_set_id) { > > +return -1; > > +} > > +return 0; > > +} > > + > > +static void > > +lsample_exporter_list(struct dpif_lsample *lsample, > > + struct lsample_exporter_node ***list, > > + size_t *num_exporters) > > +{ > > +struct lsample_exporter_node *node; > > +struct lsample_exporter_node **exporter_list; > > Reverse Christmas tree order. > Ack. > > +size_t k = 0, n; > > + > > +n = cmap_count(&lsample->exporters); > > + > > +exporter_list = xcalloc(n, sizeof *exporter_list); > > + > > +CMAP_FOR_EACH (node, node, &lsample->exporters) { > > +if (k >= n) { > > +break; > > +} > > +exporter_list[k++] = node; > > +} > > + > > +qsort(exporter_list, k, sizeof *exporter_list, compare_exporter_list); > > + > > +*list = exporter_list; > > +*num_exporters = k; > > +} > > + > > +static void > > +lsample_unixctl_show(struct unixctl_conn *conn, int argc OVS_UNUSED, > > + const char *argv[] OVS_UNUSED, void *aux OVS_UNUSED) > > +{ > > +struct lsample_exporter_node **node_list = NULL; > > +struct lsample_exporter_node *node; > > +struct ds ds = DS_EMPTY_INITIALIZER; > > Reverse Christmas tree? Or even better move the *node definition inside the > for loop below. > Ack. > > +const struct ofproto_dpif *ofproto; > > +size_t i, num; > > + > > +ofproto = ofproto_dpif_lookup_by_name(argv[1]); > > +if (!ofproto) { > > +unixctl_command_reply_error(conn, "no such bridge"); > > +return; > > +} > > + > > +if (!ofproto->lsample) { > > +unixctl_command_reply_error(conn, "no local sampling exporters " > > +"configured"); > > unixctl_command_reply_error(conn, > "no local sampling exporters configured"); > > > > +return; > > +} > > + > > +ds_put_format(&ds, "local sample statistics for bridge \"%s\":\n", > > Maybe capital L for Local? > Ack. > > + argv[1]); > > + > > +lsample_exporter_list(ofproto->lsample, &node_list, &num); > > + > > +for (i = 0; i < num; i++) { > > +uint64_t n_bytes; > > +uint64_t n_packets; > > Reverse Christmas tree > Ack. > > + > > +node = node_list[i]; > > + > > +atomic_read_relaxed(&node->exporter.n_packets, &n_packets); > > +atomic_read_relaxed(&node->exporter.n_bytes, &n_bytes); > > + > > +if (i) { > > +ds_put_cstr(&ds, "\n"); > > +} > > + > > +
Re: [ovs-dev] [RFC v2 4/9] vswitchd: Add local sampling to vswitchd schema.
On Thu, Jun 27, 2024 at 03:57:03PM GMT, Eelco Chaudron wrote: > > > On 27 Jun 2024, at 13:37, Adrián Moreno wrote: > > > On Wed, Jun 26, 2024 at 02:11:00PM GMT, Eelco Chaudron wrote: > >> On 5 Jun 2024, at 22:23, Adrian Moreno wrote: > >> > >>> Add as new column in the Flow_Sample_Collector_Set table named > >>> "local_sample_group" which enables this feature. > >>> > >>> Signed-off-by: Adrian Moreno > >>> --- > >>> NEWS | 4 ++ > >>> vswitchd/bridge.c | 78 +++--- > >>> vswitchd/vswitch.ovsschema | 9 - > >>> vswitchd/vswitch.xml | 39 +-- > >>> 4 files changed, 119 insertions(+), 11 deletions(-) > >>> > >>> diff --git a/NEWS b/NEWS > >>> index b92cec532..1c05a7120 100644 > >>> --- a/NEWS > >>> +++ b/NEWS > >>> @@ -7,6 +7,10 @@ Post-v3.3.0 > >>> - The primary development branch has been renamed from 'master' to > >>> 'main'. > >>> The OVS tree remains hosted on GitHub. > >>>https://github.com/openvswitch/ovs.git > >>> + - Local sampling is introduced. It reuses the OpenFlow sample action > >>> and > >>> + allows samples to be emitted locally (instead of via IPFIX) in a > >>> + datapath-specific manner via the new datapath action called > >>> "emit_sample". > >>> + Linux kernel datapath is the first to support this feature. > >>> > >>> > >>> v3.3.0 - 16 Feb 2024 > >>> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c > >>> index 95a65fcdc..cd7dc6646 100644 > >>> --- a/vswitchd/bridge.c > >>> +++ b/vswitchd/bridge.c > >>> @@ -288,6 +288,7 @@ static void bridge_configure_mac_table(struct bridge > >>> *); > >>> static void bridge_configure_mcast_snooping(struct bridge *); > >>> static void bridge_configure_sflow(struct bridge *, int > >>> *sflow_bridge_number); > >>> static void bridge_configure_ipfix(struct bridge *); > >>> +static void bridge_configure_lsample(struct bridge *); > >>> static void bridge_configure_spanning_tree(struct bridge *); > >>> static void bridge_configure_tables(struct bridge *); > >>> static void bridge_configure_dp_desc(struct bridge *); > >>> @@ -989,6 +990,7 @@ bridge_reconfigure(const struct ovsrec_open_vswitch > >>> *ovs_cfg) > >>> bridge_configure_netflow(br); > >>> bridge_configure_sflow(br, &sflow_bridge_number); > >>> bridge_configure_ipfix(br); > >>> +bridge_configure_lsample(br); > >>> bridge_configure_spanning_tree(br); > >>> bridge_configure_tables(br); > >>> bridge_configure_dp_desc(br); > >>> @@ -1537,10 +1539,11 @@ ovsrec_ipfix_is_valid(const struct ovsrec_ipfix > >>> *ipfix) > >>> return ipfix && ipfix->n_targets > 0; > >>> } > >>> > >>> -/* Returns whether a Flow_Sample_Collector_Set row is valid. */ > >>> +/* Returns whether a Flow_Sample_Collector_Set row constains valid IPFIX > >> > >> constains -> 'contains a' > >> > > > > Ack. > > > >>> + * configuration. */ > >>> static bool > >>> -ovsrec_fscs_is_valid(const struct ovsrec_flow_sample_collector_set *fscs, > >>> - const struct bridge *br) > >>> +ovsrec_fscs_is_valid_ipfix(const struct ovsrec_flow_sample_collector_set > >>> *fscs, > >>> + const struct bridge *br) > >>> { > >>> return ovsrec_ipfix_is_valid(fscs->ipfix) && fscs->bridge == br->cfg; > >>> } > >>> @@ -1558,7 +1561,7 @@ bridge_configure_ipfix(struct bridge *br) > >>> const char *virtual_obs_id; > >>> > >>> OVSREC_FLOW_SAMPLE_COLLECTOR_SET_FOR_EACH(fe_cfg, idl) { > >>> -if (ovsrec_fscs_is_valid(fe_cfg, br)) { > >>> +if (ovsrec_fscs_is_valid_ipfix(fe_cfg, br)) { > >>> n_fe_opts++; > >>> } > >>> } > >>> @@ -1621,7 +1624,7 @@ bridge_configure_ipfix(struct bridge *br) > >>> fe_opts = xcalloc(n_fe_opts, sizeof *fe_opts); > >>> opts = fe_opts; > >>> OVSREC_FLOW_SAMPLE_COLLECTOR_SET_FOR_EACH(fe_cfg, idl) { > >>> -if (ovsrec_fscs_is_valid(fe_cfg, br)) { > >>> +if (ovsrec_fscs_is_valid_ipfix(fe_cfg, br)) { > >>> opts->collector_set_id = fe_cfg->id; > >>> sset_init(&opts->targets); > >>> sset_add_array(&opts->targets, fe_cfg->ipfix->targets, > >>> @@ -1667,6 +1670,71 @@ bridge_configure_ipfix(struct bridge *br) > >>> } > >>> } > >>> > >>> +/* Returns whether a Flow_Sample_Collector_Set row contains valid local > >>> + * sampling configuraiton. */ > >> > >> ...row contains a valid local... > >>++ > >> > >> configuraiton -> configuration > > > > Ack. > > > >> > >>> +static bool > >>> +ovsrec_fscs_is_valid_local(const struct ovsrec_flow_sample_collector_set > >>> *fscs, > >>> + const struct bridge *br) > >>> +{ > >>> +return fscs->local_sample_group && fscs->n_local_sample_group == 1 && > >>> + fscs->bridge == br->cfg; > >>> +} > >>> + > >>> +/* Set local sample configuration on 'br'. */ > >>> +sta
Re: [ovs-dev] [RFC v2 1/9] odp-util: Add support OVS_ACTION_ATTR_EMIT_SAMPLE.
On Thu, Jun 27, 2024 at 03:42:38PM GMT, Eelco Chaudron wrote: > > > On 27 Jun 2024, at 12:45, Adrián Moreno wrote: > > > On Wed, Jun 26, 2024 at 02:08:27PM GMT, Eelco Chaudron wrote: > >> On 5 Jun 2024, at 22:23, Adrian Moreno wrote: > >> > >>> Add support for parsing and formatting the new action. > >>> > >>> Also, flag OVS_ACTION_ATTR_SAMPLE as requiring datapath assistance if it > >>> contains a nested OVS_ACTION_ATTR_EMIT_SAMPLE. The reason is that the > >>> sampling rate form the parent "sample" is made available to the nested > >>> "emit_sample" by the kernel. > >> > >> Hi Adrian, > >> > >> Thanks for this series! This email kicks off my review of the series, > >> see comments below and in the other patches. > >> > >> Cheers, > >> > >> Eelco > >> > >>> Signed-off-by: Adrian Moreno > >>> --- > >>> include/linux/openvswitch.h | 25 + > >>> lib/dpif-netdev.c| 1 + > >>> lib/dpif.c | 1 + > >>> lib/odp-execute.c| 25 - > >>> lib/odp-util.c | 103 +++ > >>> lib/odp-util.h | 3 + > >>> ofproto/ofproto-dpif-ipfix.c | 1 + > >>> ofproto/ofproto-dpif-sflow.c | 1 + > >>> python/ovs/flow/odp.py | 8 +++ > >>> tests/odp.at | 16 ++ > >>> 10 files changed, 183 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h > >>> index d9fb991ef..b4e0647bd 100644 > >>> --- a/include/linux/openvswitch.h > >>> +++ b/include/linux/openvswitch.h > >>> @@ -992,6 +992,30 @@ struct check_pkt_len_arg { > >>> }; > >>> #endif > >>> > >>> +#define OVS_EMIT_SAMPLE_COOKIE_MAX_SIZE 16 > >>> +/** > >>> + * enum ovs_emit_sample_attr - Attributes for > >>> %OVS_ACTION_ATTR_EMIT_SAMPLE > >>> + * action. > >>> + * > >>> + * @OVS_EMIT_SAMPLE_ATTR_GROUP: 32-bit number to identify the source of > >>> the > >>> + * sample. > >>> + * @OVS_EMIT_SAMPLE_ATTR_COOKIE: A variable-length binary cookie that > >>> contains > >>> + * user-defined metadata. The maximum length is 16 bytes. > >>> + * > >>> + * Sends the packet to the psample multicast group with the specified > >>> group and > >>> + * cookie. It is possible to combine this action with the > >>> + * %OVS_ACTION_ATTR_TRUNC to limit the size of the packet being emitted. > >> > >> I'll start the discussion again on the naming. The name "emit_sample()" > >> does not seem appropriate. This function's primary role is to copy the > >> packet and send it to a local collector, which varies depending on the > >> datapath. For the kernel datapath, this collector is psample, while for > >> userspace, it will likely be some kind of probe. This action is distinct > >> from the sample() action by design; it is a standalone action that can > >> be combined with others. > >> > >> Furthermore, the action itself does not involve taking a sample; it > >> consistently pushes the packet to the local collector. Therefore, I > >> suggest renaming "emit_sample()" to "emit_local()". This same goes for > >> all the derivative ATTR naming. > >> > > > > Let's discuss this in the kernel ml. > > > >>> + */ > >>> +enum ovs_emit_sample_attr { > >>> + OVS_EMIT_SAMPLE_ATTR_UNPSEC, > >>> + OVS_EMIT_SAMPLE_ATTR_GROUP, /* u32 number. */ > >>> + OVS_EMIT_SAMPLE_ATTR_COOKIE,/* Optional, user specified cookie. */ > >> > >> Fix comment alignment? Maybe also change the order to be alphabetical? > >> > > > > What do you mean by comment alignment? > > Well you are using tabs to index which might result in it looking like this: > > + OVS_EMIT_SAMPLE_ATTR_GROUP, /* u32 number. */ > + OVS_EMIT_SAMPLE_ATTR_COOKIE,/* Optional, user specified cookie. */ > > (Most) of the other comments use spaces after the definition. > > >>> + __OVS_EMIT_SAMPLE_ATTR_MAX > >>> +}; > >>> + > >>> +#define OVS_EMIT_SAMPLE_ATTR_MAX (__OVS_EMIT_SAMPLE_ATTR_MAX - 1) > >>> + > >>> + > >>> /** > >>> * enum ovs_action_attr - Action types. > >>> * > >>> @@ -1087,6 +,7 @@ enum ovs_action_attr { > >>> OVS_ACTION_ATTR_ADD_MPLS, /* struct ovs_action_add_mpls. */ > >>> OVS_ACTION_ATTR_DEC_TTL, /* Nested OVS_DEC_TTL_ATTR_*. */ > >>> OVS_ACTION_ATTR_DROP, /* u32 xlate_error. */ > >>> + OVS_ACTION_ATTR_EMIT_SAMPLE, /* Nested OVS_EMIT_SAMPLE_ATTR_*. */ > >>> > >>> #ifndef __KERNEL__ > >>> OVS_ACTION_ATTR_TUNNEL_PUSH, /* struct ovs_action_push_tnl*/ > >>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > >>> index c7f9e1490..c1171890c 100644 > >>> --- a/lib/dpif-netdev.c > >>> +++ b/lib/dpif-netdev.c > >>> @@ -9519,6 +9519,7 @@ dp_execute_cb(void *aux_, struct dp_packet_batch > >>> *packets_, > >>> case OVS_ACTION_ATTR_DROP: > >>> case OVS_ACTION_ATTR_ADD_MPLS: > >>> case OVS_ACTION_ATTR_DEC_TTL: > >>> +case OVS_ACTION_ATTR_EMIT_SAMPLE: > >>> case __OVS_ACTION_ATTR_MAX: > >>> OVS_NOT_REACHED(); > >>> } > >>> diff --git a/lib/dpif.c b/lib/dpif.c > >>> index 23e
Re: [ovs-dev] RFC OVN: fabric integration
On 6/28/24 11:03, Ales Musil wrote: > Hi Frode, > > looking forward to the RFC. AFAIU it means that the routes would be exposed on > LR, more specifically GW router. Would it make sense to allow this behavior > for > provider networks (LS with localnet port)? In that case we could advertise > chassis-local information from logical routers attached to BGP-enabled > switches. E.g.: FIPs, LBs. It would cover the use case for distributed > routers. To achieve that we should have BGP peers for each chassis that the LS > is local on. I haven't read the whole thing yet, but can we, please, stop adding routing features to switches? :) If someone wants routing, they should use a router, IMO. Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next v6 10/10] selftests: openvswitch: add psample test
Add a test to verify sampling packets via psample works. In order to do that, create a subcommand in ovs-dpctl.py to listen to on the psample multicast group and print samples. Signed-off-by: Adrian Moreno --- .../selftests/net/openvswitch/openvswitch.sh | 115 +- .../selftests/net/openvswitch/ovs-dpctl.py| 73 ++- 2 files changed, 182 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh b/tools/testing/selftests/net/openvswitch/openvswitch.sh index 15bca0708717..2ee770281a08 100755 --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh @@ -20,7 +20,8 @@ tests=" nat_related_v4 ip4-nat-related: ICMP related matches work with SNAT netlink_checks ovsnl: validate netlink attrs and settings upcall_interfaces ovs: test the upcall interfaces - drop_reason drop: test drop reasons are emitted" + drop_reason drop: test drop reasons are emitted + psample psample: Sampling packets with psample" info() { [ $VERBOSE = 0 ] || echo $* @@ -102,12 +103,21 @@ ovs_netns_spawn_daemon() { shift netns=$1 shift - info "spawning cmd: $*" - ip netns exec $netns $* >> $ovs_dir/stdout 2>> $ovs_dir/stderr & + if [ "$netns" == "_default" ]; then + $* >> $ovs_dir/stdout 2>> $ovs_dir/stderr & + else + ip netns exec $netns $* >> $ovs_dir/stdout 2>> $ovs_dir/stderr & + fi pid=$! ovs_sbx "$sbx" on_exit "kill -TERM $pid 2>/dev/null" } +ovs_spawn_daemon() { + sbx=$1 + shift + ovs_netns_spawn_daemon $sbx "_default" $* +} + ovs_add_netns_and_veths () { info "Adding netns attached: sbx:$1 dp:$2 {$3, $4, $5}" ovs_sbx "$1" ip netns add "$3" || return 1 @@ -170,6 +180,19 @@ ovs_drop_reason_count() return `echo "$perf_output" | grep "$pattern" | wc -l` } +ovs_test_flow_fails () { + ERR_MSG="Flow actions may not be safe on all matching packets" + + PRE_TEST=$(dmesg | grep -c "${ERR_MSG}") + ovs_add_flow $@ &> /dev/null $@ && return 1 + POST_TEST=$(dmesg | grep -c "${ERR_MSG}") + + if [ "$PRE_TEST" == "$POST_TEST" ]; then + return 1 + fi + return 0 +} + usage() { echo echo "$0 [OPTIONS] [TEST]..." @@ -184,6 +207,92 @@ usage() { exit 1 } + +# psample test +# - use psample to observe packets +test_psample() { + sbx_add "test_psample" || return $? + + # Add a datapath with per-vport dispatching. + ovs_add_dp "test_psample" psample -V 2:1 || return 1 + + info "create namespaces" + ovs_add_netns_and_veths "test_psample" "psample" \ + client c0 c1 172.31.110.10/24 -u || return 1 + ovs_add_netns_and_veths "test_psample" "psample" \ + server s0 s1 172.31.110.20/24 -u || return 1 + + # Check if psample actions can be configured. + ovs_add_flow "test_psample" psample \ + 'in_port(1),eth(),eth_type(0x0806),arp()' 'psample(group=1)' + if [ $? == 1 ]; then + info "no support for psample - skipping" + ovs_exit_sig + return $ksft_skip + fi + + ovs_del_flows "test_psample" psample + + # Test action verification. + OLDIFS=$IFS + IFS='*' + min_key='in_port(1),eth(),eth_type(0x0800),ipv4()' + for testcase in \ + "cookie to large"*"psample(group=1,cookie=1615141312111009080706050403020100)" \ + "no group with cookie"*"psample(cookie=abcd)" \ + "no group"*"sample()"; + do + set -- $testcase; + ovs_test_flow_fails "test_psample" psample $min_key $2 + if [ $? == 1 ]; then + info "failed - $1" + return 1 + fi + done + IFS=$OLDIFS + + ovs_del_flows "test_psample" psample + # Allow ARP + ovs_add_flow "test_psample" psample \ + 'in_port(1),eth(),eth_type(0x0806),arp()' '2' || return 1 + ovs_add_flow "test_psample" psample \ + 'in_port(2),eth(),eth_type(0x0806),arp()' '1' || return 1 + + # Sample first 14 bytes of all traffic. + ovs_add_flow "test_psample" psample \ + "in_port(1),eth(),eth_type(0x0800),ipv4()" \ +"trunc(14),psample(group=1,cookie=c0ffee),2" + + # Sample all traffic. In this case, use a sample() action with both + # psample and an upcall emulating simultaneous local sampling and + # sFlow / IPFIX. + nlpid=$(grep -E "listening on upcall packet handler" \ +$ovs_dir/s0.out | cut -d ":" -f 2 | tr -d ' ') + + ovs_add_flow "tes
[ovs-dev] [PATCH net-next v6 09/10] selftests: openvswitch: parse trunc action
The trunc action was supported decode-able but not parse-able. Add support for parsing the action string. Reviewed-by: Aaron Conole Signed-off-by: Adrian Moreno --- .../testing/selftests/net/openvswitch/ovs-dpctl.py | 13 + 1 file changed, 13 insertions(+) diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py index fa73f82639fe..558d12b0d39d 100644 --- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py +++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py @@ -817,6 +817,19 @@ class ovsactions(nla): self["attrs"].append(["OVS_ACTION_ATTR_USERSPACE", uact]) parsed = True +elif parse_starts_block(actstr, "trunc(", False): +parencount += 1 +actstr, val = parse_extract_field( +actstr, +"trunc(", +r"([0-9]+)", +int, +False, +None, +) +self["attrs"].append(["OVS_ACTION_ATTR_TRUNC", val]) +parsed = True + actstr = actstr[strspn(actstr, ", ") :] while parencount > 0: parencount -= 1 -- 2.45.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next v6 08/10] selftests: openvswitch: add userspace parsing
The userspace action lacks parsing support plus it contains a bug in the name of one of its attributes. This patch makes userspace action work. Reviewed-by: Aaron Conole Signed-off-by: Adrian Moreno --- .../selftests/net/openvswitch/ovs-dpctl.py| 24 +-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py index 4dc6ceca7e1e..fa73f82639fe 100644 --- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py +++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py @@ -575,13 +575,27 @@ class ovsactions(nla): print_str += "userdata=" for f in self.get_attr("OVS_USERSPACE_ATTR_USERDATA"): print_str += "%x." % f -if self.get_attr("OVS_USERSPACE_ATTR_TUN_PORT") is not None: +if self.get_attr("OVS_USERSPACE_ATTR_EGRESS_TUN_PORT") is not None: print_str += "egress_tun_port=%d" % self.get_attr( -"OVS_USERSPACE_ATTR_TUN_PORT" +"OVS_USERSPACE_ATTR_EGRESS_TUN_PORT" ) print_str += ")" return print_str +def parse(self, actstr): +attrs_desc = ( +("pid", "OVS_USERSPACE_ATTR_PID", int), +("userdata", "OVS_USERSPACE_ATTR_USERDATA", +lambda x: list(bytearray.fromhex(x))), +("egress_tun_port", "OVS_USERSPACE_ATTR_EGRESS_TUN_PORT", int) +) + +attrs, actstr = parse_attrs(actstr, attrs_desc) +for attr in attrs: +self["attrs"].append(attr) + +return actstr + def dpstr(self, more=False): print_str = "" @@ -797,6 +811,12 @@ class ovsactions(nla): self["attrs"].append(["OVS_ACTION_ATTR_PSAMPLE", psampleact]) parsed = True +elif parse_starts_block(actstr, "userspace(", False): +uact = self.userspace() +actstr = uact.parse(actstr[len("userspace(") : ]) +self["attrs"].append(["OVS_ACTION_ATTR_USERSPACE", uact]) +parsed = True + actstr = actstr[strspn(actstr, ", ") :] while parencount > 0: parencount -= 1 -- 2.45.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next v6 07/10] selftests: openvswitch: add psample action
Add sample and psample action support to ovs-dpctl.py. Refactor common attribute parsing logic into an external function. Signed-off-by: Adrian Moreno --- .../selftests/net/openvswitch/ovs-dpctl.py| 162 +- 1 file changed, 161 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py index 9f8dec2f6539..4dc6ceca7e1e 100644 --- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py +++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py @@ -8,6 +8,7 @@ import argparse import errno import ipaddress import logging +import math import multiprocessing import re import struct @@ -58,6 +59,7 @@ OVS_FLOW_CMD_DEL = 2 OVS_FLOW_CMD_GET = 3 OVS_FLOW_CMD_SET = 4 +UINT32_MAX = 0x def macstr(mac): outstr = ":".join(["%02X" % i for i in mac]) @@ -267,6 +269,75 @@ def parse_extract_field( return str_skipped, data +def parse_attrs(actstr, attr_desc): +"""Parses the given action string and returns a list of netlink +attributes based on a list of attribute descriptions. + +Each element in the attribute description list is a tuple such as: +(name, attr_name, parse_func) +where: +name: is the string representing the attribute +attr_name: is the name of the attribute as defined in the uAPI. +parse_func: is a callable accepting a string and returning either +a single object (the parsed attribute value) or a tuple of +two values (the parsed attribute value and the remaining string) + +Returns a list of attributes and the remaining string. +""" +def parse_attr(actstr, key, func): +actstr = actstr[len(key) :] + +if not func: +return None, actstr + +delim = actstr[0] +actstr = actstr[1:] + +if delim == "=": +pos = strcspn(actstr, ",)") +ret = func(actstr[:pos]) +else: +ret = func(actstr) + +if isinstance(ret, tuple): +(datum, actstr) = ret +else: +datum = ret +actstr = actstr[strcspn(actstr, ",)"):] + +if delim == "(": +if not actstr or actstr[0] != ")": +raise ValueError("Action contains unbalanced parentheses") + +actstr = actstr[1:] + +actstr = actstr[strspn(actstr, ", ") :] + +return datum, actstr + +attrs = [] +attr_desc = list(attr_desc) +while actstr and actstr[0] != ")" and attr_desc: +found = False +for i, (key, attr, func) in enumerate(attr_desc): +if actstr.startswith(key): +datum, actstr = parse_attr(actstr, key, func) +attrs.append([attr, datum]) +found = True +del attr_desc[i] + +if not found: +raise ValueError("Unknown attribute: '%s'" % actstr) + +actstr = actstr[strspn(actstr, ", ") :] + +if actstr[0] != ")": +raise ValueError("Action string contains extra garbage or has " + "unbalanced parenthesis: '%s'" % actstr) + +return attrs, actstr[1:] + + class ovs_dp_msg(genlmsg): # include the OVS version # We need a custom header rather than just being able to rely on @@ -285,7 +356,7 @@ class ovsactions(nla): ("OVS_ACTION_ATTR_SET", "none"), ("OVS_ACTION_ATTR_PUSH_VLAN", "none"), ("OVS_ACTION_ATTR_POP_VLAN", "flag"), -("OVS_ACTION_ATTR_SAMPLE", "none"), +("OVS_ACTION_ATTR_SAMPLE", "sample"), ("OVS_ACTION_ATTR_RECIRC", "uint32"), ("OVS_ACTION_ATTR_HASH", "none"), ("OVS_ACTION_ATTR_PUSH_MPLS", "none"), @@ -304,8 +375,85 @@ class ovsactions(nla): ("OVS_ACTION_ATTR_ADD_MPLS", "none"), ("OVS_ACTION_ATTR_DEC_TTL", "none"), ("OVS_ACTION_ATTR_DROP", "uint32"), +("OVS_ACTION_ATTR_PSAMPLE", "psample"), ) +class psample(nla): +nla_flags = NLA_F_NESTED + +nla_map = ( +("OVS_PSAMPLE_ATTR_UNSPEC", "none"), +("OVS_PSAMPLE_ATTR_GROUP", "uint32"), +("OVS_PSAMPLE_ATTR_COOKIE", "array(uint8)"), +) + +def dpstr(self, more=False): +args = "group=%d" % self.get_attr("OVS_PSAMPLE_ATTR_GROUP") + +cookie = self.get_attr("OVS_PSAMPLE_ATTR_COOKIE") +if cookie: +args += ",cookie(%s)" % \ +"".join(format(x, "02x") for x in cookie) + +return "psample(%s)" % args + +def parse(self, actstr): +desc = ( +("group", "OVS_PSAMPLE_ATTR_GROUP", int), +("cookie", "OVS_PSAMPLE_ATTR_COOKIE", +lambda x: list(bytearray.fromhex(x))) +) + +attrs, actstr = parse_attrs(actstr, desc) + +for attr in attrs: +self["attrs"].append(attr) + +return acts
[ovs-dev] [PATCH net-next v6 05/10] net: openvswitch: add psample action
Add support for a new action: psample. This action accepts a u32 group id and a variable-length cookie and uses the psample multicast group to make the packet available for observability. The maximum length of the user-defined cookie is set to 16, same as tc_cookie, to discourage using cookies that will not be offloadable. Signed-off-by: Adrian Moreno --- Documentation/netlink/specs/ovs_flow.yaml | 17 include/uapi/linux/openvswitch.h | 28 ++ net/openvswitch/Kconfig | 1 + net/openvswitch/actions.c | 47 +++ net/openvswitch/flow_netlink.c| 32 ++- 5 files changed, 124 insertions(+), 1 deletion(-) diff --git a/Documentation/netlink/specs/ovs_flow.yaml b/Documentation/netlink/specs/ovs_flow.yaml index 4fdfc6b5cae9..46f5d1cd8a5f 100644 --- a/Documentation/netlink/specs/ovs_flow.yaml +++ b/Documentation/netlink/specs/ovs_flow.yaml @@ -727,6 +727,12 @@ attribute-sets: name: dec-ttl type: nest nested-attributes: dec-ttl-attrs + - +name: psample +type: nest +nested-attributes: psample-attrs +doc: | + Sends a packet sample to psample for external observation. - name: tunnel-key-attrs enum-name: ovs-tunnel-key-attr @@ -938,6 +944,17 @@ attribute-sets: - name: gbp type: u32 + - +name: psample-attrs +enum-name: ovs-psample-attr +name-prefix: ovs-psample-attr- +attributes: + - +name: group +type: u32 + - +name: cookie +type: binary operations: name-prefix: ovs-flow-cmd- diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h index efc82c318fa2..07086759556b 100644 --- a/include/uapi/linux/openvswitch.h +++ b/include/uapi/linux/openvswitch.h @@ -914,6 +914,31 @@ struct check_pkt_len_arg { }; #endif +#define OVS_PSAMPLE_COOKIE_MAX_SIZE 16 +/** + * enum ovs_pample_attr - Attributes for %OVS_ACTION_ATTR_PSAMPLE + * action. + * + * @OVS_PSAMPLE_ATTR_GROUP: 32-bit number to identify the source of the + * sample. + * @OVS_PSAMPLE_ATTR_COOKIE: An optional variable-length binary cookie that + * contains user-defined metadata. The maximum length is + * OVS_PSAMPLE_COOKIE_MAX_SIZE bytes. + * + * Sends the packet to the psample multicast group with the specified group and + * cookie. It is possible to combine this action with the + * %OVS_ACTION_ATTR_TRUNC action to limit the size of the sample. + */ +enum ovs_psample_attr { + OVS_PSAMPLE_ATTR_GROUP = 1, /* u32 number. */ + OVS_PSAMPLE_ATTR_COOKIE,/* Optional, user specified cookie. */ + + /* private: */ + __OVS_PSAMPLE_ATTR_MAX +}; + +#define OVS_PSAMPLE_ATTR_MAX (__OVS_PSAMPLE_ATTR_MAX - 1) + /** * enum ovs_action_attr - Action types. * @@ -966,6 +991,8 @@ struct check_pkt_len_arg { * of l3 tunnel flag in the tun_flags field of OVS_ACTION_ATTR_ADD_MPLS * argument. * @OVS_ACTION_ATTR_DROP: Explicit drop action. + * @OVS_ACTION_ATTR_PSAMPLE: Send a sample of the packet to external observers + * via psample. * * Only a single header can be set with a single %OVS_ACTION_ATTR_SET. Not all * fields within a header are modifiable, e.g. the IPv4 protocol and fragment @@ -1004,6 +1031,7 @@ enum ovs_action_attr { OVS_ACTION_ATTR_ADD_MPLS, /* struct ovs_action_add_mpls. */ OVS_ACTION_ATTR_DEC_TTL, /* Nested OVS_DEC_TTL_ATTR_*. */ OVS_ACTION_ATTR_DROP, /* u32 error code. */ + OVS_ACTION_ATTR_PSAMPLE, /* Nested OVS_PSAMPLE_ATTR_*. */ __OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted * from userspace. */ diff --git a/net/openvswitch/Kconfig b/net/openvswitch/Kconfig index 29a7081858cd..2535f3f9f462 100644 --- a/net/openvswitch/Kconfig +++ b/net/openvswitch/Kconfig @@ -10,6 +10,7 @@ config OPENVSWITCH (NF_CONNTRACK && ((!NF_DEFRAG_IPV6 || NF_DEFRAG_IPV6) && \ (!NF_NAT || NF_NAT) && \ (!NETFILTER_CONNCOUNT || NETFILTER_CONNCOUNT))) + depends on PSAMPLE || !PSAMPLE select LIBCRC32C select MPLS select NET_MPLS_GSO diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index 964225580824..a035b7e677dd 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -24,6 +24,11 @@ #include #include #include + +#if IS_ENABLED(CONFIG_PSAMPLE) +#include +#endif + #include #include "datapath.h" @@ -1299,6 +1304,39 @@ static int execute_dec_ttl(struct sk_buff *skb, struct sw_flow_key *key) return 0; } +#if IS_ENABLED(CONFIG_PSAMPLE) +static void execute_psample(struct datapath *dp, struct sk_buff *skb, + const struct nlattr *attr) +{ + struct psample_group psample_group = {}; + struct psample_metadata md = {}; +
[ovs-dev] [PATCH net-next v6 03/10] net: psample: skip packet copy if no listeners
If nobody is listening on the multicast group, generating the sample, which involves copying packet data, seems completely unnecessary. Return fast in this case. Acked-by: Eelco Chaudron Reviewed-by: Ido Schimmel Reviewed-by: Simon Horman Signed-off-by: Adrian Moreno --- net/psample/psample.c | 4 1 file changed, 4 insertions(+) diff --git a/net/psample/psample.c b/net/psample/psample.c index b37488f426bc..1c76f3e48dcd 100644 --- a/net/psample/psample.c +++ b/net/psample/psample.c @@ -376,6 +376,10 @@ void psample_sample_packet(struct psample_group *group, struct sk_buff *skb, void *data; int ret; + if (!genl_has_listeners(&psample_nl_family, group->net, + PSAMPLE_NL_MCGRP_SAMPLE)) + return; + meta_len = (in_ifindex ? nla_total_size(sizeof(u16)) : 0) + (out_ifindex ? nla_total_size(sizeof(u16)) : 0) + (md->out_tc_valid ? nla_total_size(sizeof(u16)) : 0) + -- 2.45.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next v6 06/10] net: openvswitch: store sampling probability in cb.
When a packet sample is observed, the sampling rate that was used is important to estimate the real frequency of such event. Store the probability of the parent sample action in the skb's cb area and use it in psample action to pass it down to psample module. Acked-by: Eelco Chaudron Reviewed-by: Ilya Maximets Signed-off-by: Adrian Moreno --- include/uapi/linux/openvswitch.h | 3 ++- net/openvswitch/actions.c| 20 +--- net/openvswitch/datapath.h | 3 +++ net/openvswitch/vport.c | 1 + 4 files changed, 23 insertions(+), 4 deletions(-) diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h index 07086759556b..8a0cf8060c37 100644 --- a/include/uapi/linux/openvswitch.h +++ b/include/uapi/linux/openvswitch.h @@ -649,7 +649,8 @@ enum ovs_flow_attr { * Actions are passed as nested attributes. * * Executes the specified actions with the given probability on a per-packet - * basis. + * basis. Nested actions will be able to access the probability value of the + * parent @OVS_ACTION_ATTR_SAMPLE. */ enum ovs_sample_attr { OVS_SAMPLE_ATTR_UNSPEC, diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index a035b7e677dd..34af6bce4085 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -1048,12 +1048,15 @@ static int sample(struct datapath *dp, struct sk_buff *skb, struct nlattr *sample_arg; int rem = nla_len(attr); const struct sample_arg *arg; + u32 init_probability; bool clone_flow_key; + int err; /* The first action is always 'OVS_SAMPLE_ATTR_ARG'. */ sample_arg = nla_data(attr); arg = nla_data(sample_arg); actions = nla_next(sample_arg, &rem); + init_probability = OVS_CB(skb)->probability; if ((arg->probability != U32_MAX) && (!arg->probability || get_random_u32() > arg->probability)) { @@ -1062,9 +1065,16 @@ static int sample(struct datapath *dp, struct sk_buff *skb, return 0; } + OVS_CB(skb)->probability = arg->probability; + clone_flow_key = !arg->exec; - return clone_execute(dp, skb, key, 0, actions, rem, last, -clone_flow_key); + err = clone_execute(dp, skb, key, 0, actions, rem, last, + clone_flow_key); + + if (!last) + OVS_CB(skb)->probability = init_probability; + + return err; } /* When 'last' is true, clone() should always consume the 'skb'. @@ -1311,6 +1321,7 @@ static void execute_psample(struct datapath *dp, struct sk_buff *skb, struct psample_group psample_group = {}; struct psample_metadata md = {}; const struct nlattr *a; + u32 rate; int rem; nla_for_each_attr(a, nla_data(attr), nla_len(attr), rem) { @@ -1329,8 +1340,11 @@ static void execute_psample(struct datapath *dp, struct sk_buff *skb, psample_group.net = ovs_dp_get_net(dp); md.in_ifindex = OVS_CB(skb)->input_vport->dev->ifindex; md.trunc_size = skb->len - OVS_CB(skb)->cutlen; + md.rate_as_probability = 1; + + rate = OVS_CB(skb)->probability ? OVS_CB(skb)->probability : U32_MAX; - psample_sample_packet(&psample_group, skb, 0, &md); + psample_sample_packet(&psample_group, skb, rate, &md); } #else static inline void execute_psample(struct datapath *dp, struct sk_buff *skb, diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h index 0cd29971a907..9ca6231ea647 100644 --- a/net/openvswitch/datapath.h +++ b/net/openvswitch/datapath.h @@ -115,12 +115,15 @@ struct datapath { * fragmented. * @acts_origlen: The netlink size of the flow actions applied to this skb. * @cutlen: The number of bytes from the packet end to be removed. + * @probability: The sampling probability that was applied to this skb; 0 means + * no sampling has occurred; U32_MAX means 100% probability. */ struct ovs_skb_cb { struct vport*input_vport; u16 mru; u16 acts_origlen; u32 cutlen; + u32 probability; }; #define OVS_CB(skb) ((struct ovs_skb_cb *)(skb)->cb) diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c index 972ae01a70f7..8732f6e51ae5 100644 --- a/net/openvswitch/vport.c +++ b/net/openvswitch/vport.c @@ -500,6 +500,7 @@ int ovs_vport_receive(struct vport *vport, struct sk_buff *skb, OVS_CB(skb)->input_vport = vport; OVS_CB(skb)->mru = 0; OVS_CB(skb)->cutlen = 0; + OVS_CB(skb)->probability = 0; if (unlikely(dev_net(skb->dev) != ovs_dp_get_net(vport->dp))) { u32 mark; -- 2.45.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next v6 04/10] net: psample: allow using rate as probability
Although not explicitly documented in the psample module itself, the definition of PSAMPLE_ATTR_SAMPLE_RATE seems inherited from act_sample. Quoting tc-sample(8): "RATE of 100 will lead to an average of one sampled packet out of every 100 observed." With this semantics, the rates that we can express with an unsigned 32-bits number are very unevenly distributed and concentrated towards "sampling few packets". For example, we can express a probability of 2.32E-8% but we cannot express anything between 100% and 50%. For sampling applications that are capable of sampling a decent amount of packets, this sampling rate semantics is not very useful. Add a new flag to the uAPI that indicates that the sampling rate is expressed in scaled probability, this is: - 0 is 0% probability, no packets get sampled. - U32_MAX is 100% probability, all packets get sampled. Acked-by: Eelco Chaudron Reviewed-by: Ido Schimmel Signed-off-by: Adrian Moreno --- include/net/psample.h| 3 ++- include/uapi/linux/psample.h | 10 +- net/psample/psample.c| 3 +++ 3 files changed, 14 insertions(+), 2 deletions(-) diff --git a/include/net/psample.h b/include/net/psample.h index 2ac71260a546..c52e9ebd88dd 100644 --- a/include/net/psample.h +++ b/include/net/psample.h @@ -24,7 +24,8 @@ struct psample_metadata { u8 out_tc_valid:1, out_tc_occ_valid:1, latency_valid:1, - unused:5; + rate_as_probability:1, + unused:4; const u8 *user_cookie; u32 user_cookie_len; }; diff --git a/include/uapi/linux/psample.h b/include/uapi/linux/psample.h index e80637e1d97b..b765f0e81f20 100644 --- a/include/uapi/linux/psample.h +++ b/include/uapi/linux/psample.h @@ -8,7 +8,11 @@ enum { PSAMPLE_ATTR_ORIGSIZE, PSAMPLE_ATTR_SAMPLE_GROUP, PSAMPLE_ATTR_GROUP_SEQ, - PSAMPLE_ATTR_SAMPLE_RATE, + PSAMPLE_ATTR_SAMPLE_RATE, /* u32, ratio between observed and +* sampled packets or scaled probability +* if PSAMPLE_ATTR_SAMPLE_PROBABILITY +* is set. +*/ PSAMPLE_ATTR_DATA, PSAMPLE_ATTR_GROUP_REFCOUNT, PSAMPLE_ATTR_TUNNEL, @@ -20,6 +24,10 @@ enum { PSAMPLE_ATTR_TIMESTAMP, /* u64, nanoseconds */ PSAMPLE_ATTR_PROTO, /* u16 */ PSAMPLE_ATTR_USER_COOKIE, /* binary, user provided data */ + PSAMPLE_ATTR_SAMPLE_PROBABILITY,/* no argument, interpret rate in +* PSAMPLE_ATTR_SAMPLE_RATE as a +* probability scaled 0 - U32_MAX. +*/ __PSAMPLE_ATTR_MAX }; diff --git a/net/psample/psample.c b/net/psample/psample.c index 1c76f3e48dcd..f48b5b9cd409 100644 --- a/net/psample/psample.c +++ b/net/psample/psample.c @@ -497,6 +497,9 @@ void psample_sample_packet(struct psample_group *group, struct sk_buff *skb, md->user_cookie)) goto error; + if (md->rate_as_probability) + nla_put_flag(skb, PSAMPLE_ATTR_SAMPLE_PROBABILITY); + genlmsg_end(nl_skb, data); genlmsg_multicast_netns(&psample_nl_family, group->net, nl_skb, 0, PSAMPLE_NL_MCGRP_SAMPLE, GFP_ATOMIC); -- 2.45.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next v6 01/10] net: psample: add user cookie
Add a user cookie to the sample metadata so that sample emitters can provide more contextual information to samples. If present, send the user cookie in a new attribute: PSAMPLE_ATTR_USER_COOKIE. Acked-by: Eelco Chaudron Reviewed-by: Simon Horman Reviewed-by: Ido Schimmel Signed-off-by: Adrian Moreno --- include/net/psample.h| 2 ++ include/uapi/linux/psample.h | 1 + net/psample/psample.c| 9 - 3 files changed, 11 insertions(+), 1 deletion(-) diff --git a/include/net/psample.h b/include/net/psample.h index 0509d2d6be67..2ac71260a546 100644 --- a/include/net/psample.h +++ b/include/net/psample.h @@ -25,6 +25,8 @@ struct psample_metadata { out_tc_occ_valid:1, latency_valid:1, unused:5; + const u8 *user_cookie; + u32 user_cookie_len; }; struct psample_group *psample_group_get(struct net *net, u32 group_num); diff --git a/include/uapi/linux/psample.h b/include/uapi/linux/psample.h index e585db5bf2d2..e80637e1d97b 100644 --- a/include/uapi/linux/psample.h +++ b/include/uapi/linux/psample.h @@ -19,6 +19,7 @@ enum { PSAMPLE_ATTR_LATENCY, /* u64, nanoseconds */ PSAMPLE_ATTR_TIMESTAMP, /* u64, nanoseconds */ PSAMPLE_ATTR_PROTO, /* u16 */ + PSAMPLE_ATTR_USER_COOKIE, /* binary, user provided data */ __PSAMPLE_ATTR_MAX }; diff --git a/net/psample/psample.c b/net/psample/psample.c index a5d9b8446f77..b37488f426bc 100644 --- a/net/psample/psample.c +++ b/net/psample/psample.c @@ -386,7 +386,9 @@ void psample_sample_packet(struct psample_group *group, struct sk_buff *skb, nla_total_size(sizeof(u32)) +/* group_num */ nla_total_size(sizeof(u32)) +/* seq */ nla_total_size_64bit(sizeof(u64)) + /* timestamp */ - nla_total_size(sizeof(u16)); /* protocol */ + nla_total_size(sizeof(u16)) +/* protocol */ + (md->user_cookie_len ? + nla_total_size(md->user_cookie_len) : 0); /* user cookie */ #ifdef CONFIG_INET tun_info = skb_tunnel_info(skb); @@ -486,6 +488,11 @@ void psample_sample_packet(struct psample_group *group, struct sk_buff *skb, } #endif + if (md->user_cookie && md->user_cookie_len && + nla_put(nl_skb, PSAMPLE_ATTR_USER_COOKIE, md->user_cookie_len, + md->user_cookie)) + goto error; + genlmsg_end(nl_skb, data); genlmsg_multicast_netns(&psample_nl_family, group->net, nl_skb, 0, PSAMPLE_NL_MCGRP_SAMPLE, GFP_ATOMIC); -- 2.45.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next v6 02/10] net: sched: act_sample: add action cookie to sample
If the action has a user_cookie, pass it along to the sample so it can be easily identified. Acked-by: Eelco Chaudron Reviewed-by: Ido Schimmel Signed-off-by: Adrian Moreno --- net/sched/act_sample.c | 12 1 file changed, 12 insertions(+) diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c index a69b53d54039..2ceb4d141b71 100644 --- a/net/sched/act_sample.c +++ b/net/sched/act_sample.c @@ -167,7 +167,9 @@ TC_INDIRECT_SCOPE int tcf_sample_act(struct sk_buff *skb, { struct tcf_sample *s = to_sample(a); struct psample_group *psample_group; + u8 cookie_data[TC_COOKIE_MAX_SIZE]; struct psample_metadata md = {}; + struct tc_cookie *user_cookie; int retval; tcf_lastuse_update(&s->tcf_tm); @@ -189,6 +191,16 @@ TC_INDIRECT_SCOPE int tcf_sample_act(struct sk_buff *skb, if (skb_at_tc_ingress(skb) && tcf_sample_dev_ok_push(skb->dev)) skb_push(skb, skb->mac_len); + rcu_read_lock(); + user_cookie = rcu_dereference(a->user_cookie); + if (user_cookie) { + memcpy(cookie_data, user_cookie->data, + user_cookie->len); + md.user_cookie = cookie_data; + md.user_cookie_len = user_cookie->len; + } + rcu_read_unlock(); + md.trunc_size = s->truncate ? s->trunc_size : skb->len; psample_sample_packet(psample_group, skb, s->rate, &md); -- 2.45.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next v6 00/10] net: openvswitch: Add sample multicasting.
** Background ** Currently, OVS supports several packet sampling mechanisms (sFlow, per-bridge IPFIX, per-flow IPFIX). These end up being translated into a userspace action that needs to be handled by ovs-vswitchd's handler threads only to be forwarded to some third party application that will somehow process the sample and provide observability on the datapath. A particularly interesting use-case is controller-driven per-flow IPFIX sampling where the OpenFlow controller can add metadata to samples (via two 32bit integers) and this metadata is then available to the sample-collecting system for correlation. ** Problem ** The fact that sampled traffic share netlink sockets and handler thread time with upcalls, apart from being a performance bottleneck in the sample extraction itself, can severely compromise the datapath, yielding this solution unfit for highly loaded production systems. Users are left with little options other than guessing what sampling rate will be OK for their traffic pattern and system load and dealing with the lost accuracy. Looking at available infrastructure, an obvious candidated would be to use psample. However, it's current state does not help with the use-case at stake because sampled packets do not contain user-defined metadata. ** Proposal ** This series is an attempt to fix this situation by extending the existing psample infrastructure to carry a variable length user-defined cookie. The main existing user of psample is tc's act_sample. It is also extended to forward the action's cookie to psample. Finally, a new OVS action (OVS_SAMPLE_ATTR_EMIT_SAMPLE) is created. It accepts a group and an optional cookie and uses psample to multicast the packet and the metadata. -- v5 -> v6: - Renamed emit_sample -> psample - Addressed unused variable and conditionally compilation of function. v4 -> v5: - Rebased. - Removed lefover enum value and wrapped some long lines in selftests. v3 -> v4: - Rebased. - Addressed Jakub's comment on private and unused nla attributes. v2 -> v3: - Addressed comments from Simon, Aaron and Ilya. - Dropped probability propagation in nested sample actions. - Dropped patch v2's 7/9 in favor of a userspace implementation and consume skb if emit_sample is the last action, same as we do with userspace. - Split ovs-dpctl.py features in independent patches. v1 -> v2: - Create a new action ("emit_sample") rather than reuse existing "sample" one. - Add probability semantics to psample's sampling rate. - Store sampling probability in skb's cb area and use it in emit_sample. - Test combining "emit_sample" with "trunc" - Drop group_id filtering and tracepoint in psample. rfc_v2 -> v1: - Accommodate Ilya's comments. - Split OVS's attribute in two attributes and simplify internal handling of psample arguments. - Extend psample and tc with a user-defined cookie. - Add a tracepoint to psample to facilitate troubleshooting. rfc_v1 -> rfc_v2: - Use psample instead of a new OVS-only multicast group. - Extend psample and tc with a user-defined cookie. Adrian Moreno (10): net: psample: add user cookie net: sched: act_sample: add action cookie to sample net: psample: skip packet copy if no listeners net: psample: allow using rate as probability net: openvswitch: add psample action net: openvswitch: store sampling probability in cb. selftests: openvswitch: add psample action selftests: openvswitch: add userspace parsing selftests: openvswitch: parse trunc action selftests: openvswitch: add psample test Documentation/netlink/specs/ovs_flow.yaml | 17 ++ include/net/psample.h | 5 +- include/uapi/linux/openvswitch.h | 31 +- include/uapi/linux/psample.h | 11 +- net/openvswitch/Kconfig | 1 + net/openvswitch/actions.c | 65 - net/openvswitch/datapath.h| 3 + net/openvswitch/flow_netlink.c| 32 ++- net/openvswitch/vport.c | 1 + net/psample/psample.c | 16 +- net/sched/act_sample.c| 12 + .../selftests/net/openvswitch/openvswitch.sh | 115 +++- .../selftests/net/openvswitch/ovs-dpctl.py| 272 +- 13 files changed, 565 insertions(+), 16 deletions(-) -- 2.45.2 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v6 0/2] Add support to disable VXLAN mode.
Thank you all! regards, Vladislav Odintsov -Original Message- From: dev on behalf of Dumitru Ceara Date: Friday, 28 June 2024 at 12:03 To: Vladislav Odintsov , "d...@openvswitch.org" Subject: Re: [ovs-dev] [PATCH ovn v6 0/2] Add support to disable VXLAN mode. On 6/7/24 15:54, Vladislav Odintsov wrote: > v6: > - Addressed Mark's review comments: > 1. Removed global variable "vxlan_mode" change from "global" engine node. > 2. Configuration knob "disable_vxlan_mode" was renamed to "vxlan_mode" > v5: > - Addressed Ihar's review comments: > 1. fixed errors after incorrect conflicts solving on rebase; > 2. changed VXLAN mode naming to capitalized; > 3. clarified VXLAN mode in ovn-architecture man page. > v4: > - Addressed Dumitru's and Ihar's review comments; > - single patch was split into two: > 1. function call replaced with a global variable `vxlan_mode`; > 2. introduced `disable_vxlan_mode` configuration knob; > - rebased onto latest main branch. > v3: > - Removed accidental ovs submodule change. > v2: > - Added NEWS item. > > Vladislav Odintsov (2): > northd: Make `vxlan_mode` a global variable. > northd: Add support for disabling vxlan mode. > Thanks, Vladislav, Ihar and Ales! I applied this series to main. Best regards, Dumitru ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v2 2/2] tests: Skip memory error in DPDK tests.
On 6/28/24 11:16, David Marchand wrote: > On Thu, Jun 27, 2024 at 10:38 AM Ales Musil wrote: >> >> Add skip for memory error produced by DPDK on ARM with 2MB hugepages >> which isn't harmful [0]. >> >> [0] http://mails.dpdk.org/archives/dev/2024-June/296764.html >> >> Suggested-by: David Marchand >> Signed-off-by: Ales Musil > > I would mention ARM in the commit title (maybe it can be tweaked when > applying), but otherwise the patch lgtm. I changed it to: tests: Skip memory error triggered on ARM in DPDK tests. > Reviewed-by: David Marchand > Thanks, Ales and David! Applied to main. Regards, Dumitru ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v2 1/2] ci: Save some DPDK compilation time.
On 6/28/24 11:15, David Marchand wrote: > On Thu, Jun 27, 2024 at 10:38 AM Ales Musil wrote: >> >> Add more options that will shave some compilation time off DPDK by >> skipping unused parts of the code. >> >> Suggested-by: David Marchand >> Signed-off-by: Ales Musil > > Reviewed-by: David Marchand > Thanks, Ales and David! Applied to main. Regards, Dumitru ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] controller: Add random delay during fdb learning.
> On 28 Jun 2024, at 2:45 PM, Dumitru Ceara wrote: > > !---| > CAUTION: External Email > > |---! > > On 6/28/24 11:02, Dumitru Ceara wrote: > > [...] > >> >> This needs to be "struct fdb *fdb". I fixed it up and applied the patch >> to main. >> > > I also added you to the AUTHORS file, Naveen. I'm not sure why you > weren't there already. I apologize for that, should be fixed now. Thank you, Dumitru. Thanks, Naveen > > Regards, > Dumitru > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ovsdb: raft: Don't forward more than one command to the leader.
On 6/28/24 09:20, Dumitru Ceara wrote: > On 6/27/24 00:02, Ilya Maximets wrote: >> Every transaction has RAFT log prerequisites. Even if transactions >> are not related (because RAFT doesn't actually know what data it is >> handling). When leader writes a new record to a RAFT storage, it is >> getting appended to the log right away and changes current 'eid', >> i.e., changes prerequisites. The leader will not try to write new >> records until the current one is committed, because until then the >> pre-check will be failing. >> >> However, that is different for the follower. Followers do not add >> records to the RAFT log until the leader sends an append request back. >> So, if there are multiple transactions pending on a follower, it will >> create a command for each of them and prerequisites will be set to the >> same values. All these commands will be sent to the leader, but only >> one can succeed at a time, because accepting one command immediately >> changes prerequisites and all other commands become non-applicable. >> So, out of N commands, 1 will succeed and N - 1 will fail. The cluster >> failure is a transient failure, so the follower will re-process all the >> failed transactions and send them again. 1 will succeed and N - 2 will >> fail. And so on, until there are no more transactions. In the end, >> instead of processing N transactions, the follower is performing >> N * (N - 1) / 2 transaction processing iterations. That is consuming >> a huge amount of CPU resources completely unnecessarily. >> >> Since there is no real chance for multiple transactions from the same >> follower to succeed, it's better to not send them in the first place. >> This also eliminates prerequisite mismatch messages on a leader in >> this particular case. >> > > Makes sense! > >> In a test with 30 parallel shell threads executing 12K transactions >> total with separate ovsdb-client calls through the same follower there >> is about 60% performance improvement. The test takes ~100 seconds to >> complete without this change and ~40 seconds with this change applied. >> The new time is very close to what it takes to execute the same test >> through the cluster leader. >> > > Maybe a public link to the test (if possible) would be a nice reference > to have in the future? I'm not sure where to post it, so I'll just leave it here, and I can add a link to this post into the commit message. The script is adding 12K ports across 120 logical switches in a style of ovn-kubernetes (sets up port security, some external IDs and adds to an address set): --- $ cat ../transaction-benchmark-parallel.sh #set -x set -o errexit # Creating 120 logical switches for i in $(seq 120); do ovn-nbctl ls-add ip-10-10-177-$i.us-west-2.compute.internal done IFS=$'\r\n' GLOBIGNORE='*' \ command eval 'uuids=($(ovn-nbctl --bare --columns _uuid list logical_switch | sed "/^$/d"))' for i in $(seq 120); do echo "uuid #${i}: ${uuids[$(($i - 1))]}" done # Creating one addres set as_name="a0123456789012345678" as_uuid=$(ovn-nbctl create address_set name=${as_name}) # Adding ports function add_400() { rm -rf txn_results$1.txt for i in $(seq $1 $(($1 + 399)) ); do echo $i port_name="node-density-----_node-density-$i" namespace="node-density-----" ls_name="ip-10-10-177-$((${i} % 120 + 1)).us-west-2.compute.internal" uuid_name="row____" mac=$(printf '0a:58:0a:9b:%02x:%02x' $((${i} >> 8)) $((${i} & 255))) ip=$(printf "10.11.%d.%d\n" $((${i} / 255)) $((${i} % 255 + 1))) time ovsdb-client transact unix:$(pwd)/sandbox/nb2.ovsdb "[\"OVN_Northbound\", { \"uuid-name\":\"${uuid_name}\", \"row\":{ \"name\":\"${port_name}\", \"options\":[\"map\",[[\"requested-chassis\",\"${ls_name}\"]]], \"addresses\":[\"set\",[\"${mac} ${ip}\"]], \"external_ids\":[\"map\",[[\"pod\",\"true\"],[\"namespace\",\"${namespace}\"]]], \"port_security\":[\"set\",[\"${mac} ${ip}\"]] }, \"op\":\"insert\", \"table\":\"Logical_Switch_Port\" }, { \"where\":[[\"_uuid\",\"==\",[\"uuid\",\"${uuids[$(($i % 120))]}\"]]], \"mutations\":[[\"ports\",\"insert\",[\"set\",[[\"named-uuid\",\"${uuid_name}\"], \"op\":\"mutate\", \"table\":\"Logical_Switch\" }, { \"where\":[[\"_uuid\",\"==\",[\"uuid\",\"${as_uuid}\"]]], \"mutations\":[[\"addresses\",\"insert\",[\"set\",[\"${ip}\", \"op\":\"mutate\", \"table\":\"Address_Set\" }]" >> txn_results$1.txt done echo "done" >> txn_results$1.txt } rm -f txn_results*.txt n=0 for i in $(seq 0 400 11600); do n=$(($n + 1)) add_400 $i & done while [ $(grep 'done' txn_results*.txt | wc -l) != $n ]; do sleep 1 done --- It is not th
Re: [ovs-dev] OVN technical community meeting - June 24th
On 6/25/24 19:03, Frode Nordahl wrote: > Hello, > > On Tue, Jun 25, 2024 at 6:04 PM Dumitru Ceara wrote: >> >> Hi, >> >> Thanks to everyone who attended the OVN meeting yesterday! >> >> The recording is available here: >> https://www.youtube.com/watch?v=bonLhUj2oGg >> >> Transcripts: >> https://drive.google.com/file/d/1ioVXcHiuD-xr5RQTh5OSGbpmhUtIDiO_/view?usp=sharing >> >> Notes: >> https://docs.google.com/document/d/1dG4GwcYOSs4uArPGtOoaP5tH4KCto-GH_C3tIXSnZZ8/ > > Thank you for organizing, it was a good discussion! As promised I also > sent out an e-mail detailing some of the points we spoke about, I > suggest we continue the discussion there [0]. > Awesome, thanks! I see Ales already shared some thoughts there. >> The main discussion topic was BGP integration in OVN and mostly about >> where to run the BGP control plane (e.g., FRR on the side vs an in-tree >> BGP implementation) and how to get packets to it. I'll try to list some >> points here: >> >> Frode suggested a "L2 load balancer" feature to be added to OVN to >> "sniff" BGP control packets reaching a logical router and send them out >> (through a VIF?) to FRR running alongside OVN on the host. >> >> Ilya also provided an alternative of using a new type of logical router >> port, similar to a loopback interface and bind BGP configuration to that >> one. >> >> I had an off-list discussion with Tim Rozet (in CC) from the OpenShift >> networking team in Red Hat and he shared an idea that might be >> interesting because it kind of combines the two above. That is: >> >> - configure a TCP load balancer on the logical router we want FRR to run >> BGP for >> - this load balancer will "balance" (DNAT) traffic to a single backend >> IP which can be any tenant internal IP reserved for FRR to bind to >> - this IP could be configured on a regular OVN VIF that's further >> plugged into the namespace/container/VM where FRR runs >> >> Frode, would this be a good alternative for your use case? > > Using a loopback address is indeed common in many configurations, > however as I see it there are two requirements that conflict with > that. One being the goal of minimizing configuration overhead through > the use of IPv6 LLA ("BGP Unnumbered ''), which inherently are point > to point, the other being support for BGP authentication, which is at > odds with NATing the BGP connection. An indirect blocker for this > would also be how popular BGP implementations do not allow setting a > next hop for routes transmitted over a IPv6 LLA. > > See more detail in [0]. > Ack. >> In any case, if we end up having to add a new feature to explicitly >> handle BGP configuration (the "L2 load balancer" discussed in the >> meeting), I think my preference would be in the end to have an explicit >> "bgp" configuration in the northbound database. We already do that for >> other features: DHCP, IGMP, multicast routing. > > Ack, thank you for stating your preference here! > >> Finally, I'd like to propose a new technical meeting instance in >> approximately 5 weeks. I'm looking at: >> >> Monday, July 29th, 3PM UTC. >> >> I'll wait a few days before sending out the invite in case people raise >> objections. > > Works for me! > I went ahead and sent a new invite for: Date/Time: Monday, July 29h, 3PM UTC Link: https://meet.google.com/zns-gqsd-jdn Agenda: https://docs.google.com/document/d/1dG4GwcYOSs4uArPGtOoaP5tH4KCto-GH_C3tIXSnZZ8/ Best regards, Dumitru ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] datapath-windows: Fix parsing of split buffers in OvsGetTcpHeader.
Hi Alin, llya, I have tried the patch, unfortunately, we hit BSOD after patching the change. So I think it doesn't work. Best regards, Wenying On Fri, Jun 7, 2024 at 8:58 AM Wenying Dong wrote: > Sure, happy to help if we have a patch. > > Best, > Wenying > > On Thu, Jun 6, 2024 at 7:19 PM Alin Serdean wrote: > >> Hi Ilya, >> >> Thanks for the patch. Will try to get a setup and test it. >> >> In theory if we have a new analyze target of a newer visual studio it >> should flag those types of issues. >> >> Will try to send a patch for the aforementioned target. >> >> Wenying can you help in testing the patch? >> >> Thank you, >> Alin. >> -- >> *From:* Ilya Maximets >> *Sent:* Thursday, June 6, 2024 12:52 PM >> *To:* ovs-dev@openvswitch.org >> *Cc:* Alin Serdean ; Wenying Dong ; >> Ilya Maximets >> *Subject:* [PATCH] datapath-windows: Fix parsing of split buffers in >> OvsGetTcpHeader. >> >> NdisGetDataBuffer() is called without providing a buffer to copy packet >> data in case it is not contiguous. So, it fails in some scenarios >> where the packet is handled by the general network stack before OVS >> and headers become split in multiple buffers. >> >> Use existing helpers to retrieve the headers instead, they are using >> OvsGetPacketBytes() which should be able to handle split data. >> >> It might be a bit slower than getting direct pointers that may be >> provided by NdisGetDataBuffer(), but it's better to optimize commonly >> used OvsGetPacketBytes() helper in the future instead of optimizing >> every single caller separately. And we're still copying the TCP >> header anyway. >> >> Fixes: 9726a016d9d6 ("datapath-windows: Implement locking in conntrack >> NAT.") >> Reported-at: https://github.com/openvswitch/ovs-issues/issues/323 >> Signed-off-by: Ilya Maximets >> --- >> >> WARNING: I beleive this code is correct, but I did not test it with real >> traffic, I only verified that it compiles. Should not be applied >> unless someone tests it in an actual Windows setup. >> >> datapath-windows/ovsext/Conntrack.c | 45 ++--- >> 1 file changed, 21 insertions(+), 24 deletions(-) >> >> diff --git a/datapath-windows/ovsext/Conntrack.c >> b/datapath-windows/ovsext/Conntrack.c >> index 39ba5cc10..4649805dd 100644 >> --- a/datapath-windows/ovsext/Conntrack.c >> +++ b/datapath-windows/ovsext/Conntrack.c >> @@ -678,46 +678,43 @@ OvsGetTcpHeader(PNET_BUFFER_LIST nbl, >> VOID *storage, >> UINT32 *tcpPayloadLen) >> { >> -IPHdr *ipHdr; >> -IPv6Hdr *ipv6Hdr; >> -TCPHdr *tcp; >> +IPv6Hdr ipv6HdrStorage; >> +IPHdr ipHdrStorage; >> +const IPv6Hdr *ipv6Hdr; >> +const IPHdr *ipHdr; >> +const TCPHdr *tcp; >> VOID *dest = storage; >> uint16_t ipv6ExtLength = 0; >> >> if (layers->isIPv6) { >> -ipv6Hdr = NdisGetDataBuffer(NET_BUFFER_LIST_FIRST_NB(nbl), >> -layers->l4Offset + sizeof(TCPHdr), >> -NULL, 1, 0); >> +ipv6Hdr = OvsGetPacketBytes(nbl, sizeof *ipv6Hdr, >> +layers->l3Offset, &ipv6HdrStorage); >> if (ipv6Hdr == NULL) { >> return NULL; >> } >> >> -tcp = (TCPHdr *)((PCHAR)ipv6Hdr + layers->l4Offset); >> -ipv6Hdr = (IPv6Hdr *)((PCHAR)ipv6Hdr + layers->l3Offset); >> -if (tcp->doff * 4 >= sizeof *tcp) { >> -NdisMoveMemory(dest, tcp, sizeof(TCPHdr)); >> -ipv6ExtLength = layers->l4Offset - layers->l3Offset - >> sizeof(IPv6Hdr); >> -*tcpPayloadLen = (ntohs(ipv6Hdr->payload_len) - >> ipv6ExtLength - TCP_HDR_LEN(tcp)); >> -return storage; >> +tcp = OvsGetTcp(nbl, layers->l4Offset, dest); >> +if (tcp == NULL) { >> +return NULL; >> } >> + >> +ipv6ExtLength = layers->l4Offset - layers->l3Offset - >> sizeof(IPv6Hdr); >> +*tcpPayloadLen = (ntohs(ipv6Hdr->payload_len) - ipv6ExtLength - >> TCP_HDR_LEN(tcp)); >> } else { >> -ipHdr = NdisGetDataBuffer(NET_BUFFER_LIST_FIRST_NB(nbl), >> - layers->l4Offset + sizeof(TCPHdr), >> - NULL, 1 /*no align*/, 0); >> +ipHdr = OvsGetIp(nbl, layers->l3Offset, &ipHdrStorage); >> if (ipHdr == NULL) { >> return NULL; >> } >> >> -ipHdr = (IPHdr *)((PCHAR)ipHdr + layers->l3Offset); >> -tcp = (TCPHdr *)((PCHAR)ipHdr + ipHdr->ihl * 4); >> - >> -if (tcp->doff * 4 >= sizeof *tcp) { >> -NdisMoveMemory(dest, tcp, sizeof(TCPHdr)); >> -*tcpPayloadLen = TCP_DATA_LENGTH(ipHdr, tcp); >> -return storage; >> +tcp = OvsGetTcp(nbl, layers->l4Offset, dest); >> +if (tcp == NULL) { >> +return NULL; >> } >> + >> +*tcpPayloadLen = TCP_DATA_LENGTH(ipHdr, tcp); >> } >> -r
Re: [ovs-dev] [PATCH ovn v2 2/2] tests: Skip memory error in DPDK tests.
On Thu, Jun 27, 2024 at 10:38 AM Ales Musil wrote: > > Add skip for memory error produced by DPDK on ARM with 2MB hugepages > which isn't harmful [0]. > > [0] http://mails.dpdk.org/archives/dev/2024-June/296764.html > > Suggested-by: David Marchand > Signed-off-by: Ales Musil I would mention ARM in the commit title (maybe it can be tweaked when applying), but otherwise the patch lgtm. Reviewed-by: David Marchand -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v2 1/2] ci: Save some DPDK compilation time.
On Thu, Jun 27, 2024 at 10:38 AM Ales Musil wrote: > > Add more options that will shave some compilation time off DPDK by > skipping unused parts of the code. > > Suggested-by: David Marchand > Signed-off-by: Ales Musil Reviewed-by: David Marchand -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] controller: Add random delay during fdb learning.
On 6/28/24 11:02, Dumitru Ceara wrote: [...] > > This needs to be "struct fdb *fdb". I fixed it up and applied the patch > to main. > I also added you to the AUTHORS file, Naveen. I'm not sure why you weren't there already. I apologize for that, should be fixed now. Regards, Dumitru ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 ovn] Do not reply on unicast arps for IPv4 targets.
On 6/28/24 11:02, Dumitru Ceara wrote: [...] > > And applied the patch to main. > I also added Vasyl to the AUTHORS list. Regards, Dumitru ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] RFC OVN: fabric integration
On Tue, Jun 25, 2024 at 6:52 PM Frode Nordahl wrote: > Hello, > > We are increasingly seeing requests for integration between OVN > powered CMSs/workloads and the fabric. > > As a side note, this is a very interesting topic to me personally, and > I think there are opportunities in the long term for this class of > software to potentially fill a void for more automated and SDN-like > ways of managing the physical network, as previously closed physical > switch hardware is increasingly opening up to programmatic extension > and control. > > While very exciting, it will take a while, both in terms of evolving > how networking teams are organized, in terms of the longevity of > networking gear making entity wide refresh cycles very long, not to > mention gathering agreement and momentum to build such a thing from > the pieces we have. > > > So to be pragmatic, we need to integrate with something that fabric > network engineers are comfortable with, and already available on most > networking hardware, be it closed or open, today. > > The most ubiquitous routing protocol, which has prevailed in modern > layer 3 only data center designs [0], is BGP. > > Use cases: > * Allow fabric to locate and direct traffic to reroutable resources > such as IPv4/IPv6 prefixes, Floating IPs (FIPs) and Load Balancer > VIPs. > > * Use the fabric as a load balancer, announcing the same service IP on > multiple hosts (anycast). > > * Aggregate announcements from stacked CMSes (i.e. Kubernetes running > on top of OpenStack). > > > Requirements: > * Data path must be hardware offloaded, i.e. the next hop address the > peer resolves for announcements of OVN resources needs to be an LRP > IP. > > * Minimize configuration overhead through the use of IPv6 LLAs for > peering routing both IPv4 and IPv6 prefixes over a IPv6 BGP session > [1] (aka. “BGP Unnumbered”). > > * Support ECMP out of the host, i.e. use L3 interfaces potentially > connecting to two different ToRs, instead of bonds, avoiding the > additional complexity of multi-chassis bonds. > > * Support BGP authentication [2][3], i.e. the source, destination > address and ports in packet headers can not be changed. > > * Compatibility >* Running a BGP protocol suite on the host is becoming a thing in > its own right, and our users may have requirements of their own that > influence their choice of implementation. We need to take this into > account and choose integration methods that allow OVN to work with > multiple protocol suite implementations. > >* While we have the power to change and fix issues in popular > routing protocol suites, such as FRR, we need to be able to integrate > with versions that exist on networking hardware out there today. > > Limitations that influence/dictate implementation choices: > * Peering with IPv6 LLAs to meet the configuration overhead > requirement makes the peering relationship point to point. > > * Popular BGP implementations, such as FRR which is used as routing > protocol suite by many ToR open source NOSes, does not accept > sending/receiving IPv6 LLA next hop with the route, so the BGP peer > address will be used as next hop. (There are even mentions of 3rd > party nexthop currently not being supported, but not sure if that is > accurate [4]). > > * As mentioned above, BGP authentication requires IP headers to be > unchanged for the BGP TCP packets going to/from the BGP speaker. > > > Proposed implementation: > > We are in the process of preparing some RFC/PoC patches that at a high > level will: > * Manage a VRF in the system serving two purposes: >* Leaking of route information from ovn-controller to the VRF > routing table, which a routing protocol suite can redistribute subject > to configuration. > >* Provide an IP endpoint that a VRF aware application, such as FRR, > can bind to serving as a BGP speaker on behalf of a OVN LRP IP. > > * We will attach a OVN VIF to this VRF that has data path rules that: > >* Forward required traffic destined to the OVN LRP IP to the VRF. > >* Forward required traffic from the application bound to the VRF as > if it originated from the OVN LRP IP. > > > Hopefully we'll have something up on the list before the end of this > week, which makes it real and easier to reason about for further > discussion. > > > Prior art: > > We recognize that there already exists a third party approach to this > in the ovn-bgp-agent [5] governed by OpenStack, and our goal with this > work is to provide a tighter integration that might cater generically > for other CMSes and use cases. > > > 0: https://datatracker.ietf.org/doc/html/rfc7938 > 1: https://datatracker.ietf.org/doc/html/rfc5549 > 2: https://datatracker.ietf.org/doc/html/rfc2385 > 3: https://datatracker.ietf.org/doc/html/rfc5925 > 4: > https://github.com/FRRouting/frr/blob/cc3519f3e6eaa06f762e0d447202df32df66e129/bgpd/bgp_route.c#L2719 > 5: https://docs.openstack.org/ovn-bgp-agent/latest/ > > Hi Frode, looking forward to the RFC. AFAIU
Re: [ovs-dev] [PATCH ovn v6 0/2] Add support to disable VXLAN mode.
On 6/7/24 15:54, Vladislav Odintsov wrote: > v6: > - Addressed Mark's review comments: > 1. Removed global variable "vxlan_mode" change from "global" engine node. > 2. Configuration knob "disable_vxlan_mode" was renamed to "vxlan_mode" > v5: > - Addressed Ihar's review comments: > 1. fixed errors after incorrect conflicts solving on rebase; > 2. changed VXLAN mode naming to capitalized; > 3. clarified VXLAN mode in ovn-architecture man page. > v4: > - Addressed Dumitru's and Ihar's review comments; > - single patch was split into two: > 1. function call replaced with a global variable `vxlan_mode`; > 2. introduced `disable_vxlan_mode` configuration knob; > - rebased onto latest main branch. > v3: > - Removed accidental ovs submodule change. > v2: > - Added NEWS item. > > Vladislav Odintsov (2): > northd: Make `vxlan_mode` a global variable. > northd: Add support for disabling vxlan mode. > Thanks, Vladislav, Ihar and Ales! I applied this series to main. Best regards, Dumitru ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] utilties: Allow ovn-detrace to run on ovs-ofctl dump-flows output.
On 6/21/24 16:29, Dumitru Ceara wrote: > On 6/3/24 10:23, Ales Musil wrote: >> The ovs-ofctl dump-flows output is slightly different from oproto/trace >> cookie 0xXXX vs. cookie=0xXXX. Update the regex that it also matches >> on the equals case. This allows us to run ovn-detrace against the >> ovs-ofctl dump-flows output. >> >> Also provide simple, partially hardcoded test case for ovn-detrace. >> >> Signed-off-by: Ales Musil >> --- > > Looks good to me, thanks! > > Acked-by: Dumitru Ceara > I went ahead and applied this to main. Thanks, Ales! Regards, Dumitru ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 ovn] Do not reply on unicast arps for IPv4 targets.
On 6/27/24 16:18, Numan Siddique wrote: > On Fri, May 24, 2024 at 4:00 AM Vasyl Saienko wrote: >> >> Reply only if target ethernet address is broadcast, if >> address is specified explicitly do noting to let target >> reply by itself. This technique allows to monitor target >> aliveness with arping. >> >> Closes #239 >> >> Signed-off-by: Vasyl Saienko > > Sorry for the late reviews. > > Acked-by: Numan Siddique > > Numan > >> --- >> northd/northd.c | 11 +-- >> northd/ovn-northd.8.xml | 7 --- >> tests/ovn-northd.at | 4 ++-- >> 3 files changed, 15 insertions(+), 7 deletions(-) >> >> diff --git a/northd/northd.c b/northd/northd.c >> index 37f443e70..e80e1885d 100644 >> --- a/northd/northd.c >> +++ b/northd/northd.c >> @@ -8832,8 +8832,15 @@ build_lswitch_arp_nd_responder_known_ips(struct >> ovn_port *op, >> for (size_t i = 0; i < op->n_lsp_addrs; i++) { >> for (size_t j = 0; j < op->lsp_addrs[i].n_ipv4_addrs; j++) { >> ds_clear(match); >> -ds_put_format(match, "arp.tpa == %s && arp.op == 1", >> -op->lsp_addrs[i].ipv4_addrs[j].addr_s); >> +/* NOTE(vsaienko): Do not reply on unicast ARPs, forward >> + * them to the target to have ability to monitor target >> + * aliveness via ARPs. >> +*/ Thanks, Vasyl and Numan! I changed this to: /* Do not reply on unicast ARPs, forward them to the target * to have ability to monitor target liveness via unicast * ARP requests. */ And applied the patch to main. Best regards, Dumitru ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] controller: Add random delay during fdb learning.
On 6/28/24 09:20, Ales Musil wrote: > On Wed, Jun 26, 2024 at 8:20 AM Naveen Yerramneni < > naveen.yerramn...@nutanix.com> wrote: > >> This change reduces the probability of conflicts when >> multiple nodes tries to add the same FDB entry to SB at the >> same time. When conflict occurs, OVN controller does full >> recompute which is heavy weight on the scale setup. >> >> Signed-off-by: Naveen Yerramneni >> Suggested-by: Dumitru Ceara >> --- >> controller/mac-cache.c | 4 +++- >> controller/mac-cache.h | 4 +++- >> controller/ovn-controller.c | 2 +- >> controller/pinctrl.c| 28 >> tests/ovn.at| 6 +++--- >> 5 files changed, 30 insertions(+), 14 deletions(-) >> >> diff --git a/controller/mac-cache.c b/controller/mac-cache.c >> index d8c4e2aed..de45d7a6a 100644 >> --- a/controller/mac-cache.c >> +++ b/controller/mac-cache.c >> @@ -245,7 +245,8 @@ mac_binding_lookup(struct ovsdb_idl_index >> *sbrec_mac_binding_by_lport_ip, >> >> /* FDB. */ >> struct fdb * >> -fdb_add(struct hmap *map, struct fdb_data fdb_data) { >> +fdb_add(struct hmap *map, struct fdb_data fdb_data, long long timestamp) >> +{ >> struct fdb *fdb = fdb_find(map, &fdb_data); >> >> if (!fdb) { >> @@ -255,6 +256,7 @@ fdb_add(struct hmap *map, struct fdb_data fdb_data) { >> } >> >> fdb->data = fdb_data; >> +fdb->timestamp = timestamp; >> >> return fdb; >> } >> diff --git a/controller/mac-cache.h b/controller/mac-cache.h >> index 3c78f9440..b94e7a1cf 100644 >> --- a/controller/mac-cache.h >> +++ b/controller/mac-cache.h >> @@ -83,6 +83,7 @@ struct fdb { >> struct fdb_data data; >> /* Reference to the SB FDB record. */ >> const struct sbrec_fdb *sbrec_fdb; >> +long long timestamp; >> }; >> >> struct bp_packet_data { >> @@ -149,7 +150,8 @@ mac_binding_lookup(struct ovsdb_idl_index >> *sbrec_mac_binding_by_lport_ip, >> const char *logical_port, const char *ip); >> >> /* FDB. */ >> -struct fdb *fdb_add(struct hmap *map, struct fdb_data fdb_data); >> +struct fdb *fdb_add(struct hmap *map, struct fdb_data fdb_data, >> +long long timestamp); >> >> void fdb_remove(struct hmap *map, struct fdb *fdb); >> >> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c >> index 6874f99a3..e99bdb3ef 100644 >> --- a/controller/ovn-controller.c >> +++ b/controller/ovn-controller.c >> @@ -3322,7 +3322,7 @@ fdb_add_sb(struct mac_cache_data *data, const struct >> sbrec_fdb *sfdb) >> return; >> } >> >> -struct fdb *fdb = fdb_add(&data->fdbs, fdb_data); >> +struct fdb *fdb = fdb_add(&data->fdbs, fdb_data, 0); >> >> fdb->sbrec_fdb = sfdb; >> } >> diff --git a/controller/pinctrl.c b/controller/pinctrl.c >> index f2e382a44..e3ded793b 100644 >> --- a/controller/pinctrl.c >> +++ b/controller/pinctrl.c >> @@ -4739,6 +4739,7 @@ pinctrl_destroy(void) >> /* Buffered "put_mac_binding" operation. */ >> >> #define MAX_MAC_BINDING_DELAY_MSEC 50 >> +#define MAX_FDB_DELAY_MSEC 50 >> #define MAX_MAC_BINDINGS 1000 >> >> /* Contains "struct mac_binding"s. */ >> @@ -8961,21 +8962,30 @@ run_put_fdbs(struct ovsdb_idl_txn *ovnsb_idl_txn, >> return; >> } >> >> +long long now = time_msec(); >> const struct fdb *fdb; This needs to be "struct fdb *fdb". I fixed it up and applied the patch to main. Thanks, Naveen and Ales! Regards, Dumitru ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] Userspace: Software fallback for UDP encapsulated TCP segmentation.
On Thu, Jun 27, 2024 at 4:04 PM David Marchand wrote: > @@ -926,6 +926,17 @@ netdev_send(struct netdev *netdev, int qid, > struct dp_packet_batch *batch, > return netdev_send_tso(netdev, qid, batch, > concurrent_txq); > } > } > +} else if (!(netdev_flags & NETDEV_TX_OFFLOAD_OUTER_UDP_CKSUM)) { > +DP_PACKET_BATCH_FOR_EACH (i, packet, batch) { > +if (!dp_packet_hwol_is_tso(packet) && > +!dp_packet_hwol_is_tunnel_vxlan(packet) && > +!dp_packet_hwol_is_tunnel_geneve(packet)) { > +continue; Erm, less buggy: +if (!dp_packet_hwol_is_tso(packet) || +(!dp_packet_hwol_is_tunnel_vxlan(packet) && + !dp_packet_hwol_is_tunnel_geneve(packet))) { +continue; +} > +} > +if (dp_packet_hwol_is_outer_udp_cksum(packet)) { > +return netdev_send_tso(netdev, qid, batch, > concurrent_txq); > +} > +} > } > } -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v6 2/2] northd: Add support for disabling vxlan mode.
On Fri, Jun 7, 2024 at 3:54 PM Vladislav Odintsov wrote: > Commit [1] introduced a "VXLAN mode" concept. It brought a limitation > for available tunnel IDs because of lack of space in VXLAN VNI. > In VXLAN mode OVN is limited by 4095 datapaths (LRs or non-transit LSs) > and 2047 logical ports per datapath. > > Prior to this patch VXLAN mode was enabled automatically if at least one > chassis had encap of VXLAN type. In scenarios where one want to use > VXLAN only for HW VTEP (RAMP) switch, such limitation makes no sence. > > This patch adds support for explicit disabling of VXLAN mode via > Northbound database. > > 1: https://github.com/ovn-org/ovn/commit/b07f1bc3d068 > > Acked-By: Ihar Hrachyshka > Fixes: b07f1bc3d068 ("Add VXLAN support for non-VTEP datapath bindings") > Signed-off-by: Vladislav Odintsov > --- > NEWS | 4 > northd/en-global-config.c | 8 +++- > northd/northd.c | 10 -- > northd/northd.h | 3 ++- > ovn-architecture.7.xml| 6 ++ > ovn-nb.xml| 10 ++ > tests/ovn-northd.at | 29 + > 7 files changed, 66 insertions(+), 4 deletions(-) > > diff --git a/NEWS b/NEWS > index 3bdc55172..aa1669d9c 100644 > --- a/NEWS > +++ b/NEWS > @@ -31,6 +31,10 @@ Post v24.03.0 > has been renamed to "options:ic-route-denylist" in order to comply > with > inclusive language guidelines. The previous name is still recognized > to > aid with backwards compatibility. > + - Added new global config option NB_Global:options:vxlan_mode to support > +ability to disable "VXLAN mode" to extend available tunnel IDs space > for > +datapaths from 4095 to 16711680. For more details see man ovn-nb(5) > for > +mentioned option. > > OVN v24.03.0 - 01 Mar 2024 > -- > diff --git a/northd/en-global-config.c b/northd/en-global-config.c > index df0f8e58c..784538a14 100644 > --- a/northd/en-global-config.c > +++ b/northd/en-global-config.c > @@ -117,7 +117,8 @@ en_global_config_run(struct engine_node *node , void > *data) > > char *max_tunid = xasprintf("%d", > get_ovn_max_dp_key_local( > -is_vxlan_mode(sbrec_chassis_table))); > +is_vxlan_mode(&nb->options, > + sbrec_chassis_table))); > smap_replace(options, "max_tunid", max_tunid); > free(max_tunid); > > @@ -534,6 +535,11 @@ check_nb_options_out_of_sync(const struct > nbrec_nb_global *nb, > return true; > } > > +if (config_out_of_sync(&nb->options, &config_data->nb_options, > + "vxlan_mode", false)) { > +return true; > +} > + > return false; > } > > diff --git a/northd/northd.c b/northd/northd.c > index 6d118a19a..a4937b472 100644 > --- a/northd/northd.c > +++ b/northd/northd.c > @@ -886,8 +886,13 @@ join_datapaths(const struct > nbrec_logical_switch_table *nbrec_ls_table, > } > > bool > -is_vxlan_mode(const struct sbrec_chassis_table *sbrec_chassis_table) > +is_vxlan_mode(const struct smap *nb_options, > + const struct sbrec_chassis_table *sbrec_chassis_table) > { > +if (!smap_get_bool(nb_options, "vxlan_mode", true)) { > +return false; > +} > + > const struct sbrec_chassis *chassis; > SBREC_CHASSIS_TABLE_FOR_EACH (chassis, sbrec_chassis_table) { > for (int i = 0; i < chassis->n_encaps; i++) { > @@ -17605,7 +17610,8 @@ ovnnb_db_run(struct northd_input *input_data, > use_common_zone = smap_get_bool(input_data->nb_options, > "use_common_zone", > false); > > -vxlan_mode = is_vxlan_mode(input_data->sbrec_chassis_table); > +vxlan_mode = is_vxlan_mode(input_data->nb_options, > + input_data->sbrec_chassis_table); > > build_datapaths(ovnsb_txn, > input_data->nbrec_logical_switch_table, > diff --git a/northd/northd.h b/northd/northd.h > index 987f82954..2f2fdb673 100644 > --- a/northd/northd.h > +++ b/northd/northd.h > @@ -790,7 +790,8 @@ lr_has_multiple_gw_ports(const struct ovn_datapath *od) > } > > bool > -is_vxlan_mode(const struct sbrec_chassis_table *sbrec_chassis_table); > +is_vxlan_mode(const struct smap *nb_options, > + const struct sbrec_chassis_table *sbrec_chassis_table); > > uint32_t get_ovn_max_dp_key_local(bool _vxlan_mode); > > diff --git a/ovn-architecture.7.xml b/ovn-architecture.7.xml > index e32d1a9f7..640944faf 100644 > --- a/ovn-architecture.7.xml > +++ b/ovn-architecture.7.xml > @@ -2920,4 +2920,10 @@ > the future, gateways that do not support encapsulations with large > amounts > of metadata may continue to have a reduced feature set. > > + > +VXLAN mode is recommended to be disabled if VXLAN encap > at > +hypervisors is needed only to support HW VTEP L2 Gateway > f
Re: [ovs-dev] [PATCH ovn v6 1/2] northd: Make `vxlan_mode` a global variable.
On Fri, Jun 7, 2024 at 3:54 PM Vladislav Odintsov wrote: > This simplifies code and subsequent commit to explicitely disable VXLAN > mode is based on these changes. > > Also "VXLAN mode" term is introduced in ovn-architecture man page. > > Signed-off-by: Vladislav Odintsov > --- > northd/en-global-config.c | 3 +- > northd/northd.c | 76 --- > northd/northd.h | 5 ++- > ovn-architecture.7.xml| 10 +++--- > 4 files changed, 41 insertions(+), 53 deletions(-) > > diff --git a/northd/en-global-config.c b/northd/en-global-config.c > index 28c78a12c..df0f8e58c 100644 > --- a/northd/en-global-config.c > +++ b/northd/en-global-config.c > @@ -116,7 +116,8 @@ en_global_config_run(struct engine_node *node , void > *data) > } > > char *max_tunid = xasprintf("%d", > -get_ovn_max_dp_key_local(sbrec_chassis_table)); > +get_ovn_max_dp_key_local( > +is_vxlan_mode(sbrec_chassis_table))); > smap_replace(options, "max_tunid", max_tunid); > free(max_tunid); > > diff --git a/northd/northd.c b/northd/northd.c > index 9f81afccb..6d118a19a 100644 > --- a/northd/northd.c > +++ b/northd/northd.c > @@ -90,6 +90,10 @@ static bool use_ct_inv_match = true; > */ > static bool default_acl_drop; > > +/* If this option is 'true' northd will use limited 24-bit space for > datapath > + * and ports tunnel key allocation (12 bits for each instead of default > 16). */ > +static bool vxlan_mode; > + > #define MAX_OVN_TAGS 4096 > > > @@ -881,7 +885,7 @@ join_datapaths(const struct nbrec_logical_switch_table > *nbrec_ls_table, > } > } > > -static bool > +bool > is_vxlan_mode(const struct sbrec_chassis_table *sbrec_chassis_table) > { > const struct sbrec_chassis *chassis; > @@ -896,25 +900,22 @@ is_vxlan_mode(const struct sbrec_chassis_table > *sbrec_chassis_table) > } > > uint32_t > -get_ovn_max_dp_key_local(const struct sbrec_chassis_table > *sbrec_chassis_table) > +get_ovn_max_dp_key_local(bool _vxlan_mode) > { > -if (is_vxlan_mode(sbrec_chassis_table)) { > -/* OVN_MAX_DP_GLOBAL_NUM doesn't apply for vxlan mode. */ > +if (_vxlan_mode) { > +/* OVN_MAX_DP_GLOBAL_NUM doesn't apply for VXLAN mode. */ > return OVN_MAX_DP_VXLAN_KEY; > } > return OVN_MAX_DP_KEY - OVN_MAX_DP_GLOBAL_NUM; > } > > static void > -ovn_datapath_allocate_key(const struct sbrec_chassis_table > *sbrec_ch_table, > - struct hmap *datapaths, struct hmap *dp_tnlids, > +ovn_datapath_allocate_key(struct hmap *datapaths, struct hmap *dp_tnlids, >struct ovn_datapath *od, uint32_t *hint) > { > if (!od->tunnel_key) { > od->tunnel_key = ovn_allocate_tnlid(dp_tnlids, "datapath", > -OVN_MIN_DP_KEY_LOCAL, > - > get_ovn_max_dp_key_local(sbrec_ch_table), > -hint); > +OVN_MIN_DP_KEY_LOCAL, get_ovn_max_dp_key_local(vxlan_mode), > hint); > if (!od->tunnel_key) { > if (od->sb) { > sbrec_datapath_binding_delete(od->sb); > @@ -927,7 +928,6 @@ ovn_datapath_allocate_key(const struct > sbrec_chassis_table *sbrec_ch_table, > > static void > ovn_datapath_assign_requested_tnl_id( > -const struct sbrec_chassis_table *sbrec_chassis_table, > struct hmap *dp_tnlids, struct ovn_datapath *od) > { > const struct smap *other_config = (od->nbs > @@ -936,8 +936,7 @@ ovn_datapath_assign_requested_tnl_id( > uint32_t tunnel_key = smap_get_int(other_config, "requested-tnl-key", > 0); > if (tunnel_key) { > const char *interconn_ts = smap_get(other_config, "interconn-ts"); > -if (!interconn_ts && is_vxlan_mode(sbrec_chassis_table) && > -tunnel_key >= 1 << 12) { > +if (!interconn_ts && vxlan_mode && tunnel_key >= 1 << 12) { > static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1); > VLOG_WARN_RL(&rl, "Tunnel key %"PRIu32" for datapath %s is " > "incompatible with VXLAN", tunnel_key, > @@ -985,7 +984,6 @@ build_datapaths(struct ovsdb_idl_txn *ovnsb_txn, > const struct nbrec_logical_switch_table *nbrec_ls_table, > const struct nbrec_logical_router_table *nbrec_lr_table, > const struct sbrec_datapath_binding_table *sbrec_dp_table, > -const struct sbrec_chassis_table *sbrec_chassis_table, > struct ovn_datapaths *ls_datapaths, > struct ovn_datapaths *lr_datapaths, > struct ovs_list *lr_list) > @@ -1000,12 +998,11 @@ build_datapaths(struct ovsdb_idl_txn *ovnsb_txn, > struct hmap dp_tnlids = HMAP_INITIALIZER(&dp_tnlids); > struct ovn_datapath *od; > LIST_FOR_EACH (od, list, &both) { > -ovn_datapath_assign_requested_tnl_id(sbrec_chassis_table, > &dp_tnlids, > -
Re: [ovs-dev] [PATCH ovn] controller: Add random delay during fdb learning.
On Wed, Jun 26, 2024 at 8:20 AM Naveen Yerramneni < naveen.yerramn...@nutanix.com> wrote: > This change reduces the probability of conflicts when > multiple nodes tries to add the same FDB entry to SB at the > same time. When conflict occurs, OVN controller does full > recompute which is heavy weight on the scale setup. > > Signed-off-by: Naveen Yerramneni > Suggested-by: Dumitru Ceara > --- > controller/mac-cache.c | 4 +++- > controller/mac-cache.h | 4 +++- > controller/ovn-controller.c | 2 +- > controller/pinctrl.c| 28 > tests/ovn.at| 6 +++--- > 5 files changed, 30 insertions(+), 14 deletions(-) > > diff --git a/controller/mac-cache.c b/controller/mac-cache.c > index d8c4e2aed..de45d7a6a 100644 > --- a/controller/mac-cache.c > +++ b/controller/mac-cache.c > @@ -245,7 +245,8 @@ mac_binding_lookup(struct ovsdb_idl_index > *sbrec_mac_binding_by_lport_ip, > > /* FDB. */ > struct fdb * > -fdb_add(struct hmap *map, struct fdb_data fdb_data) { > +fdb_add(struct hmap *map, struct fdb_data fdb_data, long long timestamp) > +{ > struct fdb *fdb = fdb_find(map, &fdb_data); > > if (!fdb) { > @@ -255,6 +256,7 @@ fdb_add(struct hmap *map, struct fdb_data fdb_data) { > } > > fdb->data = fdb_data; > +fdb->timestamp = timestamp; > > return fdb; > } > diff --git a/controller/mac-cache.h b/controller/mac-cache.h > index 3c78f9440..b94e7a1cf 100644 > --- a/controller/mac-cache.h > +++ b/controller/mac-cache.h > @@ -83,6 +83,7 @@ struct fdb { > struct fdb_data data; > /* Reference to the SB FDB record. */ > const struct sbrec_fdb *sbrec_fdb; > +long long timestamp; > }; > > struct bp_packet_data { > @@ -149,7 +150,8 @@ mac_binding_lookup(struct ovsdb_idl_index > *sbrec_mac_binding_by_lport_ip, > const char *logical_port, const char *ip); > > /* FDB. */ > -struct fdb *fdb_add(struct hmap *map, struct fdb_data fdb_data); > +struct fdb *fdb_add(struct hmap *map, struct fdb_data fdb_data, > +long long timestamp); > > void fdb_remove(struct hmap *map, struct fdb *fdb); > > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > index 6874f99a3..e99bdb3ef 100644 > --- a/controller/ovn-controller.c > +++ b/controller/ovn-controller.c > @@ -3322,7 +3322,7 @@ fdb_add_sb(struct mac_cache_data *data, const struct > sbrec_fdb *sfdb) > return; > } > > -struct fdb *fdb = fdb_add(&data->fdbs, fdb_data); > +struct fdb *fdb = fdb_add(&data->fdbs, fdb_data, 0); > > fdb->sbrec_fdb = sfdb; > } > diff --git a/controller/pinctrl.c b/controller/pinctrl.c > index f2e382a44..e3ded793b 100644 > --- a/controller/pinctrl.c > +++ b/controller/pinctrl.c > @@ -4739,6 +4739,7 @@ pinctrl_destroy(void) > /* Buffered "put_mac_binding" operation. */ > > #define MAX_MAC_BINDING_DELAY_MSEC 50 > +#define MAX_FDB_DELAY_MSEC 50 > #define MAX_MAC_BINDINGS 1000 > > /* Contains "struct mac_binding"s. */ > @@ -8961,21 +8962,30 @@ run_put_fdbs(struct ovsdb_idl_txn *ovnsb_idl_txn, > return; > } > > +long long now = time_msec(); > const struct fdb *fdb; > -HMAP_FOR_EACH (fdb, hmap_node, &put_fdbs) { > -run_put_fdb(ovnsb_idl_txn, sbrec_fdb_by_dp_key_mac, > -sbrec_port_binding_by_key, > -sbrec_datapath_binding_by_key, fdb); > +HMAP_FOR_EACH_SAFE (fdb, hmap_node, &put_fdbs) { > +if (now >= fdb->timestamp) { > +run_put_fdb(ovnsb_idl_txn, sbrec_fdb_by_dp_key_mac, > +sbrec_port_binding_by_key, > +sbrec_datapath_binding_by_key, fdb); > +fdb_remove(&put_fdbs, fdb); > +} > } > -fdbs_clear(&put_fdbs); > } > > > static void > wait_put_fdbs(struct ovsdb_idl_txn *ovnsb_idl_txn) > +OVS_REQUIRES(pinctrl_mutex) > { > -if (ovnsb_idl_txn && !hmap_is_empty(&put_fdbs)) { > -poll_immediate_wake(); > +if (!ovnsb_idl_txn) { > +return; > +} > + > +struct fdb *fdb; > +HMAP_FOR_EACH (fdb, hmap_node, &put_fdbs) { > +poll_timer_wait_until(fdb->timestamp); > } > } > > @@ -8995,6 +9005,8 @@ pinctrl_handle_put_fdb(const struct flow *md, const > struct flow *headers) > .mac = headers->dl_src, > }; > > -fdb_add(&put_fdbs, fdb_data); > +uint32_t delay = random_range(MAX_FDB_DELAY_MSEC) + 1; > +long long timestamp = time_msec() + delay; > +fdb_add(&put_fdbs, fdb_data, timestamp); > notify_pinctrl_main(); > } > diff --git a/tests/ovn.at b/tests/ovn.at > index af0eaeed3..7b199d3f5 100644 > --- a/tests/ovn.at > +++ b/tests/ovn.at > @@ -34382,21 +34382,21 @@ AT_CHECK([ovn-nbctl --wait=hv sync]) > # Learning is disabled, the table should be empty > send_packet 20 > AT_CHECK([ovn-nbctl --wait=hv sync]) > -AT_CHECK([test $(ovn-sbctl list fdb | grep -c "00:00:00:00:10:20") = 0]) > +OVS_WAIT_UNTIL([test $(ovn-sbctl lis
Re: [ovs-dev] [PATCH] ovsdb: raft: Don't forward more than one command to the leader.
On 6/27/24 00:02, Ilya Maximets wrote: > Every transaction has RAFT log prerequisites. Even if transactions > are not related (because RAFT doesn't actually know what data it is > handling). When leader writes a new record to a RAFT storage, it is > getting appended to the log right away and changes current 'eid', > i.e., changes prerequisites. The leader will not try to write new > records until the current one is committed, because until then the > pre-check will be failing. > > However, that is different for the follower. Followers do not add > records to the RAFT log until the leader sends an append request back. > So, if there are multiple transactions pending on a follower, it will > create a command for each of them and prerequisites will be set to the > same values. All these commands will be sent to the leader, but only > one can succeed at a time, because accepting one command immediately > changes prerequisites and all other commands become non-applicable. > So, out of N commands, 1 will succeed and N - 1 will fail. The cluster > failure is a transient failure, so the follower will re-process all the > failed transactions and send them again. 1 will succeed and N - 2 will > fail. And so on, until there are no more transactions. In the end, > instead of processing N transactions, the follower is performing > N * (N - 1) / 2 transaction processing iterations. That is consuming > a huge amount of CPU resources completely unnecessarily. > > Since there is no real chance for multiple transactions from the same > follower to succeed, it's better to not send them in the first place. > This also eliminates prerequisite mismatch messages on a leader in > this particular case. > Makes sense! > In a test with 30 parallel shell threads executing 12K transactions > total with separate ovsdb-client calls through the same follower there > is about 60% performance improvement. The test takes ~100 seconds to > complete without this change and ~40 seconds with this change applied. > The new time is very close to what it takes to execute the same test > through the cluster leader. > Maybe a public link to the test (if possible) would be a nice reference to have in the future? > Note: prerequisite failures on a leader are still possible, but mostly > in a case of simultaneous transactions from different followers. It's > a normal thing for a distributed database due to its nature. > > Signed-off-by: Ilya Maximets > --- > ovsdb/raft.c| 45 - > ovsdb/raft.h| 2 +- > ovsdb/storage.c | 9 + > ovsdb/storage.h | 5 - > ovsdb/transaction.c | 6 +- > 5 files changed, 55 insertions(+), 12 deletions(-) > > diff --git a/ovsdb/raft.c b/ovsdb/raft.c > index ac3d37ac4..116924058 100644 > --- a/ovsdb/raft.c > +++ b/ovsdb/raft.c > @@ -2307,12 +2307,55 @@ raft_get_eid(const struct raft *raft, uint64_t index) > return &raft->snap.eid; > } > > -const struct uuid * > +static const struct uuid * > raft_current_eid(const struct raft *raft) > { > return raft_get_eid(raft, raft->log_end - 1); > } > > +bool > +raft_precheck_prereq(const struct raft *raft, const struct uuid *prereq) > +{ > +if (!uuid_equals(raft_current_eid(raft), prereq)) { > +VLOG_DBG("%s: prerequisites (" UUID_FMT ") " > + "do not match current eid (" UUID_FMT ")", > + __func__, UUID_ARGS(prereq), > + UUID_ARGS(raft_current_eid(raft))); > +return false; > +} > + > +/* Having incomplete commands on a follower means that the leader has > + * these commands and they will change the prerequisites once added to > + * the leader's log. > + * > + * There is a chance that all these commands will actually fail and the > + * record with current prerequisites will in fact succeed, but, since > + * these are our own commands, the chances are low. > + * > + * Incomplete commands on a leader will not change the leader's current > + * 'eid' on commit as they are already part of the leader's log. */ > +if (raft->role != RAFT_LEADER && hmap_count(&raft->commands)) { > +struct raft_command *cmd; > + > +HMAP_FOR_EACH (cmd, hmap_node, &raft->commands) { Nit: I'd rewrite this as: if (raft->role == RAFT_LEADER) { return true; } HMAP_FOR_EACH (cmd, hmap_node, &raft->commands) { ... } > +/* Skip commands that are already part of the log (have non-zero > + * index) and ones that do not carry any data (have zero 'eid'), > + * as they can't change prerequisites. > + * > + * Database will not re-run triggers unless the data changes or > + * one of the data-carrying triggers completes. So, pre-check > must > + * not fail if there are no outstanding data-carrying commands. > */ > +if (!cmd->index && !uuid_is_zero(&cmd->eid)) { > +