Re: [ovs-dev] [PATCH] net: openvswitch: Fix Use-After-Free in ovs_ct_exit

2024-04-24 Thread patchwork-bot+netdevbpf
Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski :

On Mon, 22 Apr 2024 05:37:17 -0400 you wrote:
> Since kfree_rcu, which is called in the hlist_for_each_entry_rcu traversal
> of ovs_ct_limit_exit, is not part of the RCU read critical section, it
> is possible that the RCU grace period will pass during the traversal and
> the key will be free.
> 
> To prevent this, it should be changed to hlist_for_each_entry_safe.
> 
> [...]

Here is the summary with links:
  - net: openvswitch: Fix Use-After-Free in ovs_ct_exit
https://git.kernel.org/netdev/net/c/5ea7b72d4fac

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] selftests: openvswitch: Questions about possible enhancements

2024-04-24 Thread Jakub Kicinski
On Wed, 24 Apr 2024 17:44:05 +0100 Simon Horman wrote:
> I have recently been exercising the Open vSwitch kernel selftests,
> using vng,

Speaking of ovs tests, we currently don't run them in CI (and suffer
related skips in pmtu.sh) because Amazon Linux doesn't have ovs
packaged and building it looks pretty hard.

Is there an easy way to build just the CLI tooling or get a pre-built
package somewhere?

Or perhaps you'd be willing to run the OvS tests and we can move 
the part of pmtu.sh into OvS test dir?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next] net: openvswitch: Release reference to netdev

2024-04-24 Thread patchwork-bot+netdevbpf
Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski :

On Tue, 23 Apr 2024 15:37:51 +0800 you wrote:
> dev_get_by_name will provide a reference on the netdev. So ensure that
> the reference of netdev is released after completed.
> 
> Fixes: 2540088b836f ("net: openvswitch: Check vport netdev name")
> Signed-off-by: Jun Gu 
> ---
>  net/openvswitch/vport-netdev.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)

Here is the summary with links:
  - [net-next] net: openvswitch: Release reference to netdev
https://git.kernel.org/netdev/net-next/c/66270920f90f

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v1 2/2] Add support for overlay provider networks.

2024-04-24 Thread Numan Siddique
On Wed, Apr 24, 2024 at 5:14 PM Mark Michelson  wrote:
>
> Hi Numan,
>
> I haven't done a full review of this yet, but I figured I'd give some
> initial feedback from what I had looked at.
>
> At a high level, this is missing documentation in ovn-nb.xml for the new
> "overlay_provider_network" option. There should also be a NEWS entry for
> the new functionality.
>

Thanks for the comments.   Yes.  I missed adding them.  I'll address it in v2.

>
>
> See below for a couple more notes.
>
> On 4/23/24 12:43, num...@ovn.org wrote:
> > From: Numan Siddique 
> >
> > It is expected that a provider network logical switch has a localnet logical
> > switch port in order to bridge the overlay traffic to the underlay traffic.
> > There can be some usecases where a overlay logical switch (without
> > a localnet port) can act as a provider network and presently NAT doesn't
> > work as expected.  This patch adds this support.  A new config option
> > "overlay_provider_network" is added to support this feature.
> > This feature gets enabled for a logical switch 'P' if:
> >- The above option is set to true in the Logical_Switch.other_config
> >  column.
> >- The logical switch 'P' doesn't have any localnet ports.
> >- The logical router port of a router 'R' connecting to 'P'
> >  is a gateway router port.
> >- And the logical router 'R' has only one gateway router port.
> >
> > If all the above conditions are met, ovn-northd creates a chassisredirect
> > port for the logical switch port (of type router) connecting to the
> > router 'R'.  For example, if the logical port is named as "P-R" and its
> > peer router port is "R-P", then chassisredirect port cr-P-R is created
> > along with cr-R-P.  Gateway chassis binding the cr-R-P also binds cr-P-R.
> > This ensures that the routing is centralized on this gateway chassis for
> > the traffic coming from switch "P" towards the router or vice versa.
> > This centralization is required in order to support NAT (both SNAT and
> > DNAT).  Distributed NAT (i.e if external_mac and router_port is set) is
> > not supported and instead the router port mac is used for such traffic.
> >
> > Reported-at: https://issues.redhat.com/browse/FDP-364
> > Signed-off-by: Numan Siddique 
> > ---
> >   controller/physical.c |   4 +
> >   northd/northd.c   | 226 +
> >   northd/northd.h   |   1 +
> >   tests/multinode-macros.at |   2 +-
> >   tests/multinode.at| 177 +
> >   tests/ovn-northd.at   | 520 +-
> >   tests/ovn.at  |   8 +-
> >   7 files changed, 886 insertions(+), 52 deletions(-)
> >
> > diff --git a/controller/physical.c b/controller/physical.c
> > index 7ee3086940..625e37e8a7 100644
> > --- a/controller/physical.c
> > +++ b/controller/physical.c
> > @@ -1587,6 +1587,10 @@ consider_port_binding(struct ovsdb_idl_index 
> > *sbrec_port_binding_by_name,
> >   ct_zones);
> >   put_zones_ofpacts(_ids, ofpacts_p);
> >
> > +/* Clear the MFF_INPORT.  Its possible that the same packet may
> > + * go out from the same tunnel inport. */
> > +put_load(ofp_to_u16(OFPP_NONE), MFF_IN_PORT, 0, 16, ofpacts_p);
> > +
> >   /* Resubmit to table 41. */
> >   put_resubmit(OFTABLE_CHECK_LOOPBACK, ofpacts_p);
> >   }
> > diff --git a/northd/northd.c b/northd/northd.c
> > index ead54235c8..1942f9f7a1 100644
> > --- a/northd/northd.c
> > +++ b/northd/northd.c
> > @@ -2063,6 +2063,35 @@ parse_lsp_addrs(struct ovn_port *op)
> >   }
> >   }
> >
> > +static struct ovn_port *
> > +create_cr_port(struct ovn_port *op, struct hmap *ports)
> > +{
> > +char *redirect_name = ovn_chassis_redirect_name(
> > +op->nbsp ? op->nbsp->name : op->nbrp->name);
> > +
> > +struct ovn_datapath *od = op->od;
> > +struct ovn_port *crp = ovn_port_find(ports, redirect_name);
> > +if (!(crp && crp->sb && crp->sb->datapath == od->sb)) {
> > +crp = ovn_port_create(ports, redirect_name,
> > +  op->nbsp, op->nbrp, NULL);
> > +}
> > +
> > +crp->l3dgw_port = op;
> > +op->cr_port = crp;
> > +crp->od = od;
> > +free(redirect_name);
> > +
> > +/* Add to l3dgw_ports in od, for later use during flow creation. */
> > +if (od->n_l3dgw_ports == od->n_allocated_l3dgw_ports) {
> > +od->l3dgw_ports = x2nrealloc(od->l3dgw_ports,
> > + >n_allocated_l3dgw_ports,
> > + sizeof *od->l3dgw_ports);
> > +}
> > +od->l3dgw_ports[od->n_l3dgw_ports++] = op;
> > +
> > +return crp;
> > +}
> > +
> >   static void
> >   join_logical_ports(const struct sbrec_port_binding_table *sbrec_pb_table,
> >  struct hmap *ls_datapaths, struct hmap *lr_datapaths,
> > @@ -2170,9 +2199,10 @@ join_logical_ports(const struct 
> > 

Re: [ovs-dev] [PATCH OVN v6 3/3] northd, tests: DHCP Relay Agent support for overlay IPv4 subnets.

2024-04-24 Thread Numan Siddique
On Wed, Apr 24, 2024 at 5:57 AM Naveen Yerramneni <
naveen.yerramn...@nutanix.com> wrote:

> NB SCHEMA CHANGES
> -
>   1. New DHCP_Relay table
>   "DHCP_Relay": {
> "columns": {
> "name": {"type": "string"},
> "servers": {"type": {"key": "string",
>"min": 0,
>"max": 1}},
> "external_ids": {
> "type": {"key": "string", "value": "string",
> "min": 0, "max": "unlimited"}}},
> "options": {"type": {"key": "string", "value": "string",
> "min": 0, "max": "unlimited"}},
> "isRoot": true},
>   2. New column to Logical_Router_Port table
>   "dhcp_relay": {"type": {"key": {"type": "uuid",
> "refTable": "DHCP_Relay",
> "refType": "strong"},
> "min": 0,
> "max": 1}},
>
> NEW PIPELINE STAGES
> ---
> Following stage is added for DHCP relay feature.
> Some of the flows are fitted into the existing pipeline tages.
>   1. lr_in_dhcp_relay_req
>- This stage process the DHCP request packets coming from DHCP
> clients.
>- DHCP request packets for which dhcp_relay_req_chk action
>  (which gets applied in ip input stage) is successful are
> forwarded to DHCP server.
>- DHCP request packets for which dhcp_relay_req_chk action is
> unsuccessful gets dropped.
>   2. lr_in_dhcp_relay_resp_chk
>- This stage applied the dhcp_relay_resp_chk action for  DHCP
> response packets coming
>  from the DHCP server.
>   3. lr_in_dhcp_relay_resp
>- DHCP response packets for which dhcp_relay_resp_chk is sucessful
> are forwarded
>  to the DHCP clients.
>- DHCP response packets for which dhcp_relay_resp_chk is
> unsucessful gets dropped.
>
> REGISTRY USAGE
> ---
>   - reg9[7] : To store the result of dhcp_relay_req_chk action.
>   - reg9[8] : To store the result of dhcp_relay_resp_chk action.
>   - reg2 : To store the original dest ip for DHCP response packets.
>This is required to properly match the packets in
>lr_in_dhcp_relay_resp stage since dhcp_relay_resp_chk action
>changes the dest ip.
>
> FLOWS
> -
>
> Following are the flows added when DHCP Relay is configured on one overlay
> subnet,
> one additonal flow is added in ls_in_l2_lkup table for each VM part of the
> subnet.
>
>   1. table=27(ls_in_l2_lkup  ), priority=100  , match=(inport ==
>  && eth.src ==  && ip4.src == 0.0.0.0 && ip4.dst ==
> 255.255.255.255 && udp.src == 68 && udp.dst == 67),
>  action=(eth.dst=;outport=;next;/* DHCP_RELAY_REQ */)
>   2. table=3 (lr_in_ip_input ), priority=110  , match=(inport == 
> && ip4.src == 0.0.0.0 && ip4.dst == 255.255.255.255 && ip.frag == 0 &&
> udp.src == 68 && udp.dst == 67),
>  action=(reg9[7] = dhcp_relay_req_chk(,
> );next; /* DHCP_RELAY_REQ */)
>   3. table=3 (lr_in_ip_input ), priority=110  , match=(ip4.src ==
>  && ip4.dst ==  && udp.src == 67 && udp.dst == 67),
> action=(next;/* DHCP_RELAY_RESP */)
>   4. table=4 (lr_in_dhcp_relay_req), priority=100  , match=(inport ==
> "lrp1" && ip4.src == 0.0.0.0 && ip4.dst == 255.255.255.255 && udp.src == 68
> && udp.dst == 67 && reg9[7]),
>  action=(ip4.src=;ip4.dst=;udp.src=67;next; /*
> DHCP_RELAY_REQ */)
>   5. table=4 (lr_in_dhcp_relay_req), priority=1, match=(inport ==
>  && ip4.src == 0.0.0.0 && ip4.dst == 255.255.255.255 && udp.src == 68
> && udp.dst == 67 && reg9[7] == 0),
>  action=(drop; /* DHCP_RELAY_REQ */)
>   6. table=18(lr_in_dhcp_relay_resp_chk), priority=100  , match=(ip4.src
> ==  && ip4.dst ==  && ip.frag == 0 && udp.src == 67 &&
> udp.dst == 67),
>  action=(reg2 = ip4.dst;reg9[8] = dhcp_relay_resp_chk(,
> );next;/* DHCP_RELAY_RESP */)
>   7. table=19(lr_in_dhcp_relay_resp), priority=100  , match=(ip4.src ==
>  && reg2 ==  && udp.src == 67 && udp.dst == 67 &&
> reg9[8]),
>  action=(ip4.src=;udp.dst=68;outport=;output; /*
> DHCP_RELAY_RESP */)
>   8. table=19(lr_in_dhcp_relay_resp), priority=1, match=(ip4.src ==
>  && reg2 ==  && udp.src == 67 && udp.dst == 67 &&
> reg9[8] == 0), action=(drop; /* DHCP_RELAY_RESP */)
>
> Commands to enable the feature
> --
>   ovn-nbctl create DHCP_Relay name= servers=
>   ovn-nbctl set Logical_Router_port  dhcp_relay=
>   ovn-nbctl set Logical_Switch 
> other_config:dhcp_relay_port=
>
> Limitations:
> 
>   - All OVN features that needs IP address to be configured on logical
> port (like proxy arp, etc)
> will not be supported for overlay subnets on which DHCP relay is
> enabled.
>
> Signed-off-by: Naveen Yerramneni 
> Co-authored-by: Huzaifa Calcuttawala 
> Signed-off-by: Huzaifa Calcuttawala 
> CC: Mary Manohar 
>

Thanks.  I applied this patch to 

Re: [ovs-dev] [PATCH OVN v6 2/3] controller: DHCP Relay Agent support for overlay IPv4 subnets.

2024-04-24 Thread Numan Siddique
On Wed, Apr 24, 2024 at 5:57 AM Naveen Yerramneni <
naveen.yerramn...@nutanix.com> wrote:

> Added changes in pinctrl to process DHCP Relay opcodes:
>   - ACTION_OPCODE_DHCP_RELAY_REQ_CHK: For request packets
>   - ACTION_OPCODE_DHCP_RELAY_RESP_CHK: For response packet
>
> Signed-off-by: Naveen Yerramneni 
>

Thanks.  I applied this patch to main with the below changes.

I did some changes to the way dhcp options are returned in the function
dhcp_parse_options()



diff --git a/controller/pinctrl.c b/controller/pinctrl.c
index 50e090cd25..0ee6d8fa85 100644
--- a/controller/pinctrl.c
+++ b/controller/pinctrl.c
@@ -2023,7 +2023,7 @@ static const char *dhcp_msg_str_get(uint8_t msg_type)

 static const struct dhcp_header *
 dhcp_get_hdr_from_pkt(struct dp_packet *pkt_in, const char **in_dhcp_pptr,
-  const char *end)
+  const char *end)
 {
 /* Validate the DHCP request packet.
  * Format of the DHCP packet is
@@ -2079,13 +2079,20 @@ dhcp_get_hdr_from_pkt(struct dp_packet *pkt_in,
const char **in_dhcp_pptr,
 return dhcp_hdr;
 }

-static void
-dhcp_parse_options(const char **in_dhcp_pptr, const char *end,
-   const uint8_t **dhcp_msg_type_pptr,
-   ovs_be32 *request_ip_ptr,
-   bool *ipxe_req_ptr, ovs_be32 *server_id_ptr,
-   ovs_be32 *netmask_ptr, ovs_be32 *router_ip_ptr)
+/* Parsed DHCP option values which we are interested in. */
+struct parsed_dhcp_options {
+uint8_t dhcp_msg_type;
+ovs_be32 request_ip;
+ovs_be32 server_id;
+ovs_be32 netmask;
+ovs_be32 router_ip;
+bool ipxe_req;
+};
+
+static struct parsed_dhcp_options
+dhcp_parse_options(const char **in_dhcp_pptr, const char *end)
 {
+struct parsed_dhcp_options parsed_dhcp_opts = {0};
 while ((*in_dhcp_pptr) < end) {
 const struct dhcp_opt_header *in_dhcp_opt =
 (const struct dhcp_opt_header *) *in_dhcp_pptr;
@@ -2107,52 +2114,53 @@ dhcp_parse_options(const char **in_dhcp_pptr, const
char *end,

 switch (in_dhcp_opt->code) {
 case DHCP_OPT_MSG_TYPE:
-if (dhcp_msg_type_pptr && in_dhcp_opt->len == 1) {
-*dhcp_msg_type_pptr = DHCP_OPT_PAYLOAD(in_dhcp_opt);
+if (in_dhcp_opt->len == 1) {
+const uint8_t *dhcp_msg_type =
DHCP_OPT_PAYLOAD(in_dhcp_opt);
+parsed_dhcp_opts.dhcp_msg_type = *dhcp_msg_type;
 }
 break;
 case DHCP_OPT_REQ_IP:
-if (request_ip_ptr && in_dhcp_opt->len == 4) {
-*request_ip_ptr = get_unaligned_be32(
-  DHCP_OPT_PAYLOAD(in_dhcp_opt));
+if (in_dhcp_opt->len == 4) {
+parsed_dhcp_opts.request_ip = get_unaligned_be32(
+DHCP_OPT_PAYLOAD(in_dhcp_opt));
 }
 break;
 case OVN_DHCP_OPT_CODE_SERVER_ID:
-if (server_id_ptr && in_dhcp_opt->len == 4) {
-*server_id_ptr = get_unaligned_be32(
-  DHCP_OPT_PAYLOAD(in_dhcp_opt));
+if (in_dhcp_opt->len == 4) {
+parsed_dhcp_opts.server_id = get_unaligned_be32(
+DHCP_OPT_PAYLOAD(in_dhcp_opt));
 }
 break;
 case OVN_DHCP_OPT_CODE_NETMASK:
-if (netmask_ptr && in_dhcp_opt->len == 4) {
-*netmask_ptr = get_unaligned_be32(
-  DHCP_OPT_PAYLOAD(in_dhcp_opt));
+if (in_dhcp_opt->len == 4) {
+parsed_dhcp_opts.netmask = get_unaligned_be32(
+DHCP_OPT_PAYLOAD(in_dhcp_opt));
 }
 break;
 case OVN_DHCP_OPT_CODE_ROUTER_IP:
-if (router_ip_ptr && in_dhcp_opt->len == 4) {
-*router_ip_ptr = get_unaligned_be32(
-  DHCP_OPT_PAYLOAD(in_dhcp_opt));
+if (in_dhcp_opt->len == 4) {
+parsed_dhcp_opts.router_ip = get_unaligned_be32(
+DHCP_OPT_PAYLOAD(in_dhcp_opt));
 }
 break;
 case DHCP_OPT_ETHERBOOT:
-if (ipxe_req_ptr) {
-*ipxe_req_ptr = true;
-}
+parsed_dhcp_opts.ipxe_req = true;
 break;
 default:
 break;
 }
 }
+
+return parsed_dhcp_opts;
 }

 /* Called with in the pinctrl_handler thread context. */
 static void
-pinctrl_handle_dhcp_relay_req_chk(
-struct rconn *swconn,
-struct dp_packet *pkt_in, struct ofputil_packet_in *pin,
-struct ofpbuf *userdata,
-struct ofpbuf *continuation)
+pinctrl_handle_dhcp_relay_req_chk(struct rconn *swconn,
+  struct dp_packet *pkt_in,
+  struct ofputil_packet_in *pin,
+  struct ofpbuf *userdata,
+   

Re: [ovs-dev] [PATCH OVN v6 1/3] actions: DHCP Relay Agent support for overlay IPv4 subnets.

2024-04-24 Thread Numan Siddique
On Wed, Apr 24, 2024 at 5:56 AM Naveen Yerramneni <
naveen.yerramn...@nutanix.com> wrote:

> NEW OVN ACTIONS
> ---
>   1. dhcp_relay_req_chk(, )
>- This action executes on the source node on which the DHCP request
> originated.
>- This action relays the DHCP request coming from client to the
> server.
>  Relay-ip is used to update GIADDR in the DHCP header.
>   2. dhcp_relay_resp_chk(, )
>- This action executes on the first node (RC node) which processes
>  the DHCP response from the server.
>- This action updates  the destination MAC and destination IP so
> that the response
>  can be forwarded to the appropriate node from which request was
> originated.
>- Relay-ip, server-ip are used to validate GIADDR and SERVER ID in
> the DHCP payload.
>
> Signed-off-by: Naveen Yerramneni 
>

Thanks.

I applied this patch to main with the below changes.  Changes are mostly
cosmetic.


diff --git a/include/ovn/actions.h b/include/ovn/actions.h
index b30cb96b29..ae0864fdd7 100644
--- a/include/ovn/actions.h
+++ b/include/ovn/actions.h
@@ -396,7 +396,6 @@ struct ovnact_put_opts {
 /* OVNACT_DHCP_RELAY. */
 struct ovnact_dhcp_relay {
 struct ovnact ovnact;
-int family;
 struct expr_field dst;  /* 1-bit destination field. */
 ovs_be32 relay_ipv4;
 ovs_be32 server_ipv4;
diff --git a/lib/actions.c b/lib/actions.c
index 940209a7b7..94751d0597 100644
--- a/lib/actions.c
+++ b/lib/actions.c
@@ -2692,12 +2692,10 @@ parse_dhcp_relay_chk(struct action_context *ctx,

 /* Parse relay ip and server ip. */
 if (ctx->lexer->token.format == LEX_F_IPV4) {
-dhcp_relay->family = AF_INET;
 dhcp_relay->relay_ipv4 = ctx->lexer->token.value.ipv4;
 lexer_get(ctx->lexer);
 lexer_match(ctx->lexer, LEX_T_COMMA);
 if (ctx->lexer->token.format == LEX_F_IPV4) {
-dhcp_relay->family = AF_INET;
 dhcp_relay->server_ipv4 = ctx->lexer->token.value.ipv4;
 lexer_get(ctx->lexer);
 } else {
@@ -2706,7 +2704,7 @@ parse_dhcp_relay_chk(struct action_context *ctx,
 }
 } else {
 lexer_syntax_error(ctx->lexer, "expecting IPv4 dhcp relay "
-"and server ips");
+   "and server ips");
 return;
 }
 lexer_force_match(ctx->lexer, LEX_T_RPAREN);
@@ -2726,9 +2724,9 @@ encode_dhcpv4_relay_chk(const struct
ovnact_dhcp_relay *dhcp_relay,
 ovs_be32 ofs = htonl(dst.ofs);
 ofpbuf_put(ofpacts, , sizeof ofs);
 ofpbuf_put(ofpacts, _relay->relay_ipv4,
-sizeof(dhcp_relay->relay_ipv4));
+   sizeof(dhcp_relay->relay_ipv4));
 ofpbuf_put(ofpacts, _relay->server_ipv4,
-sizeof(dhcp_relay->server_ipv4));
+   sizeof(dhcp_relay->server_ipv4));
 encode_finish_controller_op(oc_offset, ofpacts);
 }

@@ -2736,7 +2734,7 @@ static void
 format_DHCPV4_RELAY_REQ_CHK(const struct ovnact_dhcp_relay *dhcp_relay,
 struct ds *s)
 {
-format_dhcpv4_relay_chk("dhcp_relay_req_chk",dhcp_relay, s);
+format_dhcpv4_relay_chk("dhcp_relay_req_chk", dhcp_relay, s);
 }

 static void
@@ -2752,7 +2750,7 @@ static void
 format_DHCPV4_RELAY_RESP_CHK(const struct ovnact_dhcp_relay *dhcp_relay,
  struct ds *s)
 {
-format_dhcpv4_relay_chk("dhcp_relay_resp_chk",dhcp_relay, s);
+format_dhcpv4_relay_chk("dhcp_relay_resp_chk", dhcp_relay, s);
 }

 static void
@@ -2764,8 +2762,7 @@ encode_DHCPV4_RELAY_RESP_CHK(const struct
ovnact_dhcp_relay *dhcp_relay,
 ACTION_OPCODE_DHCP_RELAY_RESP_CHK);
 }

-static void ovnact_dhcp_relay_free(
-  struct ovnact_dhcp_relay *dhcp_relay OVS_UNUSED)
+static void ovnact_dhcp_relay_free(struct ovnact_dhcp_relay *dr OVS_UNUSED)
 {
 }

@@ -5472,11 +5469,11 @@ parse_set_action(struct action_context *ctx)
 parse_chk_lb_aff(ctx, ,
 ovnact_put_CHK_LB_AFF(ctx->ovnacts));
 } else if (lexer_match_id(ctx->lexer, "dhcp_relay_req_chk")) {
-parse_dhcp_relay_chk(ctx, ,
-ovnact_put_DHCPV4_RELAY_REQ_CHK(ctx->ovnacts));
+parse_dhcp_relay_chk(
+ctx, , ovnact_put_DHCPV4_RELAY_REQ_CHK(ctx->ovnacts));
 } else if (lexer_match_id(ctx->lexer, "dhcp_relay_resp_chk")) {
-parse_dhcp_relay_chk(ctx, ,
-ovnact_put_DHCPV4_RELAY_RESP_CHK(ctx->ovnacts));
+parse_dhcp_relay_chk(
+ctx, , ovnact_put_DHCPV4_RELAY_RESP_CHK(ctx->ovnacts));
 } else {
 parse_assignment_action(ctx, false, );
 }
diff --git a/utilities/ovn-trace.c b/utilities/ovn-trace.c
index 49b2ab9545..13a4ea490a 100644
--- a/utilities/ovn-trace.c
+++ b/utilities/ovn-trace.c
@@ -2360,9 +2360,9 @@ execute_dhcpv4_relay_resp_chk(const struct
ovnact_dhcp_relay *dr,
 ovntrace_node_append(

Re: [ovs-dev] [PATCH OVN v5 0/4] DHCP Relay Agent support for overlay subnets.

2024-04-24 Thread Numan Siddique
On Wed, Apr 24, 2024 at 9:21 AM Naveen Yerramneni <
naveen.yerramn...@nutanix.com> wrote:

>
>
> > On 05-Apr-2024, at 9:08 PM, Numan Siddique  wrote:
> >
> > CAUTION: External Email
> >
> >
> > On Wed, Mar 20, 2024 at 10:40 AM Naveen Yerramneni <
> naveen.yerramn...@nutanix.com> wrote:
> > >
> > > This patch contains changes to enable DHCP Relay Agent support for
> overlay subnets.
> > >
> > > USE CASE:
> > > --
> > >   - Enable IP address assignment for overlay subnets from the
> centralized DHCP server present in the underlay network.
> > >
> > > PREREQUISITES
> > > --
> > >   - Logical Router Port IP should be assigned (statically) from
> the same overlay subnet which is managed by DHCP server.
> > >   - LRP IP is used for GIADRR field when relaying the DHCP packets
> and also same IP needs to be configured as default gateway for the overlay
> subnet.
> > >   - Overlay subnets managed by external DHCP server are expected
> to be directly reachable from the underlay network.
> > >
> > > EXPECTED PACKET FLOW:
> > > --
> > > Following is the expected packet flow inorder to support DHCP
> rleay functionality in OVN.
> > >   1. DHCP client originates DHCP discovery (broadcast).
> > >   2. DHCP relay (running on the OVN) receives the broadcast and
> forwards the packet to the DHCP server by converting it to unicast.
> > >  While forwarding the packet, it updates the GIADDR in DHCP
> header to its interface IP on which DHCP packet is received and increments
> hop count.
> > >   3. DHCP server uses GIADDR field to decide the IP address pool
> from which IP has to be assigned and DHCP offer is sent to the same IP
> (GIADDR).
> > >   4. DHCP relay agent forwards the offer to the client.
> > >   5. DHCP client sends DHCP request (broadcast) packet.
> > >   6. DHCP relay (running on the OVN) receives the broadcast and
> forwards the packet to the DHCP server by converting it to unicast.
> > >  While forwarding the packet, it updates the GIADDR in DHCP
> header to its interface IP on which DHCP packet is received.
> > >   7. DHCP Server sends the ACK packet.
> > >   8. DHCP relay agent forwards the ACK packet to the client.
> > >   9. All the future renew/release packets are directly exchanged
> between DHCP client and DHCP server.
> > >
> > > OVN DHCP RELAY PACKET FLOW:
> > > 
> > > To add DHCP Relay support on OVN, we need to replicate all the
> behavior described above using distributed logical switch and logical
> router.
> > > At, highlevel packet flow is distributed among Logical Switch and
> Logical Router on source node (where VM is deployed) and redirect
> chassis(RC) node.
> > >   1. Request packet gets processed on the source node where VM is
> deployed and relays the packet to DHCP server.
> > >   2. Response packet is first processed on RC node (which first
> recieves the packet from underlay network). RC node forwards the packet to
> the right node by filling in the dest MAC and IP.
> > >
> > > OVN Packet flow with DHCP relay is explained below.
> > >   1. DHCP client (VM) sends the DHCP discover packet (broadcast).
> > >   2. Logical switch converts the packet to L2 unicast by setting
> the destination MAC to LRP's MAC
> > >   3. Logical Router receives the packet and redirects it to the
> OVN controller.
> > >   4. OVN controller updates the required information(GIADDR, HOP
> count) in the DHCP payload after doing the required checks. If any check
> fails, packet is dropped.
> > >   5. Logical Router converts the packet to L3 unicast and forwards
> it to the server. This packets gets routed like any other packet (via RC
> node).
> > >   6. Server replies with DHCP offer.
> > >   7. RC node processes the DHCP offer and forwards it to the OVN
> controller.
> > >   8. OVN controller does sanity checks and  updates the
> destination MAC (available in DHCP header), destination IP (available in
> DHCP header) and reinjects the packet to datapath.
> > >  If any check fails, packet is dropped.
> > >   9. Logical router updates the source IP and port and forwards
> the packet to logical switch.
> > >   10. Logical switch delivers the packet to the DHCP client.
> > >   11. Similar steps are performed for Request and Ack packets.
> > >   12. All the future renew/release packets are directly exchanged
> between DHCP client and DHCP server
> > >
> > > NEW OVN ACTIONS
> > > ---
> > >   1. dhcp_relay_req_chk(, )
> > >   - This action executes on the source node on which the DHCP
> request originated.
> > >   - This action relays the DHCP request coming from client to
> the server. Relay-ip is used to update GIADDR in the DHCP header.
> > >   2. dhcp_relay_resp_chk(, )
> > >   - This action executes on the first node 

Re: [ovs-dev] selftests: openvswitch: Questions about possible enhancements

2024-04-24 Thread Benjamin Poirier
On 2024-04-24 18:37 +0100, Simon Horman wrote:
> On Wed, Apr 24, 2024 at 05:44:05PM +0100, Simon Horman wrote:
> > Hi Aaron, Jakub, all,
> > 
> > I have recently been exercising the Open vSwitch kernel selftests,
> > using vng, something like this:
> > 
> > TESTDIR="tools/testing/selftests/net/openvswitch"
> > 
> > vng -v --run . --user root --cpus 2 \
> > --overlay-rwdir "$PWD" -- \
> > "modprobe openvswitch && \
> >  echo \"timeout=90\" >> \"${TESTDIR}/settings\" && \
> >  make -C \"$TESTDIR\" run_tests"
> > 
> > And I have some observations that I'd like to ask about.
> > 
> > 1. Building the kernel using the following command does not
> >build the openvswitch kernel module.
> > 
> > vng -v --build \
> > --config tools/testing/selftests/net/config
> > 
> >All that seems to be missing is CONFIG_OPENVSWITCH=m
> >and I am wondering what the best way of resolving this is.
> > 
> >Perhaps I am doing something wrong.
> >Or perhaps tools/testing/selftests/net/openvswitch/config
> >should be created? If so, should it include (most of?) what is in
> >tools/testing/selftests/net/config, or just CONFIG_OPENVSWITCH=m?

I noticed something similar when testing Jiri's virtio_net selftests
patchset [1].

drivers/net/virtio_net/config includes virtio options but the
test also needs at least CONFIG_NET_VRF=y which is part of net/config.

Whatever the answer to your question, all config files should be
coherent on this matter.

[1] https://lore.kernel.org/netdev/20240424104049.3935572-1-j...@resnulli.us/

[...]
> 
>   5. openvswitch.sh starts with "#!/bin/sh".
>  But substitutions such as "${ns:0:1}0"  fail if /bin/sh is dash.
>  Perhaps we should change openvswitch.sh to use bash?

I think so. A similar change was done in
c2518da8e6b0 selftests: bonding: Change script interpreter (v6.8-rc1)
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v1 2/2] Add support for overlay provider networks.

2024-04-24 Thread Mark Michelson

Hi Numan,

I haven't done a full review of this yet, but I figured I'd give some 
initial feedback from what I had looked at.


At a high level, this is missing documentation in ovn-nb.xml for the new 
"overlay_provider_network" option. There should also be a NEWS entry for 
the new functionality.


See below for a couple more notes.

On 4/23/24 12:43, num...@ovn.org wrote:

From: Numan Siddique 

It is expected that a provider network logical switch has a localnet logical
switch port in order to bridge the overlay traffic to the underlay traffic.
There can be some usecases where a overlay logical switch (without
a localnet port) can act as a provider network and presently NAT doesn't
work as expected.  This patch adds this support.  A new config option
"overlay_provider_network" is added to support this feature.
This feature gets enabled for a logical switch 'P' if:
   - The above option is set to true in the Logical_Switch.other_config
 column.
   - The logical switch 'P' doesn't have any localnet ports.
   - The logical router port of a router 'R' connecting to 'P'
 is a gateway router port.
   - And the logical router 'R' has only one gateway router port.

If all the above conditions are met, ovn-northd creates a chassisredirect
port for the logical switch port (of type router) connecting to the
router 'R'.  For example, if the logical port is named as "P-R" and its
peer router port is "R-P", then chassisredirect port cr-P-R is created
along with cr-R-P.  Gateway chassis binding the cr-R-P also binds cr-P-R.
This ensures that the routing is centralized on this gateway chassis for
the traffic coming from switch "P" towards the router or vice versa.
This centralization is required in order to support NAT (both SNAT and
DNAT).  Distributed NAT (i.e if external_mac and router_port is set) is
not supported and instead the router port mac is used for such traffic.

Reported-at: https://issues.redhat.com/browse/FDP-364
Signed-off-by: Numan Siddique 
---
  controller/physical.c |   4 +
  northd/northd.c   | 226 +
  northd/northd.h   |   1 +
  tests/multinode-macros.at |   2 +-
  tests/multinode.at| 177 +
  tests/ovn-northd.at   | 520 +-
  tests/ovn.at  |   8 +-
  7 files changed, 886 insertions(+), 52 deletions(-)

diff --git a/controller/physical.c b/controller/physical.c
index 7ee3086940..625e37e8a7 100644
--- a/controller/physical.c
+++ b/controller/physical.c
@@ -1587,6 +1587,10 @@ consider_port_binding(struct ovsdb_idl_index 
*sbrec_port_binding_by_name,
  ct_zones);
  put_zones_ofpacts(_ids, ofpacts_p);
  
+/* Clear the MFF_INPORT.  Its possible that the same packet may

+ * go out from the same tunnel inport. */
+put_load(ofp_to_u16(OFPP_NONE), MFF_IN_PORT, 0, 16, ofpacts_p);
+
  /* Resubmit to table 41. */
  put_resubmit(OFTABLE_CHECK_LOOPBACK, ofpacts_p);
  }
diff --git a/northd/northd.c b/northd/northd.c
index ead54235c8..1942f9f7a1 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -2063,6 +2063,35 @@ parse_lsp_addrs(struct ovn_port *op)
  }
  }
  
+static struct ovn_port *

+create_cr_port(struct ovn_port *op, struct hmap *ports)
+{
+char *redirect_name = ovn_chassis_redirect_name(
+op->nbsp ? op->nbsp->name : op->nbrp->name);
+
+struct ovn_datapath *od = op->od;
+struct ovn_port *crp = ovn_port_find(ports, redirect_name);
+if (!(crp && crp->sb && crp->sb->datapath == od->sb)) {
+crp = ovn_port_create(ports, redirect_name,
+  op->nbsp, op->nbrp, NULL);
+}
+
+crp->l3dgw_port = op;
+op->cr_port = crp;
+crp->od = od;
+free(redirect_name);
+
+/* Add to l3dgw_ports in od, for later use during flow creation. */
+if (od->n_l3dgw_ports == od->n_allocated_l3dgw_ports) {
+od->l3dgw_ports = x2nrealloc(od->l3dgw_ports,
+ >n_allocated_l3dgw_ports,
+ sizeof *od->l3dgw_ports);
+}
+od->l3dgw_ports[od->n_l3dgw_ports++] = op;
+
+return crp;
+}
+
  static void
  join_logical_ports(const struct sbrec_port_binding_table *sbrec_pb_table,
 struct hmap *ls_datapaths, struct hmap *lr_datapaths,
@@ -2170,9 +2199,10 @@ join_logical_ports(const struct sbrec_port_binding_table 
*sbrec_pb_table,
  tag_alloc_add_existing_tags(tag_alloc_table, nbsp);
  }
  }
+
+struct hmapx gw_ports = HMAPX_INITIALIZER(_ports);
  HMAP_FOR_EACH (od, key_node, lr_datapaths) {
  ovs_assert(od->nbr);
-size_t n_allocated_l3dgw_ports = 0;
  for (size_t i = 0; i < od->nbr->n_ports; i++) {
  const struct nbrec_logical_router_port *nbrp
  = od->nbr->ports[i];
@@ -2236,10 +2266,7 @@ join_logical_ports(const struct 

[ovs-dev] [RFC 11/11] tests: Add test for sample offloading.

2024-04-24 Thread Adrian Moreno
Signed-off-by: Adrian Moreno 
---
 tests/system-common-macros.at|  4 +++
 tests/system-offloads-traffic.at | 53 
 2 files changed, 57 insertions(+)

diff --git a/tests/system-common-macros.at b/tests/system-common-macros.at
index 2a68cd664..860d6a8c9 100644
--- a/tests/system-common-macros.at
+++ b/tests/system-common-macros.at
@@ -378,3 +378,7 @@ m4_define([OVS_CHECK_GITHUB_ACTION],
 # OVS_CHECK_DROP_ACTION()
 m4_define([OVS_CHECK_DROP_ACTION],
 [AT_SKIP_IF([! grep -q "Datapath supports explicit drop action" 
ovs-vswitchd.log])])
+
+# OVS_CHECK_PSAMPLE()
+m4_define([OVS_CHECK_PSAMPLE],
+[AT_SKIP_IF([! grep -q "Datapath supports psample" ovs-vswitchd.log])])
diff --git a/tests/system-offloads-traffic.at b/tests/system-offloads-traffic.at
index d1da33d96..f4d0ccab5 100644
--- a/tests/system-offloads-traffic.at
+++ b/tests/system-offloads-traffic.at
@@ -933,3 +933,56 @@ OVS_WAIT_UNTIL([grep -q "Datapath does not support 
explicit drop action" ovs-vsw
 
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
+
+
+AT_SETUP([offloads - sample action])
+OVS_TRAFFIC_VSWITCHD_START([], [], [-- set Open_vSwitch . 
other_config:hw-offload=true])
+OVS_CHECK_PSAMPLE()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
+
+# Choosing numbers whose hex representation is the same for big and little 
endian.
+AT_DATA([flows.txt], [dnl
+arp action=NORMAL
+in_port=ovs-p0 
actions=sample(probability=65535,collector_set_id=1,obs_domain_id=1431655765,obs_point_id=1717986918),NORMAL
+in_port=ovs-p1 
actions=sample(probability=65535,collector_set_id=2,obs_domain_id=2290649224,obs_point_id=2576980377),NORMAL
+])
+
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+
+AT_CHECK([ovs-vsctl --id=@br get Bridge br0 \
+-- create FLow_Sample_Collector_Set bridge=@br id=1 
psample_group=10 \
+-- create FLow_Sample_Collector_Set bridge=@br id=2 
psample_group=12],
+ [0], [ignore])
+
+OVS_DAEMONIZE([ovs-psample > psample.out], [psample.pid])
+on_exit 'kill `cat psample.pid`'
+sleep 1
+
+NS_CHECK_EXEC([at_ns0], [ping -q -c 10 -i 0.1 -W 2 10.1.1.2 | FORMAT_PING], 
[0], [dnl
+10 packets transmitted, 10 received, 0% packet loss, time 0ms
+])
+
+AT_CHECK([ovs-appctl dpctl/dump-flows | grep "eth_type(0x0800)" | 
DUMP_CLEAN_SORTED], [0], [dnl
+in_port(2),eth(macs),eth_type(0x0800),ipv4(frag=no), packets:9, bytes:756, 
used:0.001s, actions:sample(sample=100.0%,group_id=10,cookie=),3
+in_port(3),eth(macs),eth_type(0x0800),ipv4(frag=no), packets:9, bytes:756, 
used:0.001s, actions:sample(sample=100.0%,group_id=12,cookie=),2
+])
+
+AT_CHECK([ovs-appctl dpctl/dump-flows type=ovs | grep "eth_type(0x0800)" | 
DUMP_CLEAN_SORTED], [0], [])
+
+AT_CHECK([ovs-appctl dpctl/dump-flows type=tc,offloaded | grep 
"eth_type(0x0800)" | DUMP_CLEAN_SORTED], [0], [dnl
+in_port(2),eth(macs),eth_type(0x0800),ipv4(frag=no), packets:9, bytes:756, 
used:0.001s, actions:sample(sample=100.0%,group_id=10,cookie=),3
+in_port(3),eth(macs),eth_type(0x0800),ipv4(frag=no), packets:9, bytes:756, 
used:0.001s, actions:sample(sample=100.0%,group_id=12,cookie=),2
+])
+
+AT_CHECK([ovs-appctl upcall/show | grep -E "offloaded flows : [[1-9]]"], [0], 
[ignore])
+
+AT_CHECK([grep -E "group_id=0xa obs_domain=0x,obs_point=0x 
.*icmp.*nw_src=10.1.1.1,nw_dst=10.1.1.2" psample.out >/dev/null])
+AT_CHECK([grep -E "group_id=0xc obs_domain=0x,obs_point=0x 
.*icmp.*nw_src=10.1.1.2,nw_dst=10.1.1.1" psample.out >/dev/null])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [RFC 08/11] netdev-offload-tc: Add sample support.

2024-04-24 Thread Adrian Moreno
Offload the sample action if it contains psample information by creating
a tc "sample" action with the user cookie inside the action's cookie.

Avoid using the "sample" action's cookie to store the ufid.

Signed-off-by: Adrian Moreno 
---
 include/linux/automake.mk|  5 +-
 include/linux/pkt_cls.h  |  2 +
 include/linux/tc_act/tc_sample.h | 26 ++
 lib/netdev-offload-tc.c  | 67 +
 lib/tc.c | 84 ++--
 lib/tc.h |  7 +++
 utilities/ovs-psample.c  |  8 +--
 7 files changed, 190 insertions(+), 9 deletions(-)
 create mode 100644 include/linux/tc_act/tc_sample.h

diff --git a/include/linux/automake.mk b/include/linux/automake.mk
index ac306b53c..c2a270152 100644
--- a/include/linux/automake.mk
+++ b/include/linux/automake.mk
@@ -5,9 +5,10 @@ noinst_HEADERS += \
include/linux/pkt_cls.h \
include/linux/psample.h \
include/linux/gen_stats.h \
+   include/linux/tc_act/tc_ct.h \
include/linux/tc_act/tc_mpls.h \
include/linux/tc_act/tc_pedit.h \
+   include/linux/tc_act/tc_sample.h \
include/linux/tc_act/tc_skbedit.h \
include/linux/tc_act/tc_tunnel_key.h \
-   include/linux/tc_act/tc_vlan.h \
-   include/linux/tc_act/tc_ct.h
+   include/linux/tc_act/tc_vlan.h
diff --git a/include/linux/pkt_cls.h b/include/linux/pkt_cls.h
index fb4a7ecea..c566ac1b5 100644
--- a/include/linux/pkt_cls.h
+++ b/include/linux/pkt_cls.h
@@ -8,6 +8,8 @@
 #include 
 #include 
 
+#define TC_COOKIE_MAX_SIZE 16
+
 /* Action attributes */
 enum {
TCA_ACT_UNSPEC,
diff --git a/include/linux/tc_act/tc_sample.h b/include/linux/tc_act/tc_sample.h
new file mode 100644
index 0..398f32761
--- /dev/null
+++ b/include/linux/tc_act/tc_sample.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef __LINUX_TC_SAMPLE_H
+#define __LINUX_TC_SAMPLE_H
+
+#include 
+#include 
+#include 
+
+struct tc_sample {
+   tc_gen;
+};
+
+enum {
+   TCA_SAMPLE_UNSPEC,
+   TCA_SAMPLE_TM,
+   TCA_SAMPLE_PARMS,
+   TCA_SAMPLE_RATE,
+   TCA_SAMPLE_TRUNC_SIZE,
+   TCA_SAMPLE_PSAMPLE_GROUP,
+   TCA_SAMPLE_PAD,
+   __TCA_SAMPLE_MAX
+};
+#define TCA_SAMPLE_MAX (__TCA_SAMPLE_MAX - 1)
+
+#endif
+
diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
index 36e814bee..0e7273431 100644
--- a/lib/netdev-offload-tc.c
+++ b/lib/netdev-offload-tc.c
@@ -1038,6 +1038,21 @@ parse_tc_flower_to_actions__(struct tc_flower *flower, 
struct ofpbuf *buf,
 nl_msg_end_nested(buf, offset);
 }
 break;
+case TC_ACT_SAMPLE: {
+size_t offset;
+
+offset = nl_msg_start_nested(buf, OVS_ACTION_ATTR_SAMPLE);
+nl_msg_put_u32(buf, OVS_SAMPLE_ATTR_PROBABILITY,
+   UINT32_MAX / action->sample.rate);
+nl_msg_put_u32(buf, OVS_SAMPLE_ATTR_PSAMPLE_GROUP,
+   action->sample.group_id);
+nl_msg_put_unspec(buf, OVS_SAMPLE_ATTR_PSAMPLE_COOKIE,
+  >sample.cookie[0],
+  action->sample.cookie_len);
+
+nl_msg_end_nested(buf, offset);
+}
+break;
 }
 
 if (action->jump_action && action->type != TC_ACT_POLICE_MTU) {
@@ -2054,6 +2069,53 @@ parse_check_pkt_len_action(struct netdev *netdev, struct 
tc_flower *flower,
 return 0;
 }
 
+static int
+parse_sample_action(struct tc_flower *flower, const struct nlattr *nl_act,
+struct tc_action *action)
+{
+/* Only offloadable if it's psample only. Use the policy to enforce it by
+ * making psample arguments mandatory and omitting actions. */
+static const struct nl_policy ovs_sample_policy[] = {
+[OVS_SAMPLE_ATTR_PROBABILITY] = { .type = NL_A_U32 },
+[OVS_SAMPLE_ATTR_PSAMPLE_GROUP] = { .type = NL_A_U32, },
+[OVS_SAMPLE_ATTR_PSAMPLE_COOKIE] = { .type = NL_A_UNSPEC,
+ .optional = true,
+ .max_len = TC_COOKIE_MAX_SIZE }
+};
+struct nlattr *a[ARRAY_SIZE(ovs_sample_policy)];
+uint32_t probability;
+
+if (!nl_parse_nested(nl_act, ovs_sample_policy, a, ARRAY_SIZE(a))) {
+return EOPNOTSUPP;
+}
+
+action->type = TC_ACT_SAMPLE;
+/* OVS probability and TC sampling rate have different semantics.
+ * The former represents the number of sampled packets out of UINT32_MAX
+ * while the other represents the ratio between observed and sampled
+ * packets. */
+probability = nl_attr_get_u32(a[OVS_SAMPLE_ATTR_PROBABILITY]);
+if (!probability) {
+return EINVAL;
+}
+action->sample.rate = UINT32_MAX / probability;
+
+action->sample.group_id =
+nl_attr_get_u32(a[OVS_SAMPLE_ATTR_PSAMPLE_GROUP]);
+
+if (a[OVS_SAMPLE_ATTR_PSAMPLE_COOKIE]) {
+

[ovs-dev] [RFC 10/11] ofproto-dpif-psample: Add command to show stats.

2024-04-24 Thread Adrian Moreno
Signed-off-by: Adrian Moreno 
---
 ofproto/ofproto-dpif-psample.c | 59 ++
 ofproto/ofproto-dpif-psample.h |  1 +
 ofproto/ofproto-dpif.c |  1 +
 3 files changed, 61 insertions(+)

diff --git a/ofproto/ofproto-dpif-psample.c b/ofproto/ofproto-dpif-psample.c
index ea4926eb2..2d73a4ded 100644
--- a/ofproto/ofproto-dpif-psample.c
+++ b/ofproto/ofproto-dpif-psample.c
@@ -20,9 +20,12 @@
 #include "dpif.h"
 #include "hash.h"
 #include "ofproto.h"
+#include "ofproto-dpif.h"
+#include "openvswitch/dynamic-string.h"
 #include "openvswitch/hmap.h"
 #include "openvswitch/thread.h"
 #include "openvswitch/vlog.h"
+#include "unixctl.h"
 
 VLOG_DEFINE_THIS_MODULE(psample);
 
@@ -205,3 +208,59 @@ dpif_psample_unref(struct dpif_psample *ps) 
OVS_EXCLUDED(mutex)
 dpif_psample_destroy(ps);
 }
 }
+
+/* Unix commands. */
+static void
+psample_unixctl_show(struct unixctl_conn *conn, int argc OVS_UNUSED,
+ const char *argv[] OVS_UNUSED, void *aux OVS_UNUSED)
+{
+struct psample_exporter_map_node *node;
+struct ds ds = DS_EMPTY_INITIALIZER;
+const struct ofproto_dpif *ofproto;
+bool first = true;
+
+ofproto = ofproto_dpif_lookup_by_name(argv[1]);
+if (!ofproto) {
+unixctl_command_reply_error(conn, "no such bridge");
+return;
+}
+
+if (!ofproto->psample) {
+unixctl_command_reply_error(conn, "no psample exporters configured");
+return;
+}
+
+ds_put_format(, "Psample statistics for bridge \"%s\":\n", argv[1]);
+
+ovs_mutex_lock();
+HMAP_FOR_EACH (node, node, >psample->exporters_map) {
+if (!first) {
+ds_put_cstr(, "\n");
+} else {
+first = false;
+}
+
+ds_put_format(, "- Collector Set ID: %"PRIu32"\n",
+node->exporter.collector_set_id);
+ds_put_format(, "  Psample Group ID: %"PRIu32"\n",
+node->exporter.group_id);
+ds_put_format(, "  Total number of bytes: %"PRIu64"\n",
+node->exporter.n_bytes);
+ds_put_format(, "  Total number of packets: %"PRIu64"\n",
+node->exporter.n_packets);
+}
+ovs_mutex_unlock();
+
+unixctl_command_reply(conn, ds_cstr());
+ds_destroy();
+}
+
+void dpif_psample_init(void)
+{
+static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
+if (ovsthread_once_start()) {
+unixctl_command_register("psample/show", "bridge", 1, 1,
+ psample_unixctl_show, NULL);
+ovsthread_once_done();
+}
+}
diff --git a/ofproto/ofproto-dpif-psample.h b/ofproto/ofproto-dpif-psample.h
index 763fbd30b..f264ad4c2 100644
--- a/ofproto/ofproto-dpif-psample.h
+++ b/ofproto/ofproto-dpif-psample.h
@@ -34,5 +34,6 @@ bool dpif_psample_get_group_id(struct dpif_psample *, 
uint32_t, uint32_t *);
 
 void dpif_psample_credit_stats(struct dpif_psample *, uint32_t,
const struct dpif_flow_stats *);
+void dpif_psample_init(void);
 
 #endif // OFPROTO_DPIF_PSAMPLE_H
diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index f1efdd482..ebb399307 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -287,6 +287,7 @@ init(const struct shash *iface_hints)
 ofproto_unixctl_init();
 ofproto_dpif_trace_init();
 udpif_init();
+dpif_psample_init();
 }
 
 static void
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [RFC 06/11] utilities: Add ovs-psample.

2024-04-24 Thread Adrian Moreno
This simple program reads from psample and prints the packets to stdout.
It's useful for quickly collecting sampled packets.

Signed-off-by: Adrian Moreno 
---
 Documentation/automake.mk   |   1 +
 Documentation/conf.py   |   2 +
 Documentation/ref/index.rst |   1 +
 Documentation/ref/ovs-psample.8.rst |  47 
 include/linux/automake.mk   |   1 +
 include/linux/psample.h |  64 ++
 rhel/openvswitch-fedora.spec.in |   2 +
 rhel/openvswitch.spec.in|   1 +
 utilities/automake.mk   |   9 +
 utilities/ovs-psample.c | 330 
 10 files changed, 458 insertions(+)
 create mode 100644 Documentation/ref/ovs-psample.8.rst
 create mode 100644 include/linux/psample.h
 create mode 100644 utilities/ovs-psample.c

diff --git a/Documentation/automake.mk b/Documentation/automake.mk
index 47d2e336a..c22facfd6 100644
--- a/Documentation/automake.mk
+++ b/Documentation/automake.mk
@@ -165,6 +165,7 @@ RST_MANPAGES = \
ovs-l3ping.8.rst \
ovs-parse-backtrace.8.rst \
ovs-pki.8.rst \
+   ovs-psample.8.rst \
ovs-tcpdump.8.rst \
ovs-tcpundump.1.rst \
ovs-test.8.rst \
diff --git a/Documentation/conf.py b/Documentation/conf.py
index 15785605a..75efed2fc 100644
--- a/Documentation/conf.py
+++ b/Documentation/conf.py
@@ -134,6 +134,8 @@ _man_pages = [
  u'convert "tcpdump -xx" output to hex strings'),
 ('ovs-test.8',
  u'Check Linux drivers for performance, vlan and L3 tunneling problems'),
+('ovs-psample.8',
+ u'Print packets sampled by psample'),
 ('ovs-vlan-test.8',
  u'Check Linux drivers for problems with vlan traffic'),
 ('ovsdb-server.7',
diff --git a/Documentation/ref/index.rst b/Documentation/ref/index.rst
index 03ada932f..d1076435a 100644
--- a/Documentation/ref/index.rst
+++ b/Documentation/ref/index.rst
@@ -46,6 +46,7 @@ time:
ovs-pki.8
ovs-sim.1
ovs-parse-backtrace.8
+   ovs-psample.8
ovs-tcpdump.8
ovs-tcpundump.1
ovs-test.8
diff --git a/Documentation/ref/ovs-psample.8.rst 
b/Documentation/ref/ovs-psample.8.rst
new file mode 100644
index 0..c849c83d8
--- /dev/null
+++ b/Documentation/ref/ovs-psample.8.rst
@@ -0,0 +1,47 @@
+==
+ovs-sample
+==
+
+Synopsis
+
+
+``ovs-sample``
+[``--group=`` | ``-g`` ]
+
+``ovs-sample --help``
+
+``ovs-sample --version``
+
+Description
+===
+
+Open vSwitch per-flow sampling can be configured to emit the samples
+through the ``psample`` netlink multicast group.
+
+Such sampled traffic contains, apart from the packet, some metadata that
+gives further information about the packet sample. More specifically, OVS
+inserts the ``observation_domain_id`` and the ``observation_point_id`` that
+where provided in the sample action (see ``ovs-actions(7)``).
+
+the ``ovs-sample`` program provides a simple way of joining the psample
+multicast group and printing the sampled packets.
+
+
+Options
+===
+
+.. option:: ``-g``  or ``--group`` 
+
+  Tells ``ovs-sample`` to filter out samples that don't belong to that group.
+
+  Different ``Flow_Sample_Collector_Set`` entries can be configured with
+  different ``group_id`` values (see ``ovs-vswitchd.conf.db(5)``). This option
+  helps focusing the output on the relevant samples.
+
+.. option:: -h, --help
+
+Prints a brief help message to the console.
+
+.. option:: -V, --version
+
+Prints version information to the console.
diff --git a/include/linux/automake.mk b/include/linux/automake.mk
index cdae5eedc..ac306b53c 100644
--- a/include/linux/automake.mk
+++ b/include/linux/automake.mk
@@ -3,6 +3,7 @@ noinst_HEADERS += \
include/linux/netfilter/nf_conntrack_sctp.h \
include/linux/openvswitch.h \
include/linux/pkt_cls.h \
+   include/linux/psample.h \
include/linux/gen_stats.h \
include/linux/tc_act/tc_mpls.h \
include/linux/tc_act/tc_pedit.h \
diff --git a/include/linux/psample.h b/include/linux/psample.h
new file mode 100644
index 0..eb642f875
--- /dev/null
+++ b/include/linux/psample.h
@@ -0,0 +1,64 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef __LINUX_PSAMPLE_H
+#define __LINUX_PSAMPLE_H
+
+enum {
+   PSAMPLE_ATTR_IIFINDEX,
+   PSAMPLE_ATTR_OIFINDEX,
+   PSAMPLE_ATTR_ORIGSIZE,
+   PSAMPLE_ATTR_SAMPLE_GROUP,
+   PSAMPLE_ATTR_GROUP_SEQ,
+   PSAMPLE_ATTR_SAMPLE_RATE,
+   PSAMPLE_ATTR_DATA,
+   PSAMPLE_ATTR_GROUP_REFCOUNT,
+   PSAMPLE_ATTR_TUNNEL,
+
+   PSAMPLE_ATTR_PAD,
+   PSAMPLE_ATTR_OUT_TC,/* u16 */
+   PSAMPLE_ATTR_OUT_TC_OCC,/* u64, bytes */
+   PSAMPLE_ATTR_LATENCY,   /* u64, nanoseconds */
+   PSAMPLE_ATTR_TIMESTAMP, /* u64, nanoseconds */
+   PSAMPLE_ATTR_PROTO, /* u16 */
+   PSAMPLE_ATTR_USER_COOKIE,
+
+   __PSAMPLE_ATTR_MAX
+};
+
+enum psample_command {
+   

[ovs-dev] [RFC 09/11] ofproto-dpif-xlate-cache: Add psample to xcache.

2024-04-24 Thread Adrian Moreno
Add a cache entry type for psample objects.
Store both the dpif_psample reference and the collector_set_id so we can
quickly find the particular exporter.

Using that mechanism, account for packet and byte statistics.

Signed-off-by: Adrian Moreno 
---
 ofproto/ofproto-dpif-psample.c | 20 
 ofproto/ofproto-dpif-psample.h |  4 
 ofproto/ofproto-dpif-xlate-cache.c | 11 ++-
 ofproto/ofproto-dpif-xlate-cache.h |  6 ++
 ofproto/ofproto-dpif-xlate.c   |  9 +
 ofproto/ofproto-dpif.c |  1 +
 6 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/ofproto/ofproto-dpif-psample.c b/ofproto/ofproto-dpif-psample.c
index 1e4f4bf48..ea4926eb2 100644
--- a/ofproto/ofproto-dpif-psample.c
+++ b/ofproto/ofproto-dpif-psample.c
@@ -17,6 +17,7 @@
 #include 
 #include "ofproto-dpif-psample.h"
 
+#include "dpif.h"
 #include "hash.h"
 #include "ofproto.h"
 #include "openvswitch/hmap.h"
@@ -30,6 +31,8 @@ static struct ovs_mutex mutex = OVS_MUTEX_INITIALIZER;
 struct psample_exporter {
 uint32_t group_id;
 uint32_t collector_set_id;
+uint64_t n_packets;
+uint64_t n_bytes;
 };
 
 struct psample_exporter_map_node {
@@ -145,6 +148,23 @@ dpif_psample_get_group_id(struct dpif_psample *ps, 
uint32_t collector_set_id,
 return found;
 }
 
+void
+dpif_psample_credit_stats(struct dpif_psample *ps, uint32_t collector_set_id,
+  const struct dpif_flow_stats *stats)
+OVS_EXCLUDED(mutex)
+{
+struct psample_exporter_map_node *node;
+
+ovs_mutex_lock();
+node = dpif_psample_find_exporter_node(ps, collector_set_id);
+if (node) {
+node->exporter.n_packets += stats->n_packets;
+node->exporter.n_bytes += stats->n_bytes;
+}
+ovs_mutex_unlock();
+}
+
+
 /* Creation and destruction. */
 struct dpif_psample *
 dpif_psample_create(void)
diff --git a/ofproto/ofproto-dpif-psample.h b/ofproto/ofproto-dpif-psample.h
index b9f2584af..763fbd30b 100644
--- a/ofproto/ofproto-dpif-psample.h
+++ b/ofproto/ofproto-dpif-psample.h
@@ -20,6 +20,7 @@
 #include 
 #include 
 
+struct dpif_flow_stats;
 struct dpif_psample;
 struct ovs_list;
 
@@ -31,4 +32,7 @@ bool dpif_psample_set_options(struct dpif_psample *, const 
struct ovs_list *);
 
 bool dpif_psample_get_group_id(struct dpif_psample *, uint32_t, uint32_t *);
 
+void dpif_psample_credit_stats(struct dpif_psample *, uint32_t,
+   const struct dpif_flow_stats *);
+
 #endif // OFPROTO_DPIF_PSAMPLE_H
diff --git a/ofproto/ofproto-dpif-xlate-cache.c 
b/ofproto/ofproto-dpif-xlate-cache.c
index 2e1fcb3a6..0fe76e5fa 100644
--- a/ofproto/ofproto-dpif-xlate-cache.c
+++ b/ofproto/ofproto-dpif-xlate-cache.c
@@ -35,9 +35,10 @@
 #include "learn.h"
 #include "mac-learning.h"
 #include "netdev-vport.h"
+#include "ofproto/ofproto-dpif.h"
 #include "ofproto/ofproto-dpif-mirror.h"
+#include "ofproto/ofproto-dpif-psample.h"
 #include "ofproto/ofproto-dpif-xlate.h"
-#include "ofproto/ofproto-dpif.h"
 #include "ofproto/ofproto-provider.h"
 #include "openvswitch/dynamic-string.h"
 #include "openvswitch/vlog.h"
@@ -162,6 +163,11 @@ xlate_push_stats_entry(struct xc_entry *entry,
 }
 
 break;
+case XC_PSAMPLE:
+dpif_psample_credit_stats(entry->psample.psample,
+  entry->psample.collector_set_id,
+  stats);
+break;
 default:
 OVS_NOT_REACHED();
 }
@@ -245,6 +251,9 @@ xlate_cache_clear_entry(struct xc_entry *entry)
 break;
 case XC_TUNNEL_HEADER:
 break;
+case XC_PSAMPLE:
+dpif_psample_unref(entry->psample.psample);
+break;
 default:
 OVS_NOT_REACHED();
 }
diff --git a/ofproto/ofproto-dpif-xlate-cache.h 
b/ofproto/ofproto-dpif-xlate-cache.h
index 0fc6d2ea6..fa707889d 100644
--- a/ofproto/ofproto-dpif-xlate-cache.h
+++ b/ofproto/ofproto-dpif-xlate-cache.h
@@ -29,6 +29,7 @@
 struct bfd;
 struct bond;
 struct dpif_flow_stats;
+struct dpif_psample;
 struct flow;
 struct group_dpif;
 struct mbridge;
@@ -53,6 +54,7 @@ enum xc_type {
 XC_GROUP,
 XC_TNL_NEIGH,
 XC_TUNNEL_HEADER,
+XC_PSAMPLE,
 };
 
 /* xlate_cache entries hold enough information to perform the side effects of
@@ -126,6 +128,10 @@ struct xc_entry {
 } operation;
 uint16_t hdr_size;
 } tunnel_hdr;
+struct {
+struct dpif_psample *psample;
+uint32_t collector_set_id;
+} psample;
 };
 };
 
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 1dcf86856..a9856e358 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -5989,6 +5989,15 @@ xlate_sample_action(struct xlate_ctx *ctx,
sizeof(os->obs_domain_id));
 ofpbuf_put(_args->cookie, >obs_point_id,
sizeof(os->obs_point_id));
+
+if (ctx->xin->xcache) {
+struct 

[ovs-dev] [RFC 03/11] ofproto: Add ofproto-dpif-psample.

2024-04-24 Thread Adrian Moreno
Add a new resource in ofproto-dpif and the corresponding API in
ofproto_provider.h to represent and change psample configuration.

Signed-off-by: Adrian Moreno 
---
 ofproto/automake.mk|   2 +
 ofproto/ofproto-dpif-psample.c | 167 +
 ofproto/ofproto-dpif-psample.h |  31 ++
 ofproto/ofproto-dpif.c |  33 +++
 ofproto/ofproto-dpif.h |   1 +
 ofproto/ofproto-provider.h |   9 ++
 ofproto/ofproto.c  |  10 ++
 ofproto/ofproto.h  |   8 ++
 8 files changed, 261 insertions(+)
 create mode 100644 ofproto/ofproto-dpif-psample.c
 create mode 100644 ofproto/ofproto-dpif-psample.h

diff --git a/ofproto/automake.mk b/ofproto/automake.mk
index 7c08b563b..340003e12 100644
--- a/ofproto/automake.mk
+++ b/ofproto/automake.mk
@@ -34,6 +34,8 @@ ofproto_libofproto_la_SOURCES = \
ofproto/ofproto-dpif-mirror.h \
ofproto/ofproto-dpif-monitor.c \
ofproto/ofproto-dpif-monitor.h \
+   ofproto/ofproto-dpif-psample.c \
+   ofproto/ofproto-dpif-psample.h \
ofproto/ofproto-dpif-rid.c \
ofproto/ofproto-dpif-rid.h \
ofproto/ofproto-dpif-sflow.c \
diff --git a/ofproto/ofproto-dpif-psample.c b/ofproto/ofproto-dpif-psample.c
new file mode 100644
index 0..a83530ed8
--- /dev/null
+++ b/ofproto/ofproto-dpif-psample.c
@@ -0,0 +1,167 @@
+/*
+ * Copyright (c) 2024 Red Hat, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+#include "ofproto-dpif-psample.h"
+
+#include "hash.h"
+#include "ofproto.h"
+#include "openvswitch/hmap.h"
+#include "openvswitch/thread.h"
+#include "openvswitch/vlog.h"
+
+VLOG_DEFINE_THIS_MODULE(psample);
+
+static struct ovs_mutex mutex = OVS_MUTEX_INITIALIZER;
+
+struct psample_exporter {
+uint32_t group_id;
+uint32_t collector_set_id;
+};
+
+struct psample_exporter_map_node {
+struct hmap_node node;
+struct psample_exporter exporter;
+};
+
+struct dpif_psample {
+struct hmap exporters_map; /* Contains psample_exporter_map_node. */
+
+struct ovs_refcount ref_cnt;
+};
+
+/* Exporter handling */
+static void
+dpif_psample_clear(struct dpif_psample *ps) OVS_REQUIRES(mutex)
+{
+struct psample_exporter_map_node *node;
+
+HMAP_FOR_EACH_POP (node, node, >exporters_map) {
+free(node);
+}
+}
+
+static struct psample_exporter_map_node*
+dpif_psample_new_exporter_node(struct dpif_psample *ps,
+   const struct ofproto_psample_options *options)
+OVS_REQUIRES(mutex)
+{
+struct psample_exporter_map_node *node;
+node = xzalloc(sizeof *node);
+node->exporter.collector_set_id = options->collector_set_id;
+node->exporter.group_id = options->group_id;
+hmap_insert(>exporters_map, >node,
+hash_int(options->collector_set_id, 0));
+return node;
+}
+
+static struct psample_exporter_map_node*
+dpif_psample_find_exporter_node(const struct dpif_psample *ps,
+const uint32_t collector_set_id)
+OVS_REQUIRES(mutex)
+{
+struct psample_exporter_map_node *node;
+HMAP_FOR_EACH_WITH_HASH (node, node,
+ hash_int(collector_set_id, 0),
+ >exporters_map) {
+if (node->exporter.collector_set_id == collector_set_id) {
+return node;
+}
+}
+return NULL;
+}
+
+/* Configuration. */
+
+/* Sets the psample configuration.
+ * Returns true if the configuration has changed. */
+bool
+dpif_psample_set_options(struct dpif_psample *ps,
+ const struct ovs_list *options_list)
+OVS_EXCLUDED(mutex)
+{
+struct ofproto_psample_options *options;
+struct psample_exporter_map_node *node;
+bool changed = false;
+
+ovs_mutex_lock();
+
+/* psample exporters do not hold any runtime memory so we do not need to
+ * be extra careful at detecting which exporter changed and which did
+ * not. As soon as we detect any change we can just recreate them all. */
+LIST_FOR_EACH(options, list_node, options_list) {
+node = dpif_psample_find_exporter_node(ps, options->collector_set_id);
+if (!node ||
+node->exporter.collector_set_id != options->collector_set_id ||
+node->exporter.group_id != options->group_id) {
+changed = true;
+break;
+}
+}
+changed |= (hmap_count(>exporters_map) != ovs_list_size(options_list));
+
+if 

[ovs-dev] [RFC 05/11] ofproto_dpif_xlate: Use psample for OFP samples.

2024-04-24 Thread Adrian Moreno
When a OFP_SAMPLE action is xlated and a dpif_psample object has been
configured (via Flow_Sample_Collector_Set table) with the same
collector_set_id, add psample information to the odp sample action.

Signed-off-by: Adrian Moreno 
---
 ofproto/ofproto-dpif-psample.c |  20 +
 ofproto/ofproto-dpif-psample.h |   3 +
 ofproto/ofproto-dpif-xlate.c   | 149 +
 ofproto/ofproto-dpif-xlate.h   |   3 +-
 ofproto/ofproto-dpif.c |   2 +-
 5 files changed, 138 insertions(+), 39 deletions(-)

diff --git a/ofproto/ofproto-dpif-psample.c b/ofproto/ofproto-dpif-psample.c
index a83530ed8..1e4f4bf48 100644
--- a/ofproto/ofproto-dpif-psample.c
+++ b/ofproto/ofproto-dpif-psample.c
@@ -125,6 +125,26 @@ OVS_EXCLUDED(mutex)
 return changed;
 }
 
+/* Returns the group_id of the exporter with the given collector_set_id, if it
+ * exists. */
+bool
+dpif_psample_get_group_id(struct dpif_psample *ps, uint32_t collector_set_id,
+  uint32_t *group_id) OVS_EXCLUDED(mutex)
+{
+
+struct psample_exporter_map_node *node;
+bool found = false;
+
+ovs_mutex_lock();
+node = dpif_psample_find_exporter_node(ps, collector_set_id);
+if (node) {
+found = true;
+*group_id = node->exporter.group_id;
+}
+ovs_mutex_unlock();
+return found;
+}
+
 /* Creation and destruction. */
 struct dpif_psample *
 dpif_psample_create(void)
diff --git a/ofproto/ofproto-dpif-psample.h b/ofproto/ofproto-dpif-psample.h
index 80ba44fb9..b9f2584af 100644
--- a/ofproto/ofproto-dpif-psample.h
+++ b/ofproto/ofproto-dpif-psample.h
@@ -18,6 +18,7 @@
 #define OFPROTO_DPIF_PSAMPLE_H 1
 
 #include 
+#include 
 
 struct dpif_psample;
 struct ovs_list;
@@ -28,4 +29,6 @@ struct dpif_psample* dpif_psample_ref(const struct 
dpif_psample *);
 
 bool dpif_psample_set_options(struct dpif_psample *, const struct ovs_list *);
 
+bool dpif_psample_get_group_id(struct dpif_psample *, uint32_t, uint32_t *);
+
 #endif // OFPROTO_DPIF_PSAMPLE_H
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 7c4950895..1dcf86856 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -47,6 +47,7 @@
 #include "ofproto/ofproto-dpif-ipfix.h"
 #include "ofproto/ofproto-dpif-mirror.h"
 #include "ofproto/ofproto-dpif-monitor.h"
+#include "ofproto/ofproto-dpif-psample.h"
 #include "ofproto/ofproto-dpif-sflow.h"
 #include "ofproto/ofproto-dpif-trace.h"
 #include "ofproto/ofproto-dpif-xlate-cache.h"
@@ -117,6 +118,7 @@ struct xbridge {
 struct dpif_sflow *sflow; /* SFlow handle, or null. */
 struct dpif_ipfix *ipfix; /* Ipfix handle, or null. */
 struct netflow *netflow;  /* Netflow handle, or null. */
+struct dpif_psample *psample; /* Psample handle, or null. */
 struct stp *stp;  /* STP or null if disabled. */
 struct rstp *rstp;/* RSTP or null if disabled. */
 
@@ -687,6 +689,7 @@ static void xlate_xbridge_set(struct xbridge *, struct dpif 
*,
   const struct dpif_sflow *,
   const struct dpif_ipfix *,
   const struct netflow *,
+  const struct dpif_psample *,
   bool forward_bpdu, bool has_in_band,
   const struct dpif_backer_support *,
   const struct xbridge_addr *);
@@ -1070,6 +1073,7 @@ xlate_xbridge_set(struct xbridge *xbridge,
   const struct dpif_sflow *sflow,
   const struct dpif_ipfix *ipfix,
   const struct netflow *netflow,
+  const struct dpif_psample *psample,
   bool forward_bpdu, bool has_in_band,
   const struct dpif_backer_support *support,
   const struct xbridge_addr *addr)
@@ -1099,6 +1103,11 @@ xlate_xbridge_set(struct xbridge *xbridge,
 xbridge->ipfix = dpif_ipfix_ref(ipfix);
 }
 
+if (xbridge->psample != psample) {
+dpif_psample_unref(xbridge->psample);
+xbridge->psample = dpif_psample_ref(psample);
+}
+
 if (xbridge->stp != stp) {
 stp_unref(xbridge->stp);
 xbridge->stp = stp_ref(stp);
@@ -1214,8 +1223,9 @@ xlate_xbridge_copy(struct xbridge *xbridge)
   xbridge->dpif, xbridge->ml, xbridge->stp,
   xbridge->rstp, xbridge->ms, xbridge->mbridge,
   xbridge->sflow, xbridge->ipfix, xbridge->netflow,
-  xbridge->forward_bpdu, xbridge->has_in_band,
-  >support, xbridge->addr);
+  xbridge->psample, xbridge->forward_bpdu,
+  xbridge->has_in_band, >support,
+  xbridge->addr);
 LIST_FOR_EACH (xbundle, list_node, >xbundles) {
 xlate_xbundle_copy(new_xbridge, xbundle);
 }
@@ -1373,6 +1383,7 @@ xlate_ofproto_set(struct ofproto_dpif *ofproto, const 

[ovs-dev] [RFC 07/11] netlink-offload-tc: Rename act_cookie->flow_cookie.

2024-04-24 Thread Adrian Moreno
In preparation to allowing certain actions to have a cookie that does
not represent the entire flow, rename flower->act_cookie to
flower->flow_cookie.

This patch does not introduce any behavioral change, it's just a
variable renaming.

Signed-off-by: Adrian Moreno 
---
 lib/netdev-offload-tc.c |  8 
 lib/tc.c| 14 +++---
 lib/tc.h|  2 +-
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
index 921d52317..36e814bee 100644
--- a/lib/netdev-offload-tc.c
+++ b/lib/netdev-offload-tc.c
@@ -1322,8 +1322,8 @@ netdev_tc_flow_dump_next(struct netdev_flow_dump *dump,
 continue;
 }
 
-if (flower.act_cookie.len >= sizeof *ufid) {
-*ufid = get_32aligned_u128(flower.act_cookie.data);
+if (flower.flow_cookie.len >= sizeof *ufid) {
+*ufid = get_32aligned_u128(flower.flow_cookie.data);
 } else if (!find_ufid(netdev, , ufid)) {
 continue;
 }
@@ -2537,8 +2537,8 @@ netdev_tc_flow_put(struct netdev *netdev, struct match 
*match,
 return ENOSPC;
 }
 
-flower.act_cookie.data = ufid;
-flower.act_cookie.len = sizeof *ufid;
+flower.flow_cookie.data = ufid;
+flower.flow_cookie.len = sizeof *ufid;
 
 block_id = get_block_id_from_netdev(netdev);
 id = tc_make_tcf_id_chain(ifindex, block_id, chain, prio, hook);
diff --git a/lib/tc.c b/lib/tc.c
index e9bcae4e4..7176991c3 100644
--- a/lib/tc.c
+++ b/lib/tc.c
@@ -2088,8 +2088,8 @@ nl_parse_single_action(struct nlattr *action, struct 
tc_flower *flower,
 }
 
 if (act_cookie) {
-flower->act_cookie.data = nl_attr_get(act_cookie);
-flower->act_cookie.len = nl_attr_get_size(act_cookie);
+flower->flow_cookie.data = nl_attr_get(act_cookie);
+flower->flow_cookie.len = nl_attr_get_size(act_cookie);
 }
 
 return nl_parse_action_stats(action_attrs[TCA_ACT_STATS],
@@ -3458,7 +3458,7 @@ nl_msg_put_flower_acts(struct ofpbuf *request, struct 
tc_flower *flower)
   TCA_EGRESS_MIRROR);
 }
 }
-nl_msg_put_act_cookie(request, >act_cookie);
+nl_msg_put_act_cookie(request, >flow_cookie);
 nl_msg_put_act_flags(request);
 nl_msg_end_nested(request, act_offset);
 }
@@ -3476,14 +3476,14 @@ nl_msg_put_flower_acts(struct ofpbuf *request, struct 
tc_flower *flower)
 
 act_offset = nl_msg_start_nested(request, act_index++);
 nl_msg_put_act_gact(request, action->chain);
-nl_msg_put_act_cookie(request, >act_cookie);
+nl_msg_put_act_cookie(request, >flow_cookie);
 nl_msg_end_nested(request, act_offset);
 }
 break;
 case TC_ACT_CT: {
 act_offset = nl_msg_start_nested(request, act_index++);
 nl_msg_put_act_ct(request, action, action_pc);
-nl_msg_put_act_cookie(request, >act_cookie);
+nl_msg_put_act_cookie(request, >flow_cookie);
 nl_msg_end_nested(request, act_offset);
 }
 break;
@@ -3501,7 +3501,7 @@ nl_msg_put_flower_acts(struct ofpbuf *request, struct 
tc_flower *flower)
   released)) {
 return -EOPNOTSUPP;
 }
-nl_msg_put_act_cookie(request, >act_cookie);
+nl_msg_put_act_cookie(request, >flow_cookie);
 nl_msg_put_act_flags(request);
 nl_msg_end_nested(request, act_offset);
 }
@@ -3515,7 +3515,7 @@ nl_msg_put_flower_acts(struct ofpbuf *request, struct 
tc_flower *flower)
 if (!flower->action_count) {
 act_offset = nl_msg_start_nested(request, act_index++);
 nl_msg_put_act_gact(request, 0);
-nl_msg_put_act_cookie(request, >act_cookie);
+nl_msg_put_act_cookie(request, >flow_cookie);
 nl_msg_put_act_flags(request);
 nl_msg_end_nested(request, act_offset);
 }
diff --git a/lib/tc.h b/lib/tc.h
index fdbcf4b7c..40ea3d816 100644
--- a/lib/tc.h
+++ b/lib/tc.h
@@ -386,7 +386,7 @@ struct tc_flower {
 
 bool tunnel;
 
-struct tc_cookie act_cookie;
+struct tc_cookie flow_cookie; /* Cookie to help identify the flow. */
 
 bool needs_full_ip_proto_mask;
 
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [RFC 01/11] odp-util: Add support to new psample uAPI.

2024-04-24 Thread Adrian Moreno
The new odp sample attributes allow userspace to specify a group_id and
user-defined cookie to be passed down to psample.

Add support for parsing and formatting such action.

Signed-off-by: Adrian Moreno 
---
 include/linux/openvswitch.h  |  49 +---
 lib/odp-execute.c|   3 +
 lib/odp-util.c   | 150 ++-
 ofproto/ofproto-dpif-ipfix.c |   2 +
 python/ovs/flow/odp.py   |   2 +
 tests/odp.at |   5 ++
 6 files changed, 163 insertions(+), 48 deletions(-)

diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
index d9fb991ef..3e748405c 100644
--- a/include/linux/openvswitch.h
+++ b/include/linux/openvswitch.h
@@ -696,6 +696,7 @@ enum ovs_flow_attr {
 #define OVS_UFID_F_OMIT_MASK (1 << 1)
 #define OVS_UFID_F_OMIT_ACTIONS  (1 << 2)
 
+#define OVS_PSAMPLE_COOKIE_MAX_SIZE 16
 /**
  * enum ovs_sample_attr - Attributes for %OVS_ACTION_ATTR_SAMPLE action.
  * @OVS_SAMPLE_ATTR_PROBABILITY: 32-bit fraction of packets to sample with
@@ -703,15 +704,27 @@ enum ovs_flow_attr {
  * %UINT32_MAX samples all packets and intermediate values sample intermediate
  * fractions of packets.
  * @OVS_SAMPLE_ATTR_ACTIONS: Set of actions to execute in sampling event.
- * Actions are passed as nested attributes.
+ * Actions are passed as nested attributes. Optional if
+ * OVS_SAMPLE_ATTR_PSAMPLE_GROUP is set.
+ * @OVS_SAMPLE_ATTR_PSAMPLE_GROUP: A 32-bit number to be used as psample group.
+ * Optional if OVS_SAMPLE_ATTR_ACTIONS is set.
+ * @OVS_SAMPLE_ATTR_PSAMPLE_COOKIE: A variable-length binary cookie that, if
+ * provided, will be copied to the psample cookie.
  *
- * Executes the specified actions with the given probability on a per-packet
- * basis.
+ * Either OVS_SAMPLE_ATTR_PSAMPLE_GROUP or OVS_SAMPLE_ATTR_ACTIONS must be
+ * specified.
+ *
+ * Executes the specified actions and/or sends the packet to psample
+ * with the given probability on a per-packet basis.
  */
 enum ovs_sample_attr {
OVS_SAMPLE_ATTR_UNSPEC,
-   OVS_SAMPLE_ATTR_PROBABILITY, /* u32 number */
-   OVS_SAMPLE_ATTR_ACTIONS, /* Nested OVS_ACTION_ATTR_* attributes. */
+   OVS_SAMPLE_ATTR_PROBABILITY,/* u32 number */
+   OVS_SAMPLE_ATTR_ACTIONS,/* Nested OVS_ACTION_ATTR_
+* attributes.
+*/
+   OVS_SAMPLE_ATTR_PSAMPLE_GROUP,  /* u32 number */
+   OVS_SAMPLE_ATTR_PSAMPLE_COOKIE, /* binary */
__OVS_SAMPLE_ATTR_MAX,
 
 #ifdef __KERNEL__
@@ -722,13 +735,27 @@ enum ovs_sample_attr {
 #define OVS_SAMPLE_ATTR_MAX (__OVS_SAMPLE_ATTR_MAX - 1)
 
 #ifdef __KERNEL__
+
+/* Definition for flags in struct sample_arg. */
+enum {
+   /* When actions in sample will not change the flows. */
+   OVS_SAMPLE_ARG_FLAG_EXEC = 1 << 0,
+   /* When set, the packet will be sent to psample. */
+   OVS_SAMPLE_ARG_FLAG_PSAMPLE = 1 << 1,
+};
+
 struct sample_arg {
-   bool exec;   /* When true, actions in sample will not
- * change flow keys. False otherwise.
- */
-   u32  probability;/* Same value as
- * 'OVS_SAMPLE_ATTR_PROBABILITY'.
- */
+   u16 flags; /* Flags that modify the behavior of the
+   * action. See SAMPLE_ARG_FLAG_*
+   */
+   u32  probability;   /* Same value as
+* 'OVS_SAMPLE_ATTR_PROBABILITY'.
+*/
+   u32  group_id;  /* Same value as
+* 'OVS_SAMPLE_ATTR_PSAMPLE_GROUP'.
+*/
+   u8  cookie_len; /* Length of psample cookie */
+   char cookie[OVS_PSAMPLE_COOKIE_MAX_SIZE]; /* psample cookie data. */
 };
 #endif
 
diff --git a/lib/odp-execute.c b/lib/odp-execute.c
index 081e4d432..d8ee93429 100644
--- a/lib/odp-execute.c
+++ b/lib/odp-execute.c
@@ -716,6 +716,9 @@ odp_execute_sample(void *dp, struct dp_packet *packet, bool 
steal,
 subactions = a;
 break;
 
+/* Ignored in userspace datapath. */
+case OVS_SAMPLE_ATTR_PSAMPLE_GROUP:
+case OVS_SAMPLE_ATTR_PSAMPLE_COOKIE:
 case OVS_SAMPLE_ATTR_UNSPEC:
 case __OVS_SAMPLE_ATTR_MAX:
 default:
diff --git a/lib/odp-util.c b/lib/odp-util.c
index 21f34d955..611b5229e 100644
--- a/lib/odp-util.c
+++ b/lib/odp-util.c
@@ -227,12 +227,16 @@ format_odp_sample_action(struct ds *ds, const struct 
nlattr *attr,
 {
 static const struct nl_policy ovs_sample_policy[] = {
 [OVS_SAMPLE_ATTR_PROBABILITY] = { .type = NL_A_U32 },
-[OVS_SAMPLE_ATTR_ACTIONS] = { .type = NL_A_NESTED }
+[OVS_SAMPLE_ATTR_ACTIONS] = { .type = NL_A_NESTED,
+  .optional = true },
+

[ovs-dev] [RFC 04/11] vswitchd: Add psample to schema and configure it.

2024-04-24 Thread Adrian Moreno
Add a psample_group field to the Flow Sample Collector Set table and use
it to configure the psample ofproto layer.

Signed-off-by: Adrian Moreno 
---
 vswitchd/bridge.c  | 54 ++
 vswitchd/vswitch.ovsschema |  7 -
 vswitchd/vswitch.xml   | 32 +++---
 3 files changed, 83 insertions(+), 10 deletions(-)

diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index 95a65fcdc..474eb1650 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -288,6 +288,7 @@ static void bridge_configure_mac_table(struct bridge *);
 static void bridge_configure_mcast_snooping(struct bridge *);
 static void bridge_configure_sflow(struct bridge *, int *sflow_bridge_number);
 static void bridge_configure_ipfix(struct bridge *);
+static void bridge_configure_psample(struct bridge *);
 static void bridge_configure_spanning_tree(struct bridge *);
 static void bridge_configure_tables(struct bridge *);
 static void bridge_configure_dp_desc(struct bridge *);
@@ -989,6 +990,7 @@ bridge_reconfigure(const struct ovsrec_open_vswitch 
*ovs_cfg)
 bridge_configure_netflow(br);
 bridge_configure_sflow(br, _bridge_number);
 bridge_configure_ipfix(br);
+bridge_configure_psample(br);
 bridge_configure_spanning_tree(br);
 bridge_configure_tables(br);
 bridge_configure_dp_desc(br);
@@ -1537,10 +1539,11 @@ ovsrec_ipfix_is_valid(const struct ovsrec_ipfix *ipfix)
 return ipfix && ipfix->n_targets > 0;
 }
 
-/* Returns whether a Flow_Sample_Collector_Set row is valid. */
+/* Returns whether a Flow_Sample_Collector_Set row constains valid IPFIX
+ * configuration. */
 static bool
-ovsrec_fscs_is_valid(const struct ovsrec_flow_sample_collector_set *fscs,
- const struct bridge *br)
+ovsrec_fscs_is_valid_ipfix(const struct ovsrec_flow_sample_collector_set *fscs,
+   const struct bridge *br)
 {
 return ovsrec_ipfix_is_valid(fscs->ipfix) && fscs->bridge == br->cfg;
 }
@@ -1558,7 +1561,7 @@ bridge_configure_ipfix(struct bridge *br)
 const char *virtual_obs_id;
 
 OVSREC_FLOW_SAMPLE_COLLECTOR_SET_FOR_EACH(fe_cfg, idl) {
-if (ovsrec_fscs_is_valid(fe_cfg, br)) {
+if (ovsrec_fscs_is_valid_ipfix(fe_cfg, br)) {
 n_fe_opts++;
 }
 }
@@ -1621,7 +1624,7 @@ bridge_configure_ipfix(struct bridge *br)
 fe_opts = xcalloc(n_fe_opts, sizeof *fe_opts);
 opts = fe_opts;
 OVSREC_FLOW_SAMPLE_COLLECTOR_SET_FOR_EACH(fe_cfg, idl) {
-if (ovsrec_fscs_is_valid(fe_cfg, br)) {
+if (ovsrec_fscs_is_valid_ipfix(fe_cfg, br)) {
 opts->collector_set_id = fe_cfg->id;
 sset_init(>targets);
 sset_add_array(>targets, fe_cfg->ipfix->targets,
@@ -1667,6 +1670,47 @@ bridge_configure_ipfix(struct bridge *br)
 }
 }
 
+/* Set psample configuration on 'br'. */
+static void
+bridge_configure_psample(struct bridge *br)
+{
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
+const struct ovsrec_flow_sample_collector_set *fscs;
+struct ofproto_psample_options *ps_options;
+struct ovs_list options_list;
+int ret;
+
+ovs_list_init(_list);
+
+OVSREC_FLOW_SAMPLE_COLLECTOR_SET_FOR_EACH(fscs, idl) {
+if (!fscs->psample_group || fscs->n_psample_group != 1
+|| fscs->bridge != br->cfg)
+continue;
+
+ps_options = xzalloc(sizeof *ps_options);
+ps_options->collector_set_id = fscs->id;
+ps_options->group_id = *fscs->psample_group;
+
+ovs_list_insert(_list, _options->list_node);
+}
+
+ret = ofproto_set_psample(br->ofproto, _list);
+
+if (ret == ENOTSUP) {
+if (!ovs_list_is_empty(_list)) {
+VLOG_WARN_RL(, "bridge %s: ignoring psample configuration: "
+  "not supported by this datapath", br->name);
+}
+} else if (ret) {
+VLOG_ERR_RL(, "bridge %s: error configuring psample: %s",
+ br->name, ovs_strerror(ret));
+}
+
+LIST_FOR_EACH_POP(ps_options, list_node, _list) {
+free(ps_options);
+}
+}
+
 static void
 port_configure_stp(const struct ofproto *ofproto, struct port *port,
struct ofproto_port_stp_settings *port_s,
diff --git a/vswitchd/vswitch.ovsschema b/vswitchd/vswitch.ovsschema
index e2d5e2e85..a7ad9bcaa 100644
--- a/vswitchd/vswitch.ovsschema
+++ b/vswitchd/vswitch.ovsschema
@@ -1,6 +1,6 @@
 {"name": "Open_vSwitch",
  "version": "8.5.0",
- "cksum": "4040946650 27557",
+ "cksum": "1366215760 27764",
  "tables": {
"Open_vSwitch": {
  "columns": {
@@ -562,6 +562,11 @@
  "type": {"key": {"type": "uuid",
   "refTable": "IPFIX"},
   "min": 0, "max": 1}},
+   "psample_group": {
+ "type": {"key": {"type": "integer",
+  "minInteger": 0,
+  "maxInteger": 

[ovs-dev] [RFC 00/11] Add psample support to NXAST_SAMPLE action.

2024-04-24 Thread Adrian Moreno
This is the userspace counterpart of the work being done in the kernel
[1]. Sending it as RFC to get some early feedback on the overall
solution.

** Problem description **
Currently, OVS supports several observability features, such as
per-bridge IPFIX, per-flow IPFIX and sFlow. However, given the
complexity of OVN-generated pipelines, a single sample is typically not
enough to troubleshoot what is OVN/OVS is doing with the packet. We need
highler level metadata alongside the packet sample.

This can be achieved by the means of per-flow IPFIX sampling and
NXAST_SAMPLE action, through which OVN can add two arbitrary 32-bit
values (observation_domain_id and observation_point_id) that are sent
alongside the packet information in the IPFIX frames.

However, there is a fundamental limitation of the existing sampling
infrastructure, specially in the kernel datapath: samples are handled by
ovs-vswitchd, forcing the need for an upcall (userspace action). The
fact that samples share the same netlink sockets and handler thread cpu
time with actual packet misse, can easily cause packet drops making this
solution unfit for use in highly loaded production systems.

Users are left with little option than guessing what sampling rate will
be OK for their traffic pattern and dealing with the lost accuracy. 

** Feature Description **
In order to solve this situation and enable this feature to be safely
turned on in production, this RFC uses the psample support proposed in
[1] to send NXAST_SAMPLE samples to psample adding the observability
domain and point information as metadata.

~~ API ~~
The API is simply a new field called "psample_group" in the
Flow_Sample_Collector_Set (FSCS) table. Configuring this value is
orthogonal to also associating the FSCS entry to an entry in the IPFIX
table.

Apart from that configuration, the controller needs to add NXAST_SAMPLE
actions that refer the entry's id.

~~ HW Offload ~~
psample is already supported by tc using the act_sample action. The
metadata is taken from the actions' cookie.
This series also adds support for offloading the odp sample action,
only when it includes psample information but not nested actions.

A bit of special care has to be taken in the tc layer to not store the
ufid in the sample's cookie as it's used to carry action-specific data.

~~ Datapath support ~~
This new behavior of the datapth sample action is only supported on the
kernel datapath. A more detailed analysis is needed (and planned) to
find a way to also optimize the userspace datapath. However, although
potentially inefficient, there is not that much risk of dropping packets
with the existing IPFIX infrastructure.

~~ Testing ~~
The series includes an utility program called "ovs-psample" that listens
to packets multicasted by the psample module and prints them (also
printing the obs_domain and obs_point ids). In addition the kernel
series includes a tracepoint for easy testing and troubleshooting.

[1] https://patchwork.kernel.org/project/netdevbpf/list/?series=847473


Adrian Moreno (11):
  odp-util: Add support to new psample uAPI.
  ofproto_dpif: Check for psample support.
  ofproto: Add ofproto-dpif-psample.
  vswitchd: Add psample to schema and configure it.
  ofproto_dpif_xlate: Use psample for OFP samples.
  utilities: Add ovs-psample.
  netlink-offload-tc: Rename act_cookie->flow_cookie.
  netdev-offload-tc: Add sample support.
  ofproto-dpif-xlate-cache: Add psample to xcache.
  ofproto-dpif-psample: Add command to show stats.
  tests: Add test for sample offloading.

 Documentation/automake.mk   |   1 +
 Documentation/conf.py   |   2 +
 Documentation/ref/index.rst |   1 +
 Documentation/ref/ovs-psample.8.rst |  47 
 include/linux/automake.mk   |   6 +-
 include/linux/openvswitch.h |  49 -
 include/linux/pkt_cls.h |   2 +
 include/linux/psample.h |  64 ++
 include/linux/tc_act/tc_sample.h|  26 +++
 lib/netdev-offload-tc.c |  75 ++-
 lib/odp-execute.c   |   3 +
 lib/odp-util.c  | 150 +
 lib/tc.c|  98 -
 lib/tc.h|   9 +-
 ofproto/automake.mk |   2 +
 ofproto/ofproto-dpif-ipfix.c|   2 +
 ofproto/ofproto-dpif-psample.c  | 266 ++
 ofproto/ofproto-dpif-psample.h  |  39 
 ofproto/ofproto-dpif-xlate-cache.c  |  11 +-
 ofproto/ofproto-dpif-xlate-cache.h  |   6 +
 ofproto/ofproto-dpif-xlate.c| 158 +
 ofproto/ofproto-dpif-xlate.h|   3 +-
 ofproto/ofproto-dpif.c  |  96 +++-
 ofproto/ofproto-dpif.h  |   7 +-
 ofproto/ofproto-provider.h  |   9 +
 ofproto/ofproto.c   |  10 +
 ofproto/ofproto.h   |   8 +
 python/ovs/flow/odp.py  |   2 +
 rhel/openvswitch-fedora.spec.in |   2 +
 rhel/openvswitch.spec.in|   1 +
 

[ovs-dev] [RFC 02/11] ofproto_dpif: Check for psample support.

2024-04-24 Thread Adrian Moreno
Only kernel datapath supports psample so check that the datapath is not
userspace and that it accepts the new attributes.

Signed-off-by: Adrian Moreno 
---
 ofproto/ofproto-dpif.c | 59 ++
 ofproto/ofproto-dpif.h |  6 -
 2 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 32d037be6..3cee2795a 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -25,6 +25,7 @@
 #include "coverage.h"
 #include "cfm.h"
 #include "ct-dpif.h"
+#include "dpif-netdev.h"
 #include "fail-open.h"
 #include "guarded-list.h"
 #include "hmapx.h"
@@ -873,6 +874,12 @@ ovs_lb_output_action_supported(struct ofproto_dpif 
*ofproto)
 return ofproto->backer->rt_support.lb_output_action;
 }
 
+bool
+ovs_psample_supported(struct ofproto_dpif *ofproto)
+{
+return ofproto->backer->rt_support.psample;
+}
+
 /* Tests whether 'backer''s datapath supports recirculation.  Only newer
  * datapaths support OVS_KEY_ATTR_RECIRC_ID in keys.  We need to disable some
  * features on older datapaths that don't support this feature.
@@ -1440,6 +1447,14 @@ dpif_supports_ct_zero_snat(struct dpif_backer *backer)
 return supported;
 }
 
+static bool check_psample(struct dpif_backer *backer);
+
+static bool
+dpif_supports_psample(struct dpif_backer *backer)
+{
+return !dpif_is_netdev(backer->dpif) && check_psample(backer);
+}
+
 /* Tests whether 'backer''s datapath supports the
  * OVS_ACTION_ATTR_CHECK_PKT_LEN action. */
 static bool
@@ -1609,6 +1624,49 @@ check_add_mpls(struct dpif_backer *backer)
 return supported;
 }
 
+/* Tests whether 'backer''s datapath supports the OVS_SAMPLE_ATTR_PSAMPLE
+ * attribute. */
+static bool
+check_psample(struct dpif_backer *backer)
+{
+uint8_t cookie[OVS_PSAMPLE_COOKIE_MAX_SIZE];
+struct odputil_keybuf keybuf;
+struct ofpbuf actions;
+struct ofpbuf key;
+struct flow flow;
+bool supported;
+size_t offset;
+
+struct odp_flow_key_parms odp_parms = {
+.flow = ,
+.probe = true,
+};
+
+memset(, 0, sizeof flow);
+ofpbuf_use_stack(, , sizeof keybuf);
+odp_flow_key_from_flow(_parms, );
+ofpbuf_init(, 64);
+
+/* Generate a random max-size cookie. */
+random_bytes([0], sizeof(cookie));
+
+offset = nl_msg_start_nested(, OVS_ACTION_ATTR_SAMPLE);
+nl_msg_put_u32(, OVS_SAMPLE_ATTR_PROBABILITY, 1);
+nl_msg_put_u32(, OVS_SAMPLE_ATTR_PSAMPLE_GROUP, 10);
+nl_msg_put_unspec(, OVS_SAMPLE_ATTR_PSAMPLE_COOKIE, [0],
+  sizeof(cookie));
+nl_msg_end_nested(, offset);
+
+supported = dpif_probe_feature(backer->dpif, "psample", ,
+   , NULL);
+ofpbuf_uninit();
+VLOG_INFO("%s: Datapath %s psample",
+  dpif_name(backer->dpif),
+  supported ? "supports" : "does not support");
+return supported;
+}
+
+
 #define CHECK_FEATURE__(NAME, SUPPORT, FIELD, VALUE, ETHTYPE)   \
 static bool \
 check_##NAME(struct dpif_backer *backer)\
@@ -1698,6 +1756,7 @@ check_support(struct dpif_backer *backer)
 dpif_supports_lb_output_action(backer->dpif);
 backer->rt_support.ct_zero_snat = dpif_supports_ct_zero_snat(backer);
 backer->rt_support.add_mpls = check_add_mpls(backer);
+backer->rt_support.psample = dpif_supports_psample(backer);
 
 /* Flow fields. */
 backer->rt_support.odp.ct_state = check_ct_state(backer);
diff --git a/ofproto/ofproto-dpif.h b/ofproto/ofproto-dpif.h
index d33f73df8..3db4263c7 100644
--- a/ofproto/ofproto-dpif.h
+++ b/ofproto/ofproto-dpif.h
@@ -213,7 +213,10 @@ struct group_dpif *group_dpif_lookup(struct ofproto_dpif *,
 DPIF_SUPPORT_FIELD(bool, ct_zero_snat, "Conntrack all-zero IP SNAT")\
 \
 /* True if the datapath supports add_mpls action. */\
-DPIF_SUPPORT_FIELD(bool, add_mpls, "MPLS Label add")
+DPIF_SUPPORT_FIELD(bool, add_mpls, "MPLS Label add")\
+\
+/* True if the datapath supports psample. */\
+DPIF_SUPPORT_FIELD(bool, psample, "Psample")
 
 
 /* Stores the various features which the corresponding backer supports. */
@@ -411,5 +414,6 @@ bool ofproto_dpif_ct_zone_timeout_policy_get_name(
 uint8_t nw_proto, char **tp_name, bool *unwildcard);
 
 bool ovs_explicit_drop_action_supported(struct ofproto_dpif *);
+bool ovs_psample_supported(struct ofproto_dpif *);
 
 #endif /* ofproto-dpif.h */
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] selftests: openvswitch: Questions about possible enhancements

2024-04-24 Thread Aaron Conole
Benjamin Poirier  writes:

> On 2024-04-24 18:37 +0100, Simon Horman wrote:
>> On Wed, Apr 24, 2024 at 05:44:05PM +0100, Simon Horman wrote:
>> > Hi Aaron, Jakub, all,
>> > 
>> > I have recently been exercising the Open vSwitch kernel selftests,
>> > using vng, something like this:
>> > 
>> >TESTDIR="tools/testing/selftests/net/openvswitch"
>> > 
>> > vng -v --run . --user root --cpus 2 \
>> > --overlay-rwdir "$PWD" -- \
>> > "modprobe openvswitch && \
>> > echo \"timeout=90\" >> \"${TESTDIR}/settings\" && \
>> >  make -C \"$TESTDIR\" run_tests"
>> > 
>> > And I have some observations that I'd like to ask about.
>> > 
>> > 1. Building the kernel using the following command does not
>> >build the openvswitch kernel module.
>> > 
>> >vng -v --build \
>> >--config tools/testing/selftests/net/config
>> > 
>> >All that seems to be missing is CONFIG_OPENVSWITCH=m
>> >and I am wondering what the best way of resolving this is.
>> > 
>> >Perhaps I am doing something wrong.
>> >Or perhaps tools/testing/selftests/net/openvswitch/config
>> >should be created? If so, should it include (most of?) what is in
>> >tools/testing/selftests/net/config, or just CONFIG_OPENVSWITCH=m?
>
> I noticed something similar when testing Jiri's virtio_net selftests
> patchset [1].
>
> drivers/net/virtio_net/config includes virtio options but the
> test also needs at least CONFIG_NET_VRF=y which is part of net/config.
>
> Whatever the answer to your question, all config files should be
> coherent on this matter.
>
> [1] https://lore.kernel.org/netdev/20240424104049.3935572-1-j...@resnulli.us/
>
> [...]
>> 
>>   5. openvswitch.sh starts with "#!/bin/sh".
>>  But substitutions such as "${ns:0:1}0"  fail if /bin/sh is dash.
>>  Perhaps we should change openvswitch.sh to use bash?
>
> I think so. A similar change was done in
> c2518da8e6b0 selftests: bonding: Change script interpreter (v6.8-rc1)

+1 - I'm okay with it.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] selftests: openvswitch: Questions about possible enhancements

2024-04-24 Thread Aaron Conole
Simon Horman  writes:

> Hi Aaron, Jakub, all,
>
> I have recently been exercising the Open vSwitch kernel selftests,
> using vng, something like this:
>
>   TESTDIR="tools/testing/selftests/net/openvswitch"
>
> vng -v --run . --user root --cpus 2 \
> --overlay-rwdir "$PWD" -- \
> "modprobe openvswitch && \
>echo \"timeout=90\" >> \"${TESTDIR}/settings\" && \
>  make -C \"$TESTDIR\" run_tests"
>
> And I have some observations that I'd like to ask about.
>
> 1. Building the kernel using the following command does not
>build the openvswitch kernel module.
>
>   vng -v --build \
>   --config tools/testing/selftests/net/config
>
>All that seems to be missing is CONFIG_OPENVSWITCH=m
>and I am wondering what the best way of resolving this is.
>
>Perhaps I am doing something wrong.
>Or perhaps tools/testing/selftests/net/openvswitch/config
>should be created? If so, should it include (most of?) what is in
>tools/testing/selftests/net/config, or just CONFIG_OPENVSWITCH=m?

I have a series that I need to get back to fixing:

https://mail.openvswitch.org/pipermail/ovs-dev/2024-February/411917.html

which does include the config listed, and some of the fixes for things
you've noted.

I think it makes sense to get back to it.

> 2. As per my example above, it seems that a modprobe openvswitch is
>required (if openvswitch is a module).
>
>Again, perhaps I am doing something wrong. But if not, should this be
>incorporated into tools/testing/selftests/net/openvswitch/openvswitch.sh
>or otherwise automated?
>
> 3. I have observed that the last test fails (yesterday, but not today!),
>because the namespace it tries to create already exists. I believe this
>is because it is pending deletion.
>
>My work-around is as follows:
>
>  ovs_add_netns_and_veths () {
>   info "Adding netns attached: sbx:$1 dp:$2 {$3, $4, $5}"
> + for i in $(seq 10); do
> + ovs_sbx "$1" test -e "/var/run/netns/$3" || break
> + info "Namespace $3 still exists (attempt $i)"
> + ovs_sbx "$1" ip netns del "$3"
> + sleep "$i"
> + done
>   ovs_sbx "$1" ip netns add "$3" || return 1
>   on_exit "ovs_sbx $1 ip netns del $3"
>   ovs_sbx "$1" ip link add "$4" type veth peer name "$5" || return 1
>
>N.B.: the "netns del" part is probably not needed,
>but I'm not able to exercise it effectively right now.
>
>I am wondering if a loop like this is appropriate to add, perhaps also
>to namespace deletion. Or if it would be appropriate to port
>openvswitch.sh to use ./tools/testing/selftests/net/lib.sh, which I
>believe handles this.
>
> 4. I am observing timeouts whith the default value of 45s.
>Bumping this to 90s seems to help.
>Are there any objections to a patch to bump the timeout?

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] selftests: openvswitch: Questions about possible enhancements

2024-04-24 Thread Simon Horman
On Wed, Apr 24, 2024 at 05:44:05PM +0100, Simon Horman wrote:
> Hi Aaron, Jakub, all,
> 
> I have recently been exercising the Open vSwitch kernel selftests,
> using vng, something like this:
> 
>   TESTDIR="tools/testing/selftests/net/openvswitch"
> 
> vng -v --run . --user root --cpus 2 \
> --overlay-rwdir "$PWD" -- \
> "modprobe openvswitch && \
>echo \"timeout=90\" >> \"${TESTDIR}/settings\" && \
>  make -C \"$TESTDIR\" run_tests"
> 
> And I have some observations that I'd like to ask about.
> 
> 1. Building the kernel using the following command does not
>build the openvswitch kernel module.
> 
>   vng -v --build \
>   --config tools/testing/selftests/net/config
> 
>All that seems to be missing is CONFIG_OPENVSWITCH=m
>and I am wondering what the best way of resolving this is.
> 
>Perhaps I am doing something wrong.
>Or perhaps tools/testing/selftests/net/openvswitch/config
>should be created? If so, should it include (most of?) what is in
>tools/testing/selftests/net/config, or just CONFIG_OPENVSWITCH=m?
> 
> 2. As per my example above, it seems that a modprobe openvswitch is
>required (if openvswitch is a module).
> 
>Again, perhaps I am doing something wrong. But if not, should this be
>incorporated into tools/testing/selftests/net/openvswitch/openvswitch.sh
>or otherwise automated?
> 
> 3. I have observed that the last test fails (yesterday, but not today!),
>because the namespace it tries to create already exists. I believe this
>is because it is pending deletion.
> 
>My work-around is as follows:
> 
>  ovs_add_netns_and_veths () {
>   info "Adding netns attached: sbx:$1 dp:$2 {$3, $4, $5}"
> + for i in $(seq 10); do
> + ovs_sbx "$1" test -e "/var/run/netns/$3" || break
> + info "Namespace $3 still exists (attempt $i)"
> + ovs_sbx "$1" ip netns del "$3"
> + sleep "$i"
> + done
>   ovs_sbx "$1" ip netns add "$3" || return 1
>   on_exit "ovs_sbx $1 ip netns del $3"
>   ovs_sbx "$1" ip link add "$4" type veth peer name "$5" || return 1
> 
>N.B.: the "netns del" part is probably not needed,
>but I'm not able to exercise it effectively right now.
> 
>I am wondering if a loop like this is appropriate to add, perhaps also
>to namespace deletion. Or if it would be appropriate to port
>openvswitch.sh to use ./tools/testing/selftests/net/lib.sh, which I
>believe handles this.
> 
> 4. I am observing timeouts whith the default value of 45s.
>Bumping this to 90s seems to help.
>Are there any objections to a patch to bump the timeout?

  5. openvswitch.sh starts with "#!/bin/sh".
 But substitutions such as "${ns:0:1}0"  fail if /bin/sh is dash.
 Perhaps we should change openvswitch.sh to use bash?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] selftests: openvswitch: Questions about possible enhancements

2024-04-24 Thread Simon Horman
Hi Aaron, Jakub, all,

I have recently been exercising the Open vSwitch kernel selftests,
using vng, something like this:

TESTDIR="tools/testing/selftests/net/openvswitch"

vng -v --run . --user root --cpus 2 \
--overlay-rwdir "$PWD" -- \
"modprobe openvswitch && \
 echo \"timeout=90\" >> \"${TESTDIR}/settings\" && \
 make -C \"$TESTDIR\" run_tests"

And I have some observations that I'd like to ask about.

1. Building the kernel using the following command does not
   build the openvswitch kernel module.

vng -v --build \
--config tools/testing/selftests/net/config

   All that seems to be missing is CONFIG_OPENVSWITCH=m
   and I am wondering what the best way of resolving this is.

   Perhaps I am doing something wrong.
   Or perhaps tools/testing/selftests/net/openvswitch/config
   should be created? If so, should it include (most of?) what is in
   tools/testing/selftests/net/config, or just CONFIG_OPENVSWITCH=m?

2. As per my example above, it seems that a modprobe openvswitch is
   required (if openvswitch is a module).

   Again, perhaps I am doing something wrong. But if not, should this be
   incorporated into tools/testing/selftests/net/openvswitch/openvswitch.sh
   or otherwise automated?

3. I have observed that the last test fails (yesterday, but not today!),
   because the namespace it tries to create already exists. I believe this
   is because it is pending deletion.

   My work-around is as follows:

 ovs_add_netns_and_veths () {
info "Adding netns attached: sbx:$1 dp:$2 {$3, $4, $5}"
+   for i in $(seq 10); do
+   ovs_sbx "$1" test -e "/var/run/netns/$3" || break
+   info "Namespace $3 still exists (attempt $i)"
+   ovs_sbx "$1" ip netns del "$3"
+   sleep "$i"
+   done
ovs_sbx "$1" ip netns add "$3" || return 1
on_exit "ovs_sbx $1 ip netns del $3"
ovs_sbx "$1" ip link add "$4" type veth peer name "$5" || return 1

   N.B.: the "netns del" part is probably not needed,
   but I'm not able to exercise it effectively right now.

   I am wondering if a loop like this is appropriate to add, perhaps also
   to namespace deletion. Or if it would be appropriate to port
   openvswitch.sh to use ./tools/testing/selftests/net/lib.sh, which I
   believe handles this.

4. I am observing timeouts whith the default value of 45s.
   Bumping this to 90s seems to help.
   Are there any objections to a patch to bump the timeout?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [ovs-discuss] [branch-2.16] ovn distributed gateway chassisredirect strip_vlan not taking effect with stt

2024-04-24 Thread aginwala
Update:

Seems in upstream 5.4 linux, it only clears vlan_present vs old 4.15 kernel
https://github.com/torvalds/linux/blob/v5.4/net/core/skbuff.c#L5408
int skb_vlan_pop(struct sk_buff *skb)
{
  u16 vlan_tci;
  __be16 vlan_proto;
  int err;

  if (likely(skb_vlan_tag_present(skb))) {
__vlan_hwaccel_clear_tag(skb);
  } else {
...

static inline void __vlan_hwaccel_clear_tag(struct sk_buff *skb)
{
  skb->vlan_present = 0;  only clears 'present' flag
}


Hence, we patched stt on branch 2.16 ovs t
## update __push_stt_header on ovs 2.16
diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c
index 39a294764..ad1f0aa39 100644
--- a/datapath/linux/compat/stt.c
+++ b/datapath/linux/compat/stt.c
@@ -622,7 +622,9 @@ static int __push_stt_header(struct sk_buff *skb,
__be64 tun_id,
stth->flags |= STT_CSUM_VERIFIED;
}

-   stth->vlan_tci = htons(skb->vlan_tci);
+   if (skb_vlan_tag_present(skb)) {
+stth->vlan_tci = htons(skb->vlan_tci);
+}
skb->vlan_tci = 0;
put_unaligned(tun_id, >key);

Looks like part of linux change, stt side it was either not called out or
missed. Hence, let us know for any further amendments on above changes if
any as issue is mitigated with this patch and workaround is needed no more.
We will do some more tests and call out for any other failures.


Regards,
Aliasgar

On Tue, Apr 23, 2024 at 10:35 AM aginwala  wrote:

> Hi:
>
> Data plane restores when cleaning up flows using ovs-dpctl del-flows and
> eventually all the flows catch up as flows added by ovn are intact.
> However, not sure what flow caused this as the issue pops up on
> ovs-vswitchd restarts and needs to be  workaround by dpctl del-flows. Not
> sure if it's due to version compatibility with 2.11 ovn and 2.16 ovs or any
> particular patch in ovs/ovn that already has this fix . Will keep looking
> in parallel as the workaround unblocks this for now. Any additional
> pointers would be good too vs this workaround.
>
> Regards,
> Aliasgar
>
>
> On Fri, Apr 19, 2024 at 4:24 PM aginwala  wrote:
>
>> Hi All:
>>
>> Part of upgrading OVN north south gateway to the new 5.4 kernel , VMs
>> connectivity is lost when setting chassis for provider network lrp to this
>> new gateway. For interconnection gateways and hypervisors its not an issue/
>> lrp
>> _uuid   : 387a735d-fc11-4e90-8655-07785aa024af
>> chassis : b80a285b-586a-42d9-b189-69d641f143b1
>> datapath: d9219b69-5961-4f24-8414-1d4054b23169
>> external_ids: {}
>> gateway_chassis : [728adc6d-3236-4637-86e3-0f6745cf1b50,
>> 7a372e68-c228-400b-9a4b-439cf234ed40, 82295a9c-02aa-416b-bac3-83755c687caf,
>> d1b42374-c475-4745-abdb-36e72140c5b5]
>> logical_port: "cr-lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e"
>> mac : ["74:db:d1:80:d3:af 10.169.247.140/24"]
>> nat_addresses   : []
>> options :
>> {distributed-port="lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e"}
>> parent_port : []
>> tag : []
>> tunnel_key  : 2
>> type: chassisredirect
>>
>> provider network
>> port provnet-f239a6e8-73a5-4f95-8410-f7b3e0befe90
>> type: localnet
>> tag: 20
>> addresses: ["unknown"]
>> ## encap ip for ovn is on eth0
>>
>> ## gw interfaces brens2f0 hosts uplink provider network
>> ovs-vsctl list-br
>> br-int
>> brens2f0
>> ovs-vsctl list-ports brens2f0
>> ens2f0
>> patch-provnet-f239a6e8-73a5-4f95-8410-f7b3e0befe90-to-br-int
>> ## fail mode secure
>> ovs-vsctl get-fail-mode br-int
>> secure
>> ## set chassis
>> ovn-nbctl lrp-set-gateway-chassis
>> lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e
>> cee81be9-f782-4c82-800e-c5c5327531e4 101
>>
>> ovn-controller is running as a container on the new gateway
>> ovn-controller --version
>> ovn-controller (Open vSwitch) 2.11.1-13
>> OpenFlow versions 0x4:0x4
>>
>> ## ovs on the host 5.4 kernel
>> ovs-vsctl --version
>> ovs-vsctl (Open vSwitch) 2.16.0
>> DB Schema 8.3.0
>>
>> ovs-ofctl --version
>> ovs-ofctl (Open vSwitch) 2.16.0
>> OpenFlow versions 0x1:0x6
>>
>>
>> Digging further with tcpdump on the destination vm interface shows vlan
>> being present causing connectivity failure and no reply packet
>> 20:26:06.371540 74:db:d1:80:09:01 > 74:db:d1:80:0a:15, ethertype 802.1Q
>> (0x8100), length 102: vlan 20, p 0, ethertype IPv4, (tos 0x0, ttl 56, id
>> 53702, offset 0, flags [none], proto ICMP (1), length 84) 10.228.4.180 >
>> 10.78.8.42: ICMP echo request, id 7765, seq 791, length 64
>> 20:26:07.375960 74:db:d1:80:09:01 > 74:db:d1:80:0a:15, ethertype 802.1Q
>> (0x8100), length 102: vlan 20, p 0, ethertype IPv4, (tos 0x0, ttl 56, id
>> 36269, offset 0, flags [none], proto ICMP (1), length 84) 10.228.4.180 >
>> 10.78.8.42: ICMP echo request, id 7765, seq 792, length 64
>>
>> openflow rules for atrip vlan 20 is correct that are programmed with ovn
>> on new/old gw :
>> ovs-ofctl dump-flows br-int | grep strip_vlan | grep 20
>> 

[ovs-dev] [PATCH v4 ovn] northd: Fix pmtud for non routed traffic.

2024-04-24 Thread Lorenzo Bianconi
Similar to what is already implemented for routed e/w traffic,
introduce pmtud support for e/w traffic between two logical switch ports
connected to the same logical switch, but running on two different
hypervisors.

Acked-by: Mark Michelson 
Reported-at: https://issues.redhat.com/browse/FDP-524
Reported-at: https://issues.redhat.com/browse/FDP-362
Signed-off-by: Lorenzo Bianconi 
---
Changes since v3:
- fix M_NS_DAEMONIZE macro
Changes since v2:
- minor changes
Changes since v1:
- move logic in consider_port_binding
- add more self-test
- fix typos
---
 controller/lflow.h|   1 +
 controller/physical.c |  30 +++-
 northd/northd.c   |  33 ++---
 northd/ovn-northd.8.xml   |  16 +++--
 tests/multinode-macros.at |   7 ++
 tests/multinode.at| 147 +-
 tests/ovn-controller.at   |  63 
 tests/ovn-macros.at   |   1 +
 tests/ovn-northd.at   |  24 +--
 tests/ovn.at  |   5 +-
 10 files changed, 304 insertions(+), 23 deletions(-)

diff --git a/controller/lflow.h b/controller/lflow.h
index 295d004f4..1d20cae35 100644
--- a/controller/lflow.h
+++ b/controller/lflow.h
@@ -94,6 +94,7 @@ struct uuid;
 #define OFTABLE_ECMP_NH  77
 #define OFTABLE_CHK_LB_AFFINITY  78
 #define OFTABLE_MAC_CACHE_USE79
+#define OFTABLE_CT_ZONE_LOOKUP   80
 
 struct lflow_ctx_in {
 struct ovsdb_idl_index *sbrec_multicast_group_by_name_datapath;
diff --git a/controller/physical.c b/controller/physical.c
index 7ee308694..25da789f0 100644
--- a/controller/physical.c
+++ b/controller/physical.c
@@ -1498,6 +1498,26 @@ consider_port_binding(struct ovsdb_idl_index 
*sbrec_port_binding_by_name,
 return;
 }
 
+if (get_lport_type(binding) == LP_VIF) {
+/* Table 80, priority 100.
+ * ===
+ *
+ * Process ICMP{4,6} error packets too big locally generated from the
+ * kernel in order to lookup proper ct_zone. */
+struct match match = MATCH_CATCHALL_INITIALIZER;
+match_set_metadata(, htonll(dp_key));
+match_set_reg(, MFF_LOG_INPORT - MFF_REG0, port_key);
+
+struct zone_ids icmp_zone_ids = get_zone_ids(binding, ct_zones);
+ofpbuf_clear(ofpacts_p);
+put_zones_ofpacts(_zone_ids, ofpacts_p);
+put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, ofpacts_p);
+ofctrl_add_flow(flow_table, OFTABLE_CT_ZONE_LOOKUP, 100,
+binding->header_.uuid.parts[0], ,
+ofpacts_p, >header_.uuid);
+ofpbuf_clear(ofpacts_p);
+}
+
 struct match match;
 if (!strcmp(binding->type, "patch")
 || (!strcmp(binding->type, "l3gateway")
@@ -2464,6 +2484,14 @@ physical_run(struct physical_ctx *p_ctx,
   flow_table, );
 }
 
+/* Default flow for CT_ZONE_LOOKUP Table. */
+struct match ct_look_def_match;
+match_init_catchall(_look_def_match);
+ofpbuf_clear();
+put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, );
+ofctrl_add_flow(flow_table, OFTABLE_CT_ZONE_LOOKUP, 0, 0,
+_look_def_match, , hc_uuid);
+
 /* Handle output to multicast groups, in tables 40 and 41. */
 const struct sbrec_multicast_group *mc;
 SBREC_MULTICAST_GROUP_TABLE_FOR_EACH (mc, p_ctx->mc_group_table) {
@@ -2522,7 +2550,7 @@ physical_run(struct physical_ctx *p_ctx,
 /* Add specif flows for E/W ICMPv{4,6} packets if tunnelled packets
  * do not fit path MTU.
  */
-put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, );
+put_resubmit(OFTABLE_CT_ZONE_LOOKUP, );
 
 /* IPv4 */
 match_init_catchall();
diff --git a/northd/northd.c b/northd/northd.c
index d30ff9da5..ec5f44c16 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -8667,7 +8667,7 @@ build_lswitch_lflows_admission_control(struct 
ovn_datapath *od,
 ovs_assert(od->nbs);
 
 /* Default action for recirculated ICMP error 'packet too big'. */
-ovn_lflow_add(lflows, od, S_SWITCH_IN_CHECK_PORT_SEC, 110,
+ovn_lflow_add(lflows, od, S_SWITCH_IN_CHECK_PORT_SEC, 105,
   "((ip4 && icmp4.type == 3 && icmp4.code == 4) ||"
   " (ip6 && icmp6.type == 2 && icmp6.code == 0)) &&"
   " flags.tunnel_rx == 1", debug_drop_action(), lflow_ref);
@@ -11863,7 +11863,22 @@ build_lswitch_icmp_packet_toobig_admin_flows(
 {
 ovs_assert(op->nbsp);
 
+ds_clear(match);
 if (!lsp_is_router(op->nbsp)) {
+if (!op->n_lsp_addrs) {
+return;
+}
+
+ds_put_format(match,
+  "((ip4 && icmp4.type == 3 && icmp4.code == 4) ||"
+  " (ip6 && icmp6.type == 2 && icmp6.code == 0)) &&"
+  " eth.src == "ETH_ADDR_FMT" && outport == %s &&"
+  " !is_chassis_resident(%s) && flags.tunnel_rx == 1",
+  ETH_ADDR_ARGS(op->lsp_addrs[0].ea), op->json_key,

Re: [ovs-dev] [PATCH net-next] net: openvswitch: Release reference to netdev

2024-04-24 Thread Aaron Conole
Jun Gu  writes:

> dev_get_by_name will provide a reference on the netdev. So ensure that
> the reference of netdev is released after completed.
>
> Fixes: 2540088b836f ("net: openvswitch: Check vport netdev name")
> Signed-off-by: Jun Gu 
> ---

Thanks!

Reviewed-by: Aaron Conole 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] net: openvswitch: Fix Use-After-Free in ovs_ct_exit

2024-04-24 Thread Aaron Conole
Hyunwoo Kim  writes:

> Since kfree_rcu, which is called in the hlist_for_each_entry_rcu traversal
> of ovs_ct_limit_exit, is not part of the RCU read critical section, it
> is possible that the RCU grace period will pass during the traversal and
> the key will be free.
>
> To prevent this, it should be changed to hlist_for_each_entry_safe.
>
> Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit")
> Signed-off-by: Hyunwoo Kim 
> ---

Reviewed-by: Aaron Conole 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH net-next 8/8] selftests: openvswitch: add psample test

2024-04-24 Thread Adrian Moreno
Add a test to verify sampling packets via psample works.

In order to do that, create a subcommand in ovs-dpctl.py to listen to
on the psample multicast group and print samples.

In order to also test simultaneous sFlow and psample actions, add
missing parsing support for "userspace" action (via refactoring the one
in sample).

Signed-off-by: Adrian Moreno 
---
 .../selftests/net/openvswitch/openvswitch.sh  |  97 +-
 .../selftests/net/openvswitch/ovs-dpctl.py| 167 ++
 2 files changed, 231 insertions(+), 33 deletions(-)

diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh 
b/tools/testing/selftests/net/openvswitch/openvswitch.sh
index 5cae53543849..7a2307a384a9 100755
--- a/tools/testing/selftests/net/openvswitch/openvswitch.sh
+++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh
@@ -20,7 +20,8 @@ tests="
nat_related_v4  ip4-nat-related: ICMP related 
matches work with SNAT
netlink_checks  ovsnl: validate netlink attrs 
and settings
upcall_interfaces   ovs: test the upcall interfaces
-   drop_reason drop: test drop reasons are 
emitted"
+   drop_reason drop: test drop reasons are 
emitted
+   psample psample: Sampling packets with 
psample"
 
 info() {
 [ $VERBOSE = 0 ] || echo $*
@@ -170,6 +171,19 @@ ovs_drop_reason_count()
return `echo "$perf_output" | grep "$pattern" | wc -l`
 }
 
+ovs_test_flow_fails () {
+   ERR_MSG="Flow actions may not be safe on all matching packets"
+
+   PRE_TEST=$(dmesg | grep -c "${ERR_MSG}")
+ovs_add_flow $@ &> /dev/null $@ && return 1
+   POST_TEST=$(dmesg | grep -c "${ERR_MSG}")
+
+   if [ "$PRE_TEST" == "$POST_TEST" ]; then
+   return 1
+   fi
+return 0
+}
+
 usage() {
echo
echo "$0 [OPTIONS] [TEST]..."
@@ -184,6 +198,87 @@ usage() {
exit 1
 }
 
+
+# psample test
+# - samples packets with psample
+test_psample() {
+   sbx_add "test_psample" || return $?
+
+   # Add a datapath with per-vport dispatching.
+   ovs_add_dp "test_psample" psample -V 2:1 || return 1
+
+   info "create namespaces"
+   ovs_add_netns_and_veths "test_psample" "psample" \
+   client c0 c1 172.31.110.10/24 -u || return 1
+   ovs_add_netns_and_veths "test_psample" "psample" \
+   server s0 s1 172.31.110.20/24 -u || return 1
+
+   # Check if psample actions can be configured.
+   ovs_add_flow "test_psample" psample \
+   'in_port(1),eth(),eth_type(0x0806),arp()' 
'sample(sample=100%,group_id=1,cookie=0102)'
+   if [ $? == 1 ]; then
+   info "no support for psample - skipping"
+   ovs_exit_sig
+   return $ksft_skip
+   fi
+
+   ovs_del_flows "test_psample" psample
+
+   # Allow ARP
+   ovs_add_flow "test_psample" psample \
+   'in_port(1),eth(),eth_type(0x0806),arp()' '2' || return 1
+   ovs_add_flow "test_psample" psample \
+   'in_port(2),eth(),eth_type(0x0806),arp()' '1' || return 1
+
+# Test action verification.
+   OLDIFS=$IFS
+   IFS='*'
+   min_key='in_port(1),eth(),eth_type(0x800),ipv4()'
+   for testcase in \
+   "cookie to 
large"*"sample(sample=100%,group_id=1,cookie=1615141312111009080706050403020100)"
 \
+   "no group or action"*"sample(sample=100%)" \
+   "no group or action with 
cookie"*"sample(sample=100%,cookie=deadbeef)";
+   do
+   set -- $testcase;
+   ovs_test_flow_fails "test_psample" psample $min_key $2
+   if [ $? == 1 ]; then
+   info "failed - $1"
+   return 1
+   fi
+   done
+   IFS=$OLDIFS
+
+   # Sample all traffic. In this case the sample action only has psample
+   # arguments.
+   ovs_add_flow "test_psample" psample \
+   
"in_port(1),eth(),eth_type(0x0800),ipv4(src=172.31.110.10,proto=1),icmp()" 
"sample(sample=100%,group_id=1,cookie=c0ffee),2"
+
+   # Sample all traffic. In this case the sample action has both psample
+   # arguments and an upcall emulating simultaneous psample and
+   # sFlow / IPFIX.
+   nlpid=$(grep -E "listening on upcall packet handler" $ovs_dir/s0.out | 
cut -d ":" -f 2 | tr -d ' ')
+   ovs_add_flow "test_psample" psample \
+   
"in_port(2),eth(),eth_type(0x0800),ipv4(src=172.31.110.20,proto=1),icmp()" 
"sample(sample=100%,group_id=2,cookie=eeff0c,actions(userspace(pid=${nlpid},userdata=eeff0c))),1"
+
+   # Record psample data.
+   python3 $ovs_base/ovs-dpctl.py psample  >$ovs_dir/psample.out 
2>$ovs_dir/psample.err &
+   pid=$!
+   on_exit "ovs_sbx test_psample kill -TERM $pid 2>/dev/null"
+
+   # Send a single ping.
+   sleep 1
+   ovs_sbx "test_psample" ip netns exec 

[ovs-dev] [PATCH net-next 6/8] net:openvswitch: add psample support

2024-04-24 Thread Adrian Moreno
Add support for psample sampling via two new attributes to the
OVS_ACTION_ATTR_SAMPLE action.

OVS_SAMPLE_ATTR_PSAMPLE_GROUP used to pass an integer psample group_id.
OVS_SAMPLE_ATTR_PSAMPLE_COOKIE used to pass a variable-length binary
cookie that will be forwared to psample.

The maximum length of the user-defined cookie is set to 16, same as
tc_cookie, to discourage using cookies that will not be offloadable.

In order to simplify the internal processing of the action and given the
maximum size of the cookie is relatively small, add both fields to the
internal-only struct sample_arg.

The presence of a group_id mandates that the action shall called the
psample module to multicast the packet with such group_id and the
user-provided cookie if present. This behavior is orthonogal to
also executing the nested actions if present.

Signed-off-by: Adrian Moreno 
---
 Documentation/netlink/specs/ovs_flow.yaml |  6 ++
 include/uapi/linux/openvswitch.h  | 49 ++
 net/openvswitch/actions.c | 51 +--
 net/openvswitch/flow_netlink.c| 80 ++-
 4 files changed, 153 insertions(+), 33 deletions(-)

diff --git a/Documentation/netlink/specs/ovs_flow.yaml 
b/Documentation/netlink/specs/ovs_flow.yaml
index 4fdfc6b5cae9..5543c2937225 100644
--- a/Documentation/netlink/specs/ovs_flow.yaml
+++ b/Documentation/netlink/specs/ovs_flow.yaml
@@ -825,6 +825,12 @@ attribute-sets:
 name: actions
 type: nest
 nested-attributes: action-attrs
+  -
+name: psample_group
+type: u32
+  -
+name: psample_cookie
+type: binary
   -
 name: userspace-attrs
 enum-name: ovs-userspace-attr
diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index efc82c318fa2..e9cd6f3a952d 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -639,6 +639,7 @@ enum ovs_flow_attr {
 #define OVS_UFID_F_OMIT_MASK (1 << 1)
 #define OVS_UFID_F_OMIT_ACTIONS  (1 << 2)
 
+#define OVS_PSAMPLE_COOKIE_MAX_SIZE 16
 /**
  * enum ovs_sample_attr - Attributes for %OVS_ACTION_ATTR_SAMPLE action.
  * @OVS_SAMPLE_ATTR_PROBABILITY: 32-bit fraction of packets to sample with
@@ -646,15 +647,27 @@ enum ovs_flow_attr {
  * %UINT32_MAX samples all packets and intermediate values sample intermediate
  * fractions of packets.
  * @OVS_SAMPLE_ATTR_ACTIONS: Set of actions to execute in sampling event.
- * Actions are passed as nested attributes.
+ * Actions are passed as nested attributes. Optional if
+ * OVS_SAMPLE_ATTR_PSAMPLE_GROUP is set.
+ * @OVS_SAMPLE_ATTR_PSAMPLE_GROUP: A 32-bit number to be used as psample group.
+ * Optional if OVS_SAMPLE_ATTR_ACTIONS is set.
+ * @OVS_SAMPLE_ATTR_PSAMPLE_COOKIE: A variable-length binary cookie that, if
+ * provided, will be copied to the psample cookie.
  *
- * Executes the specified actions with the given probability on a per-packet
- * basis.
+ * Either OVS_SAMPLE_ATTR_PSAMPLE_GROUP or OVS_SAMPLE_ATTR_ACTIONS must be
+ * specified.
+ *
+ * Executes the specified actions and/or sends the packet to psample
+ * with the given probability on a per-packet basis.
  */
 enum ovs_sample_attr {
OVS_SAMPLE_ATTR_UNSPEC,
-   OVS_SAMPLE_ATTR_PROBABILITY, /* u32 number */
-   OVS_SAMPLE_ATTR_ACTIONS, /* Nested OVS_ACTION_ATTR_* attributes. */
+   OVS_SAMPLE_ATTR_PROBABILITY,/* u32 number */
+   OVS_SAMPLE_ATTR_ACTIONS,/* Nested OVS_ACTION_ATTR_
+* attributes.
+*/
+   OVS_SAMPLE_ATTR_PSAMPLE_GROUP,  /* u32 number */
+   OVS_SAMPLE_ATTR_PSAMPLE_COOKIE, /* binary */
__OVS_SAMPLE_ATTR_MAX,
 
 #ifdef __KERNEL__
@@ -665,13 +678,27 @@ enum ovs_sample_attr {
 #define OVS_SAMPLE_ATTR_MAX (__OVS_SAMPLE_ATTR_MAX - 1)
 
 #ifdef __KERNEL__
+
+/* Definition for flags in struct sample_arg. */
+enum {
+   /* When set, actions in sample will not change the flows. */
+   OVS_SAMPLE_ARG_FLAG_EXEC = 1 << 0,
+   /* When set, the packet will be sent to psample. */
+   OVS_SAMPLE_ARG_FLAG_PSAMPLE = 1 << 1,
+};
+
 struct sample_arg {
-   bool exec;   /* When true, actions in sample will not
- * change flow keys. False otherwise.
- */
-   u32  probability;/* Same value as
- * 'OVS_SAMPLE_ATTR_PROBABILITY'.
- */
+   u16 flags;  /* Flags that modify the behavior of the
+* action. See SAMPLE_ARG_FLAG_*.
+*/
+   u32  probability;   /* Same value as
+* 'OVS_SAMPLE_ATTR_PROBABILITY'.
+*/
+   u32  group_id;  /* Same value as
+* 'OVS_SAMPLE_ATTR_PSAMPLE_GROUP'.
+ 

[ovs-dev] [PATCH net-next 7/8] selftests: openvswitch: add sample action.

2024-04-24 Thread Adrian Moreno
Add sample action support to ovs-dpctl.py.

Signed-off-by: Adrian Moreno 
---
 .../selftests/net/openvswitch/ovs-dpctl.py| 96 ++-
 1 file changed, 95 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py 
b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
index 1dd057afd3fb..3a2dddc57e42 100644
--- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
+++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
@@ -8,6 +8,7 @@ import argparse
 import errno
 import ipaddress
 import logging
+import math
 import multiprocessing
 import re
 import struct
@@ -58,6 +59,7 @@ OVS_FLOW_CMD_DEL = 2
 OVS_FLOW_CMD_GET = 3
 OVS_FLOW_CMD_SET = 4
 
+UINT32_MAX = 0x
 
 def macstr(mac):
 outstr = ":".join(["%02X" % i for i in mac])
@@ -285,7 +287,7 @@ class ovsactions(nla):
 ("OVS_ACTION_ATTR_SET", "none"),
 ("OVS_ACTION_ATTR_PUSH_VLAN", "none"),
 ("OVS_ACTION_ATTR_POP_VLAN", "flag"),
-("OVS_ACTION_ATTR_SAMPLE", "none"),
+("OVS_ACTION_ATTR_SAMPLE", "sample"),
 ("OVS_ACTION_ATTR_RECIRC", "uint32"),
 ("OVS_ACTION_ATTR_HASH", "none"),
 ("OVS_ACTION_ATTR_PUSH_MPLS", "none"),
@@ -306,6 +308,91 @@ class ovsactions(nla):
 ("OVS_ACTION_ATTR_DROP", "uint32"),
 )
 
+class sample(nla):
+nla_flags = NLA_F_NESTED
+
+nla_map = (
+("OVS_SAMPLE_ATTR_UNSPEC", "none"),
+("OVS_SAMPLE_ATTR_PROBABILITY", "uint32"),
+("OVS_SAMPLE_ATTR_ACTIONS", "ovsactions"),
+("OVS_SAMPLE_ATTR_PSAMPLE_GROUP", "uint32"),
+("OVS_SAMPLE_ATTR_PSAMPLE_COOKIE", "array(uint8)"),
+)
+
+def dpstr(self, more=False):
+args = []
+
+args.append("sample={:.2f}%".format(
+100 * self.get_attr("OVS_SAMPLE_ATTR_PROBABILITY") /
+UINT32_MAX))
+
+group = self.get_attr("OVS_SAMPLE_ATTR_PSAMPLE_GROUP")
+cookie = self.get_attr("OVS_SAMPLE_ATTR_PSAMPLE_COOKIE")
+actions = self.get_attr("OVS_SAMPLE_ATTR_ACTIONS")
+
+if group:
+args.append("group_id=%d" % group)
+if cookie:
+args.append("cookie=%s" %
+"".join(format(x, "02x") for x in cookie))
+if actions:
+args.append("actions(%s)" % actions.dpstr(more))
+
+return "sample(%s)" % ",".join(args)
+
+def parse(self, actstr):
+""" Parses the input action string and populates the internal
+attributes. The input string must start with "sample("
+
+Returns the remaining action string.
+Raises ValueError if the action string has invalid content.
+"""
+
+def parse_nested_actions(actstr):
+subacts = ovsactions()
+parsed_len = subacts.parse(actstr)
+return subacts, parsed_len
+
+def percent_to_rate(percent):
+percent = float(percent.strip('%'))
+return int(math.floor(UINT32_MAX * (percent / 100.0) + .5))
+
+for (key, attr, func) in (
+("sample", "OVS_SAMPLE_ATTR_PROBABILITY", percent_to_rate),
+("group_id", "OVS_SAMPLE_ATTR_PSAMPLE_GROUP", int),
+("cookie", "OVS_SAMPLE_ATTR_PSAMPLE_COOKIE",
+lambda x: list(bytearray.fromhex(x))),
+("actions", "OVS_SAMPLE_ATTR_ACTIONS", parse_nested_actions),
+):
+if not actstr.startswith(key):
+continue
+
+actstr = actstr[len(key) :]
+
+if not func:
+self["attrs"].append([attr, None])
+continue
+
+# The length of complex attributes cannot be determined
+# beforehand and must be reported by the parsing func.
+delim = actstr[0]
+actstr = actstr[1:]
+if delim == "=":
+pos = strcspn(actstr, ",)")
+datum = func(actstr[:pos])
+elif delim == "(":
+datum, pos = func(actstr)
+
+self["attrs"].append([attr, datum])
+actstr = actstr[pos:]
+actstr = actstr[strspn(actstr, ", ") :]
+
+if actstr[0] != ")":
+raise ValueError("Action str: '%s' unbalanced" % actstr)
+
+return actstr[1:]
+
+
 class ctact(nla):
 nla_flags = NLA_F_NESTED
 
@@ -637,6 +724,13 @@ class ovsactions(nla):
 self["attrs"].append(["OVS_ACTION_ATTR_CT", ctact])
 parsed = True
 
+elif parse_starts_block(actstr, "sample(", False):
+sampleact = self.sample()
+actstr = sampleact.parse(actstr[len("sample(") : ])
+self["attrs"].append(["OVS_ACTION_ATTR_SAMPLE", sampleact])
+   

[ovs-dev] [PATCH net-next 0/8] net: openvswitch: Add sample multicasting.

2024-04-24 Thread Adrian Moreno
** Background **
Currently, OVS supports several packet sampling mechanisms (sFlow,
per-bridge IPFIX, per-flow IPFIX). These end up being translated into a
userspace action that needs to be handled by ovs-vswitchd's handler
threads only to be forwarded to some third party application that
will somehow process the sample and provide observability on the
datapath.

A particularly interesting use-case is controller-driven
per-flow IPFIX sampling where the OpenFlow controller can add metadata
to samples (via two 32bit integers) and this metadata is then available
to the sample-collecting system for correlation.

** Problem **
The fact that sampled traffic share netlink sockets and handler thread
time with upcalls, apart from being a performance bottleneck in the
sample extraction itself, can severely compromise the datapath,
yielding this solution unfit for highly loaded production systems.

Users are left with little options other than guessing what sampling
rate will be OK for their traffic pattern and system load and dealing
with the lost accuracy.

Looking at available infrastructure, an obvious candidated would be
to use psample. However, it's current state does not help with the
use-case at stake because sampled packets do not contain user-defined
metadata.

** Proposal **
This series is an attempt to fix this situation by extending the
existing psample infrastructure to carry a variable length
user-defined cookie.

The main existing user of psample is tc's act_sample. It is also
xtended to forward the action's cookie to psample.

Finally, OVS sample action is extended with a couple of attributes
(OVS_SAMPLE_ATTR_PSAMPLE_{GROUP,COOKIE}) that contain a 32 group_id
and a variable length cookie. When provided, OVS sends the packet
to psample for observability.

In order to make it easier for users to receive samples coming from
a specific source, group_id filtering is added to psample as well
as a tracepoint for troubleshooting.

--
rfc_v2 -> v1:
- Accomodate Ilya's comments.
- Split OVS's attribute in two attributes and simplify internal
handling of psample arguments.
- Extend psample and tc with a user-defined cookie.
- Add a tracepoint to psample to facilitate troubleshooting.

rfc_v1 -> rfc_v2:
- Use psample instead of a new OVS-only multicast group.
- Extend psample and tc with a user-defined cookie.

Adrian Moreno (8):
  net: netlink: export genl private pointer getters
  net: psample: add multicast filtering on group_id
  net: psample: add user cookie
  net: psample: add tracepoint
  net: sched: act_sample: add action cookie to sample
  net:openvswitch: add psample support
  selftests: openvswitch: add sample action.
  selftests: openvswitch: add psample test

 Documentation/netlink/specs/ovs_flow.yaml |   6 +
 include/net/psample.h |   2 +
 include/uapi/linux/openvswitch.h  |  49 -
 include/uapi/linux/psample.h  |   2 +
 net/netlink/genetlink.c   |   2 +
 net/openvswitch/actions.c |  51 -
 net/openvswitch/flow_netlink.c|  80 +--
 net/psample/psample.c | 131 ++-
 net/psample/trace.h   |  62 ++
 net/sched/act_sample.c|  12 +
 .../selftests/net/openvswitch/openvswitch.sh  |  97 +++-
 .../selftests/net/openvswitch/ovs-dpctl.py| 207 +-
 12 files changed, 655 insertions(+), 46 deletions(-)
 create mode 100644 net/psample/trace.h

-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/1] netdev-dpdk: Fix possible memory leak configuring VF MAC address.

2024-04-24 Thread Simon Horman
On Tue, Apr 23, 2024 at 11:33:15AM +0100, Simon Horman wrote:
> On Wed, Apr 17, 2024 at 10:43:11AM +0300, Roi Dayan wrote:
> > 
> > 
> > On 16/04/2024 18:48, Simon Horman wrote:
> > > On Tue, Apr 16, 2024 at 04:21:48PM +0300, Roi Dayan via dev wrote:
> > >> VLOG_WARN_BUF() is allocating memory for the error string and should
> > >> e used if the configuration cannot continue and error is being returned
> > >> so the caller has indication of releasing the pointer.
> > >> Change to VLOG_WARN() to keep the logic that error is not being
> > >> returned.
> > >>
> > >> Fixes: f4336f504b17 ("netdev-dpdk: Add option to configure VF MAC 
> > >> address.")
> > >> Signed-off-by: Roi Dayan 
> > >> Acked-by: Gaetan Rivet 
> > >> Acked-by: Eli Britstein 
> > >> ---
> > >>  lib/netdev-dpdk.c | 17 -
> > >>  1 file changed, 8 insertions(+), 9 deletions(-)
> > >>
> > >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > >> index 2111f776810b..9249b9e9c646 100644
> > >> --- a/lib/netdev-dpdk.c
> > >> +++ b/lib/netdev-dpdk.c
> > >> @@ -2379,17 +2379,16 @@ netdev_dpdk_set_config(struct netdev *netdev, 
> > >> const struct smap *args,
> > >>  struct eth_addr mac;
> > >>  
> > >>  if (!dpdk_port_is_representor(dev)) {
> > >> -VLOG_WARN_BUF(errp, "'%s' is trying to set the VF MAC '%s' "
> > >> -  "but 'options:dpdk-vf-mac' is only supported 
> > >> for "
> > >> -  "VF representors.",
> > >> -  netdev_get_name(netdev), vf_mac);
> > >> +VLOG_WARN("'%s' is trying to set the VF MAC '%s' "
> > >> +  "but 'options:dpdk-vf-mac' is only supported for "
> > >> +  "VF representors.",
> > >> +  netdev_get_name(netdev), vf_mac);
> > >>  } else if (!eth_addr_from_string(vf_mac, )) {
> > >> -VLOG_WARN_BUF(errp, "interface '%s': cannot parse VF MAC 
> > >> '%s'.",
> > >> -  netdev_get_name(netdev), vf_mac);
> > >> +VLOG_WARN("interface '%s': cannot parse VF MAC '%s'.",
> > >> +  netdev_get_name(netdev), vf_mac);
> > >>  } else if (eth_addr_is_multicast(mac)) {
> > >> -VLOG_WARN_BUF(errp,
> > >> -  "interface '%s': cannot set VF MAC to 
> > >> multicast "
> > >> -  "address '%s'.", netdev_get_name(netdev), 
> > >> vf_mac);
> > >> +VLOG_WARN("interface '%s': cannot set VF MAC to multicast "
> > >> +  "address '%s'.", netdev_get_name(netdev), vf_mac);
> > >>  } else if (!eth_addr_equals(dev->requested_hwaddr, mac)) {
> > >>  dev->requested_hwaddr = mac;
> > >>  netdev_request_reconfigure(netdev);
> > > 
> > > Thanks Roi,
> > > 
> > > I agree that this change makes sense as the allocated
> > > value will typically be discarded. And I think if
> > > VLOG_WARN_BUF() is called again in the same invocation of
> > > netdev_dpdk_set_config() a memory leak occurs.
> > > 
> > > Acked-by: Simon Horman 
> > > 
> > > After reviewing your patch I am now wondering if, conversely,
> > > the following change _also_ makes sense:
> > > 
> > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > > index 2111f776810b..32d4193d24af 100644
> > > --- a/lib/netdev-dpdk.c
> > > +++ b/lib/netdev-dpdk.c
> > > @@ -2426,8 +2426,8 @@ netdev_dpdk_set_config(struct netdev *netdev, const 
> > > struct smap *args,
> > >  }
> > >  err = 0; /* Not fatal. */
> > >  } else {
> > > -VLOG_WARN("%s: Cannot get flow control parameters: %s",
> > > -  netdev_get_name(netdev), rte_strerror(err));
> > > +VLOG_WARN_BUF(errp, "%s: Cannot get flow control parameters: 
> > > %s",
> > > +  netdev_get_name(netdev), rte_strerror(err));
> > >  }
> > >  goto out;
> > >  }
> > 
> > Hi,
> > 
> > thanks for the review. yes it can improve the error to the user in 
> > ovs-vsctl show.
> > 
> > error: "enp8s0f1: could not set configuration (Invalid 
> > argument)"
> > vs
> > 
> > error: "enp8s0f1: Cannot get flow control parameters: 
> > Invalid argument"
> > 
> > 
> > I'll do it in a different commit as its not related to the memleak fix.
> 
> Thanks,
> 
> I have gone ahead and applied this patch to main.
> 
> - netdev-dpdk: Fix possible memory leak configuring VF MAC address.
>   https://github.com/openvswitch/ovs/commit/4f29804f249b
> 
> As a follow-up I will prepare backports back to v2.17.

I now have applied the following backports.

Backport applied to branch-3.3:
- netdev-dpdk: Fix possible memory leak configuring VF MAC address.
  https://github.com/openvswitch/ovs/commit/a6c3b5202c2f

Backport applied to branch-3.2:
- netdev-dpdk: Fix possible memory leak configuring VF MAC address.
  https://github.com/openvswitch/ovs/commit/4e4463ca5539

Backport 

Re: [ovs-dev] [PATCH OVN v5 0/4] DHCP Relay Agent support for overlay subnets.

2024-04-24 Thread Naveen Yerramneni


> On 05-Apr-2024, at 9:08 PM, Numan Siddique  wrote:
> 
> CAUTION: External Email
> 
> 
> On Wed, Mar 20, 2024 at 10:40 AM Naveen Yerramneni 
>  wrote:
> >
> > This patch contains changes to enable DHCP Relay Agent support for 
> > overlay subnets.
> >
> > USE CASE:
> > --
> >   - Enable IP address assignment for overlay subnets from the 
> > centralized DHCP server present in the underlay network.
> >
> > PREREQUISITES
> > --
> >   - Logical Router Port IP should be assigned (statically) from the 
> > same overlay subnet which is managed by DHCP server.
> >   - LRP IP is used for GIADRR field when relaying the DHCP packets and 
> > also same IP needs to be configured as default gateway for the overlay 
> > subnet.
> >   - Overlay subnets managed by external DHCP server are expected to be 
> > directly reachable from the underlay network.
> >
> > EXPECTED PACKET FLOW:
> > --
> > Following is the expected packet flow inorder to support DHCP rleay 
> > functionality in OVN.
> >   1. DHCP client originates DHCP discovery (broadcast).
> >   2. DHCP relay (running on the OVN) receives the broadcast and 
> > forwards the packet to the DHCP server by converting it to unicast.
> >  While forwarding the packet, it updates the GIADDR in DHCP header 
> > to its interface IP on which DHCP packet is received and increments hop 
> > count.
> >   3. DHCP server uses GIADDR field to decide the IP address pool from 
> > which IP has to be assigned and DHCP offer is sent to the same IP (GIADDR).
> >   4. DHCP relay agent forwards the offer to the client.
> >   5. DHCP client sends DHCP request (broadcast) packet.
> >   6. DHCP relay (running on the OVN) receives the broadcast and 
> > forwards the packet to the DHCP server by converting it to unicast.
> >  While forwarding the packet, it updates the GIADDR in DHCP header 
> > to its interface IP on which DHCP packet is received.
> >   7. DHCP Server sends the ACK packet.
> >   8. DHCP relay agent forwards the ACK packet to the client.
> >   9. All the future renew/release packets are directly exchanged 
> > between DHCP client and DHCP server.
> >
> > OVN DHCP RELAY PACKET FLOW:
> > 
> > To add DHCP Relay support on OVN, we need to replicate all the behavior 
> > described above using distributed logical switch and logical router.
> > At, highlevel packet flow is distributed among Logical Switch and 
> > Logical Router on source node (where VM is deployed) and redirect 
> > chassis(RC) node.
> >   1. Request packet gets processed on the source node where VM is 
> > deployed and relays the packet to DHCP server.
> >   2. Response packet is first processed on RC node (which first 
> > recieves the packet from underlay network). RC node forwards the packet to 
> > the right node by filling in the dest MAC and IP.
> >
> > OVN Packet flow with DHCP relay is explained below.
> >   1. DHCP client (VM) sends the DHCP discover packet (broadcast).
> >   2. Logical switch converts the packet to L2 unicast by setting the 
> > destination MAC to LRP's MAC
> >   3. Logical Router receives the packet and redirects it to the OVN 
> > controller.
> >   4. OVN controller updates the required information(GIADDR, HOP count) 
> > in the DHCP payload after doing the required checks. If any check fails, 
> > packet is dropped.
> >   5. Logical Router converts the packet to L3 unicast and forwards it 
> > to the server. This packets gets routed like any other packet (via RC node).
> >   6. Server replies with DHCP offer.
> >   7. RC node processes the DHCP offer and forwards it to the OVN 
> > controller.
> >   8. OVN controller does sanity checks and  updates the destination MAC 
> > (available in DHCP header), destination IP (available in DHCP header) and 
> > reinjects the packet to datapath.
> >  If any check fails, packet is dropped.
> >   9. Logical router updates the source IP and port and forwards the 
> > packet to logical switch.
> >   10. Logical switch delivers the packet to the DHCP client.
> >   11. Similar steps are performed for Request and Ack packets.
> >   12. All the future renew/release packets are directly exchanged 
> > between DHCP client and DHCP server
> >
> > NEW OVN ACTIONS
> > ---
> >   1. dhcp_relay_req_chk(, )
> >   - This action executes on the source node on which the DHCP 
> > request originated.
> >   - This action relays the DHCP request coming from client to the 
> > server. Relay-ip is used to update GIADDR in the DHCP header.
> >   2. dhcp_relay_resp_chk(, )
> >   - This action executes on the first node (RC node) which 
> > processes the DHCP response from the server.
> >   - This action updates  the destination MAC and destination IP so 
> > that 

[ovs-dev] [PATCH v2 2/2] conntrack: Key connections by zone.

2024-04-24 Thread Felix Huettner via dev
Currently conntrack uses a single large cmap for all connections stored.
This cmap contains all connections for all conntrack zones which are
completely separate from each other. By separating each zone to its own
cmap we can significantly optimize the performance when using multiple
zones.

The change fixes a similar issue as [1] where slow conntrack zone flush
operations significantly slow down OVN router failover. The difference is
just that this fix is used whith dpdk, while [1] was when using the ovs
kernel module.

As we now need to store more cmap's the memory usage of struct conntrack
increases by 524280 bytes. Additionally we need 65535 cmaps with 128
bytes each. This leads to a total memory increase of around 10MB.

Running "./ovstest test-conntrack benchmark 4 33554432 32 1" shows no
real difference in the multithreading behaviour against a single zone.

Running the new "./ovstest test-conntrack benchmark-zones" show
significant speedups as shown below. The values for "ct execute" are for
acting on the complete zone with all its entries in total (so in the
first case adding 10,000 new conntrack entries). All tests are run 1000
times.

When running with 1,000 zones with 10,000 entries each we see the
following results (all in microseconds):
"./ovstest test-conntrack benchmark-zones 1 1000 1000"

 +--++-+-+
 |  Min |   Max  |  95%ile |   Avg   |
++--++-+-+
| ct execute (commit)|  || | |
|with commit | 2266 |   3505 | 2707.06 | 2592.06 |
| without commit | 2411 |  12730 | 4432.50 | 2736.78 |
++--++-+-+
| ct execute (no commit) |  || | |
|with commit |  699 |   1238 |  886.15 |  722.67 |
| without commit |  700 |   3377 | 1934.42 |  803.53 |
++--++-+-+
| flush full zone|  || | |
|with commit |  619 |   1122 |  901.36 |  679.15 |
| without commit |  618 | 105078 |   64591 | 2886.46 |
++--++-+-+
| flush empty zone   |  || | |
|with commit |0 |  5 |1.00 |0.64 |
| without commit |   54 |  87469 |   64520 | 2172.25 |
++--++-+-+

When running with 10,000 zones with 1,000 entries each we see the
following results (all in microseconds):
"./ovstest test-conntrack benchmark-zones 1000 1 1000"

 +--++-+-+
 |  Min |   Max  |  95%ile |   Avg   |
++--++-+-+
| ct execute (commit)|  || | |
|with commit |  215 |287 |  231.88 |  222.30 |
| without commit |  214 |   1692 |  569.18 |  285.83 |
++--++-+-+
| ct execute (no commit) |  || | |
|with commit |   68 | 97 |   74.69 |   70.09 |
| without commit |   68 |300 |  158.40 |   82.06 |
++--++-+-+
| flush full zone|  || | |
|with commit |   47 |211 |   56.34 |   50.34 |
| without commit |   48 |  96330 |   63392 |   63923 |
++--++-+-+
| flush empty zone   |  || | |
|with commit |0 |  1 |1.00 |0.44 |
| without commit |3 | 109728 |   63923 | 3629.44 |
++--++-+-+

Comparing the averages we see:
* a moderate performance improvement for conntrack_execute with or
  without commiting of around 6% to 23%
* a significant performance improvement for flushing a full zone of
  around 75% to 99%
* an even more significant improvement for flushing empty zones since we
  no longer need to check any unrelated connections

[1] 9ec849e8aa869b646c372fac552ae2609a4b5f66

Signed-off-by: Felix Huettner 
---
v1->v2: fix formatting

 lib/conntrack-private.h |  2 +-
 lib/conntrack.c | 81 +++--
 lib/conntrack.h |  1 +
 3 files changed, 56 insertions(+), 28 deletions(-)

diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index 3fd5fccd3..71367f211 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -200,7 +200,7 @@ enum ct_ephemeral_range {
 
 struct conntrack {
 struct ovs_mutex ct_lock; /* Protects 2 following fields. */
-struct cmap conns OVS_GUARDED;
+struct cmap conns[UINT16_MAX + 1] OVS_GUARDED;
 struct rculist exp_lists[N_EXP_LISTS];
 struct cmap zone_limits 

[ovs-dev] [PATCH v2 1/2] test-conntrack: Add per zone benchmark tool.

2024-04-24 Thread Felix Huettner via dev
The current test-conntrack benchmark command runs with multiple threads
against a single conntrack zone. We now add a new benchmark-zones
command that allows us to check the performance between multiple zones.

We in there test the following scenarios for one zone while other zones
also contain entries:
1. Flushing a single full zone
2. Flushing a single empty zone
3. Commiting new conntrack entries against a single zone
4. Running conntrack_execute without commit against the entries of a
   single zone

Signed-off-by: Felix Huettner 
---
v1->v2: fix formatting

 tests/test-conntrack.c | 181 +
 1 file changed, 166 insertions(+), 15 deletions(-)

diff --git a/tests/test-conntrack.c b/tests/test-conntrack.c
index 292b6c048..dc8d6cff9 100644
--- a/tests/test-conntrack.c
+++ b/tests/test-conntrack.c
@@ -25,36 +25,48 @@
 #include "ovstest.h"
 #include "pcap-file.h"
 #include "timeval.h"
+#include "stopwatch.h"
+
+#define STOPWATCH_CT_EXECUTE_COMMIT "ct-execute-commit"
+#define STOPWATCH_CT_EXECUTE_NO_COMMIT "ct-execute-no-commit"
+#define STOPWATCH_FLUSH_FULL_ZONE "full-zone"
+#define STOPWATCH_FLUSH_EMPTY_ZONE "empty-zone"
 
 static const char payload[] = "5054000a505400090800451c00"
   "11a4cd0a0101010a010102000100020008";
 
+static struct dp_packet *
+build_packet(uint16_t udp_src, uint16_t udp_dst, ovs_be16 *dl_type)
+{
+struct udp_header *udp;
+struct flow flow;
+struct dp_packet *pkt = dp_packet_new(sizeof payload / 2);
+
+dp_packet_put_hex(pkt, payload, NULL);
+flow_extract(pkt, );
+
+udp = dp_packet_l4(pkt);
+udp->udp_src = htons(udp_src);
+udp->udp_dst = htons(udp_dst);
+
+*dl_type = flow.dl_type;
+
+return pkt;
+}
+
 static struct dp_packet_batch *
 prepare_packets(size_t n, bool change, unsigned tid, ovs_be16 *dl_type)
 {
 struct dp_packet_batch *pkt_batch = xzalloc(sizeof *pkt_batch);
-struct flow flow;
 size_t i;
 
 ovs_assert(n <= ARRAY_SIZE(pkt_batch->packets));
 
 dp_packet_batch_init(pkt_batch);
 for (i = 0; i < n; i++) {
-struct udp_header *udp;
-struct dp_packet *pkt = dp_packet_new(sizeof payload/2);
-
-dp_packet_put_hex(pkt, payload, NULL);
-flow_extract(pkt, );
-
-udp = dp_packet_l4(pkt);
-udp->udp_src = htons(ntohs(udp->udp_src) + tid);
-
-if (change) {
-udp->udp_dst = htons(ntohs(udp->udp_dst) + i);
-}
-
+uint16_t udp_dst = change ? 2+1 : 2;
+struct dp_packet *pkt = build_packet(1 + tid, udp_dst, dl_type);
 dp_packet_batch_add(pkt_batch, pkt);
-*dl_type = flow.dl_type;
 }
 
 return pkt_batch;
@@ -153,6 +165,140 @@ test_benchmark(struct ovs_cmdl_context *ctx)
 free(threads);
 }
 
+static void
+test_benchmark_zones(struct ovs_cmdl_context *ctx)
+{
+unsigned long n_conns, n_zones, iterations;
+long long start;
+unsigned i, j;
+ovs_be16 dl_type;
+long long now = time_msec();
+
+fatal_signal_init();
+
+/* Parse arguments */
+n_conns = strtoul(ctx->argv[1], NULL, 0);
+if (n_conns == 0 || n_conns >= UINT32_MAX) {
+ovs_fatal(0, "n_conns must be between 1 and 2^32");
+}
+n_zones = strtoul(ctx->argv[2], NULL, 0);
+if (n_zones == 0 || n_zones >= UINT16_MAX) {
+ovs_fatal(0, "n_zones must be between 1 and 2^16");
+}
+iterations = strtoul(ctx->argv[3], NULL, 0);
+if (iterations == 0) {
+ovs_fatal(0, "iterations must be greater than 0");
+}
+
+ct = conntrack_init();
+
+/* Create initial connection entries */
+start = time_msec();
+struct dp_packet_batch **pkt_batch = xzalloc(n_conns * sizeof *pkt_batch);
+for (i = 0; i < n_conns; i++) {
+pkt_batch[i] = xzalloc(sizeof(struct dp_packet_batch));
+dp_packet_batch_init(pkt_batch[i]);
+uint16_t udp_src = (i & 0x) >> 16;
+if (udp_src == 0) {
+udp_src = UINT16_MAX;
+}
+uint16_t udp_dst = i & 0x;
+if (udp_dst == 0) {
+udp_dst = UINT16_MAX;
+}
+struct dp_packet *pkt = build_packet(udp_src, udp_dst, _type);
+dp_packet_batch_add(pkt_batch[i], pkt);
+}
+printf("initial packet generation time: %lld ms\n", time_msec() - start);
+
+/* Put initial entries to each zone */
+start = time_msec();
+for (i = 0; i < n_zones; i++) {
+for (j = 0; j < n_conns; j++) {
+conntrack_execute(ct, pkt_batch[j], dl_type, false, true, i,
+  NULL, NULL, NULL, NULL, now, 0);
+pkt_metadata_init_conn(_batch[j]->packets[0]->md);
+}
+}
+printf("initial insert time: %lld ms\n", time_msec() - start);
+
+/* Actually run the tests */
+stopwatch_create(STOPWATCH_CT_EXECUTE_COMMIT, SW_US);
+stopwatch_create(STOPWATCH_CT_EXECUTE_NO_COMMIT, SW_US);
+stopwatch_create(STOPWATCH_FLUSH_FULL_ZONE, SW_US);

Re: [ovs-dev] [PATCH 2/2] conntrack: Key connections by zone.

2024-04-24 Thread 0-day Robot
Bleep bloop.  Greetings Felix Huettner, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Line lacks whitespace around operator
#104 FILE: lib/conntrack-private.h:203:
struct cmap conns[UINT16_MAX+1] OVS_GUARDED;

Lines checked: 283, Warnings: 1, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/2] test-conntrack: add per zone benchmark tool

2024-04-24 Thread 0-day Robot
Bleep bloop.  Greetings Felix Huettner, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: The subject summary should start with a capital.
WARNING: The subject summary should end with a dot.
Subject: test-conntrack: add per zone benchmark tool
WARNING: Line lacks whitespace around operator
#46 FILE: tests/test-conntrack.c:43:
struct dp_packet *pkt = dp_packet_new(sizeof payload/2);

WARNING: Line is 84 characters long (recommended limit is 79)
#196 FILE: tests/test-conntrack.c:269:
stopwatch_get_stats(STOPWATCH_CT_EXECUTE_NO_COMMIT, 
_ct_execute_nocommit);

WARNING: Line has trailing whitespace
#207 FILE: tests/test-conntrack.c:280:
printf("| Min| %16llu us | %19llu us | %12llu us | %13llu us |\n", 

WARNING: Line has trailing whitespace
#210 FILE: tests/test-conntrack.c:283:
printf("| Max| %16llu us | %19llu us | %12llu us | %13llu us |\n", 

WARNING: Line has trailing whitespace
#213 FILE: tests/test-conntrack.c:286:
printf("| 95%%ile | %16.2f us | %19.2f us | %12.2f us | %13.2f us |\n", 

WARNING: Line has trailing whitespace
#216 FILE: tests/test-conntrack.c:289:
printf("| Avg| %16.2f us | %19.2f us | %12.2f us | %13.2f us |\n", 

Lines checked: 246, Warnings: 8, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [RFC ovn] northd: Add bfd and bfd_consumer nodes to I-P engine.

2024-04-24 Thread Lorenzo Bianconi
Introduce bfd and bfd_consumer nodes to northd I-P engine to track bfd
connections and northd static_route/policy_route changes.

Signed-off-by: Lorenzo Bianconi 
---
 northd/en-lflow.c|  19 +--
 northd/en-northd.c   |  92 +
 northd/en-northd.h   |   8 ++
 northd/inc-proc-northd.c |  21 ++-
 northd/northd.c  | 274 ---
 northd/northd.h  |  48 ++-
 6 files changed, 344 insertions(+), 118 deletions(-)

diff --git a/northd/en-lflow.c b/northd/en-lflow.c
index c4b927fb8..9efbd5117 100644
--- a/northd/en-lflow.c
+++ b/northd/en-lflow.c
@@ -41,6 +41,9 @@ lflow_get_input_data(struct engine_node *node,
  struct lflow_input *lflow_input)
 {
 struct northd_data *northd_data = engine_get_input_data("northd", node);
+struct bfd_data *bfd_data = engine_get_input_data("bfd", node);
+struct bfd_consumer_data *bfd_consumer_data =
+engine_get_input_data("bfd_consumer", node);
 struct port_group_data *pg_data =
 engine_get_input_data("port_group", node);
 struct sync_meters_data *sync_meters_data =
@@ -78,7 +81,10 @@ lflow_get_input_data(struct engine_node *node,
 lflow_input->meter_groups = _meters_data->meter_groups;
 lflow_input->lb_datapaths_map = _data->lb_datapaths_map;
 lflow_input->svc_monitor_map = _data->svc_monitor_map;
-lflow_input->bfd_connections = NULL;
+lflow_input->bfd_connections = _data->bfd_connections;
+lflow_input->parsed_routes = _consumer_data->parsed_routes;
+lflow_input->route_tables = _consumer_data->route_tables;
+lflow_input->route_policies = _consumer_data->route_policies;
 
 struct ed_type_global_config *global_config =
 engine_get_input_data("global_config", node);
@@ -95,25 +101,14 @@ void en_lflow_run(struct engine_node *node, void *data)
 struct lflow_input lflow_input;
 lflow_get_input_data(node, _input);
 
-struct hmap bfd_connections = HMAP_INITIALIZER(_connections);
-lflow_input.bfd_connections = _connections;
-
 stopwatch_start(BUILD_LFLOWS_STOPWATCH_NAME, time_msec());
 
 struct lflow_data *lflow_data = data;
 lflow_table_clear(lflow_data->lflow_table);
 lflow_reset_northd_refs(_input);
 
-build_bfd_table(eng_ctx->ovnsb_idl_txn,
-lflow_input.nbrec_bfd_table,
-lflow_input.sbrec_bfd_table,
-lflow_input.lr_ports,
-_connections);
 build_lflows(eng_ctx->ovnsb_idl_txn, _input,
  lflow_data->lflow_table);
-bfd_cleanup_connections(lflow_input.nbrec_bfd_table,
-_connections);
-hmap_destroy(_connections);
 stopwatch_stop(BUILD_LFLOWS_STOPWATCH_NAME, time_msec());
 
 engine_set_node_state(node, EN_UPDATED);
diff --git a/northd/en-northd.c b/northd/en-northd.c
index 4479b4aff..2d8c05608 100644
--- a/northd/en-northd.c
+++ b/northd/en-northd.c
@@ -236,6 +236,66 @@ northd_global_config_handler(struct engine_node *node, 
void *data OVS_UNUSED)
 return true;
 }
 
+void
+en_bfd_consumer_run(struct engine_node *node, void *data)
+{
+
+struct northd_data *northd_data = engine_get_input_data("northd", node);
+struct bfd_data *bfd_data = engine_get_input_data("bfd", node);
+const struct nbrec_bfd_table *nbrec_bfd_table =
+EN_OVSDB_GET(engine_get_input("NB_bfd", node));
+struct bfd_consumer_data *bfd_consumer_data = data;
+
+bfd_consumer_destroy(data);
+bfd_consumer_init(data);
+
+struct ovn_datapath *od;
+HMAP_FOR_EACH (od, key_node, _data->lr_datapaths.datapaths) {
+for (int i = 0; i < od->nbr->n_ports; i++) {
+const char *route_table_name =
+smap_get(>nbr->ports[i]->options, "route_table");
+get_route_table_id(_consumer_data->route_tables,
+   route_table_name);
+}
+for (int i = 0; i < od->nbr->n_static_routes; i++) {
+parsed_routes_add(od, _data->lr_ports,
+  _consumer_data->parsed_routes,
+  _consumer_data->route_tables,
+  od->nbr->static_routes[i],
+  _data->bfd_connections);
+}
+build_route_policies(od, _data->lr_ports,
+ _data->bfd_connections,
+ _consumer_data->route_policies);
+}
+bfd_cleanup_connections(nbrec_bfd_table, _data->bfd_connections);
+
+engine_set_node_state(node, EN_UPDATED);
+}
+
+void
+en_bfd_run(struct engine_node *node, void *data)
+{
+struct northd_data *northd_data = engine_get_input_data("northd", node);
+const struct engine_context *eng_ctx = engine_get_context();
+struct bfd_data *bfd_data = data;
+const struct nbrec_bfd_table *nbrec_bfd_table =
+EN_OVSDB_GET(engine_get_input("NB_bfd", node));
+const struct sbrec_bfd_table 

[ovs-dev] [PATCH 2/2] conntrack: Key connections by zone.

2024-04-24 Thread Felix Huettner via dev
Currently conntrack uses a single large cmap for all connections stored.
This cmap contains all connections for all conntrack zones which are
completely separate from each other. By separating each zone to its own
cmap we can significantly optimize the performance when using multiple
zones.

The change fixes a similar issue as [1] where slow conntrack zone flush
operations significantly slow down OVN router failover. The difference is
just that this fix is used whith dpdk, while [1] was when using the ovs
kernel module.

As we now need to store more cmap's the memory usage of struct conntrack
increases by 524280 bytes. Additionally we need 65535 cmaps with 128
bytes each. This leads to a total memory increase of around 10MB.

Running "./ovstest test-conntrack benchmark 4 33554432 32 1" shows no
real difference in the multithreading behaviour against a single zone.

Running the new "./ovstest test-conntrack benchmark-zones" show
significant speedups as shown below. The values for "ct execute" are for
acting on the complete zone with all its entries in total (so in the
first case adding 10,000 new conntrack entries). All tests are run 1000
times.

When running with 1,000 zones with 10,000 entries each we see the
following results (all in microseconds):
"./ovstest test-conntrack benchmark-zones 1 1000 1000"

 +--++-+-+
 |  Min |   Max  |  95%ile |   Avg   |
++--++-+-+
| ct execute (commit)|  || | |
|with commit | 2266 |   3505 | 2707.06 | 2592.06 |
| without commit | 2411 |  12730 | 4432.50 | 2736.78 |
++--++-+-+
| ct execute (no commit) |  || | |
|with commit |  699 |   1238 |  886.15 |  722.67 |
| without commit |  700 |   3377 | 1934.42 |  803.53 |
++--++-+-+
| flush full zone|  || | |
|with commit |  619 |   1122 |  901.36 |  679.15 |
| without commit |  618 | 105078 |   64591 | 2886.46 |
++--++-+-+
| flush empty zone   |  || | |
|with commit |0 |  5 |1.00 |0.64 |
| without commit |   54 |  87469 |   64520 | 2172.25 |
++--++-+-+

When running with 10,000 zones with 1,000 entries each we see the
following results (all in microseconds):
"./ovstest test-conntrack benchmark-zones 1000 1 1000"

 +--++-+-+
 |  Min |   Max  |  95%ile |   Avg   |
++--++-+-+
| ct execute (commit)|  || | |
|with commit |  215 |287 |  231.88 |  222.30 |
| without commit |  214 |   1692 |  569.18 |  285.83 |
++--++-+-+
| ct execute (no commit) |  || | |
|with commit |   68 | 97 |   74.69 |   70.09 |
| without commit |   68 |300 |  158.40 |   82.06 |
++--++-+-+
| flush full zone|  || | |
|with commit |   47 |211 |   56.34 |   50.34 |
| without commit |   48 |  96330 |   63392 |   63923 |
++--++-+-+
| flush empty zone   |  || | |
|with commit |0 |  1 |1.00 |0.44 |
| without commit |3 | 109728 |   63923 | 3629.44 |
++--++-+-+

Comparing the averages we see:
* a moderate performance improvement for conntrack_execute with or
  without commiting of around 6% to 23%
* a significant performance improvement for flushing a full zone of
  around 75% to 99%
* an even more significant improvement for flushing empty zones since we
  no longer need to check any unrelated connections

[1] 9ec849e8aa869b646c372fac552ae2609a4b5f66

Signed-off-by: Felix Huettner 
---
 lib/conntrack-private.h |  2 +-
 lib/conntrack.c | 81 +++--
 lib/conntrack.h |  1 +
 3 files changed, 56 insertions(+), 28 deletions(-)

diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index 3fd5fccd3..19f1f42a0 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -200,7 +200,7 @@ enum ct_ephemeral_range {
 
 struct conntrack {
 struct ovs_mutex ct_lock; /* Protects 2 following fields. */
-struct cmap conns OVS_GUARDED;
+struct cmap conns[UINT16_MAX+1] OVS_GUARDED;
 struct rculist exp_lists[N_EXP_LISTS];
 struct cmap zone_limits OVS_GUARDED;
 struct cmap 

[ovs-dev] [PATCH 1/2] test-conntrack: add per zone benchmark tool

2024-04-24 Thread Felix Huettner via dev
The current test-conntrack benchmark command runs with multiple threads
against a single conntrack zone. We now add a new benchmark-zones
command that allows us to check the performance between multiple zones.

We in there test the following scenarios for one zone while other zones
also contain entries:
1. Flushing a single full zone
2. Flushing a single empty zone
3. Commiting new conntrack entries against a single zone
4. Running conntrack_execute without commit against the entries of a
   single zone

Signed-off-by: Felix Huettner 
---
 tests/test-conntrack.c | 180 +
 1 file changed, 165 insertions(+), 15 deletions(-)

diff --git a/tests/test-conntrack.c b/tests/test-conntrack.c
index 292b6c048..9ec3d0a61 100644
--- a/tests/test-conntrack.c
+++ b/tests/test-conntrack.c
@@ -25,36 +25,48 @@
 #include "ovstest.h"
 #include "pcap-file.h"
 #include "timeval.h"
+#include "stopwatch.h"
+
+#define STOPWATCH_CT_EXECUTE_COMMIT "ct-execute-commit"
+#define STOPWATCH_CT_EXECUTE_NO_COMMIT "ct-execute-no-commit"
+#define STOPWATCH_FLUSH_FULL_ZONE "full-zone"
+#define STOPWATCH_FLUSH_EMPTY_ZONE "empty-zone"
 
 static const char payload[] = "5054000a505400090800451c00"
   "11a4cd0a0101010a010102000100020008";
 
+static struct dp_packet *
+build_packet(uint16_t udp_src, uint16_t udp_dst, ovs_be16 *dl_type)
+{
+struct udp_header *udp;
+struct flow flow;
+struct dp_packet *pkt = dp_packet_new(sizeof payload/2);
+
+dp_packet_put_hex(pkt, payload, NULL);
+flow_extract(pkt, );
+
+udp = dp_packet_l4(pkt);
+udp->udp_src = htons(udp_src);
+udp->udp_dst = htons(udp_dst);
+
+*dl_type = flow.dl_type;
+
+return pkt;
+}
+
 static struct dp_packet_batch *
 prepare_packets(size_t n, bool change, unsigned tid, ovs_be16 *dl_type)
 {
 struct dp_packet_batch *pkt_batch = xzalloc(sizeof *pkt_batch);
-struct flow flow;
 size_t i;
 
 ovs_assert(n <= ARRAY_SIZE(pkt_batch->packets));
 
 dp_packet_batch_init(pkt_batch);
 for (i = 0; i < n; i++) {
-struct udp_header *udp;
-struct dp_packet *pkt = dp_packet_new(sizeof payload/2);
-
-dp_packet_put_hex(pkt, payload, NULL);
-flow_extract(pkt, );
-
-udp = dp_packet_l4(pkt);
-udp->udp_src = htons(ntohs(udp->udp_src) + tid);
-
-if (change) {
-udp->udp_dst = htons(ntohs(udp->udp_dst) + i);
-}
-
+uint16_t udp_dst = change ? 2+1 : 2;
+struct dp_packet *pkt = build_packet(1 + tid, udp_dst, dl_type);
 dp_packet_batch_add(pkt_batch, pkt);
-*dl_type = flow.dl_type;
 }
 
 return pkt_batch;
@@ -153,6 +165,139 @@ test_benchmark(struct ovs_cmdl_context *ctx)
 free(threads);
 }
 
+static void
+test_benchmark_zones(struct ovs_cmdl_context *ctx)
+{
+unsigned long n_conns, n_zones, iterations;
+long long start;
+unsigned i, j;
+ovs_be16 dl_type;
+long long now = time_msec();
+
+fatal_signal_init();
+
+/* Parse arguments */
+n_conns = strtoul(ctx->argv[1], NULL, 0);
+if (n_conns == 0 || n_conns >= UINT32_MAX) {
+ovs_fatal(0, "n_conns must be between 1 and 2^32");
+}
+n_zones = strtoul(ctx->argv[2], NULL, 0);
+if (n_zones == 0 || n_zones >= UINT16_MAX) {
+ovs_fatal(0, "n_zones must be between 1 and 2^16");
+}
+iterations = strtoul(ctx->argv[3], NULL, 0);
+if (iterations == 0) {
+ovs_fatal(0, "iterations must be greater than 0");
+}
+
+ct = conntrack_init();
+
+/* Create initial connection entries */
+start = time_msec();
+struct dp_packet_batch **pkt_batch = xzalloc(n_conns * sizeof *pkt_batch);
+for (i = 0; i < n_conns; i++) {
+pkt_batch[i] = xzalloc(sizeof(struct dp_packet_batch));
+dp_packet_batch_init(pkt_batch[i]);
+uint16_t udp_src = (i & 0x) >> 16;
+if (udp_src == 0) {
+udp_src = UINT16_MAX;
+}
+uint16_t udp_dst = i & 0x;
+if (udp_dst == 0) {
+udp_dst = UINT16_MAX;
+}
+struct dp_packet *pkt = build_packet(udp_src, udp_dst, _type);
+dp_packet_batch_add(pkt_batch[i], pkt);
+}
+printf("initial packet generation time: %lld ms\n", time_msec() - start);
+
+/* Put initial entries to each zone */
+start = time_msec();
+for (i = 0; i < n_zones; i++) {
+for (j = 0; j < n_conns; j++) {
+conntrack_execute(ct, pkt_batch[j], dl_type, false, true, i,
+  NULL, NULL, NULL, NULL, now, 0);
+pkt_metadata_init_conn(_batch[j]->packets[0]->md);
+}
+}
+printf("initial insert time: %lld ms\n", time_msec() - start);
+
+/* Actually run the tests */
+stopwatch_create(STOPWATCH_CT_EXECUTE_COMMIT, SW_US);
+stopwatch_create(STOPWATCH_CT_EXECUTE_NO_COMMIT, SW_US);
+stopwatch_create(STOPWATCH_FLUSH_FULL_ZONE, SW_US);
+

Re: [ovs-dev] [PATCH OVN v6 3/3] northd, tests: DHCP Relay Agent support for overlay IPv4 subnets.

2024-04-24 Thread 0-day Robot
References:  <20240424095607.129155-4-naveen.yerramn...@nutanix.com>
 

Bleep bloop.  Greetings Naveen Yerramneni, I am a robot and I have tried out 
your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Line is 101 characters long (recommended limit is 79)
#136 FILE: northd/northd.c:237:
 * | R2 REG_DHCP_RELAY_DIP_IPV4  | X | | 0 |
|

Lines checked: 1474, Warnings: 1, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH OVN v6 3/3] northd, tests: DHCP Relay Agent support for overlay IPv4 subnets.

2024-04-24 Thread Naveen Yerramneni
NB SCHEMA CHANGES
-
  1. New DHCP_Relay table
  "DHCP_Relay": {
"columns": {
"name": {"type": "string"},
"servers": {"type": {"key": "string",
   "min": 0,
   "max": 1}},
"external_ids": {
"type": {"key": "string", "value": "string",
"min": 0, "max": "unlimited"}}},
"options": {"type": {"key": "string", "value": "string",
"min": 0, "max": "unlimited"}},
"isRoot": true},
  2. New column to Logical_Router_Port table
  "dhcp_relay": {"type": {"key": {"type": "uuid",
"refTable": "DHCP_Relay",
"refType": "strong"},
"min": 0,
"max": 1}},

NEW PIPELINE STAGES
---
Following stage is added for DHCP relay feature.
Some of the flows are fitted into the existing pipeline tages.
  1. lr_in_dhcp_relay_req
   - This stage process the DHCP request packets coming from DHCP clients.
   - DHCP request packets for which dhcp_relay_req_chk action
 (which gets applied in ip input stage) is successful are forwarded to 
DHCP server.
   - DHCP request packets for which dhcp_relay_req_chk action is 
unsuccessful gets dropped.
  2. lr_in_dhcp_relay_resp_chk
   - This stage applied the dhcp_relay_resp_chk action for  DHCP response 
packets coming
 from the DHCP server.
  3. lr_in_dhcp_relay_resp
   - DHCP response packets for which dhcp_relay_resp_chk is sucessful are 
forwarded
 to the DHCP clients.
   - DHCP response packets for which dhcp_relay_resp_chk is unsucessful 
gets dropped.

REGISTRY USAGE
---
  - reg9[7] : To store the result of dhcp_relay_req_chk action.
  - reg9[8] : To store the result of dhcp_relay_resp_chk action.
  - reg2 : To store the original dest ip for DHCP response packets.
   This is required to properly match the packets in
   lr_in_dhcp_relay_resp stage since dhcp_relay_resp_chk action
   changes the dest ip.

FLOWS
-

Following are the flows added when DHCP Relay is configured on one overlay 
subnet,
one additonal flow is added in ls_in_l2_lkup table for each VM part of the 
subnet.

  1. table=27(ls_in_l2_lkup  ), priority=100  , match=(inport ==  
&& eth.src ==  && ip4.src == 0.0.0.0 && ip4.dst == 255.255.255.255 && 
udp.src == 68 && udp.dst == 67),
 action=(eth.dst=;outport=;next;/* DHCP_RELAY_REQ */)
  2. table=3 (lr_in_ip_input ), priority=110  , match=(inport ==  && 
ip4.src == 0.0.0.0 && ip4.dst == 255.255.255.255 && ip.frag == 0 && udp.src == 
68 && udp.dst == 67),
 action=(reg9[7] = dhcp_relay_req_chk(, );next; /* 
DHCP_RELAY_REQ */)
  3. table=3 (lr_in_ip_input ), priority=110  , match=(ip4.src == 
 && ip4.dst ==  && udp.src == 67 && udp.dst == 67), 
action=(next;/* DHCP_RELAY_RESP */)
  4. table=4 (lr_in_dhcp_relay_req), priority=100  , match=(inport == "lrp1" && 
ip4.src == 0.0.0.0 && ip4.dst == 255.255.255.255 && udp.src == 68 && udp.dst == 
67 && reg9[7]),
 action=(ip4.src=;ip4.dst=;udp.src=67;next; /* 
DHCP_RELAY_REQ */)
  5. table=4 (lr_in_dhcp_relay_req), priority=1, match=(inport ==  && 
ip4.src == 0.0.0.0 && ip4.dst == 255.255.255.255 && udp.src == 68 && udp.dst == 
67 && reg9[7] == 0),
 action=(drop; /* DHCP_RELAY_REQ */)
  6. table=18(lr_in_dhcp_relay_resp_chk), priority=100  , match=(ip4.src == 
 && ip4.dst ==  && ip.frag == 0 && udp.src == 67 && udp.dst 
== 67),
 action=(reg2 = ip4.dst;reg9[8] = dhcp_relay_resp_chk(, 
);next;/* DHCP_RELAY_RESP */)
  7. table=19(lr_in_dhcp_relay_resp), priority=100  , match=(ip4.src == 
 && reg2 ==  && udp.src == 67 && udp.dst == 67 && reg9[8]),
 action=(ip4.src=;udp.dst=68;outport=;output; /* DHCP_RELAY_RESP 
*/)
  8. table=19(lr_in_dhcp_relay_resp), priority=1, match=(ip4.src == 
 && reg2 ==  && udp.src == 67 && udp.dst == 67 && reg9[8] 
== 0), action=(drop; /* DHCP_RELAY_RESP */)

Commands to enable the feature
--
  ovn-nbctl create DHCP_Relay name= servers=
  ovn-nbctl set Logical_Router_port  dhcp_relay=
  ovn-nbctl set Logical_Switch  
other_config:dhcp_relay_port=

Limitations:

  - All OVN features that needs IP address to be configured on logical port 
(like proxy arp, etc)
will not be supported for overlay subnets on which DHCP relay is enabled.

Signed-off-by: Naveen Yerramneni 
Co-authored-by: Huzaifa Calcuttawala 
Signed-off-by: Huzaifa Calcuttawala 
CC: Mary Manohar 
---
 northd/northd.c | 271 +++-
 northd/northd.h |  41 +++---
 northd/ovn-northd.8.xml | 211 +++
 ovn-nb.ovsschema|  21 +++-
 ovn-nb.xml  |  39 ++
 tests/atlocal.in|   3 +
 tests/ovn-northd.at |  38 ++

[ovs-dev] [PATCH OVN v6 2/3] controller: DHCP Relay Agent support for overlay IPv4 subnets.

2024-04-24 Thread Naveen Yerramneni
Added changes in pinctrl to process DHCP Relay opcodes:
  - ACTION_OPCODE_DHCP_RELAY_REQ_CHK: For request packets
  - ACTION_OPCODE_DHCP_RELAY_RESP_CHK: For response packet

Signed-off-by: Naveen Yerramneni 
---
 controller/pinctrl.c | 597 ++-
 lib/ovn-l7.h |   2 +
 2 files changed, 530 insertions(+), 69 deletions(-)

diff --git a/controller/pinctrl.c b/controller/pinctrl.c
index aa73facbf..50e090cd2 100644
--- a/controller/pinctrl.c
+++ b/controller/pinctrl.c
@@ -1993,6 +1993,515 @@ is_dhcp_flags_broadcast(ovs_be16 flags)
 return flags & htons(DHCP_BROADCAST_FLAG);
 }
 
+static const char *dhcp_msg_str[] = {
+[0] = "INVALID",
+[DHCP_MSG_DISCOVER] = "DISCOVER",
+[DHCP_MSG_OFFER] = "OFFER",
+[DHCP_MSG_REQUEST] = "REQUEST",
+[OVN_DHCP_MSG_DECLINE] = "DECLINE",
+[DHCP_MSG_ACK] = "ACK",
+[DHCP_MSG_NAK] = "NAK",
+[OVN_DHCP_MSG_RELEASE] = "RELEASE",
+[OVN_DHCP_MSG_INFORM] = "INFORM"
+};
+
+static bool
+dhcp_relay_is_msg_type_supported(uint8_t msg_type)
+{
+return (msg_type >= DHCP_MSG_DISCOVER && msg_type <= OVN_DHCP_MSG_RELEASE);
+}
+
+static const char *dhcp_msg_str_get(uint8_t msg_type)
+{
+if (!dhcp_relay_is_msg_type_supported(msg_type)) {
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
+VLOG_WARN_RL(, "Unknown DHCP msg type: %u", msg_type);
+return "UNKNOWN";
+}
+return dhcp_msg_str[msg_type];
+}
+
+static const struct dhcp_header *
+dhcp_get_hdr_from_pkt(struct dp_packet *pkt_in, const char **in_dhcp_pptr,
+  const char *end)
+{
+/* Validate the DHCP request packet.
+ * Format of the DHCP packet is
+ * ---
+ *| UDP HEADER | DHCP HEADER | 4 Byte DHCP Cookie | DHCP OPTIONS(var len) |
+ * ---
+ */
+
+*in_dhcp_pptr = dp_packet_get_udp_payload(pkt_in);
+if (*in_dhcp_pptr == NULL) {
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
+VLOG_WARN_RL(, "DHCP: Invalid or incomplete DHCP packet received");
+return NULL;
+}
+
+const struct dhcp_header *dhcp_hdr
+= (const struct dhcp_header *) *in_dhcp_pptr;
+(*in_dhcp_pptr) += sizeof *dhcp_hdr;
+if (*in_dhcp_pptr > end) {
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
+VLOG_WARN_RL(, "DHCP: Invalid or incomplete DHCP packet received, "
+ "bad data length");
+return NULL;
+}
+
+if (dhcp_hdr->htype != 0x1) {
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
+VLOG_WARN_RL(, "DHCP: Packet is recieved with "
+ "unsupported hardware type");
+return NULL;
+}
+
+if (dhcp_hdr->hlen != 0x6) {
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
+VLOG_WARN_RL(, "DHCP: Packet is recieved with "
+ "unsupported hardware length");
+return NULL;
+}
+
+/* DHCP options follow the DHCP header. The first 4 bytes of the DHCP
+ * options is the DHCP magic cookie followed by the actual DHCP options.
+ */
+ovs_be32 magic_cookie = htonl(DHCP_MAGIC_COOKIE);
+if ((*in_dhcp_pptr) + sizeof magic_cookie > end ||
+get_unaligned_be32((const void *) (*in_dhcp_pptr)) != magic_cookie) {
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
+VLOG_WARN_RL(, "DHCP: Magic cookie not present in the DHCP packet");
+return NULL;
+}
+
+(*in_dhcp_pptr) += sizeof magic_cookie;
+
+return dhcp_hdr;
+}
+
+static void
+dhcp_parse_options(const char **in_dhcp_pptr, const char *end,
+   const uint8_t **dhcp_msg_type_pptr,
+   ovs_be32 *request_ip_ptr,
+   bool *ipxe_req_ptr, ovs_be32 *server_id_ptr,
+   ovs_be32 *netmask_ptr, ovs_be32 *router_ip_ptr)
+{
+while ((*in_dhcp_pptr) < end) {
+const struct dhcp_opt_header *in_dhcp_opt =
+(const struct dhcp_opt_header *) *in_dhcp_pptr;
+if (in_dhcp_opt->code == DHCP_OPT_END) {
+break;
+}
+if (in_dhcp_opt->code == DHCP_OPT_PAD) {
+(*in_dhcp_pptr) += 1;
+continue;
+}
+(*in_dhcp_pptr) += sizeof *in_dhcp_opt;
+if ((*in_dhcp_pptr) > end) {
+break;
+}
+(*in_dhcp_pptr) += in_dhcp_opt->len;
+if ((*in_dhcp_pptr) > end) {
+break;
+}
+
+switch (in_dhcp_opt->code) {
+case DHCP_OPT_MSG_TYPE:
+if (dhcp_msg_type_pptr && in_dhcp_opt->len == 1) {
+*dhcp_msg_type_pptr = DHCP_OPT_PAYLOAD(in_dhcp_opt);
+}
+break;
+case DHCP_OPT_REQ_IP:
+if (request_ip_ptr && in_dhcp_opt->len == 4) {
+*request_ip_ptr = get_unaligned_be32(
+

[ovs-dev] [PATCH OVN v6 1/3] actions: DHCP Relay Agent support for overlay IPv4 subnets.

2024-04-24 Thread Naveen Yerramneni
NEW OVN ACTIONS
---
  1. dhcp_relay_req_chk(, )
   - This action executes on the source node on which the DHCP request 
originated.
   - This action relays the DHCP request coming from client to the server.
 Relay-ip is used to update GIADDR in the DHCP header.
  2. dhcp_relay_resp_chk(, )
   - This action executes on the first node (RC node) which processes
 the DHCP response from the server.
   - This action updates  the destination MAC and destination IP so that 
the response
 can be forwarded to the appropriate node from which request was 
originated.
   - Relay-ip, server-ip are used to validate GIADDR and SERVER ID in the 
DHCP payload.

Signed-off-by: Naveen Yerramneni 
---
 include/ovn/actions.h |  27 ++
 lib/actions.c | 116 ++
 ovn-sb.xml|  62 ++
 tests/ovn.at  |  34 +
 utilities/ovn-trace.c |  67 
 5 files changed, 306 insertions(+)

diff --git a/include/ovn/actions.h b/include/ovn/actions.h
index f697dff39..ab2f3856c 100644
--- a/include/ovn/actions.h
+++ b/include/ovn/actions.h
@@ -96,6 +96,8 @@ struct collector_set_ids;
 OVNACT(LOOKUP_ND_IP,  ovnact_lookup_mac_bind_ip) \
 OVNACT(PUT_DHCPV4_OPTS,   ovnact_put_opts)\
 OVNACT(PUT_DHCPV6_OPTS,   ovnact_put_opts)\
+OVNACT(DHCPV4_RELAY_REQ_CHK,  ovnact_dhcp_relay)  \
+OVNACT(DHCPV4_RELAY_RESP_CHK, ovnact_dhcp_relay)  \
 OVNACT(SET_QUEUE, ovnact_set_queue)   \
 OVNACT(DNS_LOOKUP,ovnact_result)  \
 OVNACT(LOG,   ovnact_log) \
@@ -389,6 +391,15 @@ struct ovnact_put_opts {
 size_t n_options;
 };
 
+/* OVNACT_DHCP_RELAY. */
+struct ovnact_dhcp_relay {
+struct ovnact ovnact;
+int family;
+struct expr_field dst;  /* 1-bit destination field. */
+ovs_be32 relay_ipv4;
+ovs_be32 server_ipv4;
+};
+
 /* Valid arguments to SET_QUEUE action.
  *
  * QDISC_MIN_QUEUE_ID is the default queue, so user-defined queues should
@@ -765,6 +776,22 @@ enum action_opcode {
 
 /* multicast group split buffer action. */
 ACTION_OPCODE_MG_SPLIT_BUF,
+
+/* "dhcp_relay_req_chk(relay_ip, server_ip)".
+ *
+ * Arguments follow the action_header, in this format:
+ *   - The 32-bit DHCP relay IP.
+ *   - The 32-bit DHCP server IP.
+ */
+ACTION_OPCODE_DHCP_RELAY_REQ_CHK,
+
+/* "dhcp_relay_resp_chk(relay_ip, server_ip)".
+ *
+ * Arguments follow the action_header, in this format:
+ *   - The 32-bit DHCP relay IP.
+ *   - The 32-bit DHCP server IP.
+ */
+ACTION_OPCODE_DHCP_RELAY_RESP_CHK,
 };
 
 /* Header. */
diff --git a/lib/actions.c b/lib/actions.c
index 361d55009..6cd60366a 100644
--- a/lib/actions.c
+++ b/lib/actions.c
@@ -1869,6 +1869,8 @@ is_paused_nested_action(enum action_opcode opcode)
 case ACTION_OPCODE_BFD_MSG:
 case ACTION_OPCODE_ACTIVATION_STRATEGY_RARP:
 case ACTION_OPCODE_MG_SPLIT_BUF:
+case ACTION_OPCODE_DHCP_RELAY_REQ_CHK:
+case ACTION_OPCODE_DHCP_RELAY_RESP_CHK:
 default:
 return false;
 }
@@ -2610,6 +2612,114 @@ ovnact_controller_event_free(struct 
ovnact_controller_event *event)
 free_gen_options(event->options, event->n_options);
 }
 
+static void
+format_dhcpv4_relay_chk(const char *name,
+const struct ovnact_dhcp_relay *dhcp_relay,
+struct ds *s)
+{
+expr_field_format(_relay->dst, s);
+ds_put_format(s, " = %s("IP_FMT", "IP_FMT");",
+  name,
+  IP_ARGS(dhcp_relay->relay_ipv4),
+  IP_ARGS(dhcp_relay->server_ipv4));
+}
+
+static void
+parse_dhcp_relay_chk(struct action_context *ctx,
+ const struct expr_field *dst,
+ struct ovnact_dhcp_relay *dhcp_relay)
+{
+/* Skip dhcp_relay_req_chk/dhcp_relay_resp_chk( */
+lexer_force_match(ctx->lexer, LEX_T_LPAREN);
+
+/* Validate that the destination is a 1-bit, modifiable field. */
+char *error = expr_type_check(dst, 1, true, ctx->scope);
+if (error) {
+lexer_error(ctx->lexer, "%s", error);
+free(error);
+return;
+}
+dhcp_relay->dst = *dst;
+
+/* Parse relay ip and server ip. */
+if (ctx->lexer->token.format == LEX_F_IPV4) {
+dhcp_relay->family = AF_INET;
+dhcp_relay->relay_ipv4 = ctx->lexer->token.value.ipv4;
+lexer_get(ctx->lexer);
+lexer_match(ctx->lexer, LEX_T_COMMA);
+if (ctx->lexer->token.format == LEX_F_IPV4) {
+dhcp_relay->family = AF_INET;
+dhcp_relay->server_ipv4 = ctx->lexer->token.value.ipv4;
+lexer_get(ctx->lexer);
+} else {
+lexer_syntax_error(ctx->lexer, "expecting IPv4 dhcp server ip");
+return;
+}
+} else {
+lexer_syntax_error(ctx->lexer, "expecting IPv4 dhcp 

[ovs-dev] [PATCH OVN v6 0/3] DHCP Relay Agent support for overlay subnets.

2024-04-24 Thread Naveen Yerramneni
This patch contains changes to enable DHCP Relay Agent support for overlay 
subnets.

USE CASE:
--
  - Enable IP address assignment for overlay subnets from the centralized 
DHCP server present in the underlay network.

PREREQUISITES
--
  - Logical Router Port IP should be assigned (statically) from the same 
overlay subnet which is managed by DHCP server.
  - LRP IP is used for GIADRR field when relaying the DHCP packets and also 
same IP needs to be configured as default gateway for the overlay subnet.
  - Overlay subnets managed by external DHCP server are expected to be 
directly reachable from the underlay network.

EXPECTED PACKET FLOW:
--
Following is the expected packet flow inorder to support DHCP rleay 
functionality in OVN.
  1. DHCP client originates DHCP discovery (broadcast).
  2. DHCP relay (running on the OVN) receives the broadcast and forwards 
the packet to the DHCP server by converting it to unicast.
 While forwarding the packet, it updates the GIADDR in DHCP header to 
its interface IP on which DHCP packet is received and increments hop count.
  3. DHCP server uses GIADDR field to decide the IP address pool from which 
IP has to be assigned and DHCP offer is sent to the same IP (GIADDR).
  4. DHCP relay agent forwards the offer to the client.
  5. DHCP client sends DHCP request (broadcast) packet.
  6. DHCP relay (running on the OVN) receives the broadcast and forwards 
the packet to the DHCP server by converting it to unicast.
 While forwarding the packet, it updates the GIADDR in DHCP header to 
its interface IP on which DHCP packet is received.
  7. DHCP Server sends the ACK packet.
  8. DHCP relay agent forwards the ACK packet to the client.
  9. All the future renew/release packets are directly exchanged between 
DHCP client and DHCP server.

OVN DHCP RELAY PACKET FLOW:

To add DHCP Relay support on OVN, we need to replicate all the behavior 
described above using distributed logical switch and logical router.
At, highlevel packet flow is distributed among Logical Switch and Logical 
Router on source node (where VM is deployed) and redirect chassis(RC) node.
  1. Request packet gets processed on the source node where VM is deployed 
and relays the packet to DHCP server.
  2. Response packet is first processed on RC node (which first recieves 
the packet from underlay network). RC node forwards the packet to the right 
node by filling in the dest MAC and IP.

OVN Packet flow with DHCP relay is explained below.
  1. DHCP client (VM) sends the DHCP discover packet (broadcast).
  2. Logical switch converts the packet to L2 unicast by setting the 
destination MAC to LRP's MAC
  3. Logical Router receives the packet and redirects it to the OVN 
controller.
  4. OVN controller updates the required information(GIADDR, HOP count) in 
the DHCP payload after doing the required checks. If any check fails, packet is 
dropped.
  5. Logical Router converts the packet to L3 unicast and forwards it to 
the server. This packets gets routed like any other packet (via RC node).
  6. Server replies with DHCP offer.
  7. RC node processes the DHCP offer and forwards it to the OVN controller.
  8. OVN controller does sanity checks and  updates the destination MAC 
(available in DHCP header), destination IP (available in DHCP header) and 
reinjects the packet to datapath.
 If any check fails, packet is dropped.
  9. Logical router updates the source IP and port and forwards the packet 
to logical switch.
  10. Logical switch delivers the packet to the DHCP client.
  11. Similar steps are performed for Request and Ack packets.
  12. All the future renew/release packets are directly exchanged between 
DHCP client and DHCP server

NEW OVN ACTIONS
---
  1. dhcp_relay_req_chk(, )
  - This action executes on the source node on which the DHCP request 
originated.
  - This action relays the DHCP request coming from client to the 
server. Relay-ip is used to update GIADDR in the DHCP header.
  2. dhcp_relay_resp_chk(, )
  - This action executes on the first node (RC node) which processes 
the DHCP response from the server.
  - This action updates  the destination MAC and destination IP so that 
the response can be forwarded to the appropriate node from which request was 
originated.
  - Relay-ip, server-ip are used to validate GIADDR and SERVER ID in 
the DHCP payload.

FLOWS
-
Following are the flows added when DHCP Relay is configured on one overlay 
subnet, one additonal flow is added in ls_in_l2_lkup table for each VM part of 
the subnet.

  1. table=27(ls_in_l2_lkup  ), priority=100  , match=(inport == 
 && eth.src ==  && ip4.src == 0.0.0.0 && ip4.dst == 

[ovs-dev] [PATCH ovn] northd: Add lsp option force_fdb_lookup.

2024-04-24 Thread shibir-basak
This option is applicable only if the lsp is of default 'type'
i.e. type=empty_string (which is a VM (or VIF) interface) and the
lsp also has 'unknown' addresses configured.
If lsp option 'force_fdb_lookup' is set to true, mac addresses
of the lsp (if configured) are not installed in the l2 lookup
table of the Logical Switch pipeline. However, MAC addresses
are learnt using the FDB Table.

Usecase:
=
This option is required to reduce packet loss when VM is being
migrated across AZs (via VXLAN tunnel). lsp is configured in both
AZs on different logical switches which are sharing the same IP
subnet. If the port has unknown address set along with MAC, IP
then, any packet destined to VM's MAC on destination AZ will get
forwarded locally, which may lead to packet loss during VM migration.

This option is useful to reduce packet loss during VM migration
by forcing the logical switch to learn MAC using FDB.

The other way to address this issue is to use pkt_clone_type
option but this causes too much of packet duplication when there
are multiple ports with unknown address in the logical switch.

Signed-off-by: shibir-basak 
---
 northd/northd.c | 16 
 ovn-nb.xml  | 11 +++
 tests/ovn-northd.at | 38 ++
 3 files changed, 65 insertions(+)

diff --git a/northd/northd.c b/northd/northd.c
index 37f443e70..c29396716 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -1380,6 +1380,16 @@ lrport_is_enabled(const struct nbrec_logical_router_port 
*lrport)
 return !lrport->enabled || *lrport->enabled;
 }
 
+static bool
+lsp_force_fdb_lookup(const struct ovn_port *op)
+{
+/* To enable FDB Table lookup on a logical switch port, it has to be
+ * of 'type' empty_string and "addresses" must have "unknown".
+ */
+return !op->nbsp->type[0] && op->has_unknown &&
+smap_get_bool(>nbsp->options, "force_fdb_lookup", false);
+}
+
 static struct ovn_port *
 ovn_port_get_peer(const struct hmap *lr_ports, struct ovn_port *op)
 {
@@ -9479,6 +9489,12 @@ build_lswitch_ip_unicast_lookup(struct ovn_port *op,
 
 if (ovs_scan(op->nbsp->addresses[i],
 ETH_ADDR_SCAN_FMT, ETH_ADDR_SCAN_ARGS(mac))) {
+/* Skip adding flows related to the MAC address
+ * as force FDB Lookup is enabled on the lsp.
+ */
+if (lsp_force_fdb_lookup(op)) {
+continue;
+}
 ds_clear(match);
 ds_put_format(match, "eth.dst == "ETH_ADDR_FMT,
   ETH_ADDR_ARGS(mac));
diff --git a/ovn-nb.xml b/ovn-nb.xml
index b652046a7..4724d0537 100644
--- a/ovn-nb.xml
+++ b/ovn-nb.xml
@@ -1262,6 +1262,17 @@
   
 
 
+
+  This option is supported only if the Logical Switch Port is of
+  default  (i.e. type set to empty_string) and also
+   column contains unknown.
+  If set to true, MAC addresses (if configured)
+  are not installed in the l2 lookup table but the MAC addresses are
+  learnt and stored in the FDB table.
+  The default value is false.
+
+
 
   If set to mc_unknown, packets going to this VIF get cloned to all
diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
index be006fb32..82e7a6574 100644
--- a/tests/ovn-northd.at
+++ b/tests/ovn-northd.at
@@ -6886,6 +6886,44 @@ AT_CLEANUP
 ])
 
 
+OVN_FOR_EACH_NORTHD_NO_HV([
+AT_SETUP([check options: force_fdb_lookup for LSP])
+ovn_start NORTHD_TYPE
+ovn-nbctl ls-add S1
+ovn-nbctl --wait=sb lsp-add S1 S1-vm1
+ovn-nbctl --wait=sb lsp-add S1 S1-localnet
+ovn-nbctl --wait=sb lsp-set-addresses S1-vm1 "50:54:00:00:00:01 192.168.0.1" 
unknown
+ovn-nbctl --wait=sb lsp-set-type S1-localnet localnet
+
+#Verify the flows before setting force_fdb_lookup option
+ovn-sbctl dump-flows S1 > S1flows
+AT_CAPTURE_FILE([S1flows])
+
+AT_CHECK([grep -e "ls_in_l2_lkup.*S1-vm1" S1flows | ovn_strip_lflows], [0], 
[dnl
+  table=??(ls_in_l2_lkup  ), priority=50   , match=(eth.dst == 
50:54:00:00:00:01), action=(outport = "S1-vm1"; output;)
+])
+
+#Set the force_fdb_lookup option and verify the flows
+ovn-nbctl --wait=sb set logical_switch_port S1-vm1 
options:force_fdb_lookup=true
+ovn-nbctl --wait=sb set logical_switch_port S1-localnet 
options:force_fdb_lookup=true
+
+ovn-sbctl dump-flows S1 > S1flows
+AT_CAPTURE_FILE([S1flows])
+
+#Verify the flows for default port type (VM port)
+AT_CHECK([grep -e "ls_in_l2_lkup.*S1-vm1" S1flows | grep -e "match=(eth.dst == 
50:54:00:00:00:01)"], [1], [])
+AT_CHECK([grep -e "ls_in_.*_fdb.*S1-vm1" S1flows | ovn_strip_lflows], [0], [dnl
+  table=??(ls_in_lookup_fdb   ), priority=100  , match=(inport == "S1-vm1"), 
action=(reg0[[11]] = lookup_fdb(inport, eth.src); next;)
+  table=??(ls_in_put_fdb  ), priority=100  , match=(inport == "S1-vm1" && 
reg0[[11]] == 0), action=(put_fdb(inport, eth.src); next;)
+])
+
+#Verify the flows for a non-default port type (localnet