Re: [ovs-dev] [RFC 06/11] utilities: Add ovs-psample.

2024-05-13 Thread Adrian Moreno



On 5/10/24 12:06 PM, Eelco Chaudron wrote:

On 24 Apr 2024, at 21:53, Adrian Moreno wrote:


This simple program reads from psample and prints the packets to stdout.
It's useful for quickly collecting sampled packets.


See some comments below, did not review the actual sample application in detail.

//Eelco


Signed-off-by: Adrian Moreno 
---
  Documentation/automake.mk   |   1 +
  Documentation/conf.py   |   2 +
  Documentation/ref/index.rst |   1 +
  Documentation/ref/ovs-psample.8.rst |  47 
  include/linux/automake.mk   |   1 +
  include/linux/psample.h |  64 ++
  rhel/openvswitch-fedora.spec.in |   2 +
  rhel/openvswitch.spec.in|   1 +
  utilities/automake.mk   |   9 +
  utilities/ovs-psample.c | 330 
  10 files changed, 458 insertions(+)
  create mode 100644 Documentation/ref/ovs-psample.8.rst
  create mode 100644 include/linux/psample.h
  create mode 100644 utilities/ovs-psample.c

diff --git a/Documentation/automake.mk b/Documentation/automake.mk
index 47d2e336a..c22facfd6 100644
--- a/Documentation/automake.mk
+++ b/Documentation/automake.mk
@@ -165,6 +165,7 @@ RST_MANPAGES = \
ovs-l3ping.8.rst \
ovs-parse-backtrace.8.rst \
ovs-pki.8.rst \
+   ovs-psample.8.rst \
ovs-tcpdump.8.rst \
ovs-tcpundump.1.rst \
ovs-test.8.rst \
diff --git a/Documentation/conf.py b/Documentation/conf.py
index 15785605a..75efed2fc 100644
--- a/Documentation/conf.py
+++ b/Documentation/conf.py
@@ -134,6 +134,8 @@ _man_pages = [
   u'convert "tcpdump -xx" output to hex strings'),
  ('ovs-test.8',
   u'Check Linux drivers for performance, vlan and L3 tunneling problems'),
+('ovs-psample.8',
+ u'Print packets sampled by psample'),
  ('ovs-vlan-test.8',
   u'Check Linux drivers for problems with vlan traffic'),
  ('ovsdb-server.7',
diff --git a/Documentation/ref/index.rst b/Documentation/ref/index.rst
index 03ada932f..d1076435a 100644
--- a/Documentation/ref/index.rst
+++ b/Documentation/ref/index.rst
@@ -46,6 +46,7 @@ time:
 ovs-pki.8
 ovs-sim.1
 ovs-parse-backtrace.8
+   ovs-psample.8
 ovs-tcpdump.8
 ovs-tcpundump.1
 ovs-test.8
diff --git a/Documentation/ref/ovs-psample.8.rst 
b/Documentation/ref/ovs-psample.8.rst
new file mode 100644
index 0..c849c83d8
--- /dev/null
+++ b/Documentation/ref/ovs-psample.8.rst
@@ -0,0 +1,47 @@
+==
+ovs-sample


Interesting, here you call it all ovs-sample here, which is like ;)


Yes, at the begining I called it ovs-sample as I thought to make some generic 
tool, but since it ended up being very psample-specific I added the "p" (and 
missed this one).




You could add options like —local-kernel --local-userspace (--ipfix, --sflow) 
and it will eventually work on each datapath.



You mean implementing an IPFIX and sFlow collector?



If you want to keep this a separate psample utility, I would not have ovs in 
the name, as it's rather general and not OVS specific. Maybe just 
psample/psample_mon, as we also have nlmon.



Well, the way the cookie is decoded into observation_domain_id and 
observation_point_id is ovs-specific.


In fact, for the next iteration of the series I will remove the filtering API 
since its getting removed from the kernel series as well and add some kind of 
BPF code that does the filtering. Also I was  considering looking into the OVSDB 
to ensure that we filter on groups configured in it or else we could wrongly 
interpet a sampled packet that comes from some other subsystem.




+==
+
+Synopsis
+
+
+``ovs-sample``
+[``--group=`` | ``-g`` ]
+
+``ovs-sample --help``
+
+``ovs-sample --version``
+
+Description
+===
+
+Open vSwitch per-flow sampling can be configured to emit the samples
+through the ``psample`` netlink multicast group.
+
+Such sampled traffic contains, apart from the packet, some metadata that
+gives further information about the packet sample. More specifically, OVS
+inserts the ``observation_domain_id`` and the ``observation_point_id`` that
+where provided in the sample action (see ``ovs-actions(7)``).
+
+the ``ovs-sample`` program provides a simple way of joining the psample
+multicast group and printing the sampled packets.
+
+
+Options
+===
+
+.. option:: ``-g``  or ``--group`` 
+
+  Tells ``ovs-sample`` to filter out samples that don't belong to that group.
+
+  Different ``Flow_Sample_Collector_Set`` entries can be configured with
+  different ``group_id`` values (see ``ovs-vswitchd.conf.db(5)``). This option
+  helps focusing the output on the relevant samples.
+
+.. option:: -h, --help
+
+Prints a brief help message to the console.
+
+.. option:: -V, --version
+
+Prints version information to the console.
diff --git a/include/linux/automake.mk b/include/linux/automake.mk
index cdae5eedc..ac306b53c 100644
--- a/include/linux/automake.mk
+++ b/include/linux/automak

Re: [ovs-dev] [RFC 07/11] netlink-offload-tc: Rename act_cookie->flow_cookie.

2024-05-13 Thread Adrian Moreno




On 5/10/24 12:06 PM, Eelco Chaudron wrote:

On 24 Apr 2024, at 21:53, Adrian Moreno wrote:


In preparation to allowing certain actions to have a cookie that does
not represent the entire flow, rename flower->act_cookie to
flower->flow_cookie.

This patch does not introduce any behavioral change, it's just a
variable renaming.


Small nit on the comment below, the rest looks good.

//Eelco


Signed-off-by: Adrian Moreno 
---
  lib/netdev-offload-tc.c |  8 
  lib/tc.c| 14 +++---
  lib/tc.h|  2 +-
  3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
index 921d52317..36e814bee 100644
--- a/lib/netdev-offload-tc.c
+++ b/lib/netdev-offload-tc.c
@@ -1322,8 +1322,8 @@ netdev_tc_flow_dump_next(struct netdev_flow_dump *dump,
  continue;
  }

-if (flower.act_cookie.len >= sizeof *ufid) {
-*ufid = get_32aligned_u128(flower.act_cookie.data);
+if (flower.flow_cookie.len >= sizeof *ufid) {
+*ufid = get_32aligned_u128(flower.flow_cookie.data);
  } else if (!find_ufid(netdev, &id, ufid)) {
  continue;
  }
@@ -2537,8 +2537,8 @@ netdev_tc_flow_put(struct netdev *netdev, struct match 
*match,
  return ENOSPC;
  }

-flower.act_cookie.data = ufid;
-flower.act_cookie.len = sizeof *ufid;
+flower.flow_cookie.data = ufid;
+flower.flow_cookie.len = sizeof *ufid;

  block_id = get_block_id_from_netdev(netdev);
  id = tc_make_tcf_id_chain(ifindex, block_id, chain, prio, hook);
diff --git a/lib/tc.c b/lib/tc.c
index e9bcae4e4..7176991c3 100644
--- a/lib/tc.c
+++ b/lib/tc.c
@@ -2088,8 +2088,8 @@ nl_parse_single_action(struct nlattr *action, struct 
tc_flower *flower,
  }

  if (act_cookie) {
-flower->act_cookie.data = nl_attr_get(act_cookie);
-flower->act_cookie.len = nl_attr_get_size(act_cookie);
+flower->flow_cookie.data = nl_attr_get(act_cookie);
+flower->flow_cookie.len = nl_attr_get_size(act_cookie);
  }

  return nl_parse_action_stats(action_attrs[TCA_ACT_STATS],
@@ -3458,7 +3458,7 @@ nl_msg_put_flower_acts(struct ofpbuf *request, struct 
tc_flower *flower)
TCA_EGRESS_MIRROR);
  }
  }
-nl_msg_put_act_cookie(request, &flower->act_cookie);
+nl_msg_put_act_cookie(request, &flower->flow_cookie);
  nl_msg_put_act_flags(request);
  nl_msg_end_nested(request, act_offset);
  }
@@ -3476,14 +3476,14 @@ nl_msg_put_flower_acts(struct ofpbuf *request, struct 
tc_flower *flower)

  act_offset = nl_msg_start_nested(request, act_index++);
  nl_msg_put_act_gact(request, action->chain);
-nl_msg_put_act_cookie(request, &flower->act_cookie);
+nl_msg_put_act_cookie(request, &flower->flow_cookie);
  nl_msg_end_nested(request, act_offset);
  }
  break;
  case TC_ACT_CT: {
  act_offset = nl_msg_start_nested(request, act_index++);
  nl_msg_put_act_ct(request, action, action_pc);
-nl_msg_put_act_cookie(request, &flower->act_cookie);
+nl_msg_put_act_cookie(request, &flower->flow_cookie);
  nl_msg_end_nested(request, act_offset);
  }
  break;
@@ -3501,7 +3501,7 @@ nl_msg_put_flower_acts(struct ofpbuf *request, struct 
tc_flower *flower)
released)) {
  return -EOPNOTSUPP;
  }
-nl_msg_put_act_cookie(request, &flower->act_cookie);
+nl_msg_put_act_cookie(request, &flower->flow_cookie);
  nl_msg_put_act_flags(request);
  nl_msg_end_nested(request, act_offset);
  }
@@ -3515,7 +3515,7 @@ nl_msg_put_flower_acts(struct ofpbuf *request, struct 
tc_flower *flower)
  if (!flower->action_count) {
  act_offset = nl_msg_start_nested(request, act_index++);
  nl_msg_put_act_gact(request, 0);
-nl_msg_put_act_cookie(request, &flower->act_cookie);
+nl_msg_put_act_cookie(request, &flower->flow_cookie);
  nl_msg_put_act_flags(request);
  nl_msg_end_nested(request, act_offset);
  }
diff --git a/lib/tc.h b/lib/tc.h
index fdbcf4b7c..40ea3d816 100644
--- a/lib/tc.h
+++ b/lib/tc.h
@@ -386,7 +386,7 @@ struct tc_flower {

  bool tunnel;

-struct tc_cookie act_cookie;
+struct tc_cookie flow_cookie; /* Cookie to help identify the flow. */


nit: Cookie that identifies the flow by ufid.



OK.



  bool needs_full_ip_proto_mask;

--
2.44.0




___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 4/6] netdev-dpdk: Refactor TSO request code.

2024-05-13 Thread David Marchand
Hello Kevin,

Thanks for reviewing.

On Fri, May 10, 2024 at 11:50 PM Kevin Traynor  wrote:
>
> On 19/04/2024 15:06, David Marchand wrote:
> > Replace check on th == NULL with an assert() because dp_packet_l4(pkt)
> > is priorly used to compute (outer) L3 length.
> >
> > Besides, filling l4_len and tso_segsz only matters to TSO, so there is
> > no need to check for other L4 checksum offloading requests.
> >
> > Signed-off-by: David Marchand 
> > ---
> >  lib/netdev-dpdk.c | 36 +++-
> >  1 file changed, 11 insertions(+), 25 deletions(-)
> >
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > index 8b6a3ed189..661269e4b6 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -2584,7 +2584,6 @@ static bool
> >  netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf 
> > *mbuf)
> >  {
> >  struct dp_packet *pkt = CONTAINER_OF(mbuf, struct dp_packet, mbuf);
> > -struct tcp_header *th;
> >
> >  const uint64_t all_inner_requests = (RTE_MBUF_F_TX_IP_CKSUM |
> >   RTE_MBUF_F_TX_L4_MASK |
> > @@ -2614,6 +2613,8 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, 
> > struct rte_mbuf *mbuf)
> >  return true;
> >  }
> >
> > +ovs_assert(dp_packet_l4(pkt));
>
> I'm not clear why you want to change this from a warning/return
> fail/drop to an assert ?

From this point in the function, there is at least one request for
checksum offloading pending.
Any L3 (or higher) checksum requested by OVS means that the packet has
been parsed/composed as either IP or IPv6 and packet->l4_ofs was set
to point after the l3 header (with miniflow_extract / *_compose()
helpers).

So getting a NULL pointer for l4 here indicates a bug in OVS.
An assert seems better than a warn/return that probably nobody notice(d).

Did I miss a case where l4_ofs can be unset?

>
> Nit: should this be in the previous patch instead ? and I see it is
> removed in a later patch.

It is not supposed to be removed in the series.
The last patch moves it later in the function.


-- 
David Marchand

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC 01/11] odp-util: Add support to new psample uAPI.

2024-05-13 Thread Eelco Chaudron


On 10 May 2024, at 15:06, Ilya Maximets wrote:

> On 5/10/24 14:01, Eelco Chaudron wrote:
>>
>>
>> On 10 May 2024, at 12:45, Adrian Moreno wrote:
>>
>>> On 5/10/24 12:06 PM, Eelco Chaudron wrote:
 On 24 Apr 2024, at 21:53, Adrian Moreno wrote:

> The new odp sample attributes allow userspace to specify a group_id and
> user-defined cookie to be passed down to psample.
>
> Add support for parsing and formatting such action.
>
> Signed-off-by: Adrian Moreno 

 Hi Adrian,

 See my comments below inline.

 //Eelco

> ---
>   include/linux/openvswitch.h  |  49 +---
>   lib/odp-execute.c|   3 +
>   lib/odp-util.c   | 150 ++-
>   ofproto/ofproto-dpif-ipfix.c |   2 +
>   python/ovs/flow/odp.py   |   2 +
>   tests/odp.at |   5 ++
>   6 files changed, 163 insertions(+), 48 deletions(-)
>
> diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
> index d9fb991ef..3e748405c 100644
> --- a/include/linux/openvswitch.h
> +++ b/include/linux/openvswitch.h
> @@ -696,6 +696,7 @@ enum ovs_flow_attr {
>   #define OVS_UFID_F_OMIT_MASK (1 << 1)
>   #define OVS_UFID_F_OMIT_ACTIONS  (1 << 2)
>
> +#define OVS_PSAMPLE_COOKIE_MAX_SIZE 16
>   /**
>* enum ovs_sample_attr - Attributes for %OVS_ACTION_ATTR_SAMPLE action.
>* @OVS_SAMPLE_ATTR_PROBABILITY: 32-bit fraction of packets to sample 
> with
> @@ -703,15 +704,27 @@ enum ovs_flow_attr {
>* %UINT32_MAX samples all packets and intermediate values sample 
> intermediate
>* fractions of packets.
>* @OVS_SAMPLE_ATTR_ACTIONS: Set of actions to execute in sampling 
> event.
> - * Actions are passed as nested attributes.
> + * Actions are passed as nested attributes. Optional if
> + * OVS_SAMPLE_ATTR_PSAMPLE_GROUP is set.
> + * @OVS_SAMPLE_ATTR_PSAMPLE_GROUP: A 32-bit number to be used as psample 
> group.
> + * Optional if OVS_SAMPLE_ATTR_ACTIONS is set.
> + * @OVS_SAMPLE_ATTR_PSAMPLE_COOKIE: A variable-length binary cookie 
> that, if
> + * provided, will be copied to the psample cookie.

 Should we mention any limit on the actual length of the cookie?

 Any reason we explicitly call this psample? From an OVS perspective, this 
 could just be 
 SAMPLE_ATTR_FORWARD/OFFLOAD/DESTINATION/ENDPOINT/COLLECTOR/TARGET_COOKIE 
 and _GROUP. Other datapaths, do not have PSAMPLE.

>>>
>>> See my response to your comment on the cover-letter for possible name 
>>> alternatives.
>>
>> ACK, we can continue the discussion there.
>>
 Thinking about it more, from an OVS perspective we should probably not 
 even send down a COOKIE, but we should send down an obs_domain_id and 
 obs_point_id, and then the kernel can move this into a cookie.

>>>
>>> I did consider this. However, the opaque cookie fits well with the tc API 
>>> which does not have two 32-bit values but an opaque 128-bit cookie. So if 
>>> the action is offloaded OVS needs to "encode" the two 32-bit values into 
>>> the cookie anyways.
>>
>> Which is fine to me, the OVS API should be clear that we have two u32 
>> entries. The cookie is implementation-specific, and we use the netlink 
>> format as our API for all DPs.
>
>
> I'm not sure about this.  It is a kernel interface and we can't deliver
> two separate values from the psample.  So, passing two separate values
> to the kernel doesn't make a lot of sense.  They are going to be mangled
> into a single cookie anyway and we'll have to document that somehow.
> We may as well always mangle the data from the beginning, document once
> at the top level and save ourselves from documenting a million differences
> between different implementations.  While USDT could report them separately,
> I'm not sure there is a ton of value in that.

Our OVS action API is not tight to psample at all, so passing in a psample 
cookie does not make sense. We should not pass in any psample references to the 
sample action API. If it looks like the below, the API is clean and we can 
mangle it before passing it to the psample function. I see no need to document 
this in the kernel, as all the kernel does is provide the TC action cookie 
(unrelated to the OVS actions API).

+   OVS_SAMPLE_ATTR_OBS_DOMAIN,/* Observation domain id, u32 
number. */
+   OVS_SAMPLE_ATTR_OBS_POINT, /* Observation point id, u32 number. 
*/
+   OVS_SAMPLE_ATTR_SYSTEM_TARGET, /* Flag to additionally sample to system 
target. */
+   OVS_SAMPLE_ATTR_SYSTEM_GROUP,  /* System target group id, u32 number. */

>>
 The collector itself could be called system/local collector, or something 
 like that. This way it can use for example psample for kernel and UDST 
 probes for usespace. Both can pass the group and cookie (obs_domain_id and 

Re: [ovs-dev] [PATCH ovn v4] controller: Track individual address set constants.

2024-05-13 Thread Sri kor
Thanks Ales, This address set is not present in SB DB.

-Srini

On Sun, May 12, 2024 at 10:26 PM Ales Musil  wrote:

>
>
> On Sun, May 12, 2024 at 11:17 PM Sri kor  wrote:
>
>> Hi @num...@ovn.org  @Ales Musil  ,
>> Currently we are on OVN 23.09. We are facing the following syntax
>> error with the address_set.From the diff, looks like you address this fix.
>> Is this something expected?
>>
>> *2024-05-11T17:35:28.694Z|153935|lflow|WARN|error parsing match "reg0[7]
>> == 1 && ((ip4.src ==
>> $a_781ac228_38ae_4f04_9cc1_707543df50c8_addr_set_b8f7bf7f_83b3_48bc_a0d5_bc5d55438e4a)
>> && (172.27.0.0 <= ip4.dst <= 172.27.255.255) && (1 <= tcp.src <= 65535 || 1
>> <= udp.src <= 65535) && (tcp.dst == 22 || udp.dst == 22) && outport ==
>> @a_781ac228_38ae_4f04_9cc1_707543df50c8_pg_0ea55446_0df6_4358_8442_56893854004c)":
>> Syntax error at
>> `$a_781ac228_38ae_4f04_9cc1_707543df50c8_addr_set_b8f7bf7f_83b3_48bc_a0d5_bc5d55438e4a'
>> expecting address set name.*
>>
>>
>> *Here are the transaction details .*
>>
>> *[{Op:insert Table:ACL Row:map[action:allow-related direction:to-lport
>> external_ids:{GoMap:map[crusoe_fw_rule_id:770b6929-88d1-4659-ba3b-ed84ccfd71a0
>> crusoe_vpc_id:0ea55446-0df6-4358-8442-56893854004c
>> customer_id:781ac228-38ae-4f04-9cc1-707543df50c8]} log:false match:(ip4.src
>> ==
>> $a_781ac228_38ae_4f04_9cc1_707543df50c8_addr_set_b8f7bf7f_83b3_48bc_a0d5_bc5d55438e4a)
>> && (172.27.0.0 <= ip4.dst <= 172.27.255.255) && (1 <= tcp.src <= 65535 || 1
>> <= udp.src <= 65535) && (tcp.dst == 22 || udp.dst == 22) && outport ==
>> @a_781ac228_38ae_4f04_9cc1_707543df50c8_pg_0ea55446_0df6_4358_8442_56893854004c
>> name:{GoSet:[a_781ac228_38ae_4f04_9cc1_707543df50c8_acl_770b6929_88d1_4659_b]}
>> priority:2000] Rows:[] Columns:[] Mutations:[] Timeout: Where:[]
>> Until: Durable: Comment: Lock:
>> UUIDName:_641d4ca1_7596_4004_b577_0f34e0bab87d} {Op:mutate Table:Port_Group
>> Row:map[] Rows:[] Columns:[] Mutations:[{Column:acls Mutator:insert
>> Value:{GoSet:[{GoUUID:_641d4ca1_7596_4004_b577_0f34e0bab87d}]}}]
>> Timeout: Where:[where column _uuid ==
>> {00383922-81ce-4491-9115-6ea2dfc79aea}] Until: Durable: Comment:
>> Lock: UUIDName:} {Op:mutate Table:NB_Global Row:map[] Rows:[]
>> Columns:[] Mutations:[{Column:nb_cfg Mutator:+= Value:1}] Timeout:
>> Where:[where column _uuid == {be2921c1-99ae-423b-a69c-a43e67d8424e}] Until:
>> Durable: Comment: Lock: UUIDName:} {Op:select
>> Table:NB_Global Row:map[] Rows:[] Columns:[nb_cfg] Mutations:[]
>> Timeout: Where:[] Until: Durable: Comment: Lock:
>> UUIDName:}]*
>>
>>
>> *Thanks*
>>
>> *Srini*
>>
>>
> Hi,
>
> this seems to be unrelated to the patch. It looks like the controller is
> not able to recognize the address set name and doesn't parse it correctly.
> Is this address set present in the SB DB?
>
> Thanks,
> Ales
>
> On Fri, May 10, 2024 at 9:43 AM Numan Siddique  wrote:
>>
>>> On Fri, May 10, 2024 at 12:39 PM Han Zhou  wrote:
>>> >
>>> > On Fri, May 10, 2024 at 9:23 AM Numan Siddique  wrote:
>>> > >
>>> > > On Fri, May 10, 2024 at 2:37 AM Han Zhou  wrote:
>>> > > >
>>> > > > On Thu, May 9, 2024 at 10:32 AM Mark Michelson <
>>> mmich...@redhat.com>
>>> > wrote:
>>> > > > >
>>> > > > > On 5/7/24 02:12, Han Zhou wrote:
>>> > > > > >
>>> > > > > >
>>> > > > > > On Mon, May 6, 2024 at 10:37 PM Ales Musil >> > > > > > > wrote:
>>> > > > > >  >
>>> > > > > >  >
>>> > > > > >  >
>>> > > > > >  > On Mon, May 6, 2024 at 8:41 PM Han Zhou >> > > > > > > wrote:
>>> > > > > >  >>
>>> > > > > >  >>
>>> > > > > >  >>
>>> > > > > >  >> On Thu, May 2, 2024 at 10:35 PM Ales Musil <
>>> amu...@redhat.com
>>> > > > > > > wrote:
>>> > > > > >  >> >
>>> > > > > >  >> > On Thu, May 2, 2024 at 6:23 PM Han Zhou >> > > > > > > wrote:
>>> > > > > >  >> > >
>>> > > > > >  >> > >
>>> > > > > >  >> > >
>>> > > > > >  >> > > On Thu, May 2, 2024 at 6:29 AM Ales Musil <
>>> amu...@redhat.com
>>> > > > > > > wrote:
>>> > > > > >  >> > > >
>>> > > > > >  >> > > > Instead of tracking address set per struct
>>> > expr_constant_set
>>> > > > > > track it
>>> > > > > >  >> > > > per individual struct expr_constant. This allows
>>> more fine
>>> > > > grained
>>> > > > > >  >> > > > control for I-P processing of address sets in
>>> controller.
>>> > It
>>> > > > > > helps with
>>> > > > > >  >> > > > scenarios like matching on two address sets in one
>>> > expression
>>> > > > e.g.
>>> > > > > >  >> > > > "ip4.src == {$as1, $as2}". This allows any addition
>>> or
>>> > > > removal of
>>> > > > > >  >> > > > individual adress from the set to be incrementally
>>> > processed
>>> > > > > > instead
>>> > > > > >  >> > > > of reprocessing all the flows.
>>> > > > > >  >> > > >
>>> > > > > >  >> > > > This unfortunately doesn't help with the following
>>> flows:
>>> > > > > >  >> > > > "ip4.src == $as1 && ip4.dst == $as2"
>>> > > > > >  >> > > > "ip4.src == $as1 || ip4.dst == $as2"
>>> >

Re: [ovs-dev] [RFC 03/11] ofproto: Add ofproto-dpif-psample.

2024-05-13 Thread Eelco Chaudron


On 13 May 2024, at 8:45, Adrian Moreno wrote:

> On 5/10/24 12:06 PM, Eelco Chaudron wrote:
>> On 24 Apr 2024, at 21:53, Adrian Moreno wrote:
>>
>>> Add a new resource in ofproto-dpif and the corresponding API in
>>> ofproto_provider.h to represent and change psample configuration.
>>
>> See comments below.
>>
>> //Eelco
>>
>>> Signed-off-by: Adrian Moreno 
>>> ---
>>>   ofproto/automake.mk|   2 +
>>>   ofproto/ofproto-dpif-psample.c | 167 +
>>>   ofproto/ofproto-dpif-psample.h |  31 ++
>>>   ofproto/ofproto-dpif.c |  33 +++
>>>   ofproto/ofproto-dpif.h |   1 +
>>>   ofproto/ofproto-provider.h |   9 ++
>>>   ofproto/ofproto.c  |  10 ++
>>>   ofproto/ofproto.h  |   8 ++
>>>   8 files changed, 261 insertions(+)
>>>   create mode 100644 ofproto/ofproto-dpif-psample.c
>>>   create mode 100644 ofproto/ofproto-dpif-psample.h
>>>
>>> diff --git a/ofproto/automake.mk b/ofproto/automake.mk
>>> index 7c08b563b..340003e12 100644
>>> --- a/ofproto/automake.mk
>>> +++ b/ofproto/automake.mk
>>> @@ -34,6 +34,8 @@ ofproto_libofproto_la_SOURCES = \
>>> ofproto/ofproto-dpif-mirror.h \
>>> ofproto/ofproto-dpif-monitor.c \
>>> ofproto/ofproto-dpif-monitor.h \
>>> +   ofproto/ofproto-dpif-psample.c \
>>> +   ofproto/ofproto-dpif-psample.h \
>>> ofproto/ofproto-dpif-rid.c \
>>> ofproto/ofproto-dpif-rid.h \
>>> ofproto/ofproto-dpif-sflow.c \
>>> diff --git a/ofproto/ofproto-dpif-psample.c b/ofproto/ofproto-dpif-psample.c
>>
>> Like all previous patches (and I guess patches to come), we need a better 
>> name than psample :)
>>
>>> new file mode 100644
>>> index 0..a83530ed8
>>> --- /dev/null
>>> +++ b/ofproto/ofproto-dpif-psample.c
>>> @@ -0,0 +1,167 @@
>>> +/*
>>> + * Copyright (c) 2024 Red Hat, Inc.
>>> + *
>>> + * Licensed under the Apache License, Version 2.0 (the "License");
>>> + * you may not use this file except in compliance with the License.
>>> + * You may obtain a copy of the License at:
>>> + *
>>> + * http://www.apache.org/licenses/LICENSE-2.0
>>> + *
>>> + * Unless required by applicable law or agreed to in writing, software
>>> + * distributed under the License is distributed on an "AS IS" BASIS,
>>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>>> + * See the License for the specific language governing permissions and
>>> + * limitations under the License.
>>> + */
>>> +
>>> +#include 
>>> +#include "ofproto-dpif-psample.h"
>>> +
>>> +#include "hash.h"
>>> +#include "ofproto.h"
>>> +#include "openvswitch/hmap.h"
>>> +#include "openvswitch/thread.h"
>>> +#include "openvswitch/vlog.h"
>>> +
>>> +VLOG_DEFINE_THIS_MODULE(psample);
>>
>> There is no logging in this file, so we might as well remove this.
>>
>>> +
>>> +static struct ovs_mutex mutex = OVS_MUTEX_INITIALIZER;
>>
>> Can we give this mutex a meaningful name?
>>
>>> +
>>> +struct psample_exporter {
>>
>>> +uint32_t group_id;
>>> +uint32_t collector_set_id;
>>> +};
>>> +
>>> +struct psample_exporter_map_node {
>>
>> I would call this just ???_exporter_node?
>>
>>> +struct hmap_node node;
>>> +struct psample_exporter exporter;
>>> +};
>>> +
>>> +struct dpif_psample {
>>> +struct hmap exporters_map; /* Contains psample_exporter_map_node. 
>>> */
>>
>> Should we really use a hmap with all this locking? Would a cmap work better, 
>> as multiple threads might be calling into xlate_sample_action, and we could 
>> create lock contention during upcall/revalidator handling?
>>
>
> I agree, in the RFC-level, I just wrote this quick&dirty file to get the 
> feature working. I agree locking can be improved significantly.
>
>
>
>> In addition, using a cmap, might stop us from having to use refcounting, 
>> although updating the cmap might need more work than just swapping it out.
>>
>>> +
>>> +struct ovs_refcount ref_cnt;
>>> +};
>>> +
>>> +/* Exporter handling */
>>> +static void
>>> +dpif_psample_clear(struct dpif_psample *ps) OVS_REQUIRES(mutex)
>>
>> The _clear() name is a bit confusing, maybe _free/delete_exporters()
>>
>
> _free() would indicate it frees the struct which is not true. 
> *_delete_exporter is better.
>
>
>> Also, the ps variable name is confusing to my brain, but hopefully, this 
>> will change when removing psample ;)
>>
>>> +{
>>> +struct psample_exporter_map_node *node;
>>> +
>>> +HMAP_FOR_EACH_POP (node, node, &ps->exporters_map) {
>>> +free(node);
>>> +}
>>> +}
>>> +
>>> +static struct psample_exporter_map_node*
>>> +dpif_psample_new_exporter_node(struct dpif_psample *ps,
>>
>> Maybe _add_ instead of new? With a comment that we copy the options 
>> structure?
>>
>>> +   const struct ofproto_psample_options 
>>> *options)
>>> +OVS_REQUIRES(mutex)
>>> +{
>>> +struct psample_exporter_map_node *node;
>>> +node = xzalloc(sizeof *node);
>>> +node->exporter.collector_set_id = options->collector_set_i

Re: [ovs-dev] [RFC 08/11] netdev-offload-tc: Add sample support.

2024-05-13 Thread Adrian Moreno




On 5/10/24 12:06 PM, Eelco Chaudron wrote:

On 24 Apr 2024, at 21:53, Adrian Moreno wrote:


Offload the sample action if it contains psample information by creating
a tc "sample" action with the user cookie inside the action's cookie.

Avoid using the "sample" action's cookie to store the ufid.


I have some concerns about the sample action-only solution. What happened to 
the idea of adding an additional cookie to the TC match? Can we add a test case 
for the explicit drop?




I guess we can ask the kernel community. However, I'm not 100% convinced it 
makes a lot of sense form a kernel perspective. There is already a cookie in 
every action so there is plenty of space to store user data. Kernel does not 
enforce any semantics on the cookies, it's up to userspace to interpret them at 
will. Also, you cannot have a flow without an action IIUC.
So having a new cookie added just because it makes OVS code sligthly cleaner 
doesn't seem to be a super strong argument.


Regarding tc <-> ufid correlation. I'm a bit confused by the fact that flow 
replacement seems to work on a tcf_id basis, meaning that each tc flow has it's 
own unique tcf_id (or we would replace some other flow). Also, there is a hmap 
mapping tcf_id to ufids already. Am I missing something?




In general, I think we should apply the sflow patch before this series.



Agree. Are we all OK with the compatibility issues that will introduce?


//Eelco


Signed-off-by: Adrian Moreno 
---
  include/linux/automake.mk|  5 +-
  include/linux/pkt_cls.h  |  2 +
  include/linux/tc_act/tc_sample.h | 26 ++
  lib/netdev-offload-tc.c  | 67 +
  lib/tc.c | 84 ++--
  lib/tc.h |  7 +++
  utilities/ovs-psample.c  |  8 +--
  7 files changed, 190 insertions(+), 9 deletions(-)
  create mode 100644 include/linux/tc_act/tc_sample.h

diff --git a/include/linux/automake.mk b/include/linux/automake.mk
index ac306b53c..c2a270152 100644
--- a/include/linux/automake.mk
+++ b/include/linux/automake.mk
@@ -5,9 +5,10 @@ noinst_HEADERS += \
include/linux/pkt_cls.h \
include/linux/psample.h \
include/linux/gen_stats.h \
+   include/linux/tc_act/tc_ct.h \
include/linux/tc_act/tc_mpls.h \
include/linux/tc_act/tc_pedit.h \
+   include/linux/tc_act/tc_sample.h \
include/linux/tc_act/tc_skbedit.h \
include/linux/tc_act/tc_tunnel_key.h \
-   include/linux/tc_act/tc_vlan.h \
-   include/linux/tc_act/tc_ct.h
+   include/linux/tc_act/tc_vlan.h
diff --git a/include/linux/pkt_cls.h b/include/linux/pkt_cls.h
index fb4a7ecea..c566ac1b5 100644
--- a/include/linux/pkt_cls.h
+++ b/include/linux/pkt_cls.h
@@ -8,6 +8,8 @@
  #include 
  #include 

+#define TC_COOKIE_MAX_SIZE 16
+
  /* Action attributes */
  enum {
TCA_ACT_UNSPEC,
diff --git a/include/linux/tc_act/tc_sample.h b/include/linux/tc_act/tc_sample.h
new file mode 100644
index 0..398f32761
--- /dev/null
+++ b/include/linux/tc_act/tc_sample.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef __LINUX_TC_SAMPLE_H
+#define __LINUX_TC_SAMPLE_H
+
+#include 
+#include 
+#include 
+
+struct tc_sample {
+   tc_gen;
+};
+
+enum {
+   TCA_SAMPLE_UNSPEC,
+   TCA_SAMPLE_TM,
+   TCA_SAMPLE_PARMS,
+   TCA_SAMPLE_RATE,
+   TCA_SAMPLE_TRUNC_SIZE,
+   TCA_SAMPLE_PSAMPLE_GROUP,
+   TCA_SAMPLE_PAD,
+   __TCA_SAMPLE_MAX
+};
+#define TCA_SAMPLE_MAX (__TCA_SAMPLE_MAX - 1)
+
+#endif
+
diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
index 36e814bee..0e7273431 100644
--- a/lib/netdev-offload-tc.c
+++ b/lib/netdev-offload-tc.c
@@ -1038,6 +1038,21 @@ parse_tc_flower_to_actions__(struct tc_flower *flower, 
struct ofpbuf *buf,
  nl_msg_end_nested(buf, offset);
  }
  break;
+case TC_ACT_SAMPLE: {
+size_t offset;
+
+offset = nl_msg_start_nested(buf, OVS_ACTION_ATTR_SAMPLE);
+nl_msg_put_u32(buf, OVS_SAMPLE_ATTR_PROBABILITY,
+   UINT32_MAX / action->sample.rate);
+nl_msg_put_u32(buf, OVS_SAMPLE_ATTR_PSAMPLE_GROUP,
+   action->sample.group_id);
+nl_msg_put_unspec(buf, OVS_SAMPLE_ATTR_PSAMPLE_COOKIE,
+  &action->sample.cookie[0],
+  action->sample.cookie_len);
+
+nl_msg_end_nested(buf, offset);
+}
+break;
  }

  if (action->jump_action && action->type != TC_ACT_POLICE_MTU) {
@@ -2054,6 +2069,53 @@ parse_check_pkt_len_action(struct netdev *netdev, struct 
tc_flower *flower,
  return 0;
  }

+static int
+parse_sample_action(struct tc_flower *flower, const struct nlattr *nl_act,
+struct tc_action *action)
+{
+/* Only offloadable if it's psample only. Use the policy to enforce it by


Do

Re: [ovs-dev] [RFC 10/11] ofproto-dpif-psample: Add command to show stats.

2024-05-13 Thread Adrian Moreno




On 5/10/24 12:06 PM, Eelco Chaudron wrote:

On 24 Apr 2024, at 21:53, Adrian Moreno wrote:

Maybe add a small description to this patch.


Signed-off-by: Adrian Moreno 
---
  ofproto/ofproto-dpif-psample.c | 59 ++
  ofproto/ofproto-dpif-psample.h |  1 +
  ofproto/ofproto-dpif.c |  1 +
  3 files changed, 61 insertions(+)

diff --git a/ofproto/ofproto-dpif-psample.c b/ofproto/ofproto-dpif-psample.c
index ea4926eb2..2d73a4ded 100644
--- a/ofproto/ofproto-dpif-psample.c
+++ b/ofproto/ofproto-dpif-psample.c
@@ -20,9 +20,12 @@
  #include "dpif.h"
  #include "hash.h"
  #include "ofproto.h"
+#include "ofproto-dpif.h"
+#include "openvswitch/dynamic-string.h"
  #include "openvswitch/hmap.h"
  #include "openvswitch/thread.h"
  #include "openvswitch/vlog.h"
+#include "unixctl.h"

  VLOG_DEFINE_THIS_MODULE(psample);

@@ -205,3 +208,59 @@ dpif_psample_unref(struct dpif_psample *ps) 
OVS_EXCLUDED(mutex)
  dpif_psample_destroy(ps);
  }
  }
+
+/* Unix commands. */
+static void
+psample_unixctl_show(struct unixctl_conn *conn, int argc OVS_UNUSED,
+ const char *argv[] OVS_UNUSED, void *aux OVS_UNUSED)
+{
+struct psample_exporter_map_node *node;
+struct ds ds = DS_EMPTY_INITIALIZER;
+const struct ofproto_dpif *ofproto;
+bool first = true;
+
+ofproto = ofproto_dpif_lookup_by_name(argv[1]);
+if (!ofproto) {
+unixctl_command_reply_error(conn, "no such bridge");
+return;
+}
+
+if (!ofproto->psample) {
+unixctl_command_reply_error(conn, "no psample exporters configured");
+return;
+}
+
+ds_put_format(&ds, "Psample statistics for bridge \"%s\":\n", argv[1]);
+
+ovs_mutex_lock(&mutex);


We hold the global mutex for quite a long time, so we need some other solution.


I agree. I'll see how I can reduce the locking.


Also, do we need some form of sorted output?



Good point, I'll sort the outputs.


+HMAP_FOR_EACH (node, node, &ofproto->psample->exporters_map) {
+if (!first) {
+ds_put_cstr(&ds, "\n");
+} else {
+first = false;
+}
+
+ds_put_format(&ds, "- Collector Set ID: %"PRIu32"\n",
+node->exporter.collector_set_id);
+ds_put_format(&ds, "  Psample Group ID: %"PRIu32"\n",
+node->exporter.group_id);
+ds_put_format(&ds, "  Total number of bytes: %"PRIu64"\n",
+node->exporter.n_bytes);
+ds_put_format(&ds, "  Total number of packets: %"PRIu64"\n",
+node->exporter.n_packets);
+}
+ovs_mutex_unlock(&mutex);
+
+unixctl_command_reply(conn, ds_cstr(&ds));
+ds_destroy(&ds);
+}
+
+void dpif_psample_init(void)
+{
+static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;


New line


+if (ovsthread_once_start(&once)) {
+unixctl_command_register("psample/show", "bridge", 1, 1,
+ psample_unixctl_show, NULL);
+ovsthread_once_done(&once);
+}
+}
diff --git a/ofproto/ofproto-dpif-psample.h b/ofproto/ofproto-dpif-psample.h
index 763fbd30b..f264ad4c2 100644
--- a/ofproto/ofproto-dpif-psample.h
+++ b/ofproto/ofproto-dpif-psample.h
@@ -34,5 +34,6 @@ bool dpif_psample_get_group_id(struct dpif_psample *, 
uint32_t, uint32_t *);

  void dpif_psample_credit_stats(struct dpif_psample *, uint32_t,
 const struct dpif_flow_stats *);
+void dpif_psample_init(void);

  #endif // OFPROTO_DPIF_PSAMPLE_H
diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index f1efdd482..ebb399307 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -287,6 +287,7 @@ init(const struct shash *iface_hints)
  ofproto_unixctl_init();
  ofproto_dpif_trace_init();
  udpif_init();
+dpif_psample_init();
  }

  static void
--
2.44.0




___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC 11/11] tests: Add test for sample offloading.

2024-05-13 Thread Adrian Moreno




On 5/10/24 12:06 PM, Eelco Chaudron wrote:

On 24 Apr 2024, at 21:53, Adrian Moreno wrote:


Signed-off-by: Adrian Moreno 
---
  tests/system-common-macros.at|  4 +++
  tests/system-offloads-traffic.at | 53 
  2 files changed, 57 insertions(+)

diff --git a/tests/system-common-macros.at b/tests/system-common-macros.at
index 2a68cd664..860d6a8c9 100644
--- a/tests/system-common-macros.at
+++ b/tests/system-common-macros.at
@@ -378,3 +378,7 @@ m4_define([OVS_CHECK_GITHUB_ACTION],
  # OVS_CHECK_DROP_ACTION()
  m4_define([OVS_CHECK_DROP_ACTION],
  [AT_SKIP_IF([! grep -q "Datapath supports explicit drop action" 
ovs-vswitchd.log])])
+
+# OVS_CHECK_PSAMPLE()
+m4_define([OVS_CHECK_PSAMPLE],
+[AT_SKIP_IF([! grep -q "Datapath supports psample" ovs-vswitchd.log])])
diff --git a/tests/system-offloads-traffic.at b/tests/system-offloads-traffic.at
index d1da33d96..f4d0ccab5 100644
--- a/tests/system-offloads-traffic.at
+++ b/tests/system-offloads-traffic.at


I think this should be a general system-traffic.at test and it will be included 
in all datapath supporting this feature.
Maybe we still need a tc specific one for the TC details, maybe you can combine 
it with the explicit drop case?



Yes. I added only the offloading part because that's the thing I wanted to 
double-check. I will add a generic traffic test and an independent tc one.



@@ -933,3 +933,56 @@ OVS_WAIT_UNTIL([grep -q "Datapath does not support explicit 
drop action" ovs-vsw

  OVS_TRAFFIC_VSWITCHD_STOP
  AT_CLEANUP
+
+
+AT_SETUP([offloads - sample action])
+OVS_TRAFFIC_VSWITCHD_START([], [], [-- set Open_vSwitch . 
other_config:hw-offload=true])
+OVS_CHECK_PSAMPLE()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
+
+# Choosing numbers whose hex representation is the same for big and little 
endian.


I think it's more common to use 'dnl COMMENT'



Ack.


+AT_DATA([flows.txt], [dnl
+arp action=NORMAL
+in_port=ovs-p0 
actions=sample(probability=65535,collector_set_id=1,obs_domain_id=1431655765,obs_point_id=1717986918),NORMAL
+in_port=ovs-p1 
actions=sample(probability=65535,collector_set_id=2,obs_domain_id=2290649224,obs_point_id=2576980377),NORMAL
+])
+
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+
+AT_CHECK([ovs-vsctl --id=@br get Bridge br0 \
+-- create FLow_Sample_Collector_Set bridge=@br id=1 
psample_group=10 \
+-- create FLow_Sample_Collector_Set bridge=@br id=2 
psample_group=12],
+ [0], [ignore])
+
+OVS_DAEMONIZE([ovs-psample > psample.out], [psample.pid])
+on_exit 'kill `cat psample.pid`'


On exit is part of OVS_DAEMONIZE



Oh, right!


+sleep 1


We should remove the sleep here, instead wait for some "ready" output. If the 
tool does not have it, we should add it.



Actually, I don't think it's needed. sample reading is quite fast. I failed to 
remove it after some failures made me think otherwise. The failures ended up 
being some missing "fflush(stdout);" in the tool.



+
+NS_CHECK_EXEC([at_ns0], [ping -q -c 10 -i 0.1 -W 2 10.1.1.2 | FORMAT_PING], 
[0], [dnl
+10 packets transmitted, 10 received, 0% packet loss, time 0ms
+])
+
+AT_CHECK([ovs-appctl dpctl/dump-flows | grep "eth_type(0x0800)" | 
DUMP_CLEAN_SORTED], [0], [dnl
+in_port(2),eth(macs),eth_type(0x0800),ipv4(frag=no), packets:9, bytes:756, 
used:0.001s, actions:sample(sample=100.0%,group_id=10,cookie=),3
+in_port(3),eth(macs),eth_type(0x0800),ipv4(frag=no), packets:9, bytes:756, 
used:0.001s, actions:sample(sample=100.0%,group_id=12,cookie=),2
+])
+
+AT_CHECK([ovs-appctl dpctl/dump-flows type=ovs | grep "eth_type(0x0800)" | 
DUMP_CLEAN_SORTED], [0], [])
+
+AT_CHECK([ovs-appctl dpctl/dump-flows type=tc,offloaded | grep 
"eth_type(0x0800)" | DUMP_CLEAN_SORTED], [0], [dnl
+in_port(2),eth(macs),eth_type(0x0800),ipv4(frag=no), packets:9, bytes:756, 
used:0.001s, actions:sample(sample=100.0%,group_id=10,cookie=),3
+in_port(3),eth(macs),eth_type(0x0800),ipv4(frag=no), packets:9, bytes:756, 
used:0.001s, actions:sample(sample=100.0%,group_id=12,cookie=),2
+])
+
+AT_CHECK([ovs-appctl upcall/show | grep -E "offloaded flows : [[1-9]]"], [0], 
[ignore])
+
+AT_CHECK([grep -E "group_id=0xa obs_domain=0x,obs_point=0x 
.*icmp.*nw_src=10.1.1.1,nw_dst=10.1.1.2" psample.out >/dev/null])
+AT_CHECK([grep -E "group_id=0xc obs_domain=0x,obs_point=0x 
.*icmp.*nw_src=10.1.1.2,nw_dst=10.1.1.1" psample.out >/dev/null])


Will this work on fast machines, or should we have a waitfor instead?



It will probably work as there should be 10 of each, and stdout is flushed for 
each one of them.

In any case, it doesn't harm to make it a waitfor.


+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
--
2.44.0




___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC 05/11] ofproto_dpif_xlate: Use psample for OFP samples.

2024-05-13 Thread Eelco Chaudron


On 10 May 2024, at 13:15, Adrian Moreno wrote:

> On 5/10/24 12:06 PM, Eelco Chaudron wrote:
>> On 24 Apr 2024, at 21:53, Adrian Moreno wrote:
>>
>>> When a OFP_SAMPLE action is xlated and a dpif_psample object has been
>>> configured (via Flow_Sample_Collector_Set table) with the same
>>> collector_set_id, add psample information to the odp sample action.
>>
>> See comments below.
>>
>> //Eelco



>>> -compose_sample_action(ctx, probability, &cookie, tunnel_out_port, 
>>> false);
>>> +static void
>>> +xlate_sample_action(struct xlate_ctx *ctx,
>>> +const struct ofpact_sample *os)
>>> +{
>>> +struct dpif_psample *psample = ctx->xbridge->psample;
>>> +struct dpif_ipfix *ipfix = ctx->xbridge->ipfix;
>>> +struct user_action_cookie *upcall_cookie = NULL;
>>> +struct psample_args *psample_args = NULL;
>>> +odp_port_t tunnel_out_port = ODPP_NONE;
>>> +uint32_t group_id;
>>> +
>>> +if (!ipfix && !psample) {
>>> +return;
>>> +}
>>> +
>>> +/* Scale the probability from 16-bit to 32-bit while representing
>>> + * the same percentage. */
>>> +uint32_t probability =
>>> +((uint32_t) os->probability << 16) | os->probability;
>>> +
>>> +if (ipfix) {
>>> +upcall_cookie = xzalloc(sizeof *upcall_cookie);
>>> +xlate_fill_ipfix_sample(ctx, os, ipfix, upcall_cookie,
>>> +&tunnel_out_port);
>>> +}
>>> +if (psample) {
>>> +if (dpif_psample_get_group_id(psample,
>>
>> See earlier lock contention concern on taking the mutex() for parallel 
>> upcalls.
>>
>>> +  os->collector_set_id,
>>> +  &group_id)) {
>>> +psample_args = xzalloc(sizeof *psample_args);
>>> +ofpbuf_init(&psample_args->cookie, 
>>> OVS_PSAMPLE_COOKIE_MAX_SIZE);
>>> +
>>> +psample_args->group_id = group_id;
>>> +ofpbuf_put(&psample_args->cookie, &os->obs_domain_id,
>>> +   sizeof(os->obs_domain_id));
>>> +ofpbuf_put(&psample_args->cookie, &os->obs_point_id,
>>> +   sizeof(os->obs_point_id));
>>> +}
>>> +}
>>> +
>>> +compose_sample_action(ctx, probability, upcall_cookie, psample_args,
>>> +  tunnel_out_port, false);
>>> +
>>> +if (upcall_cookie) {
>>> +free(upcall_cookie);
>>> +}
>>> +if (psample_args) {
>>> +ofpbuf_uninit(&psample_args->cookie);
>>> +free(psample_args);
>>> +}
>>
>> Can we avoid these small memory allocations?
>
> Probably. I'll try to rework it. Two of the arguments to 
> compose_sample_action are now optional so having a pointer to the stack will 
> not work unless we add flags to indicate whether they are valid or not. But 
> maybe we can rework it by having something like the follwoing to avoid 
> clogging the argument list;
>
> struct sample_args {
>   struct user_action_cookie *upcall;
>   struct psample_args *psample;
>   bool has_psample;
>   bool has_upcall;
> };

Don’t think we need a structure, we can use your existing bools. Something like 
the below seems simple enough:

diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index a9856e358..42100f15b 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -5958,8 +5958,8 @@ xlate_sample_action(struct xlate_ctx *ctx,
 {
 struct dpif_psample *psample = ctx->xbridge->psample;
 struct dpif_ipfix *ipfix = ctx->xbridge->ipfix;
-struct user_action_cookie *upcall_cookie = NULL;
-struct psample_args *psample_args = NULL;
+struct user_action_cookie upcall_cookie;
+struct psample_args psample_args;
 odp_port_t tunnel_out_port = ODPP_NONE;
 uint32_t group_id;

@@ -5973,21 +5973,20 @@ xlate_sample_action(struct xlate_ctx *ctx,
 ((uint32_t) os->probability << 16) | os->probability;

 if (ipfix) {
-upcall_cookie = xzalloc(sizeof *upcall_cookie);
-xlate_fill_ipfix_sample(ctx, os, ipfix, upcall_cookie,
+xlate_fill_ipfix_sample(ctx, os, ipfix, &upcall_cookie,
 &tunnel_out_port);
 }
 if (psample) {
 if (dpif_psample_get_group_id(psample,
   os->collector_set_id,
   &group_id)) {
-psample_args = xzalloc(sizeof *psample_args);
-ofpbuf_init(&psample_args->cookie, OVS_PSAMPLE_COOKIE_MAX_SIZE);

-psample_args->group_id = group_id;
-ofpbuf_put(&psample_args->cookie, &os->obs_domain_id,
+ofpbuf_init(&psample_args.cookie, OVS_PSAMPLE_COOKIE_MAX_SIZE);
+
+psample_args.group_id = group_id;
+ofpbuf_put(&psample_args.cookie, &os->obs_domain_id,
sizeof(os->obs_domain_id));
-ofpbuf_put(&psample_args->cookie, &os->obs_point_id,
+ofpbuf_put(&psample_

Re: [ovs-dev] [RFC 06/11] utilities: Add ovs-psample.

2024-05-13 Thread Eelco Chaudron


On 13 May 2024, at 9:01, Adrian Moreno wrote:

> On 5/10/24 12:06 PM, Eelco Chaudron wrote:
>> On 24 Apr 2024, at 21:53, Adrian Moreno wrote:
>>
>>> This simple program reads from psample and prints the packets to stdout.
>>> It's useful for quickly collecting sampled packets.
>>
>> See some comments below, did not review the actual sample application in 
>> detail.
>>
>> //Eelco
>>
>>> Signed-off-by: Adrian Moreno 
>>> ---
>>>   Documentation/automake.mk   |   1 +
>>>   Documentation/conf.py   |   2 +
>>>   Documentation/ref/index.rst |   1 +
>>>   Documentation/ref/ovs-psample.8.rst |  47 
>>>   include/linux/automake.mk   |   1 +
>>>   include/linux/psample.h |  64 ++
>>>   rhel/openvswitch-fedora.spec.in |   2 +
>>>   rhel/openvswitch.spec.in|   1 +
>>>   utilities/automake.mk   |   9 +
>>>   utilities/ovs-psample.c | 330 
>>>   10 files changed, 458 insertions(+)
>>>   create mode 100644 Documentation/ref/ovs-psample.8.rst
>>>   create mode 100644 include/linux/psample.h
>>>   create mode 100644 utilities/ovs-psample.c
>>>
>>> diff --git a/Documentation/automake.mk b/Documentation/automake.mk
>>> index 47d2e336a..c22facfd6 100644
>>> --- a/Documentation/automake.mk
>>> +++ b/Documentation/automake.mk
>>> @@ -165,6 +165,7 @@ RST_MANPAGES = \
>>> ovs-l3ping.8.rst \
>>> ovs-parse-backtrace.8.rst \
>>> ovs-pki.8.rst \
>>> +   ovs-psample.8.rst \
>>> ovs-tcpdump.8.rst \
>>> ovs-tcpundump.1.rst \
>>> ovs-test.8.rst \
>>> diff --git a/Documentation/conf.py b/Documentation/conf.py
>>> index 15785605a..75efed2fc 100644
>>> --- a/Documentation/conf.py
>>> +++ b/Documentation/conf.py
>>> @@ -134,6 +134,8 @@ _man_pages = [
>>>u'convert "tcpdump -xx" output to hex strings'),
>>>   ('ovs-test.8',
>>>u'Check Linux drivers for performance, vlan and L3 tunneling 
>>> problems'),
>>> +('ovs-psample.8',
>>> + u'Print packets sampled by psample'),
>>>   ('ovs-vlan-test.8',
>>>u'Check Linux drivers for problems with vlan traffic'),
>>>   ('ovsdb-server.7',
>>> diff --git a/Documentation/ref/index.rst b/Documentation/ref/index.rst
>>> index 03ada932f..d1076435a 100644
>>> --- a/Documentation/ref/index.rst
>>> +++ b/Documentation/ref/index.rst
>>> @@ -46,6 +46,7 @@ time:
>>>  ovs-pki.8
>>>  ovs-sim.1
>>>  ovs-parse-backtrace.8
>>> +   ovs-psample.8
>>>  ovs-tcpdump.8
>>>  ovs-tcpundump.1
>>>  ovs-test.8
>>> diff --git a/Documentation/ref/ovs-psample.8.rst 
>>> b/Documentation/ref/ovs-psample.8.rst
>>> new file mode 100644
>>> index 0..c849c83d8
>>> --- /dev/null
>>> +++ b/Documentation/ref/ovs-psample.8.rst
>>> @@ -0,0 +1,47 @@
>>> +==
>>> +ovs-sample
>>
>> Interesting, here you call it all ovs-sample here, which is like ;)
>
> Yes, at the begining I called it ovs-sample as I thought to make some generic 
> tool, but since it ended up being very psample-specific I added the "p" (and 
> missed this one).
>
>>
>> You could add options like —local-kernel --local-userspace (--ipfix, 
>> --sflow) and it will eventually work on each datapath.

>
> You mean implementing an IPFIX and sFlow collector?
>
?
>> If you want to keep this a separate psample utility, I would not have ovs in 
>> the name, as it's rather general and not OVS specific. Maybe just 
>> psample/psample_mon, as we also have nlmon.
>>
>
> Well, the way the cookie is decoded into observation_domain_id and 
> observation_point_id is ovs-specific.
>
> In fact, for the next iteration of the series I will remove the filtering API 
> since its getting removed from the kernel series as well and add some kind of 
> BPF code that does the filtering. Also I was  considering looking into the 
> OVSDB to ensure that we filter on groups configured in it or else we could 
> wrongly interpet a sampled packet that comes from some other subsystem.

I would prefer to have this as an ova-sample application, which can be extended 
with other sample methods as we see fit. So when we added userspace support we 
can add it here. If we find someone crazy enough to do a simple IPFIX and/or 
sFlow collector it can be added too.

So my request would be to remove the (p) from ovs-psample, and have a switch to 
select --psample (the only supported (MANDATORY) option for now).

>>> +==
>>> +
>>> +Synopsis
>>> +
>>> +
>>> +``ovs-sample``
>>> +[``--group=`` | ``-g`` ]
>>> +
>>> +``ovs-sample --help``
>>> +
>>> +``ovs-sample --version``
>>> +
>>> +Description
>>> +===
>>> +
>>> +Open vSwitch per-flow sampling can be configured to emit the samples
>>> +through the ``psample`` netlink multicast group.
>>> +
>>> +Such sampled traffic contains, apart from the packet, some metadata that
>>> +gives further information about the packet sample. More specifically, OVS
>>> +inserts the ``observation_domain_id`` and the ``observation_point_id`` that
>>> +

Re: [ovs-dev] [RFC 06/11] utilities: Add ovs-psample.

2024-05-13 Thread Adrian Moreno



On 5/13/24 12:44 PM, Eelco Chaudron wrote:



On 13 May 2024, at 9:01, Adrian Moreno wrote:


On 5/10/24 12:06 PM, Eelco Chaudron wrote:

On 24 Apr 2024, at 21:53, Adrian Moreno wrote:


This simple program reads from psample and prints the packets to stdout.
It's useful for quickly collecting sampled packets.


See some comments below, did not review the actual sample application in detail.

//Eelco


Signed-off-by: Adrian Moreno 
---
   Documentation/automake.mk   |   1 +
   Documentation/conf.py   |   2 +
   Documentation/ref/index.rst |   1 +
   Documentation/ref/ovs-psample.8.rst |  47 
   include/linux/automake.mk   |   1 +
   include/linux/psample.h |  64 ++
   rhel/openvswitch-fedora.spec.in |   2 +
   rhel/openvswitch.spec.in|   1 +
   utilities/automake.mk   |   9 +
   utilities/ovs-psample.c | 330 
   10 files changed, 458 insertions(+)
   create mode 100644 Documentation/ref/ovs-psample.8.rst
   create mode 100644 include/linux/psample.h
   create mode 100644 utilities/ovs-psample.c

diff --git a/Documentation/automake.mk b/Documentation/automake.mk
index 47d2e336a..c22facfd6 100644
--- a/Documentation/automake.mk
+++ b/Documentation/automake.mk
@@ -165,6 +165,7 @@ RST_MANPAGES = \
ovs-l3ping.8.rst \
ovs-parse-backtrace.8.rst \
ovs-pki.8.rst \
+   ovs-psample.8.rst \
ovs-tcpdump.8.rst \
ovs-tcpundump.1.rst \
ovs-test.8.rst \
diff --git a/Documentation/conf.py b/Documentation/conf.py
index 15785605a..75efed2fc 100644
--- a/Documentation/conf.py
+++ b/Documentation/conf.py
@@ -134,6 +134,8 @@ _man_pages = [
u'convert "tcpdump -xx" output to hex strings'),
   ('ovs-test.8',
u'Check Linux drivers for performance, vlan and L3 tunneling problems'),
+('ovs-psample.8',
+ u'Print packets sampled by psample'),
   ('ovs-vlan-test.8',
u'Check Linux drivers for problems with vlan traffic'),
   ('ovsdb-server.7',
diff --git a/Documentation/ref/index.rst b/Documentation/ref/index.rst
index 03ada932f..d1076435a 100644
--- a/Documentation/ref/index.rst
+++ b/Documentation/ref/index.rst
@@ -46,6 +46,7 @@ time:
  ovs-pki.8
  ovs-sim.1
  ovs-parse-backtrace.8
+   ovs-psample.8
  ovs-tcpdump.8
  ovs-tcpundump.1
  ovs-test.8
diff --git a/Documentation/ref/ovs-psample.8.rst 
b/Documentation/ref/ovs-psample.8.rst
new file mode 100644
index 0..c849c83d8
--- /dev/null
+++ b/Documentation/ref/ovs-psample.8.rst
@@ -0,0 +1,47 @@
+==
+ovs-sample


Interesting, here you call it all ovs-sample here, which is like ;)


Yes, at the begining I called it ovs-sample as I thought to make some generic tool, but 
since it ended up being very psample-specific I added the "p" (and missed this 
one).



You could add options like —local-kernel --local-userspace (--ipfix, --sflow) 
and it will eventually work on each datapath.




You mean implementing an IPFIX and sFlow collector?


?

If you want to keep this a separate psample utility, I would not have ovs in 
the name, as it's rather general and not OVS specific. Maybe just 
psample/psample_mon, as we also have nlmon.



Well, the way the cookie is decoded into observation_domain_id and 
observation_point_id is ovs-specific.

In fact, for the next iteration of the series I will remove the filtering API 
since its getting removed from the kernel series as well and add some kind of 
BPF code that does the filtering. Also I was  considering looking into the 
OVSDB to ensure that we filter on groups configured in it or else we could 
wrongly interpet a sampled packet that comes from some other subsystem.


I would prefer to have this as an ova-sample application, which can be extended 
with other sample methods as we see fit. So when we added userspace support we 
can add it here. If we find someone crazy enough to do a simple IPFIX and/or 
sFlow collector it can be added too.

So my request would be to remove the (p) from ovs-psample, and have a switch to 
select --psample (the only supported (MANDATORY) option for now).



Oh, ok. Now I undestand, thank you. We want to leave room for the userspace 
implementation. Will do.




+==
+
+Synopsis
+
+
+``ovs-sample``
+[``--group=`` | ``-g`` ]
+
+``ovs-sample --help``
+
+``ovs-sample --version``
+
+Description
+===
+
+Open vSwitch per-flow sampling can be configured to emit the samples
+through the ``psample`` netlink multicast group.
+
+Such sampled traffic contains, apart from the packet, some metadata that
+gives further information about the packet sample. More specifically, OVS
+inserts the ``observation_domain_id`` and the ``observation_point_id`` that
+where provided in the sample action (see ``ovs-actions(7)``).
+
+the ``ovs-sample`` program provides a simple way of joining the psample
+multicast group and printing the sampled packets.
+
+
+Options
+==

Re: [ovs-dev] [RFC 08/11] netdev-offload-tc: Add sample support.

2024-05-13 Thread Eelco Chaudron


On 13 May 2024, at 10:44, Adrian Moreno wrote:

> On 5/10/24 12:06 PM, Eelco Chaudron wrote:
>> On 24 Apr 2024, at 21:53, Adrian Moreno wrote:
>>
>>> Offload the sample action if it contains psample information by creating
>>> a tc "sample" action with the user cookie inside the action's cookie.
>>>
>>> Avoid using the "sample" action's cookie to store the ufid.
>>
>> I have some concerns about the sample action-only solution. What happened to 
>> the idea of adding an additional cookie to the TC match? Can we add a test 
>> case for the explicit drop?
>>
>
>
> I guess we can ask the kernel community. However, I'm not 100% convinced it 
> makes a lot of sense form a kernel perspective. There is already a cookie in 
> every action so there is plenty of space to store user data. Kernel does not 
> enforce any semantics on the cookies, it's up to userspace to interpret them 
> at will. Also, you cannot have a flow without an action IIUC.
> So having a new cookie added just because it makes OVS code sligthly cleaner 
> doesn't seem to be a super strong argument.

ACK, but I thought this was part of the earlier discussion, so it should be 
explored. I guess we can make the opposite argument, why a per-action cookie if 
we could have a per-flow one? Having a match cookie will free the action cookie 
for action related metadata.

>
> Regarding tc <-> ufid correlation. I'm a bit confused by the fact that flow 
> replacement seems to work on a tcf_id basis, meaning that each tc flow has 
> it's own unique tcf_id (or we would replace some other flow). Also, there is 
> a hmap mapping tcf_id to ufids already. Am I missing something?

You are right, I guess this exists to support kernels where we did not have an 
actions cookie, so we use find_ufid() in the flow dump. So in theory if there 
is no action with a flow cookie it will all still work.

>> In general, I think we should apply the sflow patch before this series.
>>
>
> Agree. Are we all OK with the compatibility issues that will introduce?
>
>> //Eelco
>>
>>> Signed-off-by: Adrian Moreno 
>>> ---
>>>   include/linux/automake.mk|  5 +-
>>>   include/linux/pkt_cls.h  |  2 +
>>>   include/linux/tc_act/tc_sample.h | 26 ++
>>>   lib/netdev-offload-tc.c  | 67 +
>>>   lib/tc.c | 84 ++--
>>>   lib/tc.h |  7 +++
>>>   utilities/ovs-psample.c  |  8 +--
>>>   7 files changed, 190 insertions(+), 9 deletions(-)
>>>   create mode 100644 include/linux/tc_act/tc_sample.h
>>>
>>> diff --git a/include/linux/automake.mk b/include/linux/automake.mk
>>> index ac306b53c..c2a270152 100644
>>> --- a/include/linux/automake.mk
>>> +++ b/include/linux/automake.mk
>>> @@ -5,9 +5,10 @@ noinst_HEADERS += \
>>> include/linux/pkt_cls.h \
>>> include/linux/psample.h \
>>> include/linux/gen_stats.h \
>>> +   include/linux/tc_act/tc_ct.h \
>>> include/linux/tc_act/tc_mpls.h \
>>> include/linux/tc_act/tc_pedit.h \
>>> +   include/linux/tc_act/tc_sample.h \
>>> include/linux/tc_act/tc_skbedit.h \
>>> include/linux/tc_act/tc_tunnel_key.h \
>>> -   include/linux/tc_act/tc_vlan.h \
>>> -   include/linux/tc_act/tc_ct.h
>>> +   include/linux/tc_act/tc_vlan.h
>>> diff --git a/include/linux/pkt_cls.h b/include/linux/pkt_cls.h
>>> index fb4a7ecea..c566ac1b5 100644
>>> --- a/include/linux/pkt_cls.h
>>> +++ b/include/linux/pkt_cls.h
>>> @@ -8,6 +8,8 @@
>>>   #include 
>>>   #include 
>>>
>>> +#define TC_COOKIE_MAX_SIZE 16
>>> +
>>>   /* Action attributes */
>>>   enum {
>>> TCA_ACT_UNSPEC,
>>> diff --git a/include/linux/tc_act/tc_sample.h 
>>> b/include/linux/tc_act/tc_sample.h
>>> new file mode 100644
>>> index 0..398f32761
>>> --- /dev/null
>>> +++ b/include/linux/tc_act/tc_sample.h
>>> @@ -0,0 +1,26 @@
>>> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
>>> +#ifndef __LINUX_TC_SAMPLE_H
>>> +#define __LINUX_TC_SAMPLE_H
>>> +
>>> +#include 
>>> +#include 
>>> +#include 
>>> +
>>> +struct tc_sample {
>>> +   tc_gen;
>>> +};
>>> +
>>> +enum {
>>> +   TCA_SAMPLE_UNSPEC,
>>> +   TCA_SAMPLE_TM,
>>> +   TCA_SAMPLE_PARMS,
>>> +   TCA_SAMPLE_RATE,
>>> +   TCA_SAMPLE_TRUNC_SIZE,
>>> +   TCA_SAMPLE_PSAMPLE_GROUP,
>>> +   TCA_SAMPLE_PAD,
>>> +   __TCA_SAMPLE_MAX
>>> +};
>>> +#define TCA_SAMPLE_MAX (__TCA_SAMPLE_MAX - 1)
>>> +
>>> +#endif
>>> +
>>> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
>>> index 36e814bee..0e7273431 100644
>>> --- a/lib/netdev-offload-tc.c
>>> +++ b/lib/netdev-offload-tc.c
>>> @@ -1038,6 +1038,21 @@ parse_tc_flower_to_actions__(struct tc_flower 
>>> *flower, struct ofpbuf *buf,
>>>   nl_msg_end_nested(buf, offset);
>>>   }
>>>   break;
>>> +case TC_ACT_SAMPLE: {
>>> +size_t offset;
>>> +
>>> +offset = nl_msg_start_nested(buf, OVS_ACTION_ATTR_SAMPLE);
>>> +nl_msg_put_u32(buf, OVS_SAMPLE_ATTR_PROBABILITY,
>>> +  

Re: [ovs-dev] [Patch] ovsdb-client: Add missing arg to help for 'dump'.

2024-05-13 Thread Simon Horman
On Fri, May 10, 2024 at 01:31:10PM +0200, martin.kal...@canonical.com wrote:
> Thanks for the merge and backports Simon.
> 
> On Tue, 2024-05-07 at 17:26 +0100, Simon Horman wrote:
> > On Tue, May 07, 2024 at 03:16:25PM +0100, Simon Horman wrote:
> > > On Fri, May 03, 2024 at 02:22:27PM +0200,
> > > martin.kal...@canonical.com wrote:
> > 
> > ...
> > 
> > > Thanks,
> > > 
> > > Sorry for the confusion on my part.
> > > Now that I can see it again I have applied it to main.
> > > 
> > > - ovsdb-client: Add missing arg to help for 'dump'.
> > >   https://github.com/openvswitch/ovs/commit/0940a51b1f5a
> > > 
> > > As a follow-up I'll look at backporting this.
> > 
> > Hi Martin,
> > 
> > In preparing the backports I noticed that dump also supports optional
> > TABLES arguments. Could you consider a follow-up patch to add
> > that to the help text too?
> 
> You mean COLUMNS argument, right? Yeah, I didn't noticed this earlier.
> I'll post follow-up patch, thanks.

Yes, I meant COLUMNS.
Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] Subject: conntrack: Fully initialize conn struct before insertion.

2024-05-13 Thread Simon Horman
On Fri, May 10, 2024 at 05:43:51PM +0200, Paolo Valerio wrote:
> From: Mike Pattrick 
> 
> In case packets are concurrently received in both directions, there's
> a chance that the ones in the reverse direction get received right
> after the connection gets added to the connection tracker but before
> some of the connection's fields are fully initialized.
> This could cause OVS to access potentially invalid, as the lookup may
> end up retrieving the wrong offsets during CONTAINER_OF(), or
> uninitialized memory.
> 
> This may happen in case of regular NAT or all-zero SNAT.
> 
> Fix it by initializing early the connections fields.
> 
> Fixes: 1116459b3ba8 ("conntrack: Remove nat_conn introducing key 
> directionality.")
> Reported-at: https://issues.redhat.com/browse/FDP-616
> Signed-off-by: Mike Pattrick 
> Co-authored-by: Paolo Valerio 
> Signed-off-by: Paolo Valerio 

Acked-by: Simon Horman 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC 08/11] netdev-offload-tc: Add sample support.

2024-05-13 Thread Adrian Moreno



On 5/13/24 1:32 PM, Eelco Chaudron wrote:



On 13 May 2024, at 10:44, Adrian Moreno wrote:


On 5/10/24 12:06 PM, Eelco Chaudron wrote:

On 24 Apr 2024, at 21:53, Adrian Moreno wrote:


Offload the sample action if it contains psample information by creating
a tc "sample" action with the user cookie inside the action's cookie.

Avoid using the "sample" action's cookie to store the ufid.


I have some concerns about the sample action-only solution. What happened to 
the idea of adding an additional cookie to the TC match? Can we add a test case 
for the explicit drop?




I guess we can ask the kernel community. However, I'm not 100% convinced it 
makes a lot of sense form a kernel perspective. There is already a cookie in 
every action so there is plenty of space to store user data. Kernel does not 
enforce any semantics on the cookies, it's up to userspace to interpret them at 
will. Also, you cannot have a flow without an action IIUC.
So having a new cookie added just because it makes OVS code sligthly cleaner 
doesn't seem to be a super strong argument.


ACK, but I thought this was part of the earlier discussion, so it should be 
explored. I guess we can make the opposite argument, why a per-action cookie if 
we could have a per-flow one? Having a match cookie will free the action cookie 
for action related metadata.



Regarding tc <-> ufid correlation. I'm a bit confused by the fact that flow 
replacement seems to work on a tcf_id basis, meaning that each tc flow has it's own 
unique tcf_id (or we would replace some other flow). Also, there is a hmap mapping 
tcf_id to ufids already. Am I missing something?


You are right, I guess this exists to support kernels where we did not have an 
actions cookie, so we use find_ufid() in the flow dump. So in theory if there 
is no action with a flow cookie it will all still work.



So, could we just rely on this mechanism which is also backwards compatible and 
just use the action cookie for action information?



In general, I think we should apply the sflow patch before this series.



Agree. Are we all OK with the compatibility issues that will introduce?


//Eelco


Signed-off-by: Adrian Moreno 
---
   include/linux/automake.mk|  5 +-
   include/linux/pkt_cls.h  |  2 +
   include/linux/tc_act/tc_sample.h | 26 ++
   lib/netdev-offload-tc.c  | 67 +
   lib/tc.c | 84 ++--
   lib/tc.h |  7 +++
   utilities/ovs-psample.c  |  8 +--
   7 files changed, 190 insertions(+), 9 deletions(-)
   create mode 100644 include/linux/tc_act/tc_sample.h

diff --git a/include/linux/automake.mk b/include/linux/automake.mk
index ac306b53c..c2a270152 100644
--- a/include/linux/automake.mk
+++ b/include/linux/automake.mk
@@ -5,9 +5,10 @@ noinst_HEADERS += \
include/linux/pkt_cls.h \
include/linux/psample.h \
include/linux/gen_stats.h \
+   include/linux/tc_act/tc_ct.h \
include/linux/tc_act/tc_mpls.h \
include/linux/tc_act/tc_pedit.h \
+   include/linux/tc_act/tc_sample.h \
include/linux/tc_act/tc_skbedit.h \
include/linux/tc_act/tc_tunnel_key.h \
-   include/linux/tc_act/tc_vlan.h \
-   include/linux/tc_act/tc_ct.h
+   include/linux/tc_act/tc_vlan.h
diff --git a/include/linux/pkt_cls.h b/include/linux/pkt_cls.h
index fb4a7ecea..c566ac1b5 100644
--- a/include/linux/pkt_cls.h
+++ b/include/linux/pkt_cls.h
@@ -8,6 +8,8 @@
   #include 
   #include 

+#define TC_COOKIE_MAX_SIZE 16
+
   /* Action attributes */
   enum {
TCA_ACT_UNSPEC,
diff --git a/include/linux/tc_act/tc_sample.h b/include/linux/tc_act/tc_sample.h
new file mode 100644
index 0..398f32761
--- /dev/null
+++ b/include/linux/tc_act/tc_sample.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef __LINUX_TC_SAMPLE_H
+#define __LINUX_TC_SAMPLE_H
+
+#include 
+#include 
+#include 
+
+struct tc_sample {
+   tc_gen;
+};
+
+enum {
+   TCA_SAMPLE_UNSPEC,
+   TCA_SAMPLE_TM,
+   TCA_SAMPLE_PARMS,
+   TCA_SAMPLE_RATE,
+   TCA_SAMPLE_TRUNC_SIZE,
+   TCA_SAMPLE_PSAMPLE_GROUP,
+   TCA_SAMPLE_PAD,
+   __TCA_SAMPLE_MAX
+};
+#define TCA_SAMPLE_MAX (__TCA_SAMPLE_MAX - 1)
+
+#endif
+
diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
index 36e814bee..0e7273431 100644
--- a/lib/netdev-offload-tc.c
+++ b/lib/netdev-offload-tc.c
@@ -1038,6 +1038,21 @@ parse_tc_flower_to_actions__(struct tc_flower *flower, 
struct ofpbuf *buf,
   nl_msg_end_nested(buf, offset);
   }
   break;
+case TC_ACT_SAMPLE: {
+size_t offset;
+
+offset = nl_msg_start_nested(buf, OVS_ACTION_ATTR_SAMPLE);
+nl_msg_put_u32(buf, OVS_SAMPLE_ATTR_PROBABILITY,
+   UINT32_MAX / action->sample.rate);
+nl_msg_put_u32(buf, OVS_SAMPLE_ATTR_PSAMPLE_GROUP,
+  

Re: [ovs-dev] [RFC 01/11] odp-util: Add support to new psample uAPI.

2024-05-13 Thread Ilya Maximets
On 5/13/24 09:17, Eelco Chaudron wrote:
> 
> 
> On 10 May 2024, at 15:06, Ilya Maximets wrote:
> 
>> On 5/10/24 14:01, Eelco Chaudron wrote:
>>>
>>>
>>> On 10 May 2024, at 12:45, Adrian Moreno wrote:
>>>
 On 5/10/24 12:06 PM, Eelco Chaudron wrote:
> On 24 Apr 2024, at 21:53, Adrian Moreno wrote:
>
>> The new odp sample attributes allow userspace to specify a group_id and
>> user-defined cookie to be passed down to psample.
>>
>> Add support for parsing and formatting such action.
>>
>> Signed-off-by: Adrian Moreno 
>
> Hi Adrian,
>
> See my comments below inline.
>
> //Eelco
>
>> ---
>>   include/linux/openvswitch.h  |  49 +---
>>   lib/odp-execute.c|   3 +
>>   lib/odp-util.c   | 150 ++-
>>   ofproto/ofproto-dpif-ipfix.c |   2 +
>>   python/ovs/flow/odp.py   |   2 +
>>   tests/odp.at |   5 ++
>>   6 files changed, 163 insertions(+), 48 deletions(-)
>>
>> diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
>> index d9fb991ef..3e748405c 100644
>> --- a/include/linux/openvswitch.h
>> +++ b/include/linux/openvswitch.h
>> @@ -696,6 +696,7 @@ enum ovs_flow_attr {
>>   #define OVS_UFID_F_OMIT_MASK (1 << 1)
>>   #define OVS_UFID_F_OMIT_ACTIONS  (1 << 2)
>>
>> +#define OVS_PSAMPLE_COOKIE_MAX_SIZE 16
>>   /**
>>* enum ovs_sample_attr - Attributes for %OVS_ACTION_ATTR_SAMPLE 
>> action.
>>* @OVS_SAMPLE_ATTR_PROBABILITY: 32-bit fraction of packets to sample 
>> with
>> @@ -703,15 +704,27 @@ enum ovs_flow_attr {
>>* %UINT32_MAX samples all packets and intermediate values sample 
>> intermediate
>>* fractions of packets.
>>* @OVS_SAMPLE_ATTR_ACTIONS: Set of actions to execute in sampling 
>> event.
>> - * Actions are passed as nested attributes.
>> + * Actions are passed as nested attributes. Optional if
>> + * OVS_SAMPLE_ATTR_PSAMPLE_GROUP is set.
>> + * @OVS_SAMPLE_ATTR_PSAMPLE_GROUP: A 32-bit number to be used as 
>> psample group.
>> + * Optional if OVS_SAMPLE_ATTR_ACTIONS is set.
>> + * @OVS_SAMPLE_ATTR_PSAMPLE_COOKIE: A variable-length binary cookie 
>> that, if
>> + * provided, will be copied to the psample cookie.
>
> Should we mention any limit on the actual length of the cookie?
>
> Any reason we explicitly call this psample? From an OVS perspective, this 
> could just be 
> SAMPLE_ATTR_FORWARD/OFFLOAD/DESTINATION/ENDPOINT/COLLECTOR/TARGET_COOKIE 
> and _GROUP. Other datapaths, do not have PSAMPLE.
>

 See my response to your comment on the cover-letter for possible name 
 alternatives.
>>>
>>> ACK, we can continue the discussion there.
>>>
> Thinking about it more, from an OVS perspective we should probably not 
> even send down a COOKIE, but we should send down an obs_domain_id and 
> obs_point_id, and then the kernel can move this into a cookie.
>

 I did consider this. However, the opaque cookie fits well with the tc API 
 which does not have two 32-bit values but an opaque 128-bit cookie. So if 
 the action is offloaded OVS needs to "encode" the two 32-bit values into 
 the cookie anyways.
>>>
>>> Which is fine to me, the OVS API should be clear that we have two u32 
>>> entries. The cookie is implementation-specific, and we use the netlink 
>>> format as our API for all DPs.
>>
>>
>> I'm not sure about this.  It is a kernel interface and we can't deliver
>> two separate values from the psample.  So, passing two separate values
>> to the kernel doesn't make a lot of sense.  They are going to be mangled
>> into a single cookie anyway and we'll have to document that somehow.
>> We may as well always mangle the data from the beginning, document once
>> at the top level and save ourselves from documenting a million differences
>> between different implementations.  While USDT could report them separately,
>> I'm not sure there is a ton of value in that.
> 
> Our OVS action API is not tight to psample at all, so passing in a psample 
> cookie
> does not make sense. We should not pass in any psample references to the 
> sample
> action API. If it looks like the below, the API is clean and we can mangle it 
> before
> passing it to the psample function. I see no need to document this in the 
> kernel, as
> all the kernel does is provide the TC action cookie (unrelated to the OVS 
> actions API).

We're passing this data directly to psample, not TC.  And some documentation
is needed.  How users supposed to know where to find these observation
domain and point?  It should be specified how they are going to be seen on
the output from psample.  If kernel is doning the mangling, this should be
documented in the kernel uAPI.  If kernel receives opaque cookie and passes
it directly to psample, then we c

Re: [ovs-dev] [PATCH v2] conntrack: Fully initialize conn struct before insertion.

2024-05-13 Thread Simon Horman
On Fri, May 10, 2024 at 05:45:54PM +0200, Paolo Valerio wrote:
> From: Mike Pattrick 
> 
> In case packets are concurrently received in both directions, there's
> a chance that the ones in the reverse direction get received right
> after the connection gets added to the connection tracker but before
> some of the connection's fields are fully initialized.
> This could cause OVS to access potentially invalid, as the lookup may
> end up retrieving the wrong offsets during CONTAINER_OF(), or
> uninitialized memory.
> 
> This may happen in case of regular NAT or all-zero SNAT.
> 
> Fix it by initializing early the connections fields.
> 
> Fixes: 1116459b3ba8 ("conntrack: Remove nat_conn introducing key 
> directionality.")
> Reported-at: https://issues.redhat.com/browse/FDP-616
> Signed-off-by: Mike Pattrick 
> Co-authored-by: Paolo Valerio 
> Signed-off-by: Paolo Valerio 

Acked-by: Simon Horman 

(I accidently sent the above for v1 just now)

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC 01/11] odp-util: Add support to new psample uAPI.

2024-05-13 Thread Adrian Moreno




On 5/13/24 2:38 PM, Ilya Maximets wrote:

On 5/13/24 09:17, Eelco Chaudron wrote:



On 10 May 2024, at 15:06, Ilya Maximets wrote:


On 5/10/24 14:01, Eelco Chaudron wrote:



On 10 May 2024, at 12:45, Adrian Moreno wrote:


On 5/10/24 12:06 PM, Eelco Chaudron wrote:

On 24 Apr 2024, at 21:53, Adrian Moreno wrote:


The new odp sample attributes allow userspace to specify a group_id and
user-defined cookie to be passed down to psample.

Add support for parsing and formatting such action.

Signed-off-by: Adrian Moreno 


Hi Adrian,

See my comments below inline.

//Eelco


---
   include/linux/openvswitch.h  |  49 +---
   lib/odp-execute.c|   3 +
   lib/odp-util.c   | 150 ++-
   ofproto/ofproto-dpif-ipfix.c |   2 +
   python/ovs/flow/odp.py   |   2 +
   tests/odp.at |   5 ++
   6 files changed, 163 insertions(+), 48 deletions(-)

diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
index d9fb991ef..3e748405c 100644
--- a/include/linux/openvswitch.h
+++ b/include/linux/openvswitch.h
@@ -696,6 +696,7 @@ enum ovs_flow_attr {
   #define OVS_UFID_F_OMIT_MASK (1 << 1)
   #define OVS_UFID_F_OMIT_ACTIONS  (1 << 2)

+#define OVS_PSAMPLE_COOKIE_MAX_SIZE 16
   /**
* enum ovs_sample_attr - Attributes for %OVS_ACTION_ATTR_SAMPLE action.
* @OVS_SAMPLE_ATTR_PROBABILITY: 32-bit fraction of packets to sample with
@@ -703,15 +704,27 @@ enum ovs_flow_attr {
* %UINT32_MAX samples all packets and intermediate values sample 
intermediate
* fractions of packets.
* @OVS_SAMPLE_ATTR_ACTIONS: Set of actions to execute in sampling event.
- * Actions are passed as nested attributes.
+ * Actions are passed as nested attributes. Optional if
+ * OVS_SAMPLE_ATTR_PSAMPLE_GROUP is set.
+ * @OVS_SAMPLE_ATTR_PSAMPLE_GROUP: A 32-bit number to be used as psample group.
+ * Optional if OVS_SAMPLE_ATTR_ACTIONS is set.
+ * @OVS_SAMPLE_ATTR_PSAMPLE_COOKIE: A variable-length binary cookie that, if
+ * provided, will be copied to the psample cookie.


Should we mention any limit on the actual length of the cookie?

Any reason we explicitly call this psample? From an OVS perspective, this could 
just be 
SAMPLE_ATTR_FORWARD/OFFLOAD/DESTINATION/ENDPOINT/COLLECTOR/TARGET_COOKIE and 
_GROUP. Other datapaths, do not have PSAMPLE.



See my response to your comment on the cover-letter for possible name 
alternatives.


ACK, we can continue the discussion there.


Thinking about it more, from an OVS perspective we should probably not even 
send down a COOKIE, but we should send down an obs_domain_id and obs_point_id, 
and then the kernel can move this into a cookie.



I did consider this. However, the opaque cookie fits well with the tc API which does not 
have two 32-bit values but an opaque 128-bit cookie. So if the action is offloaded OVS 
needs to "encode" the two 32-bit values into the cookie anyways.


Which is fine to me, the OVS API should be clear that we have two u32 entries. 
The cookie is implementation-specific, and we use the netlink format as our API 
for all DPs.



I'm not sure about this.  It is a kernel interface and we can't deliver
two separate values from the psample.  So, passing two separate values
to the kernel doesn't make a lot of sense.  They are going to be mangled
into a single cookie anyway and we'll have to document that somehow.
We may as well always mangle the data from the beginning, document once
at the top level and save ourselves from documenting a million differences
between different implementations.  While USDT could report them separately,
I'm not sure there is a ton of value in that.


Our OVS action API is not tight to psample at all, so passing in a psample 
cookie
does not make sense. We should not pass in any psample references to the sample
action API. If it looks like the below, the API is clean and we can mangle it 
before
passing it to the psample function. I see no need to document this in the 
kernel, as
all the kernel does is provide the TC action cookie (unrelated to the OVS 
actions API).


We're passing this data directly to psample, not TC.  And some documentation
is needed.  How users supposed to know where to find these observation
domain and point?  It should be specified how they are going to be seen on
the output from psample.  If kernel is doning the mangling, this should be
documented in the kernel uAPI.  If kernel receives opaque cookie and passes
it directly to psample, then we can get away with only documenting the
mangling process in OVS' docuemntation.



Besides, passing an opaque cookie allows future versions of OVS to add more data 
(in addition to obs_{domain,point}) without changing uAPI.




+   OVS_SAMPLE_ATTR_OBS_DOMAIN,/* Observation domain id, u32 
number. */
+   OVS_SAMPLE_ATTR_OBS_POINT, /* Observation point id, u32 number. 
*/
+   OVS_SAMPLE_ATTR_SYSTEM_TARGET, /* Flag to additionally sample to system 
target. */
+   O

Re: [ovs-dev] [RFC 02/11] ofproto_dpif: Check for psample support.

2024-05-13 Thread Adrian Moreno



On 5/10/24 1:49 PM, Ilya Maximets wrote:

On 5/10/24 13:03, Adrian Moreno wrote:



On 5/10/24 12:06 PM, Eelco Chaudron wrote:

On 24 Apr 2024, at 21:53, Adrian Moreno wrote:


Only kernel datapath supports psample so check that the datapath is not
userspace and that it accepts the new attributes.

Signed-off-by: Adrian Moreno 
---
   ofproto/ofproto-dpif.c | 59 ++
   ofproto/ofproto-dpif.h |  6 -
   2 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 32d037be6..3cee2795a 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -25,6 +25,7 @@
   #include "coverage.h"
   #include "cfm.h"
   #include "ct-dpif.h"
+#include "dpif-netdev.h"
   #include "fail-open.h"
   #include "guarded-list.h"
   #include "hmapx.h"
@@ -873,6 +874,12 @@ ovs_lb_output_action_supported(struct ofproto_dpif 
*ofproto)
   return ofproto->backer->rt_support.lb_output_action;
   }

+bool
+ovs_psample_supported(struct ofproto_dpif *ofproto)


As mentioned in the previous patch, I do not like the psample terminology. It 
refers to an implementation-specific name, we should come up with an agnostic 
name, like ’system/local’, but there are probably better names.


+{
+return ofproto->backer->rt_support.psample;
+}
+
   /* Tests whether 'backer''s datapath supports recirculation.  Only newer
* datapaths support OVS_KEY_ATTR_RECIRC_ID in keys.  We need to disable some
* features on older datapaths that don't support this feature.
@@ -1440,6 +1447,14 @@ dpif_supports_ct_zero_snat(struct dpif_backer *backer)
   return supported;
   }

+static bool check_psample(struct dpif_backer *backer);
+
+static bool
+dpif_supports_psample(struct dpif_backer *backer)
+{
+return !dpif_is_netdev(backer->dpif) && check_psample(backer);
+}


This looks odd, we should not do any datapath specific checks here. We should 
just call check_psample() on the netdev datapath, and it should fail. If we 
ever add support for a similar feature we need to remove this code anyway. 
Guess it also omits the log message for userspace.



I agree. Unfortunately, I didn't see any action verification code in the netdev
datapath that (as the kernel does) would make the action installation fail. I'll
double check but I think it's just not capable of detecting unsupported
attributes on flow installation.


It's expected that userspace datapath is a reference implementation that
supports any actions, because it executes most of the actions via odp-execute
module that supposed to know all the actions.

This one however doesn't make sense for userspace datapath, so we have to
check beforehand.  Adding extra parsing of actions may have significant
performance impact for upcalls.

However, ofproto-dpif module should not know any datapath details and must
not include dpif-netdev.h, the logic should be moved to dpif.c.  For example,
see what we did for the explicit drop action in
dpif_may_support_explicit_drop_action().



Thanks for this pointer. My mind kept telling me I'd seen this before related to 
the drop action, but couldn't find it.


--
Adrián

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC 06/11] utilities: Add ovs-psample.

2024-05-13 Thread Ilya Maximets
On 5/13/24 13:10, Adrian Moreno wrote:
> 
> 
> On 5/13/24 12:44 PM, Eelco Chaudron wrote:
>>
>>
>> On 13 May 2024, at 9:01, Adrian Moreno wrote:
>>
>>> On 5/10/24 12:06 PM, Eelco Chaudron wrote:
 On 24 Apr 2024, at 21:53, Adrian Moreno wrote:

> This simple program reads from psample and prints the packets to stdout.
> It's useful for quickly collecting sampled packets.

 See some comments below, did not review the actual sample application in 
 detail.

 //Eelco

> Signed-off-by: Adrian Moreno 
> ---
>Documentation/automake.mk   |   1 +
>Documentation/conf.py   |   2 +
>Documentation/ref/index.rst |   1 +
>Documentation/ref/ovs-psample.8.rst |  47 
>include/linux/automake.mk   |   1 +
>include/linux/psample.h |  64 ++
>rhel/openvswitch-fedora.spec.in |   2 +
>rhel/openvswitch.spec.in|   1 +
>utilities/automake.mk   |   9 +
>utilities/ovs-psample.c | 330 
>10 files changed, 458 insertions(+)
>create mode 100644 Documentation/ref/ovs-psample.8.rst
>create mode 100644 include/linux/psample.h
>create mode 100644 utilities/ovs-psample.c
>
> diff --git a/Documentation/automake.mk b/Documentation/automake.mk
> index 47d2e336a..c22facfd6 100644
> --- a/Documentation/automake.mk
> +++ b/Documentation/automake.mk
> @@ -165,6 +165,7 @@ RST_MANPAGES = \
>   ovs-l3ping.8.rst \
>   ovs-parse-backtrace.8.rst \
>   ovs-pki.8.rst \
> + ovs-psample.8.rst \
>   ovs-tcpdump.8.rst \
>   ovs-tcpundump.1.rst \
>   ovs-test.8.rst \
> diff --git a/Documentation/conf.py b/Documentation/conf.py
> index 15785605a..75efed2fc 100644
> --- a/Documentation/conf.py
> +++ b/Documentation/conf.py
> @@ -134,6 +134,8 @@ _man_pages = [
> u'convert "tcpdump -xx" output to hex strings'),
>('ovs-test.8',
> u'Check Linux drivers for performance, vlan and L3 tunneling 
> problems'),
> +('ovs-psample.8',
> + u'Print packets sampled by psample'),
>('ovs-vlan-test.8',
> u'Check Linux drivers for problems with vlan traffic'),
>('ovsdb-server.7',
> diff --git a/Documentation/ref/index.rst b/Documentation/ref/index.rst
> index 03ada932f..d1076435a 100644
> --- a/Documentation/ref/index.rst
> +++ b/Documentation/ref/index.rst
> @@ -46,6 +46,7 @@ time:
>   ovs-pki.8
>   ovs-sim.1
>   ovs-parse-backtrace.8
> +   ovs-psample.8
>   ovs-tcpdump.8
>   ovs-tcpundump.1
>   ovs-test.8
> diff --git a/Documentation/ref/ovs-psample.8.rst 
> b/Documentation/ref/ovs-psample.8.rst
> new file mode 100644
> index 0..c849c83d8
> --- /dev/null
> +++ b/Documentation/ref/ovs-psample.8.rst
> @@ -0,0 +1,47 @@
> +==
> +ovs-sample

 Interesting, here you call it all ovs-sample here, which is like ;)
>>>
>>> Yes, at the begining I called it ovs-sample as I thought to make some 
>>> generic tool, but since it ended up being very psample-specific I added the 
>>> "p" (and missed this one).
>>>

 You could add options like —local-kernel --local-userspace (--ipfix, 
 --sflow) and it will eventually work on each datapath.
>>
>>>
>>> You mean implementing an IPFIX and sFlow collector?
>>>
>> ?
 If you want to keep this a separate psample utility, I would not have ovs 
 in the name, as it's rather general and not OVS specific. Maybe just 
 psample/psample_mon, as we also have nlmon.

>>>
>>> Well, the way the cookie is decoded into observation_domain_id and 
>>> observation_point_id is ovs-specific.
>>>
>>> In fact, for the next iteration of the series I will remove the filtering 
>>> API since its getting removed from the kernel series as well and add some 
>>> kind of BPF code that does the filtering. Also I was  considering looking 
>>> into the OVSDB to ensure that we filter on groups configured in it or else 
>>> we could wrongly interpet a sampled packet that comes from some other 
>>> subsystem.
>>
>> I would prefer to have this as an ova-sample application, which can be 
>> extended with other sample methods as we see fit. So when we added userspace 
>> support we can add it here. If we find someone crazy enough to do a simple 
>> IPFIX and/or sFlow collector it can be added too.
>>
>> So my request would be to remove the (p) from ovs-psample, and have a switch 
>> to select --psample (the only supported (MANDATORY) option for now).
>>
> 
> Oh, ok. Now I undestand, thank you. We want to leave room for the userspace 
> implementation. Will do.

FYI, we do have sflow and netflow basic collectors in tests/test-sflow.c and
tests/test-netfl

Re: [ovs-dev] [RFC 06/11] utilities: Add ovs-psample.

2024-05-13 Thread Adrian Moreno



On 5/13/24 2:48 PM, Ilya Maximets wrote:

On 5/13/24 13:10, Adrian Moreno wrote:



On 5/13/24 12:44 PM, Eelco Chaudron wrote:



On 13 May 2024, at 9:01, Adrian Moreno wrote:


On 5/10/24 12:06 PM, Eelco Chaudron wrote:

On 24 Apr 2024, at 21:53, Adrian Moreno wrote:


This simple program reads from psample and prints the packets to stdout.
It's useful for quickly collecting sampled packets.


See some comments below, did not review the actual sample application in detail.

//Eelco


Signed-off-by: Adrian Moreno 
---
Documentation/automake.mk   |   1 +
Documentation/conf.py   |   2 +
Documentation/ref/index.rst |   1 +
Documentation/ref/ovs-psample.8.rst |  47 
include/linux/automake.mk   |   1 +
include/linux/psample.h |  64 ++
rhel/openvswitch-fedora.spec.in |   2 +
rhel/openvswitch.spec.in|   1 +
utilities/automake.mk   |   9 +
utilities/ovs-psample.c | 330 
10 files changed, 458 insertions(+)
create mode 100644 Documentation/ref/ovs-psample.8.rst
create mode 100644 include/linux/psample.h
create mode 100644 utilities/ovs-psample.c

diff --git a/Documentation/automake.mk b/Documentation/automake.mk
index 47d2e336a..c22facfd6 100644
--- a/Documentation/automake.mk
+++ b/Documentation/automake.mk
@@ -165,6 +165,7 @@ RST_MANPAGES = \
ovs-l3ping.8.rst \
ovs-parse-backtrace.8.rst \
ovs-pki.8.rst \
+   ovs-psample.8.rst \
ovs-tcpdump.8.rst \
ovs-tcpundump.1.rst \
ovs-test.8.rst \
diff --git a/Documentation/conf.py b/Documentation/conf.py
index 15785605a..75efed2fc 100644
--- a/Documentation/conf.py
+++ b/Documentation/conf.py
@@ -134,6 +134,8 @@ _man_pages = [
 u'convert "tcpdump -xx" output to hex strings'),
('ovs-test.8',
 u'Check Linux drivers for performance, vlan and L3 tunneling 
problems'),
+('ovs-psample.8',
+ u'Print packets sampled by psample'),
('ovs-vlan-test.8',
 u'Check Linux drivers for problems with vlan traffic'),
('ovsdb-server.7',
diff --git a/Documentation/ref/index.rst b/Documentation/ref/index.rst
index 03ada932f..d1076435a 100644
--- a/Documentation/ref/index.rst
+++ b/Documentation/ref/index.rst
@@ -46,6 +46,7 @@ time:
   ovs-pki.8
   ovs-sim.1
   ovs-parse-backtrace.8
+   ovs-psample.8
   ovs-tcpdump.8
   ovs-tcpundump.1
   ovs-test.8
diff --git a/Documentation/ref/ovs-psample.8.rst 
b/Documentation/ref/ovs-psample.8.rst
new file mode 100644
index 0..c849c83d8
--- /dev/null
+++ b/Documentation/ref/ovs-psample.8.rst
@@ -0,0 +1,47 @@
+==
+ovs-sample


Interesting, here you call it all ovs-sample here, which is like ;)


Yes, at the begining I called it ovs-sample as I thought to make some generic tool, but 
since it ended up being very psample-specific I added the "p" (and missed this 
one).



You could add options like —local-kernel --local-userspace (--ipfix, --sflow) 
and it will eventually work on each datapath.




You mean implementing an IPFIX and sFlow collector?


?

If you want to keep this a separate psample utility, I would not have ovs in 
the name, as it's rather general and not OVS specific. Maybe just 
psample/psample_mon, as we also have nlmon.



Well, the way the cookie is decoded into observation_domain_id and 
observation_point_id is ovs-specific.

In fact, for the next iteration of the series I will remove the filtering API 
since its getting removed from the kernel series as well and add some kind of 
BPF code that does the filtering. Also I was  considering looking into the 
OVSDB to ensure that we filter on groups configured in it or else we could 
wrongly interpet a sampled packet that comes from some other subsystem.


I would prefer to have this as an ova-sample application, which can be extended 
with other sample methods as we see fit. So when we added userspace support we 
can add it here. If we find someone crazy enough to do a simple IPFIX and/or 
sFlow collector it can be added too.

So my request would be to remove the (p) from ovs-psample, and have a switch to 
select --psample (the only supported (MANDATORY) option for now).



Oh, ok. Now I undestand, thank you. We want to leave room for the userspace
implementation. Will do.


FYI, we do have sflow and netflow basic collectors in tests/test-sflow.c and
tests/test-netflow.c.

And I'm not convinced we should maintain an actual collector for any of these.
It's not OVS'es job as a project.  Like we're not shipping nlmon, we probably
shouldn't ship psample or any other collector.  But we can and should have basic
implementations for testing purposes in tests/test-psample.c for example.




Fair enough, I could move it to tests directory. After all, the original reason 
why I wrote this was testing the series. Is that OK with you as well Eelco?


___

Re: [ovs-dev] [RFC 05/11] ofproto_dpif_xlate: Use psample for OFP samples.

2024-05-13 Thread Adrian Moreno



On 5/13/24 12:32 PM, Eelco Chaudron wrote:



On 10 May 2024, at 13:15, Adrian Moreno wrote:


On 5/10/24 12:06 PM, Eelco Chaudron wrote:

On 24 Apr 2024, at 21:53, Adrian Moreno wrote:


When a OFP_SAMPLE action is xlated and a dpif_psample object has been
configured (via Flow_Sample_Collector_Set table) with the same
collector_set_id, add psample information to the odp sample action.


See comments below.

//Eelco





-compose_sample_action(ctx, probability, &cookie, tunnel_out_port, false);
+static void
+xlate_sample_action(struct xlate_ctx *ctx,
+const struct ofpact_sample *os)
+{
+struct dpif_psample *psample = ctx->xbridge->psample;
+struct dpif_ipfix *ipfix = ctx->xbridge->ipfix;
+struct user_action_cookie *upcall_cookie = NULL;
+struct psample_args *psample_args = NULL;
+odp_port_t tunnel_out_port = ODPP_NONE;
+uint32_t group_id;
+
+if (!ipfix && !psample) {
+return;
+}
+
+/* Scale the probability from 16-bit to 32-bit while representing
+ * the same percentage. */
+uint32_t probability =
+((uint32_t) os->probability << 16) | os->probability;
+
+if (ipfix) {
+upcall_cookie = xzalloc(sizeof *upcall_cookie);
+xlate_fill_ipfix_sample(ctx, os, ipfix, upcall_cookie,
+&tunnel_out_port);
+}
+if (psample) {
+if (dpif_psample_get_group_id(psample,


See earlier lock contention concern on taking the mutex() for parallel upcalls.


+  os->collector_set_id,
+  &group_id)) {
+psample_args = xzalloc(sizeof *psample_args);
+ofpbuf_init(&psample_args->cookie, OVS_PSAMPLE_COOKIE_MAX_SIZE);
+
+psample_args->group_id = group_id;
+ofpbuf_put(&psample_args->cookie, &os->obs_domain_id,
+   sizeof(os->obs_domain_id));
+ofpbuf_put(&psample_args->cookie, &os->obs_point_id,
+   sizeof(os->obs_point_id));
+}
+}
+
+compose_sample_action(ctx, probability, upcall_cookie, psample_args,
+  tunnel_out_port, false);
+
+if (upcall_cookie) {
+free(upcall_cookie);
+}
+if (psample_args) {
+ofpbuf_uninit(&psample_args->cookie);
+free(psample_args);
+}


Can we avoid these small memory allocations?


Probably. I'll try to rework it. Two of the arguments to compose_sample_action 
are now optional so having a pointer to the stack will not work unless we add 
flags to indicate whether they are valid or not. But maybe we can rework it by 
having something like the follwoing to avoid clogging the argument list;

struct sample_args {
struct user_action_cookie *upcall;
struct psample_args *psample;
bool has_psample;
bool has_upcall;
};


Don’t think we need a structure, we can use your existing bools. Something like 
the below seems simple enough:



Ok, yes that's a good alternative. I'll add it. Thanks



diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index a9856e358..42100f15b 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -5958,8 +5958,8 @@ xlate_sample_action(struct xlate_ctx *ctx,
  {
  struct dpif_psample *psample = ctx->xbridge->psample;
  struct dpif_ipfix *ipfix = ctx->xbridge->ipfix;
-struct user_action_cookie *upcall_cookie = NULL;
-struct psample_args *psample_args = NULL;
+struct user_action_cookie upcall_cookie;
+struct psample_args psample_args;
  odp_port_t tunnel_out_port = ODPP_NONE;
  uint32_t group_id;

@@ -5973,21 +5973,20 @@ xlate_sample_action(struct xlate_ctx *ctx,
  ((uint32_t) os->probability << 16) | os->probability;

  if (ipfix) {
-upcall_cookie = xzalloc(sizeof *upcall_cookie);
-xlate_fill_ipfix_sample(ctx, os, ipfix, upcall_cookie,
+xlate_fill_ipfix_sample(ctx, os, ipfix, &upcall_cookie,
  &tunnel_out_port);
  }
  if (psample) {
  if (dpif_psample_get_group_id(psample,
os->collector_set_id,
&group_id)) {
-psample_args = xzalloc(sizeof *psample_args);
-ofpbuf_init(&psample_args->cookie, OVS_PSAMPLE_COOKIE_MAX_SIZE);

-psample_args->group_id = group_id;
-ofpbuf_put(&psample_args->cookie, &os->obs_domain_id,
+ofpbuf_init(&psample_args.cookie, OVS_PSAMPLE_COOKIE_MAX_SIZE);
+
+psample_args.group_id = group_id;
+ofpbuf_put(&psample_args.cookie, &os->obs_domain_id,
 sizeof(os->obs_domain_id));
-ofpbuf_put(&psample_args->cookie, &os->obs_point_id,
+ofpbuf_put(&psample_args.cookie, &os->obs_point_id,
 sizeof(os->obs_point_id));

  if (ctx->xin->xcache) {
@@ -5997,19 +5996,15 

Re: [ovs-dev] [PATCH ovn] Do not reply on unicast arps for targets.

2024-05-13 Thread Dumitru Ceara
On 5/4/24 11:38, Vasyl Saienko wrote:
> Hello Numan
> 
> Thanks for review and feedback.
> 
> On Fri, May 3, 2024 at 7:14 PM Numan Siddique  wrote:
> 
>> On Mon, Apr 22, 2024 at 2:54 AM Vasyl Saienko 
>> wrote:
>>>
>>> Reply only if target ethernet address is broadcast, if
>>> address is specified explicitly do noting to let target
>>> reply by itself. This technique allows to monitor target
>>> aliveness with arping.
>>
>> Thanks for the patch.
>>
>> This patch does change the behavior of OVN.  Having ARP responder
>> flows avoids tunnelling the request packet if the destination is on a
>> different compute node.
>> But I don't think its a big deal.
>>
>> You are totally correct that the ARP responder allows limiting ARP
> broadcast traffic between nodes. Normally ARP requests are sent to
> broadcast ff:ff:ff:ff:ff:ff ethernet, as the protocol is designed to
> identify ethernet addresses for known IP addresses. In this case since
> traffic is broadcast it will be flooded among all nodes if ARP responder is
> not applied. At the same time the client may specify the target
> ethernet address as unicast (in this case a broadcast storm will not
> happen, as the destination ethernet address is unicast).
> 
> In OpenStack we use unicast ARP requests as a way to monitor VM aliveness
> from remote locations to make sure there are no issues with
> flows/underlying networking infra between nodes. We use ARPs due to several
> reasons:
> 1. This protocol is mandatory and can't be disabled on the target machine
> (which guarantees that the target VM will always reply to ARPs if it is
> alive)
> 2. This protocol is not filtered by firewalls/security groups as it is
> mandatory for network workability
> 3. We can't use upper layers protocols such as ICMP as they will require
> specific firewall rules inside VMs, and we do not control VMs in cloud
> cases, but still need to monitor infrastructure aliveness.
> 
> If ARP responder replies to requests with broadcast and unicast target
> ethernet address, we can't use this technique for monitoring, as even if
> target VM is down (not necessary that ovn port is down, the VM may stuck or
> be in kernel panic for example, or datapath between monitoring tool and VM
> is broken) the ARP responder will do reply to our request so we can't
> identify if VM is really accessible or not as we will always got replies
> from local ARP responder. At the same time there is no need to set up ARP
> responder for requests with a unicast ethernet address, as they will not
> generate broadcast storms by design (they are unicasts).
> 

Right, this sounds acceptable to me too.

> 
>> 1.  Accept your patch
>> 2.  Use the NB Global option - "ignore_lsp_down" and set it to false,
>> so that ovn-northd will install the ARP responder flows only if the
>> logical port is up.
>>
> This option will not allow catching a situation when there is an issue in
> datapath between monitoring software and VM located on another node, for
> example  issues with flows or some underlying networking problems.
> 
>  Have you considered this approach for your use case ?
>> 3.  Add another global option - "disable_arp_responder" which when set
>> to true, ovn-northd will not install ARP responder flows.
>>
>> Disabling ARP responder for ports completely will allow broadcast storms
> for ARP requests. Which we would like to avoid, but enable the possibility
> to monitor infrastructure aliveness.
> 
> 
>> Given that OpenStack neutron ml2ovs uses ARP responder flows only for
>> broad cast my preference would be (1).
>>
>> I'd like to know other's opinion on this.
>>

I'd accept the patch too so I'd go for (1).

Regards,
Dumitru

>> Thanks
>> Numan
>>
>>>
>>> Closes  #239
>>>
>>> Signed-off-by: Vasyl Saienko 
>>> ---
>>>  northd/northd.c | 11 +--
>>>  tests/ovn-northd.at |  4 ++--
>>>  2 files changed, 11 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/northd/northd.c b/northd/northd.c
>>> index 37f443e70..e80e1885d 100644
>>> --- a/northd/northd.c
>>> +++ b/northd/northd.c
>>> @@ -8832,8 +8832,15 @@ build_lswitch_arp_nd_responder_known_ips(struct
>> ovn_port *op,
>>>  for (size_t i = 0; i < op->n_lsp_addrs; i++) {
>>>  for (size_t j = 0; j < op->lsp_addrs[i].n_ipv4_addrs; j++) {
>>>  ds_clear(match);
>>> -ds_put_format(match, "arp.tpa == %s && arp.op == 1",
>>> -op->lsp_addrs[i].ipv4_addrs[j].addr_s);
>>> +/* NOTE(vsaienko): Do not reply on unicast ARPs, forward
>>> + * them to the target to have ability to monitor target
>>> + * aliveness via ARPs.
>>> +*/
>>> +ds_put_format(match,
>>> +"arp.tpa == %s && "
>>> +"arp.op == 1 && "
>>> +"eth.dst == ff:ff:ff:ff:ff:ff",
>>> +op->lsp_addrs[i].ipv4_addrs[j].addr_s);
>>>  ds_clear(actions);
>>> 

Re: [ovs-dev] [PATCH v2 2/2] conntrack: Key connections by zone.

2024-05-13 Thread Paolo Valerio
Hi Peng,

Peng He  writes:

> To seperate into N cmaps, why not use hash value divided by N?
>

FWIW, I think it makes sense to discuss the potential benefits of other
approaches as well.
They may even end up not being as performant as this one, but also some
points to consider here are:

- the number of zones used in the common case (or even for the specific
  use case as the expectation is that the fewer the zones involved, the
  smaller the benefit)
- given flush per zone is where most of the gain is, the flush per zone
  for the use case

As a last remark, partitioning per zone also implies a substantial
design change that may potentially result in contrast with other
approaches targeting the overall performance (e.g., [0] is a quick
example that comes to mind with good scalability improvements in cps,
and probably, but this is just a guess, measurable improvements in the
same ct execute test).

[0] 
https://patchwork.ozlabs.org/project/openvswitch/patch/165668250987.1967719.7371616138630033269.st...@fed.void/

> Simon Horman  于2024年5月1日周三 19:06写道:
>
>> On Wed, Apr 24, 2024 at 02:44:54PM +0200, Felix Huettner via dev wrote:
>> > Currently conntrack uses a single large cmap for all connections stored.
>> > This cmap contains all connections for all conntrack zones which are
>> > completely separate from each other. By separating each zone to its own
>> > cmap we can significantly optimize the performance when using multiple
>> > zones.
>> >
>> > The change fixes a similar issue as [1] where slow conntrack zone flush
>> > operations significantly slow down OVN router failover. The difference is
>> > just that this fix is used whith dpdk, while [1] was when using the ovs
>> > kernel module.
>> >
>> > As we now need to store more cmap's the memory usage of struct conntrack
>> > increases by 524280 bytes. Additionally we need 65535 cmaps with 128
>> > bytes each. This leads to a total memory increase of around 10MB.
>> >
>> > Running "./ovstest test-conntrack benchmark 4 33554432 32 1" shows no
>> > real difference in the multithreading behaviour against a single zone.
>> >
>> > Running the new "./ovstest test-conntrack benchmark-zones" show
>> > significant speedups as shown below. The values for "ct execute" are for
>> > acting on the complete zone with all its entries in total (so in the
>> > first case adding 10,000 new conntrack entries). All tests are run 1000
>> > times.
>> >
>> > When running with 1,000 zones with 10,000 entries each we see the
>> > following results (all in microseconds):
>> > "./ovstest test-conntrack benchmark-zones 1 1000 1000"
>> >
>> >  +--++-+-+
>> >  |  Min |   Max  |  95%ile |   Avg   |
>> > ++--++-+-+
>> > | ct execute (commit)|  || | |
>> > |with commit | 2266 |   3505 | 2707.06 | 2592.06 |
>> > | without commit | 2411 |  12730 | 4432.50 | 2736.78 |
>> > ++--++-+-+
>> > | ct execute (no commit) |  || | |
>> > |with commit |  699 |   1238 |  886.15 |  722.67 |
>> > | without commit |  700 |   3377 | 1934.42 |  803.53 |
>> > ++--++-+-+
>> > | flush full zone|  || | |
>> > |with commit |  619 |   1122 |  901.36 |  679.15 |
>> > | without commit |  618 | 105078 |   64591 | 2886.46 |
>> > ++--++-+-+
>> > | flush empty zone   |  || | |
>> > |with commit |0 |  5 |1.00 |0.64 |
>> > | without commit |   54 |  87469 |   64520 | 2172.25 |
>> > ++--++-+-+
>> >
>> > When running with 10,000 zones with 1,000 entries each we see the
>> > following results (all in microseconds):
>> > "./ovstest test-conntrack benchmark-zones 1000 1 1000"
>> >
>> >  +--++-+-+
>> >  |  Min |   Max  |  95%ile |   Avg   |
>> > ++--++-+-+
>> > | ct execute (commit)|  || | |
>> > |with commit |  215 |287 |  231.88 |  222.30 |
>> > | without commit |  214 |   1692 |  569.18 |  285.83 |
>> > ++--++-+-+
>> > | ct execute (no commit) |  || | |
>> > |with commit |   68 | 97 |   74.69 |   70.09 |
>> > | without commit |   68 |300 |  158.40 |   82.06 |
>> > ++--++-+-+
>> > | flush full zone|  || | |
>> > |with commit |   47 |211 |   56.34 |   50.34 |
>> > | withou

Re: [ovs-dev] [RFC 06/11] utilities: Add ovs-psample.

2024-05-13 Thread Eelco Chaudron


On 13 May 2024, at 14:50, Adrian Moreno wrote:

> On 5/13/24 2:48 PM, Ilya Maximets wrote:
>> On 5/13/24 13:10, Adrian Moreno wrote:
>>>
>>>
>>> On 5/13/24 12:44 PM, Eelco Chaudron wrote:


 On 13 May 2024, at 9:01, Adrian Moreno wrote:

> On 5/10/24 12:06 PM, Eelco Chaudron wrote:
>> On 24 Apr 2024, at 21:53, Adrian Moreno wrote:
>>
>>> This simple program reads from psample and prints the packets to stdout.
>>> It's useful for quickly collecting sampled packets.
>>
>> See some comments below, did not review the actual sample application in 
>> detail.
>>
>> //Eelco
>>
>>> Signed-off-by: Adrian Moreno 
>>> ---
>>> Documentation/automake.mk   |   1 +
>>> Documentation/conf.py   |   2 +
>>> Documentation/ref/index.rst |   1 +
>>> Documentation/ref/ovs-psample.8.rst |  47 
>>> include/linux/automake.mk   |   1 +
>>> include/linux/psample.h |  64 ++
>>> rhel/openvswitch-fedora.spec.in |   2 +
>>> rhel/openvswitch.spec.in|   1 +
>>> utilities/automake.mk   |   9 +
>>> utilities/ovs-psample.c | 330 
>>> 
>>> 10 files changed, 458 insertions(+)
>>> create mode 100644 Documentation/ref/ovs-psample.8.rst
>>> create mode 100644 include/linux/psample.h
>>> create mode 100644 utilities/ovs-psample.c
>>>
>>> diff --git a/Documentation/automake.mk b/Documentation/automake.mk
>>> index 47d2e336a..c22facfd6 100644
>>> --- a/Documentation/automake.mk
>>> +++ b/Documentation/automake.mk
>>> @@ -165,6 +165,7 @@ RST_MANPAGES = \
>>> ovs-l3ping.8.rst \
>>> ovs-parse-backtrace.8.rst \
>>> ovs-pki.8.rst \
>>> +   ovs-psample.8.rst \
>>> ovs-tcpdump.8.rst \
>>> ovs-tcpundump.1.rst \
>>> ovs-test.8.rst \
>>> diff --git a/Documentation/conf.py b/Documentation/conf.py
>>> index 15785605a..75efed2fc 100644
>>> --- a/Documentation/conf.py
>>> +++ b/Documentation/conf.py
>>> @@ -134,6 +134,8 @@ _man_pages = [
>>>  u'convert "tcpdump -xx" output to hex strings'),
>>> ('ovs-test.8',
>>>  u'Check Linux drivers for performance, vlan and L3 tunneling 
>>> problems'),
>>> +('ovs-psample.8',
>>> + u'Print packets sampled by psample'),
>>> ('ovs-vlan-test.8',
>>>  u'Check Linux drivers for problems with vlan traffic'),
>>> ('ovsdb-server.7',
>>> diff --git a/Documentation/ref/index.rst b/Documentation/ref/index.rst
>>> index 03ada932f..d1076435a 100644
>>> --- a/Documentation/ref/index.rst
>>> +++ b/Documentation/ref/index.rst
>>> @@ -46,6 +46,7 @@ time:
>>>ovs-pki.8
>>>ovs-sim.1
>>>ovs-parse-backtrace.8
>>> +   ovs-psample.8
>>>ovs-tcpdump.8
>>>ovs-tcpundump.1
>>>ovs-test.8
>>> diff --git a/Documentation/ref/ovs-psample.8.rst 
>>> b/Documentation/ref/ovs-psample.8.rst
>>> new file mode 100644
>>> index 0..c849c83d8
>>> --- /dev/null
>>> +++ b/Documentation/ref/ovs-psample.8.rst
>>> @@ -0,0 +1,47 @@
>>> +==
>>> +ovs-sample
>>
>> Interesting, here you call it all ovs-sample here, which is like ;)
>
> Yes, at the begining I called it ovs-sample as I thought to make some 
> generic tool, but since it ended up being very psample-specific I added 
> the "p" (and missed this one).
>
>>
>> You could add options like —local-kernel --local-userspace (--ipfix, 
>> --sflow) and it will eventually work on each datapath.

>
> You mean implementing an IPFIX and sFlow collector?
>
 ?
>> If you want to keep this a separate psample utility, I would not have 
>> ovs in the name, as it's rather general and not OVS specific. Maybe just 
>> psample/psample_mon, as we also have nlmon.
>>
>
> Well, the way the cookie is decoded into observation_domain_id and 
> observation_point_id is ovs-specific.
>
> In fact, for the next iteration of the series I will remove the filtering 
> API since its getting removed from the kernel series as well and add some 
> kind of BPF code that does the filtering. Also I was  considering looking 
> into the OVSDB to ensure that we filter on groups configured in it or 
> else we could wrongly interpet a sampled packet that comes from some 
> other subsystem.

 I would prefer to have this as an ova-sample application, which can be 
 extended with other sample methods as we see fit. So when we added 
 userspace support we can add it here. If we find someone crazy enough to 
 do a simple IPFIX and/or sFlow collector it can be added too.

>>

Re: [ovs-dev] [PATCH v2] utilities: Correct deletion reason in flow_reval_monitor.py.

2024-05-13 Thread Ilya Maximets
On 5/8/24 11:19, Eelco Chaudron wrote:
> The flow_reval_monitor.py script incorrectly reported the reasons for
> FDR_PURGE and FDR_TOO_EXPENSIVE, as their descriptions were swapped.
> This patch rectifies the order using a dictionary to avoid similar
> problems in the future.
> 
> In addition this patch also syncs the delete reason output of the
> script, with the comments in the code.
> 
> Fixes: 86b9e653ef22 ("revalidator: Add a USDT probe during flow deletion with 
> purge reason.")
> Signed-off-by: Eelco Chaudron 
> 
> ---
> v2: - Converted the list of strings to dictionary.
> - Added comment to code to keep code and script in sync.
> - Unified flow_delete reason comments and script output.
> ---
>  ofproto/ofproto-dpif-upcall.c| 25 ---
>  utilities/usdt-scripts/flow_reval_monitor.py | 32 ++--
>  2 files changed, 30 insertions(+), 27 deletions(-)
> 
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index 73901b651..e4d348985 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -1,3 +1,4 @@
> +
>  /* Copyright (c) 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016 Nicira, Inc.
>   *
>   * Licensed under the Apache License, Version 2.0 (the "License");
> @@ -270,18 +271,20 @@ enum ukey_state {
>  };
>  #define N_UKEY_STATES (UKEY_DELETED + 1)
>  
> +/* Ukey delete reasons used by USDT probes.  Please keep in sync with the
> + * definition in utilities/usdt-scripts/flow_reval_monitor.py.  */
>  enum flow_del_reason {
> -FDR_NONE = 0,   /* No deletion reason for the flow. */
> -FDR_AVOID_CACHING,  /* Flow deleted to avoid caching. */
> -FDR_BAD_ODP_FIT,/* The flow had a bad ODP flow fit. */
> -FDR_FLOW_IDLE,  /* The flow went unused and was deleted. */
> -FDR_FLOW_LIMIT, /* All flows being killed. */
> -FDR_FLOW_WILDCARDED,/* The flow needed a narrower wildcard mask. */
> -FDR_NO_OFPROTO, /* The flow didn't have an associated ofproto. */
> -FDR_PURGE,  /* User action caused flows to be killed. */
> -FDR_TOO_EXPENSIVE,  /* The flow was too expensive to revalidate. */
> -FDR_UPDATE_FAIL,/* Flow state transition was unexpected. */
> -FDR_XLATION_ERROR,  /* There was an error translating the flow. */
> +FDR_NONE = 0,   /* No delete reason specified. */
> +FDR_AVOID_CACHING,  /* Cache avoidance flag set. */
> +FDR_BAD_ODP_FIT,/* Bad ODP flow fit. */
> +FDR_FLOW_IDLE,  /* Flow idle timeout. */
> +FDR_FLOW_LIMIT, /* Kill all flows condition reached. */
> +FDR_FLOW_WILDCARDED,/* Flow needs a narrower wildcard mask. */
> +FDR_NO_OFPROTO, /* Flow lacks ofproto dpif association. */
> +FDR_PURGE,  /* User requested flow deletion. */
> +FDR_TOO_EXPENSIVE,  /* Too expensive to revalidate. */
> +FDR_UPDATE_FAIL,/* Datapath update failed. */
> +FDR_XLATION_ERROR,  /* Flow translation error. */
>  };
>  
>  /* 'udpif_key's are responsible for tracking the little bit of state udpif
> diff --git a/utilities/usdt-scripts/flow_reval_monitor.py 
> b/utilities/usdt-scripts/flow_reval_monitor.py
> index 534ba8fa2..bc3a28443 100755
> --- a/utilities/usdt-scripts/flow_reval_monitor.py
> +++ b/utilities/usdt-scripts/flow_reval_monitor.py
> @@ -254,19 +254,19 @@ FdrReasons = IntEnum(
>  start=0,
>  )
>  
> -FdrReasonStrings = [
> -"No deletion reason",
> -"Cache avoidance flag set",
> -"Bad ODP flow fit",
> -"Idle flow timed out",
> -"Kill all flows condition detected",
> -"Mask too wide - need narrower match",
> -"No matching ofproto rules",
> -"Too expensive to revalidate",
> -"Purged with user action",
> -"Flow state inconsistent after updates",
> -"Flow translation error",
> -]

Should we also have a comment here describing from where these are
coming from?

> +FdrReasonStrings = {
> +FdrReasons.FDR_NONE: "No delete reason specified",
> +FdrReasons.FDR_AVOID_CACHING: "Cache avoidance flag set",
> +FdrReasons.FDR_BAD_ODP_FIT: "Bad ODP flow fit",
> +FdrReasons.FDR_FLOW_IDLE: "Flow idle timeout",
> +FdrReasons.FDR_FLOW_LIMIT: "Kill all flows condition reached",
> +FdrReasons.FDR_FLOW_WILDCARDED: "Flow needs a narrower wildcard mask",
> +FdrReasons.FDR_NO_OFPROTO: "Flow lacks ofproto dpif association",

Can we rename this one into "Bridge not found" maybe?

'ofproto' and 'dpif' are not user-friendly words.  'ofproto' is an entirely
internal concept.

> +FdrReasons.FDR_PURGE: "User requested flow deletion",
> +FdrReasons.FDR_TOO_EXPENSIVE: "Too expensive to revalidate",
> +FdrReasons.FDR_UPDATE_FAIL: "Datapath update failed",
> +FdrReasons.FDR_XLATION_ERROR: "Flow translation error"
> +}
>  
>  
>  def err(msg, code=-1):
> @@ -572,10 +572,10 @@ def print_expiration(event):
>  """Prints a UFID ev

Re: [ovs-dev] [RFC 01/11] odp-util: Add support to new psample uAPI.

2024-05-13 Thread Adrian Moreno



On 5/10/24 3:06 PM, Ilya Maximets wrote:

On 5/10/24 14:01, Eelco Chaudron wrote:



On 10 May 2024, at 12:45, Adrian Moreno wrote:


On 5/10/24 12:06 PM, Eelco Chaudron wrote:

On 24 Apr 2024, at 21:53, Adrian Moreno wrote:


The new odp sample attributes allow userspace to specify a group_id and
user-defined cookie to be passed down to psample.

Add support for parsing and formatting such action.

Signed-off-by: Adrian Moreno 


Hi Adrian,

See my comments below inline.

//Eelco


---
   include/linux/openvswitch.h  |  49 +---
   lib/odp-execute.c|   3 +
   lib/odp-util.c   | 150 ++-
   ofproto/ofproto-dpif-ipfix.c |   2 +
   python/ovs/flow/odp.py   |   2 +
   tests/odp.at |   5 ++
   6 files changed, 163 insertions(+), 48 deletions(-)

diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
index d9fb991ef..3e748405c 100644
--- a/include/linux/openvswitch.h
+++ b/include/linux/openvswitch.h
@@ -696,6 +696,7 @@ enum ovs_flow_attr {
   #define OVS_UFID_F_OMIT_MASK (1 << 1)
   #define OVS_UFID_F_OMIT_ACTIONS  (1 << 2)

+#define OVS_PSAMPLE_COOKIE_MAX_SIZE 16
   /**
* enum ovs_sample_attr - Attributes for %OVS_ACTION_ATTR_SAMPLE action.
* @OVS_SAMPLE_ATTR_PROBABILITY: 32-bit fraction of packets to sample with
@@ -703,15 +704,27 @@ enum ovs_flow_attr {
* %UINT32_MAX samples all packets and intermediate values sample 
intermediate
* fractions of packets.
* @OVS_SAMPLE_ATTR_ACTIONS: Set of actions to execute in sampling event.
- * Actions are passed as nested attributes.
+ * Actions are passed as nested attributes. Optional if
+ * OVS_SAMPLE_ATTR_PSAMPLE_GROUP is set.
+ * @OVS_SAMPLE_ATTR_PSAMPLE_GROUP: A 32-bit number to be used as psample group.
+ * Optional if OVS_SAMPLE_ATTR_ACTIONS is set.
+ * @OVS_SAMPLE_ATTR_PSAMPLE_COOKIE: A variable-length binary cookie that, if
+ * provided, will be copied to the psample cookie.


Should we mention any limit on the actual length of the cookie?

Any reason we explicitly call this psample? From an OVS perspective, this could 
just be 
SAMPLE_ATTR_FORWARD/OFFLOAD/DESTINATION/ENDPOINT/COLLECTOR/TARGET_COOKIE and 
_GROUP. Other datapaths, do not have PSAMPLE.



See my response to your comment on the cover-letter for possible name 
alternatives.


ACK, we can continue the discussion there.


Thinking about it more, from an OVS perspective we should probably not even 
send down a COOKIE, but we should send down an obs_domain_id and obs_point_id, 
and then the kernel can move this into a cookie.



I did consider this. However, the opaque cookie fits well with the tc API which does not 
have two 32-bit values but an opaque 128-bit cookie. So if the action is offloaded OVS 
needs to "encode" the two 32-bit values into the cookie anyways.


Which is fine to me, the OVS API should be clear that we have two u32 entries. 
The cookie is implementation-specific, and we use the netlink format as our API 
for all DPs.



I'm not sure about this.  It is a kernel interface and we can't deliver
two separate values from the psample.  So, passing two separate values
to the kernel doesn't make a lot of sense.  They are going to be mangled
into a single cookie anyway and we'll have to document that somehow.
We may as well always mangle the data from the beginning, document once
at the top level and save ourselves from documenting a million differences
between different implementations.  While USDT could report them separately,
I'm not sure there is a ton of value in that.




The collector itself could be called system/local collector, or something like 
that. This way it can use for example psample for kernel and UDST probes for 
usespace. Both can pass the group and cookie (obs_domain_id and obs_point_id).

Also, the presence of any of this should not dictate the psample action, we 
probably need a specific OVS_SAMPLE_ATTR_SYSTEM_TARGET type nl flag.



It clearer and it might simplify option verification a bit, but isn't a bit 
redundant? There is no flag for action execution for instance, the presence of 
the attribute is enough.

An alternative would be to have nested attributes, e.g:

enum ovs_sample_attr {
OVS_SAMPLE_ATTR_UNSPEC,
OVS_SAMPLE_ATTR_PROBABILITY, /* u32 number */
OVS_SAMPLE_ATTR_ACTIONS, /* Nested OVS_ACTION_ATTR_* attributes. */
OVS_SAMPLE_ATTR_SYSTEM, /* Nested OVS_SAMPLE_SYSTEM_ATTR_* 
attributes. */

__OVS_SAMPLE_ATTR_MAX,
};

enum ovs_sample_system_attr {
OVS_SAMPLE_SYSTEM_ATTR_OBS_DOMAIN,  /* u32 number */
OVS_SAMPLE_SYSTEM_ATTR_OBS_POINT, /* u32 number */
OVS_SAMPLE_SYSTEM_ATTR_COOKIE, /* Nested OVS_ACTION_ATTR_*

__OVS_SSAMPLE_SYSTEM_ATTR_MAX,
};

WDYT?


So I would envision your delta to look something like this:

   enum ovs_sample_attr {
OVS_SAMPLE_ATTR_UNSPEC,
-   OVS_SAMPLE_ATTR_PROBABILITY, /* u32 number */
-   OVS_SAM

Re: [ovs-dev] [RFC 08/11] netdev-offload-tc: Add sample support.

2024-05-13 Thread Adrian Moreno



On 5/13/24 2:38 PM, Adrian Moreno wrote:



On 5/13/24 1:32 PM, Eelco Chaudron wrote:



On 13 May 2024, at 10:44, Adrian Moreno wrote:


On 5/10/24 12:06 PM, Eelco Chaudron wrote:

On 24 Apr 2024, at 21:53, Adrian Moreno wrote:


Offload the sample action if it contains psample information by creating
a tc "sample" action with the user cookie inside the action's cookie.

Avoid using the "sample" action's cookie to store the ufid.


I have some concerns about the sample action-only solution. What happened to 
the idea of adding an additional cookie to the TC match? Can we add a test 
case for the explicit drop?





I guess we can ask the kernel community. However, I'm not 100% convinced it 
makes a lot of sense form a kernel perspective. There is already a cookie in 
every action so there is plenty of space to store user data. Kernel does not 
enforce any semantics on the cookies, it's up to userspace to interpret them 
at will. Also, you cannot have a flow without an action IIUC.
So having a new cookie added just because it makes OVS code sligthly cleaner 
doesn't seem to be a super strong argument.


ACK, but I thought this was part of the earlier discussion, so it should be 
explored. I guess we can make the opposite argument, why a per-action cookie 
if we could have a per-flow one? Having a match cookie will free the action 
cookie for action related metadata.




I did explore, theoretically, the idea. Adding TCA_COOKIE seemed to me like a 
significant effort. It would probably mean adding it as a top-level filter 
attribute (common to all filters, not only flower), handling concurrent access 
and modification, adding support to iproute tooling, making sure it's carried 
over through hw-offloading, etc.


My bulk estimation is that this, itself, is way bigger than the psample feature 
I'm trying to add.


So, with all, I just considered the benefits would not compensate the effort of 
creating a new filter attribute. considering:

1) AFAIKS, we have a way to identify flows without the need for cookies at all
2) We could very well use some other action (e.g: what this patch does)
3) I felt chances of it even being accepted are not supper high



Regarding tc <-> ufid correlation. I'm a bit confused by the fact that flow 
replacement seems to work on a tcf_id basis, meaning that each tc flow has 
it's own unique tcf_id (or we would replace some other flow). Also, there is 
a hmap mapping tcf_id to ufids already. Am I missing something?


You are right, I guess this exists to support kernels where we did not have an 
actions cookie, so we use find_ufid() in the flow dump. So in theory if there 
is no action with a flow cookie it will all still work.




So, could we just rely on this mechanism which is also backwards compatible and 
just use the action cookie for action information?



In general, I think we should apply the sflow patch before this series.



Agree. Are we all OK with the compatibility issues that will introduce?


//Eelco


Signed-off-by: Adrian Moreno 
---
   include/linux/automake.mk    |  5 +-
   include/linux/pkt_cls.h  |  2 +
   include/linux/tc_act/tc_sample.h | 26 ++
   lib/netdev-offload-tc.c  | 67 +
   lib/tc.c | 84 ++--
   lib/tc.h |  7 +++
   utilities/ovs-psample.c  |  8 +--
   7 files changed, 190 insertions(+), 9 deletions(-)
   create mode 100644 include/linux/tc_act/tc_sample.h

diff --git a/include/linux/automake.mk b/include/linux/automake.mk
index ac306b53c..c2a270152 100644
--- a/include/linux/automake.mk
+++ b/include/linux/automake.mk
@@ -5,9 +5,10 @@ noinst_HEADERS += \
   include/linux/pkt_cls.h \
   include/linux/psample.h \
   include/linux/gen_stats.h \
+    include/linux/tc_act/tc_ct.h \
   include/linux/tc_act/tc_mpls.h \
   include/linux/tc_act/tc_pedit.h \
+    include/linux/tc_act/tc_sample.h \
   include/linux/tc_act/tc_skbedit.h \
   include/linux/tc_act/tc_tunnel_key.h \
-    include/linux/tc_act/tc_vlan.h \
-    include/linux/tc_act/tc_ct.h
+    include/linux/tc_act/tc_vlan.h
diff --git a/include/linux/pkt_cls.h b/include/linux/pkt_cls.h
index fb4a7ecea..c566ac1b5 100644
--- a/include/linux/pkt_cls.h
+++ b/include/linux/pkt_cls.h
@@ -8,6 +8,8 @@
   #include 
   #include 

+#define TC_COOKIE_MAX_SIZE 16
+
   /* Action attributes */
   enum {
   TCA_ACT_UNSPEC,
diff --git a/include/linux/tc_act/tc_sample.h 
b/include/linux/tc_act/tc_sample.h

new file mode 100644
index 0..398f32761
--- /dev/null
+++ b/include/linux/tc_act/tc_sample.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef __LINUX_TC_SAMPLE_H
+#define __LINUX_TC_SAMPLE_H
+
+#include 
+#include 
+#include 
+
+struct tc_sample {
+    tc_gen;
+};
+
+enum {
+    TCA_SAMPLE_UNSPEC,
+    TCA_SAMPLE_TM,
+    TCA_SAMPLE_PARMS,
+    TCA_SAMPLE_RATE,
+    TCA_SAMPLE_TRUNC_SIZE,
+    TCA

Re: [ovs-dev] [PATCH] fix ct tp policy when create zone.

2024-05-13 Thread Ilya Maximets
On 5/6/24 06:05, Chandler Wu wrote:
> From 5b4d479a9083272e56c51ef9521e6ef665dd534d Mon Sep 17 00:00:00 2001
> From: chandlerwu 
> Date: Mon, 6 May 2024 11:58:21 +0800
> Subject: [PATCH] Subject:[PATCH] fix ct tp policy when create zone.
> 
> Signed-off-by: chandlerwu 
> ---
>  vswitchd/bridge.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
> index 95a65fcdc..e60359b59 100644
> --- a/vswitchd/bridge.c
> +++ b/vswitchd/bridge.c
> @@ -766,6 +766,8 @@ ct_zones_reconfigure(struct datapath *dp, struct 
> ovsrec_datapath *dp_cfg)
>  if (!ct_zone) {
>  ct_zone = ct_zone_alloc(zone_id, tp_cfg);
>  hmap_insert(&dp->ct_zones, &ct_zone->node, hash_int(zone_id, 0));
> +ofproto_ct_set_zone_timeout_policy(dp->type, ct_zone->zone_id,
> +   &ct_zone->tp);
>  }
>  
>  struct simap new_tp = SIMAP_INITIALIZER(&new_tp);

Hi, Chandler Wu.  Thanks for the patch!

But could you, please, explain what is the issue you're trying to fix?

Reading the code, it seems that update_timeout_policy() call later in
the function should correctly set the policy.  Is that not happening?

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3] route-table: Add support for v4 via v6 route.

2024-05-13 Thread William Tu via dev
Add route-table support for ipv4 dst via ipv6. One use case is BGP
unnumbered, a mechanism that establishes peering sessions without the
need to explicitly configure IPv4 addresses on the interfaces involved
in the peering. Without using IPv4 address assignments, it uses
link-local IPv6 addresses of the directly connected neighbors for
peering purposes. For example, BGP might install the following route:
$ ip route get 100.87.18.3
100.87.18.3 via inet6 fe80::920a:84ff:fe9e:9570 \
dev br-phy src 100.87.18.6

Note that the v6 addr fe80::920a:84ff:fe9e:9570 is not being used in
the packet header, but only used for lookup the out dev br-phy.
Currently OVS can only support either all-ipv4 or all-ipv6, the patch
adds support for such use case.

Reported-at: 
https://mail.openvswitch.org/pipermail/ovs-discuss/2024-January/052908.html
Acked-by: Simon Horman 
Signed-off-by: William Tu 
---
v3: add vxlan test, remove rfc
v2: fix CI error
---
 lib/ovs-router.c | 34 ++--
 lib/route-table.c| 21 ++
 tests/ovs-router.at  | 33 +++
 tests/system-route.at| 24 
 tests/tunnel-push-pop.at | 48 
 5 files changed, 142 insertions(+), 18 deletions(-)

diff --git a/lib/ovs-router.c b/lib/ovs-router.c
index 3d84c9a30a8f..3ee927e6f7de 100644
--- a/lib/ovs-router.c
+++ b/lib/ovs-router.c
@@ -415,7 +415,6 @@ ovs_router_add(struct unixctl_conn *conn, int argc,
 unsigned int plen;
 ovs_be32 src = 0;
 ovs_be32 gw = 0;
-bool is_ipv6;
 ovs_be32 ip;
 int err;
 int i;
@@ -423,9 +422,8 @@ ovs_router_add(struct unixctl_conn *conn, int argc,
 if (scan_ipv4_route(argv[1], &ip, &plen)) {
 in6_addr_set_mapped_ipv4(&ip6, ip);
 plen += 96;
-is_ipv6 = false;
 } else if (scan_ipv6_route(argv[1], &ip6, &plen)) {
-is_ipv6 = true;
+;
 } else {
 unixctl_command_reply_error(conn,
 "Invalid 'ip/plen' parameter");
@@ -438,21 +436,21 @@ ovs_router_add(struct unixctl_conn *conn, int argc,
 continue;
 }
 
-if (is_ipv6) {
-if (ovs_scan(argv[i], "src="IPV6_SCAN_FMT, src6_s) &&
-ipv6_parse(src6_s, &src6)) {
-continue;
-}
-if (ipv6_parse(argv[i], &gw6)) {
-continue;
-}
-} else {
-if (ovs_scan(argv[i], "src="IP_SCAN_FMT, IP_SCAN_ARGS(&src))) {
-continue;
-}
-if (ip_parse(argv[i], &gw)) {
-continue;
-}
+if (ovs_scan(argv[i], "src="IPV6_SCAN_FMT, src6_s) &&
+ipv6_parse(src6_s, &src6)) {
+continue;
+}
+
+if (ipv6_parse(argv[i], &gw6)) {
+continue;
+}
+
+if (ovs_scan(argv[i], "src="IP_SCAN_FMT, IP_SCAN_ARGS(&src))) {
+continue;
+}
+
+if (ip_parse(argv[i], &gw)) {
+continue;
 }
 
 unixctl_command_reply_error(conn,
diff --git a/lib/route-table.c b/lib/route-table.c
index f1fe32714e8d..58412711888f 100644
--- a/lib/route-table.c
+++ b/lib/route-table.c
@@ -232,6 +232,7 @@ route_table_parse(struct ofpbuf *buf, struct 
route_table_msg *change)
 [RTA_OIF] = { .type = NL_A_U32, .optional = true },
 [RTA_GATEWAY] = { .type = NL_A_U32, .optional = true },
 [RTA_MARK] = { .type = NL_A_U32, .optional = true },
+[RTA_VIA] = { .type = NL_A_UNSPEC, .optional = true },
 [RTA_PREFSRC] = { .type = NL_A_U32, .optional = true },
 [RTA_TABLE] = { .type = NL_A_U32, .optional = true },
 };
@@ -241,6 +242,7 @@ route_table_parse(struct ofpbuf *buf, struct 
route_table_msg *change)
 [RTA_OIF] = { .type = NL_A_U32, .optional = true },
 [RTA_MARK] = { .type = NL_A_U32, .optional = true },
 [RTA_GATEWAY] = { .type = NL_A_IPV6, .optional = true },
+[RTA_VIA] = { .type = NL_A_UNSPEC, .optional = true },
 [RTA_PREFSRC] = { .type = NL_A_IPV6, .optional = true },
 [RTA_TABLE] = { .type = NL_A_U32, .optional = true },
 };
@@ -333,6 +335,25 @@ route_table_parse(struct ofpbuf *buf, struct 
route_table_msg *change)
 nl_attr_get_in6_addr(attrs[RTA_PREFSRC]);
 }
 }
+if (attrs[RTA_VIA]) {
+const struct rtvia *via;
+ovs_be32 gw;
+
+via = nl_attr_get(attrs[RTA_VIA]);
+switch (via->rtvia_family) {
+case AF_INET:
+gw = *(ALIGNED_CAST(ovs_be32 *, via->rtvia_addr));
+in6_addr_set_mapped_ipv4(&change->rd.rta_gw, gw);
+break;
+case AF_INET6:
+change->rd.rta_gw = *(ALIGNED_CAST(struct in6_addr *,
+via->rtvia_addr));
+break;
+default:
+   

Re: [ovs-dev] [PATCH v3] route-table: Add support for v4 via v6 route.

2024-05-13 Thread Ilya Maximets
On 5/14/24 00:11, William Tu wrote:
> Add route-table support for ipv4 dst via ipv6. One use case is BGP
> unnumbered, a mechanism that establishes peering sessions without the
> need to explicitly configure IPv4 addresses on the interfaces involved
> in the peering. Without using IPv4 address assignments, it uses
> link-local IPv6 addresses of the directly connected neighbors for
> peering purposes. For example, BGP might install the following route:
> $ ip route get 100.87.18.3
> 100.87.18.3 via inet6 fe80::920a:84ff:fe9e:9570 \
> dev br-phy src 100.87.18.6
> 
> Note that the v6 addr fe80::920a:84ff:fe9e:9570 is not being used in
> the packet header, but only used for lookup the out dev br-phy.
> Currently OVS can only support either all-ipv4 or all-ipv6, the patch
> adds support for such use case.
> 
> Reported-at: 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2024-January/052908.html
> Acked-by: Simon Horman 
> Signed-off-by: William Tu 
> ---
> v3: add vxlan test, remove rfc
> v2: fix CI error
> ---
>  lib/ovs-router.c | 34 ++--
>  lib/route-table.c| 21 ++
>  tests/ovs-router.at  | 33 +++
>  tests/system-route.at| 24 
>  tests/tunnel-push-pop.at | 48 
>  5 files changed, 142 insertions(+), 18 deletions(-)
> 
> diff --git a/lib/ovs-router.c b/lib/ovs-router.c
> index 3d84c9a30a8f..3ee927e6f7de 100644
> --- a/lib/ovs-router.c
> +++ b/lib/ovs-router.c
> @@ -415,7 +415,6 @@ ovs_router_add(struct unixctl_conn *conn, int argc,
>  unsigned int plen;
>  ovs_be32 src = 0;
>  ovs_be32 gw = 0;
> -bool is_ipv6;
>  ovs_be32 ip;
>  int err;
>  int i;
> @@ -423,9 +422,8 @@ ovs_router_add(struct unixctl_conn *conn, int argc,
>  if (scan_ipv4_route(argv[1], &ip, &plen)) {
>  in6_addr_set_mapped_ipv4(&ip6, ip);
>  plen += 96;
> -is_ipv6 = false;
>  } else if (scan_ipv6_route(argv[1], &ip6, &plen)) {
> -is_ipv6 = true;
> +;
>  } else {
>  unixctl_command_reply_error(conn,
>  "Invalid 'ip/plen' parameter");
> @@ -438,21 +436,21 @@ ovs_router_add(struct unixctl_conn *conn, int argc,
>  continue;
>  }
>  
> -if (is_ipv6) {
> -if (ovs_scan(argv[i], "src="IPV6_SCAN_FMT, src6_s) &&
> -ipv6_parse(src6_s, &src6)) {
> -continue;
> -}
> -if (ipv6_parse(argv[i], &gw6)) {
> -continue;
> -}
> -} else {
> -if (ovs_scan(argv[i], "src="IP_SCAN_FMT, IP_SCAN_ARGS(&src))) {
> -continue;
> -}
> -if (ip_parse(argv[i], &gw)) {
> -continue;
> -}
> +if (ovs_scan(argv[i], "src="IPV6_SCAN_FMT, src6_s) &&
> +ipv6_parse(src6_s, &src6)) {
> +continue;
> +}
> +
> +if (ipv6_parse(argv[i], &gw6)) {
> +continue;
> +}
> +
> +if (ovs_scan(argv[i], "src="IP_SCAN_FMT, IP_SCAN_ARGS(&src))) {
> +continue;
> +}
> +
> +if (ip_parse(argv[i], &gw)) {
> +continue;
>  }
>  
>  unixctl_command_reply_error(conn,
> diff --git a/lib/route-table.c b/lib/route-table.c
> index f1fe32714e8d..58412711888f 100644
> --- a/lib/route-table.c
> +++ b/lib/route-table.c
> @@ -232,6 +232,7 @@ route_table_parse(struct ofpbuf *buf, struct 
> route_table_msg *change)
>  [RTA_OIF] = { .type = NL_A_U32, .optional = true },
>  [RTA_GATEWAY] = { .type = NL_A_U32, .optional = true },
>  [RTA_MARK] = { .type = NL_A_U32, .optional = true },
> +[RTA_VIA] = { .type = NL_A_UNSPEC, .optional = true },
>  [RTA_PREFSRC] = { .type = NL_A_U32, .optional = true },
>  [RTA_TABLE] = { .type = NL_A_U32, .optional = true },
>  };
> @@ -241,6 +242,7 @@ route_table_parse(struct ofpbuf *buf, struct 
> route_table_msg *change)
>  [RTA_OIF] = { .type = NL_A_U32, .optional = true },
>  [RTA_MARK] = { .type = NL_A_U32, .optional = true },
>  [RTA_GATEWAY] = { .type = NL_A_IPV6, .optional = true },
> +[RTA_VIA] = { .type = NL_A_UNSPEC, .optional = true },
>  [RTA_PREFSRC] = { .type = NL_A_IPV6, .optional = true },
>  [RTA_TABLE] = { .type = NL_A_U32, .optional = true },
>  };
> @@ -333,6 +335,25 @@ route_table_parse(struct ofpbuf *buf, struct 
> route_table_msg *change)
>  nl_attr_get_in6_addr(attrs[RTA_PREFSRC]);
>  }
>  }
> +if (attrs[RTA_VIA]) {
> +const struct rtvia *via;
> +ovs_be32 gw;
> +
> +via = nl_attr_get(attrs[RTA_VIA]);
> +switch (via->rtvia_family) {
> +case AF_INET:
> +gw = *(ALIGNED_CAST(ovs_be32 *, via->rtvia_addr));
> +in6_addr_set_ma

[ovs-dev] [PATCH] table: Fix freeing global variable.

2024-05-13 Thread Yunjian Wang via dev
From: Pengfei Sun 

In function shash_replace_nocopy, argument to free() is the address
of a global variable (argument passed by function table_print_json__),
which is not memory allocated by malloc().

ovsdb-client -f json monitor Open_vSwitch --timestamp

ASan reports:
=
==1443083==ERROR: AddressSanitizer: attempting free on address
which was not malloc()-ed: 0x00535980 in thread T0
#0 0xaf1c9eac in __interceptor_free (/usr/lib64/libasan.so.6+
0xa4eac)
#1 0x4826e4 in json_destroy_object lib/json.c:445
#2 0x4826e4 in json_destroy__ lib/json.c:403
#3 0x4cc4e4 in table_print lib/table.c:633
#4 0x410650 in monitor_print_table ovsdb/ovsdb-client.c:1019
#5 0x410650 in monitor_print ovsdb/ovsdb-client.c:1040
#6 0x4110cc in monitor_print ovsdb/ovsdb-client.c:1030
#7 0x4110cc in do_monitor__ ovsdb/ovsdb-client.c:1503
#8 0x40743c in main ovsdb/ovsdb-client.c:283
#9 0xaeb50038  (/usr/lib64/libc.so.6+0x2b038)
#10 0xaeb50110 in __libc_start_main (/usr/lib64/libc.so.6+
0x2b110)
#11 0x40906c in _start (/usr/local/bin/ovsdb-client+0x40906c)

Use xstrdup to allocate memory for argument "time".

Fixes: cb139fa8b3a1 ("table: New function table_format() for formatting a table 
as a string.")
Signed-off-by: Pengfei Sun 
---
 lib/table.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/table.c b/lib/table.c
index 48d18b6..bcc6fb0 100644
--- a/lib/table.c
+++ b/lib/table.c
@@ -523,7 +523,7 @@ table_print_json__(const struct table *table, const struct 
table_style *style,
 }
 if (table->timestamp) {
 json_object_put_nocopy(
-json, "time",
+json, xstrdup("time"),
 json_string_create_nocopy(table_format_timestamp__()));
 }
 
-- 
2.33.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev