[ovs-dev] [PATCH v3 1/1] datapath-windows : Correct the failure handling in OvsInitConntrack.

2024-06-18 Thread Wilson Peng via dev
From: Wilson Peng 

v2-v3 change: Remove the unneeded sanity check and just correct 3 failure
for function ObReferenceObjectByHandle, zoneInfo allocated failed and OvsNatInit
is failed on OvsInitConntrack.

While deploying Tanzu Kubernetes(Antrea based solution) in Broadcom customer,
Sometimes it is found that the kernel thread OvsConntrackEntryCleaner is not
Started after the Windows node is rebooted on unexpected condition.  It could
Be also observed a similar issue in local Antrea setup via 
Clean-AntreaNetwork.ps1
Which will Remove-VMSwitch and re-create it on Windows node.

After checking the local conntrack dump, OVS doesn’t remove the connection
Entries even though the time is overdue, we could find the connection entries
Created several hours ago in the dump, within a state (TIME_WAIT) that was
Supposed to be deleted earlier. At that time, the count of the existing entries
In the OVS conntrack zone is far from the up limit, the actual number is 19k.
Then we tried to flush the conntrack with CMD "ovs-dpctl.exe flush-conntrack"
And all the conntrack entries could be removed.

In this patch, it is correcting 3 possible wrong failure processing.The 1st is 
for
Failed call ObReferenceObjectByHandle and the 2nd one is when zoneInfo is 
allocated
Failed it should also stop the cleanup thread firstly. The 3rd one failure is 
the
Possible wrong return value when OvsNatInit is failed to call on 
OvsInitConntrack.

Antrea team does help do the regression test with build including the patch
And it could PASS the testing. And it is not find the Connectract not timeout
Essue with same reproducing condition.

It is good to backport the fix to main and backported until 2.17.

Signed-off-by: Wilson Peng 
---
 datapath-windows/ovsext/Conntrack-nat.c |  2 +-
 datapath-windows/ovsext/Conntrack.c | 24 
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/datapath-windows/ovsext/Conntrack-nat.c 
b/datapath-windows/ovsext/Conntrack-nat.c
index 497354ec8..a222f6aea 100644
--- a/datapath-windows/ovsext/Conntrack-nat.c
+++ b/datapath-windows/ovsext/Conntrack-nat.c
@@ -108,6 +108,7 @@ NTSTATUS OvsNatInit()
 OVS_CT_POOL_TAG);
 if (ovsUnNatTable == NULL) {
 OvsFreeMemoryWithTag(ovsNatTable, OVS_CT_POOL_TAG);
+ovsNatTable = NULL;
 return STATUS_INSUFFICIENT_RESOURCES;
 }
 
@@ -157,7 +158,6 @@ VOID OvsNatFlush(UINT16 zone)
 VOID OvsNatCleanup()
 {
 if (ovsNatTable == NULL) {
-   NdisFreeSpinLock(&ovsCtNatLock);
return;
 }
 
diff --git a/datapath-windows/ovsext/Conntrack.c 
b/datapath-windows/ovsext/Conntrack.c
index 39ba5cc10..156861d6c 100644
--- a/datapath-windows/ovsext/Conntrack.c
+++ b/datapath-windows/ovsext/Conntrack.c
@@ -94,16 +94,28 @@ OvsInitConntrack(POVS_SWITCH_CONTEXT context)
 if (status != STATUS_SUCCESS) {
 goto freeBucketLock;
 }
-
-ObReferenceObjectByHandle(threadHandle, SYNCHRONIZE, NULL, KernelMode,
-  &ctThreadCtx.threadObject, NULL);
+ctThreadCtx.exit = 0;
+status = ObReferenceObjectByHandle(threadHandle, SYNCHRONIZE, NULL, 
KernelMode,
+   &ctThreadCtx.threadObject, NULL);
 ZwClose(threadHandle);
 threadHandle = NULL;
 
+if (!NT_SUCCESS(status)) {
+ctThreadCtx.exit = 1;
+KeSetEvent(&ctThreadCtx.event, 0, FALSE);
+KeWaitForSingleObject(ctThreadCtx.threadObject, Executive,
+   KernelMode, FALSE, NULL);
+goto freeBucketLock;
+}
 zoneInfo = OvsAllocateMemoryWithTag(sizeof(OVS_CT_ZONE_INFO) *
 CT_MAX_ZONE, OVS_CT_POOL_TAG);
 if (zoneInfo == NULL) {
 status = STATUS_INSUFFICIENT_RESOURCES;
+ctThreadCtx.exit = 1;
+KeSetEvent(&ctThreadCtx.event, 0, FALSE);
+KeWaitForSingleObject(ctThreadCtx.threadObject, Executive,
+   KernelMode, FALSE, NULL);
+ObDereferenceObject(ctThreadCtx.threadObject);
 goto freeBucketLock;
 }
 
@@ -119,7 +131,7 @@ OvsInitConntrack(POVS_SWITCH_CONTEXT context)
 if (status != STATUS_SUCCESS) {
 OvsCleanupConntrack();
 }
-return STATUS_SUCCESS;
+return status;
 
 freeBucketLock:
 for (UINT32 i = 0; i < numBucketLocks; i++) {
@@ -172,6 +184,7 @@ OvsCleanupConntrack(VOID)
 NdisFreeSpinLock(&ovsCtZoneLock);
 if (zoneInfo) {
 OvsFreeMemoryWithTag(zoneInfo, OVS_CT_POOL_TAG);
+zoneInfo = NULL;
 }
 }
 
@@ -1520,6 +1533,8 @@ OvsConntrackEntryCleaner(PVOID data)
 LOCK_STATE_EX lockState;
 BOOLEAN success = TRUE;
 
+OVS_LOG_INFO("Start the OVS ConntrackEntry Cleaner system thread,"
+ " context: %p", context);
 while (success) {
 if (context->exit) {
 break;
@@ -1541,6 +1556,7 @@ OvsConntrackEntryCleaner(PVOID data)
 KeWaitForSingleObject(&context->event, Executive, KernelMode,
   

Re: [ovs-dev] [PATCH net-next v2 7/9] net: openvswitch: do not notify drops inside sample

2024-06-18 Thread Adrián Moreno
On Tue, Jun 18, 2024 at 05:44:05PM GMT, Ilya Maximets wrote:
> On 6/18/24 12:50, Adrián Moreno wrote:
> > On Tue, Jun 18, 2024 at 12:22:23PM GMT, Ilya Maximets wrote:
> >> On 6/18/24 09:00, Adrián Moreno wrote:
> >>> On Mon, Jun 17, 2024 at 02:10:37PM GMT, Ilya Maximets wrote:
>  On 6/17/24 13:55, Ilya Maximets wrote:
> > On 6/3/24 20:56, Adrian Moreno wrote:
> >> The OVS_ACTION_ATTR_SAMPLE action is, in essence,
> >> observability-oriented.
> >>
> >> Apart from some corner case in which it's used a replacement of clone()
> >> for old kernels, it's really only used for sFlow, IPFIX and now,
> >> local emit_sample.
> >>
> >> With this in mind, it doesn't make much sense to report
> >> OVS_DROP_LAST_ACTION inside sample actions.
> >>
> >> For instance, if the flow:
> >>
> >>   actions:sample(..,emit_sample(..)),2
> >>
> >> triggers a OVS_DROP_LAST_ACTION skb drop event, it would be extremely
> >> confusing for users since the packet did reach its destination.
> >>
> >> This patch makes internal action execution silently consume the skb
> >> instead of notifying a drop for this case.
> >>
> >> Unfortunately, this patch does not remove all potential sources of
> >> confusion since, if the sample action itself is the last action, e.g:
> >>
> >> actions:sample(..,emit_sample(..))
> >>
> >> we actually _should_ generate a OVS_DROP_LAST_ACTION event, but we 
> >> aren't.
> >>
> >> Sadly, this case is difficult to solve without breaking the
> >> optimization by which the skb is not cloned on last sample actions.
> >> But, given explicit drop actions are now supported, OVS can just add 
> >> one
> >> after the last sample() and rewrite the flow as:
> >>
> >> actions:sample(..,emit_sample(..)),drop
> >>
> >> Signed-off-by: Adrian Moreno 
> >> ---
> >>  net/openvswitch/actions.c | 13 +++--
> >>  1 file changed, 11 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> >> index 33f6d93ba5e4..54fc1abcff95 100644
> >> --- a/net/openvswitch/actions.c
> >> +++ b/net/openvswitch/actions.c
> >> @@ -82,6 +82,15 @@ static struct action_fifo __percpu *action_fifos;
> >>  static struct action_flow_keys __percpu *flow_keys;
> >>  static DEFINE_PER_CPU(int, exec_actions_level);
> >>
> >> +static inline void ovs_drop_skb_last_action(struct sk_buff *skb)
> >> +{
> >> +  /* Do not emit packet drops inside sample(). */
> >> +  if (OVS_CB(skb)->probability)
> >> +  consume_skb(skb);
> >> +  else
> >> +  ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
> >> +}
> >> +
> >>  /* Make a clone of the 'key', using the pre-allocated percpu 
> >> 'flow_keys'
> >>   * space. Return NULL if out of key spaces.
> >>   */
> >> @@ -1061,7 +1070,7 @@ static int sample(struct datapath *dp, struct 
> >> sk_buff *skb,
> >>if ((arg->probability != U32_MAX) &&
> >>(!arg->probability || get_random_u32() > arg->probability)) 
> >> {
> >>if (last)
> >> -  ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
> >> +  ovs_drop_skb_last_action(skb);
> 
>  Always consuming the skb at this point makes sense, since having smaple()
>  as a last action is a reasonable thing to have.  But this looks more like
>  a fix for the original drop reason patch set.
> 
> >>>
> >>> I don't think consuming the skb at this point makes sense. It was very
> >>> intentionally changed to a drop since a very common use-case for
> >>> sampling is drop-sampling, i.e: replacing an empty action list (that
> >>> triggers OVS_DROP_LAST_ACTION) with a sample(emit_sample()). Ideally,
> >>> that replacement should not have any effect on the number of
> >>> OVS_DROP_LAST_ACTION being reported as the packets are being treated in
> >>> the same way (only observed in one case).
> >>>
> >>>
> >>return 0;
> >>}
> >>
> >> @@ -1579,7 +1588,7 @@ static int do_execute_actions(struct datapath 
> >> *dp, struct sk_buff *skb,
> >>}
> >>}
> >>
> >> -  ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
> >> +  ovs_drop_skb_last_action(skb);
> >
> > I don't think I agree with this one.  If we have a sample() action with
> > a lot of different actions inside and we reached the end while the last
> > action didn't consume the skb, then we should report that.  E.g.
> > "sample(emit_sample(),push_vlan(),set(eth())),2"  should report that the
> > cloned skb was dropped.  "sample(push_vlan(),emit_sample())" should not.
> >
> >>>
> >>> What is the use case for such action list? Having an action branch
> >>> executed rand

Re: [ovs-dev] [PATCH ovn branch-23.09] northd: Remove unused ovn_datapath_is_transit_switch().

2024-06-18 Thread Numan Siddique
On Tue, Jun 18, 2024 at 3:01 PM Dumitru Ceara  wrote:
>
> Fixes: 5f0809be5657 ("Revert "northd: Don't skip transit switch LSP when 
> creating mcast groups."")
> Signed-off-by: Dumitru Ceara 

Acked-by: Numan Siddique 

I think it's better to update the commit message that it is for
branch-23.09 only. wdyt ?

Numan

> ---
>  northd/northd.c | 6 --
>  1 file changed, 6 deletions(-)
>
> diff --git a/northd/northd.c b/northd/northd.c
> index 60f692aff3..2873a9bc48 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -806,12 +806,6 @@ ovn_datapath_is_stale(const struct ovn_datapath *od)
>  return !od->nbr && !od->nbs;
>  }
>
> -static inline bool
> -ovn_datapath_is_transit_switch(const struct ovn_datapath *od)
> -{
> -return od->tunnel_key >= OVN_MIN_DP_KEY_GLOBAL;
> -}
> -
>  static struct ovn_datapath *
>  ovn_datapath_from_sbrec(const struct hmap *ls_datapaths,
>  const struct hmap *lr_datapaths,
> --
> 2.44.0
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net] selftests: openvswitch: Use bash as interpreter

2024-06-18 Thread patchwork-bot+netdevbpf
Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski :

On Mon, 17 Jun 2024 09:28:33 +0100 you wrote:
> openvswitch.sh makes use of substitutions of the form ${ns:0:1}, to
> obtain the first character of $ns. Empirically, this is works with bash
> but not dash. When run with dash these evaluate to an empty string and
> printing an error to stdout.
> 
>  # dash -c 'ns=client; echo "${ns:0:1}"' 2>error
>  # cat error
>  dash: 1: Bad substitution
>  # bash -c 'ns=client; echo "${ns:0:1}"' 2>error
>  c
>  # cat error
> 
> [...]

Here is the summary with links:
  - [net] selftests: openvswitch: Use bash as interpreter
https://git.kernel.org/netdev/net/c/e2b447c9a1bb

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn branch-23.09 v2] northd: ic: Remove unused function and fix unit test.

2024-06-18 Thread Mark Michelson

Thanks Dumitru,

Acked-by: Mark Michelson 

I merged this to branch-23.09.

On 6/18/24 15:36, Dumitru Ceara wrote:

There are differences between branches 23.09 and 24.03 (e.g., IC tests
have been consolidated in ovn-ic.at) which made the original backport
non-trivial and some changes were missed.

This commit fixes that.

Fixes: 5f0809be5657 ("Revert "northd: Don't skip transit switch LSP when creating mcast 
groups."")
Signed-off-by: Dumitru Ceara 
---
V2:
- fix unit test that was missed while backporting the fix.
---
  northd/northd.c |  6 --
  tests/ovn.at| 10 --
  2 files changed, 16 deletions(-)

diff --git a/northd/northd.c b/northd/northd.c
index 60f692aff3..2873a9bc48 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -806,12 +806,6 @@ ovn_datapath_is_stale(const struct ovn_datapath *od)
  return !od->nbr && !od->nbs;
  }
  
-static inline bool

-ovn_datapath_is_transit_switch(const struct ovn_datapath *od)
-{
-return od->tunnel_key >= OVN_MIN_DP_KEY_GLOBAL;
-}
-
  static struct ovn_datapath *
  ovn_datapath_from_sbrec(const struct hmap *ls_datapaths,
  const struct hmap *lr_datapaths,
diff --git a/tests/ovn.at b/tests/ovn.at
index 3004e4d8d4..6d72ef6a13 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -26677,20 +26677,10 @@ wait_row_count IGMP_Group 2 address=239.0.1.68
  wait_row_count IGMP_Group 2 address='"ff0a:dead:beef::1"'
  check ovn-nbctl --wait=hv sync
  
-#Validate that Multicast Group contains all registered ports for

-# specific igmp group.
-ts_dp=$(fetch_column datapath_binding _uuid external_ids:name=ts)
-ports=$(fetch_column multicast_group ports name="239.0.1.68" datapath=$ts_dp)
-check test X2 = X$(echo $ports | wc -w)
-
-
  ovn_as az2
  wait_row_count IGMP_Group 2 address=239.0.1.68
  wait_row_count IGMP_Group 2 address='"ff0a:dead:beef::1"'
  check ovn-nbctl --wait=hv sync
-ts_dp=$(fetch_column datapath_binding _uuid external_ids:name=ts)
-ports=$(fetch_column multicast_group ports name="239.0.1.68" datapath=$ts_dp)
-check test X2 = X$(echo $ports | wc -w)
  
  # Send an IP multicast packet from LSP2, it should be forwarded

  # to lsp1 and lsp3.


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v4] ofproto: Add upcall/dump-ufid-rules command to map UFIDs to OpenFlow.

2024-06-18 Thread Dumitru Ceara
It improves the debugging experience if we can easily get a list of
OpenFlow rules and groups that contribute to the creation of a datapath
flow.

The suggested workflow is:
a. dump datapath flows (along with UUIDs), this also prints the core IDs
(PMD IDs) when applicable.

  $ ovs-appctl dpctl/dump-flows -m
  flow-dump from pmd on cpu core: 7
  ufid:7460db8f..., recirc_id(0), 

b. dump related OpenFlow rules and groups:
  $ ovs-appctl upcall/dump-ufid-rules ufid:7460db8f... pmd=7
  cookie=0x12345678, table=0 
priority=100,ip,in_port=2,nw_dst=10.0.0.2,actions=resubmit(,1)
  cookie=0x0, table=1 priority=200,actions=group:1
  group_id=1,bucket=bucket_id:0,actions=ct(commit,table=2,nat(dst=20.0.0.2))
  cookie=0x0, table=2 actions=output:1

The new command only shows rules and groups attached to ukeys that are
in states UKEY_VISIBLE or UKEY_OPERATIONAL.  That should be fine as all
other ukeys should not be relevant for the use case presented above.

For ukeys that don't have an xcache populated yet, the command goes
ahead and populates one.  In theory this is creates a slight overhead as
those ukeys might not need an xcache until they see traffic (and get
revalidated) but in practice the overhead should be minimal.

This commit tries to mimic the output format of the ovs-ofctl
dump-flows/dump-groups commands.  For groups it actually uses
ofputil_group/_bucket functions for formatting.  For rules it uses
flow_stats_ds() (the original function was exported and renamed to
ofproto_rule_stats_ds()).

Signed-off-by: Dumitru Ceara 
---
V4:
- Addressed Mike's comment:
  - use predictable port IDs.
V3:
- Addressed Mike's comments:
  - use ds_put_char()
  - make the test less flaky
V2:
- Addressed Adrian's comments:
  - check return value of populate_xcache()
  - use flow_stats_ds() (renamed to ofproto_rule_stats_ds()) instead of
custom printing
  - move ukey->state check to caller
  - handle case when group bucket is not available
  - update test case to cover the point above
---
 NEWS|   3 +
 include/openvswitch/ofp-group.h |   7 ++
 lib/ofp-group.c | 110 +++-
 ofproto/ofproto-dpif-upcall.c   |  85 
 ofproto/ofproto-dpif.c  |  24 +++
 ofproto/ofproto-dpif.h  |   2 +
 ofproto/ofproto-provider.h  |   1 +
 ofproto/ofproto.c   |  85 
 tests/ofproto-dpif.at   |  47 ++
 tests/ofproto-macros.at |   9 ++-
 10 files changed, 285 insertions(+), 88 deletions(-)

diff --git a/NEWS b/NEWS
index 5ae0108d552b..1bc085f97045 100644
--- a/NEWS
+++ b/NEWS
@@ -9,6 +9,9 @@ Post-v3.3.0
  https://github.com/openvswitch/ovs.git
- DPDK:
  * OVS validated with DPDK 23.11.1.
+   - ovs-appctl:
+ * Added 'upcall/dump-ufid-rules' to output the set of OpenFlow rules and
+   groups that contributed to the creation of a specific datapath flow.
 
 
 v3.3.0 - 16 Feb 2024
diff --git a/include/openvswitch/ofp-group.h b/include/openvswitch/ofp-group.h
index cd7af0ebff9c..79fcb3a4c0d1 100644
--- a/include/openvswitch/ofp-group.h
+++ b/include/openvswitch/ofp-group.h
@@ -70,6 +70,11 @@ struct ofputil_bucket *ofputil_bucket_find(const struct 
ovs_list *,
 bool ofputil_bucket_check_duplicate_id(const struct ovs_list *);
 struct ofputil_bucket *ofputil_bucket_list_front(const struct ovs_list *);
 struct ofputil_bucket *ofputil_bucket_list_back(const struct ovs_list *);
+void ofputil_bucket_format(const struct ofputil_bucket *,
+   enum ofp11_group_type, enum ofp_version,
+   const struct ofputil_port_map *,
+   const struct ofputil_table_map *,
+   struct ds *);
 
 static inline bool
 ofputil_bucket_has_liveness(const struct ofputil_bucket *bucket)
@@ -88,6 +93,8 @@ struct ofputil_group_props {
 void ofputil_group_properties_destroy(struct ofputil_group_props *);
 void ofputil_group_properties_copy(struct ofputil_group_props *to,
const struct ofputil_group_props *from);
+void ofputil_group_properties_format(const struct ofputil_group_props *,
+ struct ds *);
 /* Protocol-independent group_mod. */
 struct ofputil_group_mod {
 uint16_t command; /* One of OFPGC15_*. */
diff --git a/lib/ofp-group.c b/lib/ofp-group.c
index 737f48047b10..0be453d15203 100644
--- a/lib/ofp-group.c
+++ b/lib/ofp-group.c
@@ -58,14 +58,16 @@ ofputil_group_from_string(const char *s, uint32_t 
*group_idp)
 return true;
 }
 
-/* Appends to 's' a string representation of the OpenFlow group ID 'group_id'.
- * Most groups' string representation is just the number, but for special
- * groups, e.g. OFPG_ALL, it is the name, e.g. "ALL". */
+/* Appends to 's' a string representation of the OpenFlow group 'group_id'.
+ * Most groups' string representation is just 'group_id=' followed by the ID,
+ * but for

[ovs-dev] [PATCH ovn branch-23.09 v2] northd: ic: Remove unused function and fix unit test.

2024-06-18 Thread Dumitru Ceara
There are differences between branches 23.09 and 24.03 (e.g., IC tests
have been consolidated in ovn-ic.at) which made the original backport
non-trivial and some changes were missed.

This commit fixes that.

Fixes: 5f0809be5657 ("Revert "northd: Don't skip transit switch LSP when 
creating mcast groups."")
Signed-off-by: Dumitru Ceara 
---
V2:
- fix unit test that was missed while backporting the fix.
---
 northd/northd.c |  6 --
 tests/ovn.at| 10 --
 2 files changed, 16 deletions(-)

diff --git a/northd/northd.c b/northd/northd.c
index 60f692aff3..2873a9bc48 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -806,12 +806,6 @@ ovn_datapath_is_stale(const struct ovn_datapath *od)
 return !od->nbr && !od->nbs;
 }
 
-static inline bool
-ovn_datapath_is_transit_switch(const struct ovn_datapath *od)
-{
-return od->tunnel_key >= OVN_MIN_DP_KEY_GLOBAL;
-}
-
 static struct ovn_datapath *
 ovn_datapath_from_sbrec(const struct hmap *ls_datapaths,
 const struct hmap *lr_datapaths,
diff --git a/tests/ovn.at b/tests/ovn.at
index 3004e4d8d4..6d72ef6a13 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -26677,20 +26677,10 @@ wait_row_count IGMP_Group 2 address=239.0.1.68
 wait_row_count IGMP_Group 2 address='"ff0a:dead:beef::1"'
 check ovn-nbctl --wait=hv sync
 
-#Validate that Multicast Group contains all registered ports for
-# specific igmp group.
-ts_dp=$(fetch_column datapath_binding _uuid external_ids:name=ts)
-ports=$(fetch_column multicast_group ports name="239.0.1.68" datapath=$ts_dp)
-check test X2 = X$(echo $ports | wc -w)
-
-
 ovn_as az2
 wait_row_count IGMP_Group 2 address=239.0.1.68
 wait_row_count IGMP_Group 2 address='"ff0a:dead:beef::1"'
 check ovn-nbctl --wait=hv sync
-ts_dp=$(fetch_column datapath_binding _uuid external_ids:name=ts)
-ports=$(fetch_column multicast_group ports name="239.0.1.68" datapath=$ts_dp)
-check test X2 = X$(echo $ports | wc -w)
 
 # Send an IP multicast packet from LSP2, it should be forwarded
 # to lsp1 and lsp3.
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] ofproto: Add upcall/dump-ufid-rules command to map UFIDs to OpenFlow.

2024-06-18 Thread Dumitru Ceara
On 6/18/24 18:50, Mike Pattrick wrote:
> On Fri, Jun 14, 2024 at 6:45 AM Dumitru Ceara  wrote:
>>
>> It improves the debugging experience if we can easily get a list of
>> OpenFlow rules and groups that contribute to the creation of a datapath
>> flow.
>>
>> The suggested workflow is:
>> a. dump datapath flows (along with UUIDs), this also prints the core IDs
>> (PMD IDs) when applicable.
>>
>>   $ ovs-appctl dpctl/dump-flows -m
>>   flow-dump from pmd on cpu core: 7
>>   ufid:7460db8f..., recirc_id(0), 
>>
>> b. dump related OpenFlow rules and groups:
>>   $ ovs-appctl upcall/dump-ufid-rules ufid:7460db8f... pmd=7
>>   cookie=0x12345678, table=0 
>> priority=100,ip,in_port=2,nw_dst=10.0.0.2,actions=resubmit(,1)
>>   cookie=0x0, table=1 priority=200,actions=group:1
>>   group_id=1,bucket=bucket_id:0,actions=ct(commit,table=2,nat(dst=20.0.0.2))
>>   cookie=0x0, table=2 actions=output:1
>>
>> The new command only shows rules and groups attached to ukeys that are
>> in states UKEY_VISIBLE or UKEY_OPERATIONAL.  That should be fine as all
>> other ukeys should not be relevant for the use case presented above.
>>
>> For ukeys that don't have an xcache populated yet, the command goes
>> ahead and populates one.  In theory this is creates a slight overhead as
>> those ukeys might not need an xcache until they see traffic (and get
>> revalidated) but in practice the overhead should be minimal.
>>
>> This commit tries to mimic the output format of the ovs-ofctl
>> dump-flows/dump-groups commands.  For groups it actually uses
>> ofputil_group/_bucket functions for formatting.  For rules it uses
>> flow_stats_ds() (the original function was exported and renamed to
>> ofproto_rule_stats_ds()).
>>
>> Signed-off-by: Dumitru Ceara 
>> ---
>> V3:
>> - Addressed Mike's comments:
>>   - use ds_put_char()
>>   - make the test less flaky
>> V2:
>> - Addressed Adrian's comments:
>>   - check return value of populate_xcache()
>>   - use flow_stats_ds() (renamed to ofproto_rule_stats_ds()) instead of
>> custom printing
>>   - move ukey->state check to caller
>>   - handle case when group bucket is not available
>>   - update test case to cover the point above
>> ---
>>  NEWS|   3 +
>>  include/openvswitch/ofp-group.h |   7 ++
>>  lib/ofp-group.c | 110 +++-
>>  ofproto/ofproto-dpif-upcall.c   |  85 
>>  ofproto/ofproto-dpif.c  |  24 +++
>>  ofproto/ofproto-dpif.h  |   2 +
>>  ofproto/ofproto-provider.h  |   1 +
>>  ofproto/ofproto.c   |  85 
>>  tests/ofproto-dpif.at   |  47 ++
>>  tests/ofproto-macros.at |   9 ++-
>>  10 files changed, 285 insertions(+), 88 deletions(-)
>>
>> diff --git a/NEWS b/NEWS
>> index 5ae0108d552b..1bc085f97045 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -9,6 +9,9 @@ Post-v3.3.0
>>   https://github.com/openvswitch/ovs.git
>> - DPDK:
>>   * OVS validated with DPDK 23.11.1.
>> +   - ovs-appctl:
>> + * Added 'upcall/dump-ufid-rules' to output the set of OpenFlow rules 
>> and
>> +   groups that contributed to the creation of a specific datapath flow.
>>
>>
>>  v3.3.0 - 16 Feb 2024
>> diff --git a/include/openvswitch/ofp-group.h 
>> b/include/openvswitch/ofp-group.h
>> index cd7af0ebff9c..79fcb3a4c0d1 100644
>> --- a/include/openvswitch/ofp-group.h
>> +++ b/include/openvswitch/ofp-group.h
>> @@ -70,6 +70,11 @@ struct ofputil_bucket *ofputil_bucket_find(const struct 
>> ovs_list *,
>>  bool ofputil_bucket_check_duplicate_id(const struct ovs_list *);
>>  struct ofputil_bucket *ofputil_bucket_list_front(const struct ovs_list *);
>>  struct ofputil_bucket *ofputil_bucket_list_back(const struct ovs_list *);
>> +void ofputil_bucket_format(const struct ofputil_bucket *,
>> +   enum ofp11_group_type, enum ofp_version,
>> +   const struct ofputil_port_map *,
>> +   const struct ofputil_table_map *,
>> +   struct ds *);
>>
>>  static inline bool
>>  ofputil_bucket_has_liveness(const struct ofputil_bucket *bucket)
>> @@ -88,6 +93,8 @@ struct ofputil_group_props {
>>  void ofputil_group_properties_destroy(struct ofputil_group_props *);
>>  void ofputil_group_properties_copy(struct ofputil_group_props *to,
>> const struct ofputil_group_props *from);
>> +void ofputil_group_properties_format(const struct ofputil_group_props *,
>> + struct ds *);
>>  /* Protocol-independent group_mod. */
>>  struct ofputil_group_mod {
>>  uint16_t command; /* One of OFPGC15_*. */
>> diff --git a/lib/ofp-group.c b/lib/ofp-group.c
>> index 737f48047b10..0be453d15203 100644
>> --- a/lib/ofp-group.c
>> +++ b/lib/ofp-group.c
>> @@ -58,14 +58,16 @@ ofputil_group_from_string(const char *s, uint32_t 
>> *group_idp)
>>  return true;
>>  }
>>
>> -/* Appends to '

Re: [ovs-dev] [PATCH v2 2/2] ipf: Handle common case of ipf defragmentation.

2024-06-18 Thread Ilya Maximets
On 6/11/24 16:02, Aaron Conole wrote:
> Ilya Maximets  writes:
> 
>> On 6/5/24 16:54, Aaron Conole wrote:
>>> Mike Pattrick  writes:
>>>
 When conntrack is reassembling packet fragments, the same reassembly
 context can be shared across multiple threads handling different packets
 simultaneously. Once a full packet is assembled, it is added to a packet
 batch for processing, in the case where there are multiple different pmd
 threads accessing conntrack simultaneously, there is a race condition
 where the reassembled packet may be added to an arbitrary batch even if
 the current batch is available.

 When this happens, the packet may be handled incorrectly as it is
 inserted into a random openflow execution pipeline, instead of the
 pipeline for that packets flow.

 This change makes a best effort attempt to try to add the defragmented
 packet to the current batch. directly. This should succeed most of the
 time.

 Fixes: 4ea96698f667 ("Userspace datapath: Add fragmentation handling.")
 Reported-at: https://issues.redhat.com/browse/FDP-560
 Signed-off-by: Mike Pattrick 
 ---
>>>
>>> The patch overall looks good to me.  I'm considering applying with the
>>> following addition:
>>>
>>>   diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py
>>>   index 6b293770dd..d9b9e0c23f 100755
>>>   --- a/utilities/checkpatch.py
>>>   +++ b/utilities/checkpatch.py
>>>   @@ -63,7 +63,8 @@ def open_spell_check_dict():
>>>  'dhcpv6', 'opts', 'metadata', 'geneve', 
>>> 'mutex',
>>>  'netdev', 'netdevs', 'subtable', 'virtio', 
>>> 'qos',
>>>  'policer', 'datapath', 'tunctl', 'attr', 
>>> 'ethernet',
>>>   -  'ether', 'defrag', 'defragment', 'loopback', 
>>> 'sflow',
>>>   +  'ether', 'defrag', 'defragment', 
>>> 'defragmented',
>>>   +  'loopback', 'sflow',
>>>  'acl', 'initializer', 'recirc', 'xlated', 
>>> 'unclosed',
>>>  'netlink', 'msec', 'usec', 'nsec', 'ms', 'us', 
>>> 'ns',
>>>  'kilobits', 'kbps', 'kilobytes', 'megabytes', 
>>> 'mbps',
>>>
>>>
>>> unless anyone objects.  This is to squelch:
>>>
>>> == Checking 16f6885353c2 ("ipf: Handle common case of ipf 
>>> defragmentation.") ==
>>> WARNING: Possible misspelled word: "defragmented"
>>> Did you mean:  ['defragment ed', 'defragment-ed', 'defragment']
>>> Lines checked: 129, Warnings: 1, Errors: 0
>>
>> It doesn't affect CI today, so can be a separate patch, I think.  We
>> have a few more
>> words like this in relatively recent commits, like 'poller' or
>> 'autovalidator', these
>> can be bundled in that separate commit as well.
>>
>> Though updating the dictionary along with the patch that is using the
>> word sounds OK
>> to me as well.
> 
> That makes sense to me.
> 
> I've been thinking of adding a spell-check test to the robot.  Rather
> than the existing apply check doing the spell checking.  The spell
> checker would only ever generate a warning.  WDYT?

Sounds fine to me, but we really need to make the checking more robust.
Currently it feeds into the checker too many things that it shouldn't.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofp-prop: Fix unaligned 128 bit access.

2024-06-18 Thread Mike Pattrick
On Fri, Jun 14, 2024 at 8:55 AM Ilya Maximets  wrote:
>
> On 6/12/24 20:12, Mike Pattrick wrote:
> > On Wed, Jun 12, 2024 at 9:50 AM Ales Musil  wrote:
> >
> >>
> >>
> >> On Wed, Jun 12, 2024 at 3:32 PM Mike Pattrick  wrote:
> >>
> >>> When compiling with '-fsanitize=address,undefined', the "ovs-ofctl
> >>> ct-flush" test will yield the following undefined behavior flagged
> >>> by UBSan. This patch uses memcpy to move the 128bit value into the
> >>> stack before reading it.
> >>>
> >>> lib/ofp-prop.c:277:14: runtime error: load of misaligned address
> >>> for type 'union ovs_be128', which requires 8 byte alignment
> >>>   ^
> >>> #0 0x7735d4 in ofpprop_parse_u128 lib/ofp-prop.c:277
> >>> #1 0x6c6c83 in ofp_ct_match_decode lib/ofp-ct.c:529
> >>> #2 0x76f3b5 in ofp_print_nxt_ct_flush lib/ofp-print.c:959
> >>> #3 0x76f3b5 in ofp_to_string__ lib/ofp-print.c:1206
> >>> #4 0x76f3b5 in ofp_to_string lib/ofp-print.c:1264
> >>> #5 0x770c0d in ofp_print lib/ofp-print.c:1308
> >>> #6 0x484a9d in ofctl_ofp_print utilities/ovs-ofctl.c:4899
> >>> #7 0x4ddb77 in ovs_cmdl_run_command__ lib/command-line.c:247
> >>> #8 0x47f6b3 in main utilities/ovs-ofctl.c:186
>
> Thanks for cleaning up the trace.  Please, also remove the '#' tags.
> GitHub trats them as references to issues/PRs and that is annoying.
>
> Also, while most of the addresses in the trace are not important and it's
> good to strip or shorten them, the actual address where the memory access
> happened is important here, so we can see what the actual alignment was.
> Was it part of the original error message?  Clang usually provides them,
> not sure about gcc.
>
> >>>
> >>> Signed-off-by: Mike Pattrick 
> >>> ---
> >>>
> >>
> >> Hi Mike,
> >>
> >> this is interesting, do you have an idea why it didn't fail in CI by now?
> >> Also AFAIR the ofprops is supposed to be aligned to 8 bytes so unless the
> >> buffer itself isn't allocated at an address that is not itself 8 bytes
> >> aligned it shouldn't happen. In that case we might actually have a problem
> >> with other sizes.
> >>
> >
> > Report is seen with gcc + ubsan, but not clang + ubsan. It is possible that
> > this is only seen due the test, this warning wasn't seen live.
>
> I agree with Ales on this one.  Properties supposed to be aligned.
> We need to find why they are not.  i.e. is it a property itself or
> something before it.

I took a closer look at this issue. I first ran into it with
"ovs-ofctl ofp-print", but the same error pops up in ovs-vswitchd with
a command like "ovs-ofctl ct-flush br0 labels=1".

The problem is that the put/parse_u128 functions don't add padding
between the 4 byte header and the data. This is something already
taken care of in the 64bit functions.

I'll resubmit with a more correct solution.


Thanks,
Mike

>
> We may need to take similar approach as in commit:
>   a5cc859a4228 ("ofp-actions: Use aligned structures when decoding ofp 
> actions.")
>
> >
> > Cheers,
> > M
> >
> >
> >>
> >>
> >>>  lib/ofp-prop.c | 4 +++-
> >>>  1 file changed, 3 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/lib/ofp-prop.c b/lib/ofp-prop.c
> >>> index 0a685750c..ed6365414 100644
> >>> --- a/lib/ofp-prop.c
> >>> +++ b/lib/ofp-prop.c
> >>> @@ -271,10 +271,12 @@ enum ofperr
> >>>  ofpprop_parse_u128(const struct ofpbuf *property, ovs_u128 *value)
> >>>  {
> >>>  ovs_be128 *p = property->msg;
> >>> +ovs_be128 aligned;
> >>>  if (ofpbuf_msgsize(property) != sizeof *p) {
> >>>  return OFPERR_OFPBPC_BAD_LEN;
> >>>  }
> >>> -*value = ntoh128(*p);
> >>> +memcpy(&aligned, p, sizeof aligned);
>
> FWIW, this doesn't actually fix the issue.  At least not in all the
> cases.  Compiler is free to make alignment assumptions based on the
> pointer type, so we can still have unaligned access inside the memcpy.
>
> Best regards, Ilya Maximets.
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn branch-23.09] northd: Remove unused ovn_datapath_is_transit_switch().

2024-06-18 Thread Dumitru Ceara
Fixes: 5f0809be5657 ("Revert "northd: Don't skip transit switch LSP when 
creating mcast groups."")
Signed-off-by: Dumitru Ceara 
---
 northd/northd.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/northd/northd.c b/northd/northd.c
index 60f692aff3..2873a9bc48 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -806,12 +806,6 @@ ovn_datapath_is_stale(const struct ovn_datapath *od)
 return !od->nbr && !od->nbs;
 }
 
-static inline bool
-ovn_datapath_is_transit_switch(const struct ovn_datapath *od)
-{
-return od->tunnel_key >= OVN_MIN_DP_KEY_GLOBAL;
-}
-
 static struct ovn_datapath *
 ovn_datapath_from_sbrec(const struct hmap *ls_datapaths,
 const struct hmap *lr_datapaths,
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 0/4] Fix inter-AZ IP-multicast traffic in ovn-kubernetes-like deployments.

2024-06-18 Thread Mark Michelson
Thanks everyone, I pushed the change to main, branch-24.03, and 
branch-23.09.


On 6/18/24 11:16, Numan Siddique wrote:

On Tue, Jun 18, 2024 at 9:34 AM Dumitru Ceara  wrote:


Out of the patches in the series, the one that introduced the issue is
PATCH 3/4.  However we need to also revert two other patches that depend
on it.  PATCH 4/4 is a test to avoid future regressions.

Dumitru Ceara (4):
   Revert "IC: Tansit switch don't flood mcast traffic to router ports if
 matches igmp group."
   Revert "ovn-ic: Avoid igmp/mld traffic flooding."
   Revert "northd: Don't skip transit switch LSP when creating mcast
 groups."
   tests: ic: Add IP multicast test that simulates the ovn-k8s use case.


Thanks for adding the test patch.  It would be great at some point we
add mutlinode tests (mutlinode.at) for various multicast traffic
scenarios.

For the entire series:

Acked-by: Numan Siddique 

Numan





  controller/pinctrl.c |   2 -
  include/ovn/logical-fields.h |   3 -
  lib/logical-fields.c |   4 -
  northd/northd.c  |  36 ++---
  northd/northd.h  |   6 -
  northd/ovn-northd.8.xml  |  14 +-
  tests/ovn-ic.at  | 252 +--
  7 files changed, 261 insertions(+), 56 deletions(-)

--
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] ofproto: Add upcall/dump-ufid-rules command to map UFIDs to OpenFlow.

2024-06-18 Thread Mike Pattrick
On Fri, Jun 14, 2024 at 6:45 AM Dumitru Ceara  wrote:
>
> It improves the debugging experience if we can easily get a list of
> OpenFlow rules and groups that contribute to the creation of a datapath
> flow.
>
> The suggested workflow is:
> a. dump datapath flows (along with UUIDs), this also prints the core IDs
> (PMD IDs) when applicable.
>
>   $ ovs-appctl dpctl/dump-flows -m
>   flow-dump from pmd on cpu core: 7
>   ufid:7460db8f..., recirc_id(0), 
>
> b. dump related OpenFlow rules and groups:
>   $ ovs-appctl upcall/dump-ufid-rules ufid:7460db8f... pmd=7
>   cookie=0x12345678, table=0 
> priority=100,ip,in_port=2,nw_dst=10.0.0.2,actions=resubmit(,1)
>   cookie=0x0, table=1 priority=200,actions=group:1
>   group_id=1,bucket=bucket_id:0,actions=ct(commit,table=2,nat(dst=20.0.0.2))
>   cookie=0x0, table=2 actions=output:1
>
> The new command only shows rules and groups attached to ukeys that are
> in states UKEY_VISIBLE or UKEY_OPERATIONAL.  That should be fine as all
> other ukeys should not be relevant for the use case presented above.
>
> For ukeys that don't have an xcache populated yet, the command goes
> ahead and populates one.  In theory this is creates a slight overhead as
> those ukeys might not need an xcache until they see traffic (and get
> revalidated) but in practice the overhead should be minimal.
>
> This commit tries to mimic the output format of the ovs-ofctl
> dump-flows/dump-groups commands.  For groups it actually uses
> ofputil_group/_bucket functions for formatting.  For rules it uses
> flow_stats_ds() (the original function was exported and renamed to
> ofproto_rule_stats_ds()).
>
> Signed-off-by: Dumitru Ceara 
> ---
> V3:
> - Addressed Mike's comments:
>   - use ds_put_char()
>   - make the test less flaky
> V2:
> - Addressed Adrian's comments:
>   - check return value of populate_xcache()
>   - use flow_stats_ds() (renamed to ofproto_rule_stats_ds()) instead of
> custom printing
>   - move ukey->state check to caller
>   - handle case when group bucket is not available
>   - update test case to cover the point above
> ---
>  NEWS|   3 +
>  include/openvswitch/ofp-group.h |   7 ++
>  lib/ofp-group.c | 110 +++-
>  ofproto/ofproto-dpif-upcall.c   |  85 
>  ofproto/ofproto-dpif.c  |  24 +++
>  ofproto/ofproto-dpif.h  |   2 +
>  ofproto/ofproto-provider.h  |   1 +
>  ofproto/ofproto.c   |  85 
>  tests/ofproto-dpif.at   |  47 ++
>  tests/ofproto-macros.at |   9 ++-
>  10 files changed, 285 insertions(+), 88 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index 5ae0108d552b..1bc085f97045 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -9,6 +9,9 @@ Post-v3.3.0
>   https://github.com/openvswitch/ovs.git
> - DPDK:
>   * OVS validated with DPDK 23.11.1.
> +   - ovs-appctl:
> + * Added 'upcall/dump-ufid-rules' to output the set of OpenFlow rules and
> +   groups that contributed to the creation of a specific datapath flow.
>
>
>  v3.3.0 - 16 Feb 2024
> diff --git a/include/openvswitch/ofp-group.h b/include/openvswitch/ofp-group.h
> index cd7af0ebff9c..79fcb3a4c0d1 100644
> --- a/include/openvswitch/ofp-group.h
> +++ b/include/openvswitch/ofp-group.h
> @@ -70,6 +70,11 @@ struct ofputil_bucket *ofputil_bucket_find(const struct 
> ovs_list *,
>  bool ofputil_bucket_check_duplicate_id(const struct ovs_list *);
>  struct ofputil_bucket *ofputil_bucket_list_front(const struct ovs_list *);
>  struct ofputil_bucket *ofputil_bucket_list_back(const struct ovs_list *);
> +void ofputil_bucket_format(const struct ofputil_bucket *,
> +   enum ofp11_group_type, enum ofp_version,
> +   const struct ofputil_port_map *,
> +   const struct ofputil_table_map *,
> +   struct ds *);
>
>  static inline bool
>  ofputil_bucket_has_liveness(const struct ofputil_bucket *bucket)
> @@ -88,6 +93,8 @@ struct ofputil_group_props {
>  void ofputil_group_properties_destroy(struct ofputil_group_props *);
>  void ofputil_group_properties_copy(struct ofputil_group_props *to,
> const struct ofputil_group_props *from);
> +void ofputil_group_properties_format(const struct ofputil_group_props *,
> + struct ds *);
>  /* Protocol-independent group_mod. */
>  struct ofputil_group_mod {
>  uint16_t command; /* One of OFPGC15_*. */
> diff --git a/lib/ofp-group.c b/lib/ofp-group.c
> index 737f48047b10..0be453d15203 100644
> --- a/lib/ofp-group.c
> +++ b/lib/ofp-group.c
> @@ -58,14 +58,16 @@ ofputil_group_from_string(const char *s, uint32_t 
> *group_idp)
>  return true;
>  }
>
> -/* Appends to 's' a string representation of the OpenFlow group ID 
> 'group_id'.
> - * Most groups' string representation is just the number, but for special
> - * groups, e.g.

Re: [ovs-dev] [PATCH net-next 5/7] selftests: openvswitch: Support implicit ipv6 arguments.

2024-06-18 Thread Aaron Conole
Aaron Conole  writes:

> The current iteration of IPv6 support requires explicit fields to be set
> in addition to not properly support the actual IPv6 addresses properly.
> With this change, make it so that the ipv6() bare option is usable to
> create wildcarded flows to match broad swaths of ipv6 traffic.
>
> Reviewed-by: Simon Horman 
> Tested-by: Simon Horman 
> Signed-off-by: Aaron Conole 
> ---
>  .../selftests/net/openvswitch/ovs-dpctl.py| 42 ---
>  1 file changed, 27 insertions(+), 15 deletions(-)
>
> diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py 
> b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
> index 2f16df2fb16b..2062e7e6e99e 100644
> --- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
> +++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
> @@ -200,6 +200,18 @@ def convert_ipv4(data):
>  
>  return int(ipaddress.IPv4Address(ip)), int(ipaddress.IPv4Address(mask))
>  
> +def convert_ipv6(data):
> +ip, _, mask = data.partition('/')
> +
> +if not ip:
> +ip = mask = 0
> +elif not mask:
> +mask = ':::::::'
> +elif mask.isdigit():
> +mask = ipaddress.IPv6Network("::/" + mask).hostmask
> +
> +return ipaddress.IPv6Address(ip).packed, 
> ipaddress.IPv6Address(mask).packed
> +
>  def convert_int(size):
>  def convert_int_sized(data):
>  value, _, mask = data.partition('/')
> @@ -941,21 +953,21 @@ class ovskey(nla):
>  "src",
>  "src",
>  lambda x: str(ipaddress.IPv6Address(x)),
> -lambda x: int.from_bytes(x, "big"),
> -lambda x: ipaddress.IPv6Address(x),
> +lambda x: ipaddress.IPv6Address(x).packed if x else 0,
> +convert_ipv6,
>  ),
>  (
>  "dst",
>  "dst",
>  lambda x: str(ipaddress.IPv6Address(x)),
> -lambda x: int.from_bytes(x, "big"),
> -lambda x: ipaddress.IPv6Address(x),
> +lambda x: ipaddress.IPv6Address(x).packed if x else 0,
> +convert_ipv6,
>  ),
> -("label", "label", "%d", int),
> -("proto", "proto", "%d", int),
> -("tclass", "tclass", "%d", int),
> -("hlimit", "hlimit", "%d", int),
> -("frag", "frag", "%d", int),
> +("label", "label", "%d", lambda x: int(x) if x else 0),
> +("proto", "proto", "%d", lambda x: int(x) if x else 0),
> +("tclass", "tclass", "%d", lambda x: int(x) if x else 0),
> +("hlimit", "hlimit", "%d", lambda x: int(x) if x else 0),
> +("frag", "frag", "%d", lambda x: int(x) if x else 0),
>  )
>  
>  def __init__(
> @@ -1152,8 +1164,8 @@ class ovskey(nla):
>  (
>  "target",
>  "target",
> -lambda x: str(ipaddress.IPv6Address(x)),
> -lambda x: int.from_bytes(x, "big"),
> +lambda x: ipaddress.IPv6Address(x).packed,

This (and the following str() calls) shouldn't have been changed.  I'll
send a v2.  Sorry about the noise.  It isn't visible in this test, but
when doing some additional ipv6 test development for a future series, I
caught it.

> +convert_ipv6,
>  ),
>  ("sll", "sll", macstr, lambda x: int.from_bytes(x, "big")),
>  ("tll", "tll", macstr, lambda x: int.from_bytes(x, "big")),
> @@ -1237,14 +1249,14 @@ class ovskey(nla):
>  (
>  "src",
>  "src",
> -lambda x: str(ipaddress.IPv6Address(x)),
> -lambda x: int.from_bytes(x, "big", convertmac),
> +lambda x: ipaddress.IPv6Address(x).packed,
> +convert_ipv6,
>  ),
>  (
>  "dst",
>  "dst",
> -lambda x: str(ipaddress.IPv6Address(x)),
> -lambda x: int.from_bytes(x, "big"),
> +lambda x: ipaddress.IPv6Address(x).packed,
> +convert_ipv6,
>  ),
>  ("tp_src", "tp_src", "%d", int),
>  ("tp_dst", "tp_dst", "%d", int),

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next 6/7] selftests: net: Use the provided dpctl rather than the vswitchd for tests.

2024-06-18 Thread Aaron Conole
Aaron Conole  writes:

> The current pmtu test infrastucture requires an installed copy of the
> ovs-vswitchd userspace.  This means that any automated or constrained
> environments may not have the requisite tools to run the tests.  However,
> the pmtu tests don't require any special classifier processing.  Indeed
> they are only using the vswitchd in the most basic mode - as a NORMAL
> switch.
>
> However, the ovs-dpctl kernel utility can now program all the needed basic
> flows to allow traffic to traverse the tunnels and provide support for at
> least testing some basic pmtu scenarios.  More complicated flow pipelines
> can be added to the internal ovs test infrastructure, but that is work for
> the future.  For now, enable the most common cases - wide mega flows with
> no other prerequisites.
>
> Enhance the pmtu testing to try testing using the internal utility, first.
> As a fallback, if the internal utility isn't running, then try with the
> ovs-vswitchd userspace tools.
>
> Signed-off-by: Aaron Conole 
> ---
>  tools/testing/selftests/net/pmtu.sh | 145 +++-
>  1 file changed, 123 insertions(+), 22 deletions(-)
>
> diff --git a/tools/testing/selftests/net/pmtu.sh 
> b/tools/testing/selftests/net/pmtu.sh
> index cfc84958025a..51ccb9bed069 100755
> --- a/tools/testing/selftests/net/pmtu.sh
> +++ b/tools/testing/selftests/net/pmtu.sh
> @@ -842,25 +842,97 @@ setup_bridge() {
>   run_cmd ${ns_a} ip link set veth_A-C master br0
>  }
>  
> +setup_ovs_via_internal_utility() {
> + type="${1}"
> + a_addr="${2}"
> + b_addr="${3}"
> + dport="${4}"
> +
> + run_cmd python3 ./openvswitch/ovs-dpctl.py add-if ovs_br0 ${type}_a -t 
> ${type} || return 1
> +
> + ports=$(python3 ./openvswitch/ovs-dpctl.py show)
> + br0_port=$(echo "$ports" | grep -E "\sovs_br0" | sed -e 's@port @@' | 
> cut -d: -f1 | xargs)
> + type_a_port=$(echo "$ports" | grep ${type}_a | sed -e 's@port @@' | cut 
> -d: -f1 | xargs)
> + veth_a_port=$(echo "$ports" | grep veth_A | sed -e 's@port @@' | cut 
> -d: -f1 | xargs)
> +
> + v4_a_tun="${prefix4}.${a_r1}.1"
> + v4_b_tun="${prefix4}.${b_r1}.1"
> +
> + v6_a_tun="${prefix6}:${a_r1}::1"
> + v6_b_tun="${prefix6}:${b_r1}::1"
> +
> + if [ "${v4_a_tun}" = "${a_addr}" ]; then
> + run_cmd python3 ./openvswitch/ovs-dpctl.py add-flow ovs_br0 \
> + 
> "recirc_id(0),in_port(${veth_a_port}),eth(),eth_type(0x0800),ipv4()" \
> + 
> "set(tunnel(tun_id=1,dst=${v4_b_tun},ttl=64,tp_dst=${dport},flags(df|csum))),${type_a_port}"
> + run_cmd python3 ./openvswitch/ovs-dpctl.py add-flow ovs_br0 \
> + 
> "recirc_id(0),in_port(${veth_a_port}),eth(),eth_type(0x86dd),ipv6()" \
> + 
> "set(tunnel(tun_id=1,dst=${v4_b_tun},ttl=64,tp_dst=${dport},flags(df|csum))),${type_a_port}"
> + run_cmd python3 ./openvswitch/ovs-dpctl.py add-flow ovs_br0 \
> + 
> "recirc_id(0),tunnel(tun_id=1,src=${v4_b_tun},dst=${v4_a_tun}),in_port(${type_a_port}),eth(),eth_type(0x0800),ipv4()"
>  \
> + "${veth_a_port}"
> + run_cmd python3 ./openvswitch/ovs-dpctl.py add-flow ovs_br0 \
> + 
> "recirc_id(0),tunnel(tun_id=1,src=${v4_b_tun},dst=${v4_a_tun}),in_port(${type_a_port}),eth(),eth_type(0x86dd),ipv6()"
>  \
> + "${veth_a_port}"
> + run_cmd python3 ./openvswitch/ovs-dpctl.py add-flow ovs_br0 \
> + 
> "recirc_id(0),tunnel(tun_id=1,src=${v4_b_tun},dst=${v4_a_tun}),in_port(${type_a_port}),eth(),eth_type(0x0806),arp()"
>  \
> + "${veth_a_port}"
> + run_cmd python3 ./openvswitch/ovs-dpctl.py add-flow ovs_br0 \
> + 
> "recirc_id(0),in_port(${veth_a_port}),eth(),eth_type(0x0806),arp(sip=${veth4_c_addr},tip=${tunnel4_b_addr})"
>  \
> + 
> "set(tunnel(tun_id=1,dst=${v4_b_tun},ttl=64,tp_dst=${dport},flags(df|csum))),${type_a_port}"
> + else
> + run_cmd python3 ./openvswitch/ovs-dpctl.py add-flow ovs_br0 \
> + 
> "recirc_id(0),in_port(${veth_a_port}),eth(),eth_type(0x0800),ipv4()" \
> + 
> "set(tunnel(tun_id=1,ipv6_dst=${v6_b_tun},ttl=64,tp_dst=${dport},flags(df|csum))),${type_a_port}"
> + run_cmd python3 ./openvswitch/ovs-dpctl.py add-flow ovs_br0 \
> + 
> "recirc_id(0),in_port(${veth_a_port}),eth(),eth_type(0x86dd),ipv6()" \
> + 
> "set(tunnel(tun_id=1,ipv6_dst=${v6_b_tun},ttl=64,tp_dst=${dport},flags(df|csum))),${type_a_port}"
> + run_cmd python3 ./openvswitch/ovs-dpctl.py add-flow ovs_br0 \
> + 
> "recirc_id(0),tunnel(tun_id=1,ipv6_src=${v6_b_tun},ipv6_dst=${v6_a_tun}),in_port(${type_a_port}),eth(),eth_type(0x0800),ipv4()"
>  \
> + "${veth_a_port}"
> + run_cmd python3 ./openvswitch/ovs-dpctl.py add-flow ovs_br0 \
> + 
> "recirc_id(0),tunnel(tun_id=1,ipv6_s

Re: [ovs-dev] [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack

2024-06-18 Thread Ilya Maximets
On 6/18/24 16:58, Xin Long wrote:
> On Tue, Jun 18, 2024 at 7:34 AM Ilya Maximets  wrote:
>>
>> On 6/17/24 22:10, Ilya Maximets wrote:
>>> On 7/16/23 23:09, Xin Long wrote:
 By not setting IPS_CONFIRMED in tmpl that allows the exp not to be removed
 from the hashtable when lookup, we can simplify the exp processing code a
 lot in openvswitch conntrack.

 Signed-off-by: Xin Long 
 ---
  net/openvswitch/conntrack.c | 78 +
  1 file changed, 10 insertions(+), 68 deletions(-)

 diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
 index 331730fd3580..fa955e892210 100644
 --- a/net/openvswitch/conntrack.c
 +++ b/net/openvswitch/conntrack.c
 @@ -455,45 +455,6 @@ static int ovs_ct_handle_fragments(struct net *net, 
 struct sw_flow_key *key,
  return 0;
  }

 -static struct nf_conntrack_expect *
 -ovs_ct_expect_find(struct net *net, const struct nf_conntrack_zone *zone,
 -   u16 proto, const struct sk_buff *skb)
 -{
 -struct nf_conntrack_tuple tuple;
 -struct nf_conntrack_expect *exp;
 -
 -if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), proto, net, 
 &tuple))
 -return NULL;
 -
 -exp = __nf_ct_expect_find(net, zone, &tuple);
 -if (exp) {
 -struct nf_conntrack_tuple_hash *h;
 -
 -/* Delete existing conntrack entry, if it clashes with the
 - * expectation.  This can happen since conntrack ALGs do not
 - * check for clashes between (new) expectations and existing
 - * conntrack entries.  nf_conntrack_in() will check the
 - * expectations only if a conntrack entry can not be found,
 - * which can lead to OVS finding the expectation (here) in the
 - * init direction, but which will not be removed by the
 - * nf_conntrack_in() call, if a matching conntrack entry is
 - * found instead.  In this case all init direction packets
 - * would be reported as new related packets, while reply
 - * direction packets would be reported as un-related
 - * established packets.
 - */
 -h = nf_conntrack_find_get(net, zone, &tuple);
 -if (h) {
 -struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
 -
 -nf_ct_delete(ct, 0, 0);
 -nf_ct_put(ct);
 -}
 -}
 -
 -return exp;
 -}
 -
  /* This replicates logic from nf_conntrack_core.c that is not exported. */
  static enum ip_conntrack_info
  ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h)
 @@ -852,36 +813,16 @@ static int ovs_ct_lookup(struct net *net, struct 
 sw_flow_key *key,
   const struct ovs_conntrack_info *info,
   struct sk_buff *skb)
  {
 -struct nf_conntrack_expect *exp;
 -
 -/* If we pass an expected packet through nf_conntrack_in() the
 - * expectation is typically removed, but the packet could still be
 - * lost in upcall processing.  To prevent this from happening we
 - * perform an explicit expectation lookup.  Expected connections are
 - * always new, and will be passed through conntrack only when they are
 - * committed, as it is OK to remove the expectation at that time.
 - */
 -exp = ovs_ct_expect_find(net, &info->zone, info->family, skb);
 -if (exp) {
 -u8 state;
 -
 -/* NOTE: New connections are NATted and Helped only when
 - * committed, so we are not calling into NAT here.
 - */
 -state = OVS_CS_F_TRACKED | OVS_CS_F_NEW | OVS_CS_F_RELATED;
 -__ovs_ct_update_key(key, state, &info->zone, exp->master);
>>>
>>> Hi, Xin, others.
>>>
>>> Unfortunately, it seems like removal of this code broke the expected 
>>> behavior.
>>> OVS in userspace expects that SYN packet of a new related FTP connection 
>>> will
>>> get +new+rel+trk flags, but after this patch we're only getting +rel+trk 
>>> and not
>>> new.  This is a problem because we need to commit this connection with the 
>>> label
>>> and we do that for +new packets.  If we can't get +new packet we'll have to 
>>> commit
>>> every single +rel+trk packet, which doesn't make a lot of sense.  And it's a
>>> significant behavior change regardless.
>>
>> Interestingly enough I see +new+rel+trk packets in cases without SNAT,
>> but we can only get +rel+trk in cases with SNAT.  So, this may be just
>> a generic conntrack bug somewhere.  At least the behavior seems fairly
>> inconsistent.
>>
> In nf_conntrack, IP_CT_RELATED and IP_CT_NEW do not exist at the same
> time. With this 

Re: [ovs-dev] [PATCH net-next v2 7/9] net: openvswitch: do not notify drops inside sample

2024-06-18 Thread Ilya Maximets
On 6/18/24 12:50, Adrián Moreno wrote:
> On Tue, Jun 18, 2024 at 12:22:23PM GMT, Ilya Maximets wrote:
>> On 6/18/24 09:00, Adrián Moreno wrote:
>>> On Mon, Jun 17, 2024 at 02:10:37PM GMT, Ilya Maximets wrote:
 On 6/17/24 13:55, Ilya Maximets wrote:
> On 6/3/24 20:56, Adrian Moreno wrote:
>> The OVS_ACTION_ATTR_SAMPLE action is, in essence,
>> observability-oriented.
>>
>> Apart from some corner case in which it's used a replacement of clone()
>> for old kernels, it's really only used for sFlow, IPFIX and now,
>> local emit_sample.
>>
>> With this in mind, it doesn't make much sense to report
>> OVS_DROP_LAST_ACTION inside sample actions.
>>
>> For instance, if the flow:
>>
>>   actions:sample(..,emit_sample(..)),2
>>
>> triggers a OVS_DROP_LAST_ACTION skb drop event, it would be extremely
>> confusing for users since the packet did reach its destination.
>>
>> This patch makes internal action execution silently consume the skb
>> instead of notifying a drop for this case.
>>
>> Unfortunately, this patch does not remove all potential sources of
>> confusion since, if the sample action itself is the last action, e.g:
>>
>> actions:sample(..,emit_sample(..))
>>
>> we actually _should_ generate a OVS_DROP_LAST_ACTION event, but we 
>> aren't.
>>
>> Sadly, this case is difficult to solve without breaking the
>> optimization by which the skb is not cloned on last sample actions.
>> But, given explicit drop actions are now supported, OVS can just add one
>> after the last sample() and rewrite the flow as:
>>
>> actions:sample(..,emit_sample(..)),drop
>>
>> Signed-off-by: Adrian Moreno 
>> ---
>>  net/openvswitch/actions.c | 13 +++--
>>  1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
>> index 33f6d93ba5e4..54fc1abcff95 100644
>> --- a/net/openvswitch/actions.c
>> +++ b/net/openvswitch/actions.c
>> @@ -82,6 +82,15 @@ static struct action_fifo __percpu *action_fifos;
>>  static struct action_flow_keys __percpu *flow_keys;
>>  static DEFINE_PER_CPU(int, exec_actions_level);
>>
>> +static inline void ovs_drop_skb_last_action(struct sk_buff *skb)
>> +{
>> +/* Do not emit packet drops inside sample(). */
>> +if (OVS_CB(skb)->probability)
>> +consume_skb(skb);
>> +else
>> +ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
>> +}
>> +
>>  /* Make a clone of the 'key', using the pre-allocated percpu 'flow_keys'
>>   * space. Return NULL if out of key spaces.
>>   */
>> @@ -1061,7 +1070,7 @@ static int sample(struct datapath *dp, struct 
>> sk_buff *skb,
>>  if ((arg->probability != U32_MAX) &&
>>  (!arg->probability || get_random_u32() > arg->probability)) 
>> {
>>  if (last)
>> -ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
>> +ovs_drop_skb_last_action(skb);

 Always consuming the skb at this point makes sense, since having smaple()
 as a last action is a reasonable thing to have.  But this looks more like
 a fix for the original drop reason patch set.

>>>
>>> I don't think consuming the skb at this point makes sense. It was very
>>> intentionally changed to a drop since a very common use-case for
>>> sampling is drop-sampling, i.e: replacing an empty action list (that
>>> triggers OVS_DROP_LAST_ACTION) with a sample(emit_sample()). Ideally,
>>> that replacement should not have any effect on the number of
>>> OVS_DROP_LAST_ACTION being reported as the packets are being treated in
>>> the same way (only observed in one case).
>>>
>>>
>>  return 0;
>>  }
>>
>> @@ -1579,7 +1588,7 @@ static int do_execute_actions(struct datapath *dp, 
>> struct sk_buff *skb,
>>  }
>>  }
>>
>> -ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
>> +ovs_drop_skb_last_action(skb);
>
> I don't think I agree with this one.  If we have a sample() action with
> a lot of different actions inside and we reached the end while the last
> action didn't consume the skb, then we should report that.  E.g.
> "sample(emit_sample(),push_vlan(),set(eth())),2"  should report that the
> cloned skb was dropped.  "sample(push_vlan(),emit_sample())" should not.
>
>>>
>>> What is the use case for such action list? Having an action branch
>>> executed randomly doesn't make sense to me if it's not some
>>> observability thing (which IMHO should not trigger drops).
>>
>> It is exactly my point.  A list of actions that doesn't end is some sort
>> of a terminal action (output, drop, etc) does not make a lot of sense and
>

Re: [ovs-dev] [PATCH ovn 0/4] Fix inter-AZ IP-multicast traffic in ovn-kubernetes-like deployments.

2024-06-18 Thread Numan Siddique
On Tue, Jun 18, 2024 at 9:34 AM Dumitru Ceara  wrote:
>
> Out of the patches in the series, the one that introduced the issue is
> PATCH 3/4.  However we need to also revert two other patches that depend
> on it.  PATCH 4/4 is a test to avoid future regressions.
>
> Dumitru Ceara (4):
>   Revert "IC: Tansit switch don't flood mcast traffic to router ports if
> matches igmp group."
>   Revert "ovn-ic: Avoid igmp/mld traffic flooding."
>   Revert "northd: Don't skip transit switch LSP when creating mcast
> groups."
>   tests: ic: Add IP multicast test that simulates the ovn-k8s use case.

Thanks for adding the test patch.  It would be great at some point we
add mutlinode tests (mutlinode.at) for various multicast traffic
scenarios.

For the entire series:

Acked-by: Numan Siddique 

Numan



>
>  controller/pinctrl.c |   2 -
>  include/ovn/logical-fields.h |   3 -
>  lib/logical-fields.c |   4 -
>  northd/northd.c  |  36 ++---
>  northd/northd.h  |   6 -
>  northd/ovn-northd.8.xml  |  14 +-
>  tests/ovn-ic.at  | 252 +--
>  7 files changed, 261 insertions(+), 56 deletions(-)
>
> --
> 2.44.0
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 3/4] Revert "northd: Don't skip transit switch LSP when creating mcast groups."

2024-06-18 Thread Mohammad Heib
On Tue, Jun 18, 2024 at 4:32 PM Dumitru Ceara  wrote:

> This reverts commit 85ca2b75369c9a73f4750d5914666a54ebb3f2e0.
>
> The commit above breaks inter-AZ IP multicast for the case when one of
> the multicast receivers is co-located in the same zone as the sender.
> In those cases traffic is not correctly forwarded to other receivers
> that joined the group in other AZs.
>
> This is often the case in ovn-kubernetes deployments (with IC enabled).
> The current "interconnection - IGMP/MLD multicast" unit test failed to
> cover such topologies.
>
> CC: Mohammad Heib 
> Fixes: 85ca2b75369c ("northd: Don't skip transit switch LSP when creating
> mcast groups.")
> Reported-at: https://issues.redhat.com/browse/FDP-656
> Signed-off-by: Dumitru Ceara 
> ---
>  northd/northd.c |  8 +++-
>  northd/northd.h |  6 --
>  tests/ovn-ic.at | 10 --
>  3 files changed, 3 insertions(+), 21 deletions(-)
>
> diff --git a/northd/northd.c b/northd/northd.c
> index 3c1affb02f..7e474a7b89 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -5434,13 +5434,11 @@ ovn_igmp_group_get_ports(const struct
> sbrec_igmp_group *sb_igmp_group,
>  continue;
>  }
>
> -/* If this is already a port of a router on which relay is enabled
> - * and it's not a transit switch to router port, skip it for the
> - * group.  Traffic is flooded there anyway.
> +/* If this is already a port of a router on which relay is
> enabled,
> + * skip it for the group. Traffic is flooded there anyway.
>   */
>  if (port->peer && port->peer->od &&
> -port->peer->od->mcast_info.rtr.relay &&
> -!ovn_datapath_is_transit_switch(port->od)) {
> +port->peer->od->mcast_info.rtr.relay) {
>  continue;
>  }
>
> diff --git a/northd/northd.h b/northd/northd.h
> index 146139bebc..fd884c851e 100644
> --- a/northd/northd.h
> +++ b/northd/northd.h
> @@ -362,12 +362,6 @@ ovn_datapath_is_stale(const struct ovn_datapath *od)
>  return !od->nbr && !od->nbs;
>  };
>
> -static inline bool
> -ovn_datapath_is_transit_switch(const struct ovn_datapath *od)
> -{
> -return od->tunnel_key >= OVN_MIN_DP_KEY_GLOBAL;
> -}
> -
>  /* Pipeline stages. */
>
>  /* The two purposes for which ovn-northd uses OVN logical datapaths. */
> diff --git a/tests/ovn-ic.at b/tests/ovn-ic.at
> index 1666e77295..20409f70ac 100644
> --- a/tests/ovn-ic.at
> +++ b/tests/ovn-ic.at
> @@ -2042,20 +2042,10 @@ wait_row_count IGMP_Group 2 address=239.0.1.68
>  wait_row_count IGMP_Group 2 address='"ff0a:dead:beef::1"'
>  check ovn-nbctl --wait=hv sync
>
> -#Validate that Multicast Group contains all registered ports for
> -# specific igmp group.
> -ts_dp=$(fetch_column datapath_binding _uuid external_ids:name=ts)
> -ports=$(fetch_column multicast_group ports name="239.0.1.68"
> datapath=$ts_dp)
> -check test X2 = X$(echo $ports | wc -w)
> -
> -
>  ovn_as az2
>  wait_row_count IGMP_Group 2 address=239.0.1.68
>  wait_row_count IGMP_Group 2 address='"ff0a:dead:beef::1"'
>  check ovn-nbctl --wait=hv sync
> -ts_dp=$(fetch_column datapath_binding _uuid external_ids:name=ts)
> -ports=$(fetch_column multicast_group ports name="239.0.1.68"
> datapath=$ts_dp)
> -check test X2 = X$(echo $ports | wc -w)
>
>  # Send an IP multicast packet from LSP2, it should be forwarded
>  # to lsp1 and lsp3.
> --
> 2.44.0
>
>
looks good to me, thanks Dumitru.

Acked-by: Mohammad Heib 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 1/4] Revert "IC: Tansit switch don't flood mcast traffic to router ports if matches igmp group."

2024-06-18 Thread Mohammad Heib
Acked-by: Mohammad Heib 

Thanks

On Tue, Jun 18, 2024 at 4:42 PM 0-day Robot  wrote:

> Bleep bloop.  Greetings Dumitru Ceara, I am a robot and I have tried out
> your patch.
> Thanks for your contribution.
>
> I encountered some error that I wasn't expecting.  See the details below.
>
>
> checkpatch:
> WARNING: The subject, ': ', is over 70 characters, i.e., 91.
> WARNING: The subject summary should end with a dot.
> Subject: Revert "IC: Tansit switch don't flood mcast traffic to router
> ports if matches igmp group."
> Lines checked: 90, Warnings: 2, Errors: 0
>
>
> Please check this out.  If you feel there has been an error, please email
> acon...@redhat.com
>
> Thanks,
> 0-day Robot
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack

2024-06-18 Thread Xin Long
On Tue, Jun 18, 2024 at 7:34 AM Ilya Maximets  wrote:
>
> On 6/17/24 22:10, Ilya Maximets wrote:
> > On 7/16/23 23:09, Xin Long wrote:
> >> By not setting IPS_CONFIRMED in tmpl that allows the exp not to be removed
> >> from the hashtable when lookup, we can simplify the exp processing code a
> >> lot in openvswitch conntrack.
> >>
> >> Signed-off-by: Xin Long 
> >> ---
> >>  net/openvswitch/conntrack.c | 78 +
> >>  1 file changed, 10 insertions(+), 68 deletions(-)
> >>
> >> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> >> index 331730fd3580..fa955e892210 100644
> >> --- a/net/openvswitch/conntrack.c
> >> +++ b/net/openvswitch/conntrack.c
> >> @@ -455,45 +455,6 @@ static int ovs_ct_handle_fragments(struct net *net, 
> >> struct sw_flow_key *key,
> >>  return 0;
> >>  }
> >>
> >> -static struct nf_conntrack_expect *
> >> -ovs_ct_expect_find(struct net *net, const struct nf_conntrack_zone *zone,
> >> -   u16 proto, const struct sk_buff *skb)
> >> -{
> >> -struct nf_conntrack_tuple tuple;
> >> -struct nf_conntrack_expect *exp;
> >> -
> >> -if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), proto, net, 
> >> &tuple))
> >> -return NULL;
> >> -
> >> -exp = __nf_ct_expect_find(net, zone, &tuple);
> >> -if (exp) {
> >> -struct nf_conntrack_tuple_hash *h;
> >> -
> >> -/* Delete existing conntrack entry, if it clashes with the
> >> - * expectation.  This can happen since conntrack ALGs do not
> >> - * check for clashes between (new) expectations and existing
> >> - * conntrack entries.  nf_conntrack_in() will check the
> >> - * expectations only if a conntrack entry can not be found,
> >> - * which can lead to OVS finding the expectation (here) in the
> >> - * init direction, but which will not be removed by the
> >> - * nf_conntrack_in() call, if a matching conntrack entry is
> >> - * found instead.  In this case all init direction packets
> >> - * would be reported as new related packets, while reply
> >> - * direction packets would be reported as un-related
> >> - * established packets.
> >> - */
> >> -h = nf_conntrack_find_get(net, zone, &tuple);
> >> -if (h) {
> >> -struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
> >> -
> >> -nf_ct_delete(ct, 0, 0);
> >> -nf_ct_put(ct);
> >> -}
> >> -}
> >> -
> >> -return exp;
> >> -}
> >> -
> >>  /* This replicates logic from nf_conntrack_core.c that is not exported. */
> >>  static enum ip_conntrack_info
> >>  ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h)
> >> @@ -852,36 +813,16 @@ static int ovs_ct_lookup(struct net *net, struct 
> >> sw_flow_key *key,
> >>   const struct ovs_conntrack_info *info,
> >>   struct sk_buff *skb)
> >>  {
> >> -struct nf_conntrack_expect *exp;
> >> -
> >> -/* If we pass an expected packet through nf_conntrack_in() the
> >> - * expectation is typically removed, but the packet could still be
> >> - * lost in upcall processing.  To prevent this from happening we
> >> - * perform an explicit expectation lookup.  Expected connections are
> >> - * always new, and will be passed through conntrack only when they are
> >> - * committed, as it is OK to remove the expectation at that time.
> >> - */
> >> -exp = ovs_ct_expect_find(net, &info->zone, info->family, skb);
> >> -if (exp) {
> >> -u8 state;
> >> -
> >> -/* NOTE: New connections are NATted and Helped only when
> >> - * committed, so we are not calling into NAT here.
> >> - */
> >> -state = OVS_CS_F_TRACKED | OVS_CS_F_NEW | OVS_CS_F_RELATED;
> >> -__ovs_ct_update_key(key, state, &info->zone, exp->master);
> >
> > Hi, Xin, others.
> >
> > Unfortunately, it seems like removal of this code broke the expected 
> > behavior.
> > OVS in userspace expects that SYN packet of a new related FTP connection 
> > will
> > get +new+rel+trk flags, but after this patch we're only getting +rel+trk 
> > and not
> > new.  This is a problem because we need to commit this connection with the 
> > label
> > and we do that for +new packets.  If we can't get +new packet we'll have to 
> > commit
> > every single +rel+trk packet, which doesn't make a lot of sense.  And it's a
> > significant behavior change regardless.
>
> Interestingly enough I see +new+rel+trk packets in cases without SNAT,
> but we can only get +rel+trk in cases with SNAT.  So, this may be just
> a generic conntrack bug somewhere.  At least the behavior seems fairly
> inconsistent.
>
In nf_conntrack, IP_CT_RELATED and IP_CT_NEW do not exist at the same
time. With this patch, we expect OVS_CS_F_RELATED and OVS_CS_F_N

Re: [ovs-dev] [PATCH ovn 3/4] Revert "northd: Don't skip transit switch LSP when creating mcast groups."

2024-06-18 Thread 0-day Robot
Bleep bloop.  Greetings Dumitru Ceara, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: The subject, ': ', is over 70 characters, i.e., 74.
WARNING: The subject summary should end with a dot.
Subject: Revert "northd: Don't skip transit switch LSP when creating mcast 
groups."
Lines checked: 93, Warnings: 2, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 2/4] Revert "ovn-ic: Avoid igmp/mld traffic flooding."

2024-06-18 Thread 0-day Robot
Bleep bloop.  Greetings Dumitru Ceara, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: The subject summary should end with a dot.
Subject: Revert "ovn-ic: Avoid igmp/mld traffic flooding."
Lines checked: 116, Warnings: 1, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 1/4] Revert "IC: Tansit switch don't flood mcast traffic to router ports if matches igmp group."

2024-06-18 Thread 0-day Robot
Bleep bloop.  Greetings Dumitru Ceara, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: The subject, ': ', is over 70 characters, i.e., 91.
WARNING: The subject summary should end with a dot.
Subject: Revert "IC: Tansit switch don't flood mcast traffic to router ports if 
matches igmp group."
Lines checked: 90, Warnings: 2, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 4/4] tests: ic: Add IP multicast test that simulates the ovn-k8s use case.

2024-06-18 Thread Dumitru Ceara
The already existing "interconnection - IGMP/MLD multicast" test
simulated a slightly different topology (still valid though).

Signed-off-by: Dumitru Ceara 
---
 tests/ovn-ic.at | 246 
 1 file changed, 246 insertions(+)

diff --git a/tests/ovn-ic.at b/tests/ovn-ic.at
index 20409f70ac..8497cb1941 100644
--- a/tests/ovn-ic.at
+++ b/tests/ovn-ic.at
@@ -2096,3 +2096,249 @@ OVN_CLEANUP_SBOX([hv2], ["/IGMP Querier enabled without 
a valid IPv4/d
 OVN_CLEANUP_IC([az1],[az2])
 AT_CLEANUP
 ])
+
+OVN_FOR_EACH_NORTHD([
+AT_SETUP([interconnection - IGMP/MLD multicast - TS flood])
+AT_KEYWORDS([IP-multicast])
+
+# Logical network:
+#
+#   AZ1 | AZ2
+#   -
+#   |
+#   | +-- LR2 --- LS2 --- LSP2 (sender)
+#   |/
+# LSP1  --- LS1 --- LR1 --- TS ---
+#   (receiver)  |\
+#   | +-- LR3 --- LS3 --- LSP3 (receiver)
+#
+# LS1, LS2, LS3, TS configured to snoop IP multicast.
+# LR1, LR2, LR3 configured to relay IP multicast.
+# LR1-TS configured to flood IP multicast traffic unconditionally.
+# LR2-TS configured to flood IP multicast traffic unconditionally.
+# LR3-TS configured to flood IP multicast traffic unconditionally.
+
+AT_CAPTURE_FILE([exp])
+AT_CAPTURE_FILE([rcv])
+check_packets() {
+> exp
+> rcv
+if test "$1" = --uniq; then
+sort="sort -u"; shift
+else
+sort=sort
+fi
+for tuple in "$@"; do
+set $tuple; pcap=$1 type=$2
+echo "--- $pcap" | tee -a exp >> rcv
+$sort "$type" >> exp
+$PYTHON "$ovs_srcdir/utilities/ovs-pcap.in" $pcap | $sort >> rcv
+echo | tee -a exp >> rcv
+done
+
+$at_diff exp rcv >/dev/null
+}
+
+ovn_init_ic_db
+ovn_start az1
+ovn_start az2
+
+net_add n1
+
+sim_add hv1
+as hv1
+check ovs-vsctl add-br br-phys
+ovn_az_attach az1 n1 br-phys 192.168.1.1 16
+check ovs-vsctl -- add-port br-int hv1-vif1 \
+-- set interface hv1-vif1 external-ids:iface-id=lsp1 \
+   options:tx_pcap=hv1/vif1-tx.pcap \
+   options:rxq_pcap=hv1/vif1-rx.pcap
+check ovs-vsctl set open . external-ids:ovn-is-interconn=true
+
+sim_add hv2
+as hv2
+check ovs-vsctl add-br br-phys
+ovn_az_attach az2 n1 br-phys 192.168.2.1 16
+check ovs-vsctl -- add-port br-int hv2-vif1 \
+-- set interface hv2-vif1 external-ids:iface-id=lsp2 \
+   options:tx_pcap=hv2/vif1-tx.pcap \
+   options:rxq_pcap=hv2/vif1-rx.pcap
+check ovs-vsctl -- add-port br-int hv2-vif2 \
+-- set interface hv2-vif2 external-ids:iface-id=lsp3 \
+   options:tx_pcap=hv2/vif2-tx.pcap \
+   options:rxq_pcap=hv2/vif2-rx.pcap
+check ovs-vsctl set open . external-ids:ovn-is-interconn=true
+
+AT_CHECK([ovn-ic-nbctl --wait=sb create Transit_Switch name=ts], [0], [ignore])
+check ovn_as az1 ovn-nbctl wait-until logical_switch ts
+check ovn_as az2 ovn-nbctl wait-until logical_switch ts
+
+ovn_as az1
+check ovn-nbctl lr-add lr1 \
+-- lrp-add lr1 lr1-ts 00:00:00:01:00:01 42.42.42.1/24 4242::1/64 \
+-- lrp-add lr1 lr1-ls1 00:00:00:01:01:00 43.43.43.1/24 4343::1/64\
+-- lrp-set-gateway-chassis lr1-ts hv1
+check ovn-nbctl ls-add ls1 \
+-- lsp-add ls1 ls1-lr1 \
+-- lsp-set-addresses ls1-lr1 router \
+-- lsp-set-type ls1-lr1 router \
+-- lsp-set-options ls1-lr1 router-port=lr1-ls1 \
+-- lsp-add ls1 lsp1
+check ovn-nbctl lsp-add ts ts-lr1 \
+-- lsp-set-addresses ts-lr1 router \
+-- lsp-set-type ts-lr1 router \
+-- lsp-set-options ts-lr1 router-port=lr1-ts
+wait_for_ports_up
+
+ovn_as az2
+check ovn-nbctl lr-add lr2 \
+-- lrp-add lr2 lr2-ts 00:00:00:02:00:01 42.42.42.2/24 4242::2/64 \
+-- lrp-add lr2 lr2-ls2 00:00:00:02:01:00 44.44.44.1/24 ::1/64 \
+-- lrp-set-gateway-chassis lr2-ts hv2
+check ovn-nbctl ls-add ls2 \
+-- lsp-add ls2 ls2-lr2 \
+-- lsp-set-addresses ls2-lr2 router \
+-- lsp-set-type ls2-lr2 router \
+-- lsp-set-options ls2-lr2 router-port=lr2-ls2 \
+-- lsp-add ls2 lsp2
+check ovn-nbctl lsp-add ts ts-lr2 \
+-- lsp-set-addresses ts-lr2 router \
+-- lsp-set-type ts-lr2 router \
+-- lsp-set-options ts-lr2 router-port=lr2-ts
+
+check ovn-nbctl lr-add lr3 \
+-- lrp-add lr3 lr3-ts 00:00:00:02:00:02 42.42.42.3/24 4242::3/64 \
+-- lrp-add lr3 lr3-ls3 00:00:00:02:02:00 44.44.45.1/24 4445::1/64 \
+-- lrp-set-gateway-chassis lr3-ts hv2
+check ovn-nbctl ls-add ls3 \
+-- lsp-add ls3 ls3-lr3 \
+-- lsp-set-addresses ls3-lr3 router \
+-- lsp-set-type ls3-lr3 router \
+-- lsp-set-options ls3-lr3 router-port=lr3-ls3 \
+-- lsp-add ls3 lsp3
+check ovn-nbctl lsp-add ts ts-lr3 \
+-- lsp-set-addresses ts-lr3 router \
+-- lsp-set-type ts-lr3 router \
+-- lsp-set-options ts-lr3 router-port=lr3-ts
+
+wait_for_ports_up
+check ovn-ic-nbctl --wait=sb sync
+ovn_a

[ovs-dev] [PATCH ovn 3/4] Revert "northd: Don't skip transit switch LSP when creating mcast groups."

2024-06-18 Thread Dumitru Ceara
This reverts commit 85ca2b75369c9a73f4750d5914666a54ebb3f2e0.

The commit above breaks inter-AZ IP multicast for the case when one of
the multicast receivers is co-located in the same zone as the sender.
In those cases traffic is not correctly forwarded to other receivers
that joined the group in other AZs.

This is often the case in ovn-kubernetes deployments (with IC enabled).
The current "interconnection - IGMP/MLD multicast" unit test failed to
cover such topologies.

CC: Mohammad Heib 
Fixes: 85ca2b75369c ("northd: Don't skip transit switch LSP when creating mcast 
groups.")
Reported-at: https://issues.redhat.com/browse/FDP-656
Signed-off-by: Dumitru Ceara 
---
 northd/northd.c |  8 +++-
 northd/northd.h |  6 --
 tests/ovn-ic.at | 10 --
 3 files changed, 3 insertions(+), 21 deletions(-)

diff --git a/northd/northd.c b/northd/northd.c
index 3c1affb02f..7e474a7b89 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -5434,13 +5434,11 @@ ovn_igmp_group_get_ports(const struct sbrec_igmp_group 
*sb_igmp_group,
 continue;
 }
 
-/* If this is already a port of a router on which relay is enabled
- * and it's not a transit switch to router port, skip it for the
- * group.  Traffic is flooded there anyway.
+/* If this is already a port of a router on which relay is enabled,
+ * skip it for the group. Traffic is flooded there anyway.
  */
 if (port->peer && port->peer->od &&
-port->peer->od->mcast_info.rtr.relay &&
-!ovn_datapath_is_transit_switch(port->od)) {
+port->peer->od->mcast_info.rtr.relay) {
 continue;
 }
 
diff --git a/northd/northd.h b/northd/northd.h
index 146139bebc..fd884c851e 100644
--- a/northd/northd.h
+++ b/northd/northd.h
@@ -362,12 +362,6 @@ ovn_datapath_is_stale(const struct ovn_datapath *od)
 return !od->nbr && !od->nbs;
 };
 
-static inline bool
-ovn_datapath_is_transit_switch(const struct ovn_datapath *od)
-{
-return od->tunnel_key >= OVN_MIN_DP_KEY_GLOBAL;
-}
-
 /* Pipeline stages. */
 
 /* The two purposes for which ovn-northd uses OVN logical datapaths. */
diff --git a/tests/ovn-ic.at b/tests/ovn-ic.at
index 1666e77295..20409f70ac 100644
--- a/tests/ovn-ic.at
+++ b/tests/ovn-ic.at
@@ -2042,20 +2042,10 @@ wait_row_count IGMP_Group 2 address=239.0.1.68
 wait_row_count IGMP_Group 2 address='"ff0a:dead:beef::1"'
 check ovn-nbctl --wait=hv sync
 
-#Validate that Multicast Group contains all registered ports for
-# specific igmp group.
-ts_dp=$(fetch_column datapath_binding _uuid external_ids:name=ts)
-ports=$(fetch_column multicast_group ports name="239.0.1.68" datapath=$ts_dp)
-check test X2 = X$(echo $ports | wc -w)
-
-
 ovn_as az2
 wait_row_count IGMP_Group 2 address=239.0.1.68
 wait_row_count IGMP_Group 2 address='"ff0a:dead:beef::1"'
 check ovn-nbctl --wait=hv sync
-ts_dp=$(fetch_column datapath_binding _uuid external_ids:name=ts)
-ports=$(fetch_column multicast_group ports name="239.0.1.68" datapath=$ts_dp)
-check test X2 = X$(echo $ports | wc -w)
 
 # Send an IP multicast packet from LSP2, it should be forwarded
 # to lsp1 and lsp3.
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 1/4] Revert "IC: Tansit switch don't flood mcast traffic to router ports if matches igmp group."

2024-06-18 Thread Dumitru Ceara
This reverts commit c9163c3046c5650c8b91f60728449483d4282165.

This depends on 85ca2b75369c ("northd: Don't skip transit switch LSP when
creating mcast groups.") which breaks inter-AZ use cases in
ovn-kubernetes deployments and will be reverted.

CC: Mohammad Heib 
Fixes: c9163c3046c5 ("IC: Tansit switch don't flood mcast traffic to router 
ports if matches igmp group.")
Signed-off-by: Dumitru Ceara 
---
 northd/northd.c | 17 +++--
 northd/ovn-northd.8.xml | 14 +++---
 2 files changed, 10 insertions(+), 21 deletions(-)

diff --git a/northd/northd.c b/northd/northd.c
index fd0dd83ba8..2629b0e21b 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -9505,16 +9505,8 @@ build_lswitch_destination_lookup_bmcast(struct 
ovn_datapath *od,
 }
 
 
-/* Ingress table 27: Add IP multicast flows learnt from IGMP/MLD
- * (priority 90).
- *
- * OR, for transit switches:
- *
- * Add IP multicast flows learnt from IGMP/MLD to forward traffic
- * explicitly to the ports that are part of the IGMP/MLD group,
- * and ignore MROUTER Ports.
- * (priority 90).
- */
+/* Ingress table 25: Add IP multicast flows learnt from IGMP/MLD
+ * (priority 90). */
 static void
 build_lswitch_ip_mcast_igmp_mld(struct ovn_igmp_group *igmp_group,
 struct lflow_table *lflows,
@@ -9528,9 +9520,6 @@ build_lswitch_ip_mcast_igmp_mld(struct ovn_igmp_group 
*igmp_group,
 ds_clear(match);
 ds_clear(actions);
 
-bool transit_switch =
-ovn_datapath_is_transit_switch(igmp_group->datapath);
-
 struct mcast_switch_info *mcast_sw_info =
 &igmp_group->datapath->mcast_info.sw;
 uint64_t table_size = mcast_sw_info->table_size;
@@ -9576,7 +9565,7 @@ build_lswitch_ip_mcast_igmp_mld(struct ovn_igmp_group 
*igmp_group,
 }
 
 /* Also flood traffic to all multicast routers with relay enabled. */
-if (mcast_sw_info->flood_relay && !transit_switch) {
+if (mcast_sw_info->flood_relay) {
 ds_put_cstr(actions,
 "clone { "
 "outport = \""MC_MROUTER_FLOOD "\"; "
diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
index 3deaaa142d..91afe22b64 100644
--- a/northd/ovn-northd.8.xml
+++ b/northd/ovn-northd.8.xml
@@ -1956,13 +1956,13 @@ output;
   
 
   
-Priority-90 flows for non-transit switches that forward registered
-IP multicast traffic to their corresponding multicast group, which
-ovn-northd creates based on learnt
- entries.  The flows
-also forward packets to the MC_MROUTER_FLOOD multicast
-group, which ovn-nortdh populates with all the logical
-ports that are connected to logical routers with
+Priority-90 flows that forward registered IP multicast traffic to
+their corresponding multicast group, which ovn-northd
+creates based on learnt 
+entries.  The flows also forward packets to the
+MC_MROUTER_FLOOD multicast group, which
+ovn-nortdh populates with all the logical ports that
+are connected to logical routers with
 :mcast_relay='true'.
   
 
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 2/4] Revert "ovn-ic: Avoid igmp/mld traffic flooding."

2024-06-18 Thread Dumitru Ceara
This reverts commit 4494e0215e3aded99426dfe613bd090d16aced1b.

This depends on 85ca2b75369c ("northd: Don't skip transit switch LSP when
creating mcast groups.") which breaks inter-AZ use cases in
ovn-kubernetes deployments and will be reverted.

CC: Lorenzo Bianconi 
Fixes: 4494e0215e3a ("ovn-ic: Avoid igmp/mld traffic flooding.")
Signed-off-by: Dumitru Ceara 
---
 controller/pinctrl.c |  2 --
 include/ovn/logical-fields.h |  3 ---
 lib/logical-fields.c |  4 
 northd/northd.c  | 11 ---
 4 files changed, 4 insertions(+), 16 deletions(-)

diff --git a/controller/pinctrl.c b/controller/pinctrl.c
index f79ac21552..8adec6179b 100644
--- a/controller/pinctrl.c
+++ b/controller/pinctrl.c
@@ -719,8 +719,6 @@ pinctrl_forward_pkt(struct rconn *swconn, int64_t dp_key,
 put_load(dp_key, MFF_LOG_DATAPATH, 0, 64, &ofpacts);
 put_load(in_port_key, MFF_LOG_INPORT, 0, 32, &ofpacts);
 put_load(out_port_key, MFF_LOG_OUTPORT, 0, 32, &ofpacts);
-/* Avoid re-injecting packet already consumed. */
-put_load(1, MFF_LOG_FLAGS, MLF_IGMP_IGMP_SNOOP_INJECT_BIT, 1, &ofpacts);
 
 struct ofpact_resubmit *resubmit = ofpact_put_RESUBMIT(&ofpacts);
 resubmit->in_port = OFPP_CONTROLLER;
diff --git a/include/ovn/logical-fields.h b/include/ovn/logical-fields.h
index 8854dae7ac..ce79b501cf 100644
--- a/include/ovn/logical-fields.h
+++ b/include/ovn/logical-fields.h
@@ -82,7 +82,6 @@ enum mff_log_flags_bits {
 MLF_LOCALNET_BIT = 15,
 MLF_RX_FROM_TUNNEL_BIT = 16,
 MLF_ICMP_SNAT_BIT = 17,
-MLF_IGMP_IGMP_SNOOP_INJECT_BIT = 18,
 };
 
 /* MFF_LOG_FLAGS_REG flag assignments */
@@ -138,8 +137,6 @@ enum mff_log_flags {
 MLF_RX_FROM_TUNNEL = (1 << MLF_RX_FROM_TUNNEL_BIT),
 
 MLF_ICMP_SNAT = (1 << MLF_ICMP_SNAT_BIT),
-
-MLF_IGMP_IGMP_SNOOP = (1 << MLF_IGMP_IGMP_SNOOP_INJECT_BIT),
 };
 
 /* OVN logical fields
diff --git a/lib/logical-fields.c b/lib/logical-fields.c
index 68892dba50..20219a67ad 100644
--- a/lib/logical-fields.c
+++ b/lib/logical-fields.c
@@ -139,10 +139,6 @@ ovn_init_symtab(struct shash *symtab)
  flags_str);
 snprintf(flags_str, sizeof flags_str, "flags[%d]", MLF_RX_FROM_TUNNEL_BIT);
 expr_symtab_add_subfield(symtab, "flags.tunnel_rx", NULL, flags_str);
-snprintf(flags_str, sizeof flags_str, "flags[%d]",
- MLF_IGMP_IGMP_SNOOP_INJECT_BIT);
-expr_symtab_add_subfield(symtab, "flags.igmp_loopback", NULL,
- flags_str);
 
 /* Connection tracking state. */
 expr_symtab_add_field_scoped(symtab, "ct_mark", MFF_CT_MARK, NULL, false,
diff --git a/northd/northd.c b/northd/northd.c
index 2629b0e21b..3c1affb02f 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -6155,8 +6155,7 @@ build_interconn_mcast_snoop_flows(struct ovn_datapath *od,
 continue;
 }
 /* Punt IGMP traffic to controller. */
-char *match = xasprintf("inport == %s && igmp && "
-"flags.igmp_loopback == 0", op->json_key);
+char *match = xasprintf("inport == %s && igmp", op->json_key);
 ovn_lflow_metered(lflows, od, S_SWITCH_OUT_PRE_LB, 120, match,
   "clone { igmp; }; next;",
   copp_meter_get(COPP_IGMP, od->nbs->copp,
@@ -6165,8 +6164,7 @@ build_interconn_mcast_snoop_flows(struct ovn_datapath *od,
 free(match);
 
 /* Punt MLD traffic to controller. */
-match = xasprintf("inport == %s && (mldv1 || mldv2) && "
-  "flags.igmp_loopback == 0", op->json_key);
+match = xasprintf("inport == %s && (mldv1 || mldv2)", op->json_key);
 ovn_lflow_metered(lflows, od, S_SWITCH_OUT_PRE_LB, 120, match,
   "clone { igmp; }; next;",
   copp_meter_get(COPP_IGMP, od->nbs->copp,
@@ -9429,15 +9427,14 @@ build_lswitch_destination_lookup_bmcast(struct 
ovn_datapath *od,
 ds_put_cstr(actions, "igmp;");
 /* Punt IGMP traffic to controller. */
 ovn_lflow_metered(lflows, od, S_SWITCH_IN_L2_LKUP, 100,
-  "flags.igmp_loopback == 0 && igmp", ds_cstr(actions),
+  "igmp", ds_cstr(actions),
   copp_meter_get(COPP_IGMP, od->nbs->copp,
  meter_groups),
   lflow_ref);
 
 /* Punt MLD traffic to controller. */
 ovn_lflow_metered(lflows, od, S_SWITCH_IN_L2_LKUP, 100,
-  "flags.igmp_loopback == 0 && (mldv1 || mldv2)",
-  ds_cstr(actions),
+  "mldv1 || mldv2", ds_cstr(actions),
   copp_meter_get(COPP_IGMP, od->nbs->copp,
  meter_groups),
   lflow_ref);
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.or

[ovs-dev] [PATCH ovn 0/4] Fix inter-AZ IP-multicast traffic in ovn-kubernetes-like deployments.

2024-06-18 Thread Dumitru Ceara
Out of the patches in the series, the one that introduced the issue is
PATCH 3/4.  However we need to also revert two other patches that depend
on it.  PATCH 4/4 is a test to avoid future regressions.

Dumitru Ceara (4):
  Revert "IC: Tansit switch don't flood mcast traffic to router ports if
matches igmp group."
  Revert "ovn-ic: Avoid igmp/mld traffic flooding."
  Revert "northd: Don't skip transit switch LSP when creating mcast
groups."
  tests: ic: Add IP multicast test that simulates the ovn-k8s use case.

 controller/pinctrl.c |   2 -
 include/ovn/logical-fields.h |   3 -
 lib/logical-fields.c |   4 -
 northd/northd.c  |  36 ++---
 northd/northd.h  |   6 -
 northd/ovn-northd.8.xml  |  14 +-
 tests/ovn-ic.at  | 252 +--
 7 files changed, 261 insertions(+), 56 deletions(-)

-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v2 9/9] selftests: openvswitch: add emit_sample test

2024-06-18 Thread Aaron Conole
Adrián Moreno  writes:

> On Mon, Jun 17, 2024 at 07:18:05AM GMT, Adrián Moreno wrote:
>> On Fri, Jun 14, 2024 at 01:07:33PM GMT, Aaron Conole wrote:
>> > Adrian Moreno  writes:
>> >
>> > > Add a test to verify sampling packets via psample works.
>> > >
>> > > In order to do that, create a subcommand in ovs-dpctl.py to listen to
>> > > on the psample multicast group and print samples.
>> > >
>> > > In order to also test simultaneous sFlow and psample actions and
>> > > packet truncation, add missing parsing support for "userspace" and
>> > > "trunc" actions.
>> >
>> > Maybe split that into a separate patch.  This has a bugfix and 3
>> > features being pushed in.  I know it's already getting long as a series,
>> > so maybe it's okay to fold the userspace attribute bugfix with the parse
>> > support (since it wasn't really usable before).
>> >
>>
>> OK. Sounds reasonable.
>>
>> > > Signed-off-by: Adrian Moreno 
>> > > ---
>> > >  .../selftests/net/openvswitch/openvswitch.sh  |  99 +++-
>> > >  .../selftests/net/openvswitch/ovs-dpctl.py| 112 +-
>> > >  2 files changed, 204 insertions(+), 7 deletions(-)
>> > >
>> > > diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh 
>> > > b/tools/testing/selftests/net/openvswitch/openvswitch.sh
>> > > index 5cae53543849..f6e0ae3f6424 100755
>> > > --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh
>> > > +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh
>> > > @@ -20,7 +20,8 @@ tests="
>> > >  nat_related_v4  ip4-nat-related: ICMP 
>> > > related matches work with SNAT
>> > >  netlink_checks  ovsnl: validate netlink 
>> > > attrs and settings
>> > >  upcall_interfaces   ovs: test the upcall 
>> > > interfaces
>> > > -drop_reason drop: test drop reasons 
>> > > are emitted"
>> > > +drop_reason drop: test drop reasons 
>> > > are emitted
>> > > +emit_sample emit_sample: Sampling 
>> > > packets with psample"
>> > >
>> > >  info() {
>> > >  [ $VERBOSE = 0 ] || echo $*
>> > > @@ -170,6 +171,19 @@ ovs_drop_reason_count()
>> > >  return `echo "$perf_output" | grep "$pattern" | wc -l`
>> > >  }
>> > >
>> > > +ovs_test_flow_fails () {
>> > > +ERR_MSG="Flow actions may not be safe on all matching packets"
>> > > +
>> > > +PRE_TEST=$(dmesg | grep -c "${ERR_MSG}")
>> > > +ovs_add_flow $@ &> /dev/null $@ && return 1
>> > > +POST_TEST=$(dmesg | grep -c "${ERR_MSG}")
>> > > +
>> > > +if [ "$PRE_TEST" == "$POST_TEST" ]; then
>> > > +return 1
>> > > +fi
>> > > +return 0
>> > > +}
>> > > +
>> > >  usage() {
>> > >  echo
>> > >  echo "$0 [OPTIONS] [TEST]..."
>> > > @@ -184,6 +198,89 @@ usage() {
>> > >  exit 1
>> > >  }
>> > >
>> > > +
>> > > +# emit_sample test
>> > > +# - use emit_sample to observe packets
>> > > +test_emit_sample() {
>> > > +sbx_add "test_emit_sample" || return $?
>> > > +
>> > > +# Add a datapath with per-vport dispatching.
>> > > +ovs_add_dp "test_emit_sample" emit_sample -V 2:1 || return 1
>> > > +
>> > > +info "create namespaces"
>> > > +ovs_add_netns_and_veths "test_emit_sample" "emit_sample" \
>> > > +client c0 c1 172.31.110.10/24 -u || return 1
>> > > +ovs_add_netns_and_veths "test_emit_sample" "emit_sample" \
>> > > +server s0 s1 172.31.110.20/24 -u || return 1
>> > > +
>> > > +# Check if emit_sample actions can be configured.
>> > > +ovs_add_flow "test_emit_sample" emit_sample \
>> > > +'in_port(1),eth(),eth_type(0x0806),arp()' 'emit_sample(group=1)'
>> > > +if [ $? == 1 ]; then
>> > > +info "no support for emit_sample - skipping"
>> > > +ovs_exit_sig
>> > > +return $ksft_skip
>> > > +fi
>> > > +
>> > > +ovs_del_flows "test_emit_sample" emit_sample
>> > > +
>> > > +# Allow ARP
>> > > +ovs_add_flow "test_emit_sample" emit_sample \
>> > > +'in_port(1),eth(),eth_type(0x0806),arp()' '2' || return 
>> > > 1
>> > > +ovs_add_flow "test_emit_sample" emit_sample \
>> > > +'in_port(2),eth(),eth_type(0x0806),arp()' '1' || return 
>> > > 1
>> > > +
>> > > +# Test action verification.
>> > > +OLDIFS=$IFS
>> > > +IFS='*'
>> > > +min_key='in_port(1),eth(),eth_type(0x0800),ipv4()'
>> > > +for testcase in \
>> > > +"cookie to 
>> > > large"*"emit_sample(group=1,cookie=1615141312111009080706050403020100)" \
>> > > +"no group with cookie"*"emit_sample(cookie=abcd)" \
>> > > +"no group"*"sample()";
>> > > +do
>> > > +set -- $testcase;
>> > > +ovs_test_flow_fa

Re: [ovs-dev] [PATCH v2] selftests: openvswitch: Set value to nla flags.

2024-06-18 Thread Aaron Conole
Adrian Moreno  writes:

> Netlink flags, although they don't have payload at the netlink level,
> are represented as having "True" as value in pyroute2.
>
> Without it, trying to add a flow with a flag-type action (e.g: pop_vlan)
> fails with the following traceback:
>
> Traceback (most recent call last):
>   File "[...]/ovs-dpctl.py", line 2498, in 
> sys.exit(main(sys.argv))
>  ^^
>   File "[...]/ovs-dpctl.py", line 2487, in main
> ovsflow.add_flow(rep["dpifindex"], flow)
>   File "[...]/ovs-dpctl.py", line 2136, in add_flow
> reply = self.nlm_request(
> ^
>   File "[...]/pyroute2/netlink/nlsocket.py", line 822, in nlm_request
> return tuple(self._genlm_request(*argv, **kwarg))
>  ^^^
>   File "[...]/pyroute2/netlink/generic/__init__.py", line 126, in
> nlm_request
> return tuple(super().nlm_request(*argv, **kwarg))
>^^
>   File "[...]/pyroute2/netlink/nlsocket.py", line 1124, in nlm_request
> self.put(msg, msg_type, msg_flags, msg_seq=msg_seq)
>   File "[...]/pyroute2/netlink/nlsocket.py", line 389, in put
> self.sendto_gate(msg, addr)
>   File "[...]/pyroute2/netlink/nlsocket.py", line 1056, in sendto_gate
> msg.encode()
>   File "[...]/pyroute2/netlink/__init__.py", line 1245, in encode
> offset = self.encode_nlas(offset)
>  
>   File "[...]/pyroute2/netlink/__init__.py", line 1560, in encode_nlas
> nla_instance.setvalue(cell[1])
>   File "[...]/pyroute2/netlink/__init__.py", line 1265, in setvalue
> nlv.setvalue(nla_tuple[1])
>  ~^^^
> IndexError: list index out of range
>
> Signed-off-by: Adrian Moreno 
> ---

Acked-by: Aaron Conole 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next 3/3] openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack

2024-06-18 Thread Ilya Maximets
On 6/17/24 22:10, Ilya Maximets wrote:
> On 7/16/23 23:09, Xin Long wrote:
>> By not setting IPS_CONFIRMED in tmpl that allows the exp not to be removed
>> from the hashtable when lookup, we can simplify the exp processing code a
>> lot in openvswitch conntrack.
>>
>> Signed-off-by: Xin Long 
>> ---
>>  net/openvswitch/conntrack.c | 78 +
>>  1 file changed, 10 insertions(+), 68 deletions(-)
>>
>> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
>> index 331730fd3580..fa955e892210 100644
>> --- a/net/openvswitch/conntrack.c
>> +++ b/net/openvswitch/conntrack.c
>> @@ -455,45 +455,6 @@ static int ovs_ct_handle_fragments(struct net *net, 
>> struct sw_flow_key *key,
>>  return 0;
>>  }
>>  
>> -static struct nf_conntrack_expect *
>> -ovs_ct_expect_find(struct net *net, const struct nf_conntrack_zone *zone,
>> -   u16 proto, const struct sk_buff *skb)
>> -{
>> -struct nf_conntrack_tuple tuple;
>> -struct nf_conntrack_expect *exp;
>> -
>> -if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), proto, net, 
>> &tuple))
>> -return NULL;
>> -
>> -exp = __nf_ct_expect_find(net, zone, &tuple);
>> -if (exp) {
>> -struct nf_conntrack_tuple_hash *h;
>> -
>> -/* Delete existing conntrack entry, if it clashes with the
>> - * expectation.  This can happen since conntrack ALGs do not
>> - * check for clashes between (new) expectations and existing
>> - * conntrack entries.  nf_conntrack_in() will check the
>> - * expectations only if a conntrack entry can not be found,
>> - * which can lead to OVS finding the expectation (here) in the
>> - * init direction, but which will not be removed by the
>> - * nf_conntrack_in() call, if a matching conntrack entry is
>> - * found instead.  In this case all init direction packets
>> - * would be reported as new related packets, while reply
>> - * direction packets would be reported as un-related
>> - * established packets.
>> - */
>> -h = nf_conntrack_find_get(net, zone, &tuple);
>> -if (h) {
>> -struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
>> -
>> -nf_ct_delete(ct, 0, 0);
>> -nf_ct_put(ct);
>> -}
>> -}
>> -
>> -return exp;
>> -}
>> -
>>  /* This replicates logic from nf_conntrack_core.c that is not exported. */
>>  static enum ip_conntrack_info
>>  ovs_ct_get_info(const struct nf_conntrack_tuple_hash *h)
>> @@ -852,36 +813,16 @@ static int ovs_ct_lookup(struct net *net, struct 
>> sw_flow_key *key,
>>   const struct ovs_conntrack_info *info,
>>   struct sk_buff *skb)
>>  {
>> -struct nf_conntrack_expect *exp;
>> -
>> -/* If we pass an expected packet through nf_conntrack_in() the
>> - * expectation is typically removed, but the packet could still be
>> - * lost in upcall processing.  To prevent this from happening we
>> - * perform an explicit expectation lookup.  Expected connections are
>> - * always new, and will be passed through conntrack only when they are
>> - * committed, as it is OK to remove the expectation at that time.
>> - */
>> -exp = ovs_ct_expect_find(net, &info->zone, info->family, skb);
>> -if (exp) {
>> -u8 state;
>> -
>> -/* NOTE: New connections are NATted and Helped only when
>> - * committed, so we are not calling into NAT here.
>> - */
>> -state = OVS_CS_F_TRACKED | OVS_CS_F_NEW | OVS_CS_F_RELATED;
>> -__ovs_ct_update_key(key, state, &info->zone, exp->master);
> 
> Hi, Xin, others.
> 
> Unfortunately, it seems like removal of this code broke the expected behavior.
> OVS in userspace expects that SYN packet of a new related FTP connection will
> get +new+rel+trk flags, but after this patch we're only getting +rel+trk and 
> not
> new.  This is a problem because we need to commit this connection with the 
> label
> and we do that for +new packets.  If we can't get +new packet we'll have to 
> commit
> every single +rel+trk packet, which doesn't make a lot of sense.  And it's a
> significant behavior change regardless.

Interestingly enough I see +new+rel+trk packets in cases without SNAT,
but we can only get +rel+trk in cases with SNAT.  So, this may be just
a generic conntrack bug somewhere.  At least the behavior seems fairly
inconsistent.

> 
> Could you, please, take a look?
> 
> The issue can be reproduced by running check-kernel tests in OVS repo.
> 'FTP SNAT orig tuple' tests fail 100% of the time.
> 
> Best regards, Ilya Maximets.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v2 7/9] net: openvswitch: do not notify drops inside sample

2024-06-18 Thread Adrián Moreno
On Tue, Jun 18, 2024 at 12:22:23PM GMT, Ilya Maximets wrote:
> On 6/18/24 09:00, Adrián Moreno wrote:
> > On Mon, Jun 17, 2024 at 02:10:37PM GMT, Ilya Maximets wrote:
> >> On 6/17/24 13:55, Ilya Maximets wrote:
> >>> On 6/3/24 20:56, Adrian Moreno wrote:
>  The OVS_ACTION_ATTR_SAMPLE action is, in essence,
>  observability-oriented.
> 
>  Apart from some corner case in which it's used a replacement of clone()
>  for old kernels, it's really only used for sFlow, IPFIX and now,
>  local emit_sample.
> 
>  With this in mind, it doesn't make much sense to report
>  OVS_DROP_LAST_ACTION inside sample actions.
> 
>  For instance, if the flow:
> 
>    actions:sample(..,emit_sample(..)),2
> 
>  triggers a OVS_DROP_LAST_ACTION skb drop event, it would be extremely
>  confusing for users since the packet did reach its destination.
> 
>  This patch makes internal action execution silently consume the skb
>  instead of notifying a drop for this case.
> 
>  Unfortunately, this patch does not remove all potential sources of
>  confusion since, if the sample action itself is the last action, e.g:
> 
>  actions:sample(..,emit_sample(..))
> 
>  we actually _should_ generate a OVS_DROP_LAST_ACTION event, but we 
>  aren't.
> 
>  Sadly, this case is difficult to solve without breaking the
>  optimization by which the skb is not cloned on last sample actions.
>  But, given explicit drop actions are now supported, OVS can just add one
>  after the last sample() and rewrite the flow as:
> 
>  actions:sample(..,emit_sample(..)),drop
> 
>  Signed-off-by: Adrian Moreno 
>  ---
>   net/openvswitch/actions.c | 13 +++--
>   1 file changed, 11 insertions(+), 2 deletions(-)
> 
>  diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
>  index 33f6d93ba5e4..54fc1abcff95 100644
>  --- a/net/openvswitch/actions.c
>  +++ b/net/openvswitch/actions.c
>  @@ -82,6 +82,15 @@ static struct action_fifo __percpu *action_fifos;
>   static struct action_flow_keys __percpu *flow_keys;
>   static DEFINE_PER_CPU(int, exec_actions_level);
> 
>  +static inline void ovs_drop_skb_last_action(struct sk_buff *skb)
>  +{
>  +/* Do not emit packet drops inside sample(). */
>  +if (OVS_CB(skb)->probability)
>  +consume_skb(skb);
>  +else
>  +ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
>  +}
>  +
>   /* Make a clone of the 'key', using the pre-allocated percpu 'flow_keys'
>    * space. Return NULL if out of key spaces.
>    */
>  @@ -1061,7 +1070,7 @@ static int sample(struct datapath *dp, struct 
>  sk_buff *skb,
>   if ((arg->probability != U32_MAX) &&
>   (!arg->probability || get_random_u32() > arg->probability)) 
>  {
>   if (last)
>  -ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
>  +ovs_drop_skb_last_action(skb);
> >>
> >> Always consuming the skb at this point makes sense, since having smaple()
> >> as a last action is a reasonable thing to have.  But this looks more like
> >> a fix for the original drop reason patch set.
> >>
> >
> > I don't think consuming the skb at this point makes sense. It was very
> > intentionally changed to a drop since a very common use-case for
> > sampling is drop-sampling, i.e: replacing an empty action list (that
> > triggers OVS_DROP_LAST_ACTION) with a sample(emit_sample()). Ideally,
> > that replacement should not have any effect on the number of
> > OVS_DROP_LAST_ACTION being reported as the packets are being treated in
> > the same way (only observed in one case).
> >
> >
>   return 0;
>   }
> 
>  @@ -1579,7 +1588,7 @@ static int do_execute_actions(struct datapath *dp, 
>  struct sk_buff *skb,
>   }
>   }
> 
>  -ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
>  +ovs_drop_skb_last_action(skb);
> >>>
> >>> I don't think I agree with this one.  If we have a sample() action with
> >>> a lot of different actions inside and we reached the end while the last
> >>> action didn't consume the skb, then we should report that.  E.g.
> >>> "sample(emit_sample(),push_vlan(),set(eth())),2"  should report that the
> >>> cloned skb was dropped.  "sample(push_vlan(),emit_sample())" should not.
> >>>
> >
> > What is the use case for such action list? Having an action branch
> > executed randomly doesn't make sense to me if it's not some
> > observability thing (which IMHO should not trigger drops).
>
> It is exactly my point.  A list of actions that doesn't end is some sort
> of a terminal action (output, drop, etc) does not make a lot of sense and
> hence should be signaled as an unexpected dr

Re: [ovs-dev] [PATCH net-next v2 7/9] net: openvswitch: do not notify drops inside sample

2024-06-18 Thread Ilya Maximets
On 6/18/24 09:00, Adrián Moreno wrote:
> On Mon, Jun 17, 2024 at 02:10:37PM GMT, Ilya Maximets wrote:
>> On 6/17/24 13:55, Ilya Maximets wrote:
>>> On 6/3/24 20:56, Adrian Moreno wrote:
 The OVS_ACTION_ATTR_SAMPLE action is, in essence,
 observability-oriented.

 Apart from some corner case in which it's used a replacement of clone()
 for old kernels, it's really only used for sFlow, IPFIX and now,
 local emit_sample.

 With this in mind, it doesn't make much sense to report
 OVS_DROP_LAST_ACTION inside sample actions.

 For instance, if the flow:

   actions:sample(..,emit_sample(..)),2

 triggers a OVS_DROP_LAST_ACTION skb drop event, it would be extremely
 confusing for users since the packet did reach its destination.

 This patch makes internal action execution silently consume the skb
 instead of notifying a drop for this case.

 Unfortunately, this patch does not remove all potential sources of
 confusion since, if the sample action itself is the last action, e.g:

 actions:sample(..,emit_sample(..))

 we actually _should_ generate a OVS_DROP_LAST_ACTION event, but we aren't.

 Sadly, this case is difficult to solve without breaking the
 optimization by which the skb is not cloned on last sample actions.
 But, given explicit drop actions are now supported, OVS can just add one
 after the last sample() and rewrite the flow as:

 actions:sample(..,emit_sample(..)),drop

 Signed-off-by: Adrian Moreno 
 ---
  net/openvswitch/actions.c | 13 +++--
  1 file changed, 11 insertions(+), 2 deletions(-)

 diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
 index 33f6d93ba5e4..54fc1abcff95 100644
 --- a/net/openvswitch/actions.c
 +++ b/net/openvswitch/actions.c
 @@ -82,6 +82,15 @@ static struct action_fifo __percpu *action_fifos;
  static struct action_flow_keys __percpu *flow_keys;
  static DEFINE_PER_CPU(int, exec_actions_level);

 +static inline void ovs_drop_skb_last_action(struct sk_buff *skb)
 +{
 +  /* Do not emit packet drops inside sample(). */
 +  if (OVS_CB(skb)->probability)
 +  consume_skb(skb);
 +  else
 +  ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
 +}
 +
  /* Make a clone of the 'key', using the pre-allocated percpu 'flow_keys'
   * space. Return NULL if out of key spaces.
   */
 @@ -1061,7 +1070,7 @@ static int sample(struct datapath *dp, struct 
 sk_buff *skb,
if ((arg->probability != U32_MAX) &&
(!arg->probability || get_random_u32() > arg->probability)) {
if (last)
 -  ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
 +  ovs_drop_skb_last_action(skb);
>>
>> Always consuming the skb at this point makes sense, since having smaple()
>> as a last action is a reasonable thing to have.  But this looks more like
>> a fix for the original drop reason patch set.
>>
> 
> I don't think consuming the skb at this point makes sense. It was very
> intentionally changed to a drop since a very common use-case for
> sampling is drop-sampling, i.e: replacing an empty action list (that
> triggers OVS_DROP_LAST_ACTION) with a sample(emit_sample()). Ideally,
> that replacement should not have any effect on the number of
> OVS_DROP_LAST_ACTION being reported as the packets are being treated in
> the same way (only observed in one case).
> 
> 
return 0;
}

 @@ -1579,7 +1588,7 @@ static int do_execute_actions(struct datapath *dp, 
 struct sk_buff *skb,
}
}

 -  ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
 +  ovs_drop_skb_last_action(skb);
>>>
>>> I don't think I agree with this one.  If we have a sample() action with
>>> a lot of different actions inside and we reached the end while the last
>>> action didn't consume the skb, then we should report that.  E.g.
>>> "sample(emit_sample(),push_vlan(),set(eth())),2"  should report that the
>>> cloned skb was dropped.  "sample(push_vlan(),emit_sample())" should not.
>>>
> 
> What is the use case for such action list? Having an action branch
> executed randomly doesn't make sense to me if it's not some
> observability thing (which IMHO should not trigger drops).

It is exactly my point.  A list of actions that doesn't end is some sort
of a terminal action (output, drop, etc) does not make a lot of sense and
hence should be signaled as an unexpected drop, so users can re-check the
pipeline in case they missed the terminal action somehow.

> 
>>> The only actions that are actually consuming the skb are "output",
>>> "userspace", "recirc" and now "emit_sample".  "output" and "recirc" are
>>> consuming the skb "naturally" by stealing it when it is the last action.
>>> "userspace" has an explicit check to consume the skb if it is the la

Re: [ovs-dev] [PATCH net-next v2 5/9] net: openvswitch: add emit_sample action

2024-06-18 Thread Ilya Maximets
On 6/18/24 11:47, Ilya Maximets wrote:
> On 6/18/24 09:33, Adrián Moreno wrote:
>> On Mon, Jun 17, 2024 at 12:44:45PM GMT, Ilya Maximets wrote:
>>> On 6/3/24 20:56, Adrian Moreno wrote:
 Add support for a new action: emit_sample.

 This action accepts a u32 group id and a variable-length cookie and uses
 the psample multicast group to make the packet available for
 observability.

 The maximum length of the user-defined cookie is set to 16, same as
 tc_cookie, to discourage using cookies that will not be offloadable.

 Signed-off-by: Adrian Moreno 
 ---
  Documentation/netlink/specs/ovs_flow.yaml | 17 
  include/uapi/linux/openvswitch.h  | 25 
  net/openvswitch/actions.c | 50 +++
  net/openvswitch/flow_netlink.c| 33 ++-
  4 files changed, 124 insertions(+), 1 deletion(-)
>>>
>>> Some nits below, beside ones already mentioned.
>>>
>>
>> Thanks, Ilya.
>>

 diff --git a/Documentation/netlink/specs/ovs_flow.yaml 
 b/Documentation/netlink/specs/ovs_flow.yaml
 index 4fdfc6b5cae9..a7ab5593a24f 100644
 --- a/Documentation/netlink/specs/ovs_flow.yaml
 +++ b/Documentation/netlink/specs/ovs_flow.yaml
 @@ -727,6 +727,12 @@ attribute-sets:
  name: dec-ttl
  type: nest
  nested-attributes: dec-ttl-attrs
 +  -
 +name: emit-sample
 +type: nest
 +nested-attributes: emit-sample-attrs
 +doc: |
 +  Sends a packet sample to psample for external observation.
-
  name: tunnel-key-attrs
  enum-name: ovs-tunnel-key-attr
 @@ -938,6 +944,17 @@ attribute-sets:
-
  name: gbp
  type: u32
 +  -
 +name: emit-sample-attrs
 +enum-name: ovs-emit-sample-attr
 +name-prefix: ovs-emit-sample-attr-
 +attributes:
 +  -
 +name: group
 +type: u32
 +  -
 +name: cookie
 +type: binary

  operations:
name-prefix: ovs-flow-cmd-
 diff --git a/include/uapi/linux/openvswitch.h 
 b/include/uapi/linux/openvswitch.h
 index efc82c318fa2..a0e9dde0584a 100644
 --- a/include/uapi/linux/openvswitch.h
 +++ b/include/uapi/linux/openvswitch.h
 @@ -914,6 +914,30 @@ struct check_pkt_len_arg {
  };
  #endif

 +#define OVS_EMIT_SAMPLE_COOKIE_MAX_SIZE 16
 +/**
 + * enum ovs_emit_sample_attr - Attributes for %OVS_ACTION_ATTR_EMIT_SAMPLE
 + * action.
 + *
 + * @OVS_EMIT_SAMPLE_ATTR_GROUP: 32-bit number to identify the source of 
 the
 + * sample.
 + * @OVS_EMIT_SAMPLE_ATTR_COOKIE: A variable-length binary cookie that 
 contains
 + * user-defined metadata. The maximum length is 16 bytes.
>>>
>>> s/16/OVS_EMIT_SAMPLE_COOKIE_MAX_SIZE/
>>>
 + *
 + * Sends the packet to the psample multicast group with the specified 
 group and
 + * cookie. It is possible to combine this action with the
 + * %OVS_ACTION_ATTR_TRUNC action to limit the size of the packet being 
 emitted.
 + */
 +enum ovs_emit_sample_attr {
 +  OVS_EMIT_SAMPLE_ATTR_UNPSEC,
 +  OVS_EMIT_SAMPLE_ATTR_GROUP, /* u32 number. */
 +  OVS_EMIT_SAMPLE_ATTR_COOKIE,/* Optional, user specified cookie. */
 +  __OVS_EMIT_SAMPLE_ATTR_MAX
 +};
 +
 +#define OVS_EMIT_SAMPLE_ATTR_MAX (__OVS_EMIT_SAMPLE_ATTR_MAX - 1)
 +
 +
  /**
   * enum ovs_action_attr - Action types.
   *
 @@ -1004,6 +1028,7 @@ enum ovs_action_attr {
OVS_ACTION_ATTR_ADD_MPLS, /* struct ovs_action_add_mpls. */
OVS_ACTION_ATTR_DEC_TTL,  /* Nested OVS_DEC_TTL_ATTR_*. */
OVS_ACTION_ATTR_DROP, /* u32 error code. */
 +  OVS_ACTION_ATTR_EMIT_SAMPLE,  /* Nested OVS_EMIT_SAMPLE_ATTR_*. */

__OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
   * from userspace. */
 diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
 index 964225580824..3b4dba0ded59 100644
 --- a/net/openvswitch/actions.c
 +++ b/net/openvswitch/actions.c
 @@ -24,6 +24,11 @@
  #include 
  #include 
  #include 
 +
 +#if IS_ENABLED(CONFIG_PSAMPLE)
 +#include 
 +#endif
 +
  #include 

  #include "datapath.h"
 @@ -1299,6 +1304,46 @@ static int execute_dec_ttl(struct sk_buff *skb, 
 struct sw_flow_key *key)
return 0;
  }

 +static int execute_emit_sample(struct datapath *dp, struct sk_buff *skb,
 + const struct sw_flow_key *key,
 + const struct nlattr *attr)
 +{
 +#if IS_ENABLED(CONFIG_PSAMPLE)
 +  struct psample_group psample_group = {};
 +  struct psample_metadata md = {};
 +  struct vport *input_vport;

Re: [ovs-dev] [PATCH net-next v2 5/9] net: openvswitch: add emit_sample action

2024-06-18 Thread Ilya Maximets
On 6/18/24 09:33, Adrián Moreno wrote:
> On Mon, Jun 17, 2024 at 12:44:45PM GMT, Ilya Maximets wrote:
>> On 6/3/24 20:56, Adrian Moreno wrote:
>>> Add support for a new action: emit_sample.
>>>
>>> This action accepts a u32 group id and a variable-length cookie and uses
>>> the psample multicast group to make the packet available for
>>> observability.
>>>
>>> The maximum length of the user-defined cookie is set to 16, same as
>>> tc_cookie, to discourage using cookies that will not be offloadable.
>>>
>>> Signed-off-by: Adrian Moreno 
>>> ---
>>>  Documentation/netlink/specs/ovs_flow.yaml | 17 
>>>  include/uapi/linux/openvswitch.h  | 25 
>>>  net/openvswitch/actions.c | 50 +++
>>>  net/openvswitch/flow_netlink.c| 33 ++-
>>>  4 files changed, 124 insertions(+), 1 deletion(-)
>>
>> Some nits below, beside ones already mentioned.
>>
> 
> Thanks, Ilya.
> 
>>>
>>> diff --git a/Documentation/netlink/specs/ovs_flow.yaml 
>>> b/Documentation/netlink/specs/ovs_flow.yaml
>>> index 4fdfc6b5cae9..a7ab5593a24f 100644
>>> --- a/Documentation/netlink/specs/ovs_flow.yaml
>>> +++ b/Documentation/netlink/specs/ovs_flow.yaml
>>> @@ -727,6 +727,12 @@ attribute-sets:
>>>  name: dec-ttl
>>>  type: nest
>>>  nested-attributes: dec-ttl-attrs
>>> +  -
>>> +name: emit-sample
>>> +type: nest
>>> +nested-attributes: emit-sample-attrs
>>> +doc: |
>>> +  Sends a packet sample to psample for external observation.
>>>-
>>>  name: tunnel-key-attrs
>>>  enum-name: ovs-tunnel-key-attr
>>> @@ -938,6 +944,17 @@ attribute-sets:
>>>-
>>>  name: gbp
>>>  type: u32
>>> +  -
>>> +name: emit-sample-attrs
>>> +enum-name: ovs-emit-sample-attr
>>> +name-prefix: ovs-emit-sample-attr-
>>> +attributes:
>>> +  -
>>> +name: group
>>> +type: u32
>>> +  -
>>> +name: cookie
>>> +type: binary
>>>
>>>  operations:
>>>name-prefix: ovs-flow-cmd-
>>> diff --git a/include/uapi/linux/openvswitch.h 
>>> b/include/uapi/linux/openvswitch.h
>>> index efc82c318fa2..a0e9dde0584a 100644
>>> --- a/include/uapi/linux/openvswitch.h
>>> +++ b/include/uapi/linux/openvswitch.h
>>> @@ -914,6 +914,30 @@ struct check_pkt_len_arg {
>>>  };
>>>  #endif
>>>
>>> +#define OVS_EMIT_SAMPLE_COOKIE_MAX_SIZE 16
>>> +/**
>>> + * enum ovs_emit_sample_attr - Attributes for %OVS_ACTION_ATTR_EMIT_SAMPLE
>>> + * action.
>>> + *
>>> + * @OVS_EMIT_SAMPLE_ATTR_GROUP: 32-bit number to identify the source of the
>>> + * sample.
>>> + * @OVS_EMIT_SAMPLE_ATTR_COOKIE: A variable-length binary cookie that 
>>> contains
>>> + * user-defined metadata. The maximum length is 16 bytes.
>>
>> s/16/OVS_EMIT_SAMPLE_COOKIE_MAX_SIZE/
>>
>>> + *
>>> + * Sends the packet to the psample multicast group with the specified 
>>> group and
>>> + * cookie. It is possible to combine this action with the
>>> + * %OVS_ACTION_ATTR_TRUNC action to limit the size of the packet being 
>>> emitted.
>>> + */
>>> +enum ovs_emit_sample_attr {
>>> +   OVS_EMIT_SAMPLE_ATTR_UNPSEC,
>>> +   OVS_EMIT_SAMPLE_ATTR_GROUP, /* u32 number. */
>>> +   OVS_EMIT_SAMPLE_ATTR_COOKIE,/* Optional, user specified cookie. */
>>> +   __OVS_EMIT_SAMPLE_ATTR_MAX
>>> +};
>>> +
>>> +#define OVS_EMIT_SAMPLE_ATTR_MAX (__OVS_EMIT_SAMPLE_ATTR_MAX - 1)
>>> +
>>> +
>>>  /**
>>>   * enum ovs_action_attr - Action types.
>>>   *
>>> @@ -1004,6 +1028,7 @@ enum ovs_action_attr {
>>> OVS_ACTION_ATTR_ADD_MPLS, /* struct ovs_action_add_mpls. */
>>> OVS_ACTION_ATTR_DEC_TTL,  /* Nested OVS_DEC_TTL_ATTR_*. */
>>> OVS_ACTION_ATTR_DROP, /* u32 error code. */
>>> +   OVS_ACTION_ATTR_EMIT_SAMPLE,  /* Nested OVS_EMIT_SAMPLE_ATTR_*. */
>>>
>>> __OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
>>>* from userspace. */
>>> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
>>> index 964225580824..3b4dba0ded59 100644
>>> --- a/net/openvswitch/actions.c
>>> +++ b/net/openvswitch/actions.c
>>> @@ -24,6 +24,11 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +
>>> +#if IS_ENABLED(CONFIG_PSAMPLE)
>>> +#include 
>>> +#endif
>>> +
>>>  #include 
>>>
>>>  #include "datapath.h"
>>> @@ -1299,6 +1304,46 @@ static int execute_dec_ttl(struct sk_buff *skb, 
>>> struct sw_flow_key *key)
>>> return 0;
>>>  }
>>>
>>> +static int execute_emit_sample(struct datapath *dp, struct sk_buff *skb,
>>> +  const struct sw_flow_key *key,
>>> +  const struct nlattr *attr)
>>> +{
>>> +#if IS_ENABLED(CONFIG_PSAMPLE)
>>> +   struct psample_group psample_group = {};
>>> +   struct psample_metadata md = {};
>>> +   struct vport *input_vport;
>>> +   const struct nlattr *a;
>>> +   int rem;
>>> +
>>> +   for (a = nla_data(attr), rem = nla_len(attr); rem > 0;
>>> +a = nla_next(a, &rem)) {
>>
>> Since 

Re: [ovs-dev] [PATCH net-next v2 2/9] net: sched: act_sample: add action cookie to sample

2024-06-18 Thread Ilya Maximets
On 6/18/24 09:38, Adrián Moreno wrote:
> On Mon, Jun 17, 2024 at 12:00:04PM GMT, Ilya Maximets wrote:
>> On 6/3/24 20:56, Adrian Moreno wrote:
>>> If the action has a user_cookie, pass it along to the sample so it can
>>> be easily identified.
>>>
>>> Signed-off-by: Adrian Moreno 
>>> ---
>>>  net/sched/act_sample.c | 12 
>>>  1 file changed, 12 insertions(+)
>>>
>>> diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c
>>> index a69b53d54039..5c3f86ec964a 100644
>>> --- a/net/sched/act_sample.c
>>> +++ b/net/sched/act_sample.c
>>> @@ -165,9 +165,11 @@ TC_INDIRECT_SCOPE int tcf_sample_act(struct sk_buff 
>>> *skb,
>>>  const struct tc_action *a,
>>>  struct tcf_result *res)
>>>  {
>>> +   u8 cookie_data[TC_COOKIE_MAX_SIZE] = {};
>>
>> Is it necessary to initialize these 16 bytes on every call?
>> Might be expensive.  We're passing the data length around,
>> so the uninitialized parts should not be accessed.
>>
> 
> They "should" not, indeed. I was just trying to be extra careful.
> Are you worried TC_COOKIE_MAX_SIZE could grow or the cycles needed to
> clear the current 16 bytes?

I'm assuming that any extra cycles spent per packet are undesirable,
so should be avoided, if possible.  Even if we save 1-2 cycles per
packet, it's a lot when we talk about millions of packets per second.

In this particular case, it seems, we do not sacrifice anything, so
it's just a couple of cycles back for free.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v2 9/9] selftests: openvswitch: add emit_sample test

2024-06-18 Thread Adrián Moreno
On Mon, Jun 17, 2024 at 07:18:05AM GMT, Adrián Moreno wrote:
> On Fri, Jun 14, 2024 at 01:07:33PM GMT, Aaron Conole wrote:
> > Adrian Moreno  writes:
> >
> > > Add a test to verify sampling packets via psample works.
> > >
> > > In order to do that, create a subcommand in ovs-dpctl.py to listen to
> > > on the psample multicast group and print samples.
> > >
> > > In order to also test simultaneous sFlow and psample actions and
> > > packet truncation, add missing parsing support for "userspace" and
> > > "trunc" actions.
> >
> > Maybe split that into a separate patch.  This has a bugfix and 3
> > features being pushed in.  I know it's already getting long as a series,
> > so maybe it's okay to fold the userspace attribute bugfix with the parse
> > support (since it wasn't really usable before).
> >
>
> OK. Sounds reasonable.
>
> > > Signed-off-by: Adrian Moreno 
> > > ---
> > >  .../selftests/net/openvswitch/openvswitch.sh  |  99 +++-
> > >  .../selftests/net/openvswitch/ovs-dpctl.py| 112 +-
> > >  2 files changed, 204 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh 
> > > b/tools/testing/selftests/net/openvswitch/openvswitch.sh
> > > index 5cae53543849..f6e0ae3f6424 100755
> > > --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh
> > > +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh
> > > @@ -20,7 +20,8 @@ tests="
> > >   nat_related_v4  ip4-nat-related: ICMP related 
> > > matches work with SNAT
> > >   netlink_checks  ovsnl: validate netlink attrs 
> > > and settings
> > >   upcall_interfaces   ovs: test the upcall interfaces
> > > - drop_reason drop: test drop reasons are 
> > > emitted"
> > > + drop_reason drop: test drop reasons are 
> > > emitted
> > > + emit_sample emit_sample: Sampling packets 
> > > with psample"
> > >
> > >  info() {
> > >  [ $VERBOSE = 0 ] || echo $*
> > > @@ -170,6 +171,19 @@ ovs_drop_reason_count()
> > >   return `echo "$perf_output" | grep "$pattern" | wc -l`
> > >  }
> > >
> > > +ovs_test_flow_fails () {
> > > + ERR_MSG="Flow actions may not be safe on all matching packets"
> > > +
> > > + PRE_TEST=$(dmesg | grep -c "${ERR_MSG}")
> > > + ovs_add_flow $@ &> /dev/null $@ && return 1
> > > + POST_TEST=$(dmesg | grep -c "${ERR_MSG}")
> > > +
> > > + if [ "$PRE_TEST" == "$POST_TEST" ]; then
> > > + return 1
> > > + fi
> > > + return 0
> > > +}
> > > +
> > >  usage() {
> > >   echo
> > >   echo "$0 [OPTIONS] [TEST]..."
> > > @@ -184,6 +198,89 @@ usage() {
> > >   exit 1
> > >  }
> > >
> > > +
> > > +# emit_sample test
> > > +# - use emit_sample to observe packets
> > > +test_emit_sample() {
> > > + sbx_add "test_emit_sample" || return $?
> > > +
> > > + # Add a datapath with per-vport dispatching.
> > > + ovs_add_dp "test_emit_sample" emit_sample -V 2:1 || return 1
> > > +
> > > + info "create namespaces"
> > > + ovs_add_netns_and_veths "test_emit_sample" "emit_sample" \
> > > + client c0 c1 172.31.110.10/24 -u || return 1
> > > + ovs_add_netns_and_veths "test_emit_sample" "emit_sample" \
> > > + server s0 s1 172.31.110.20/24 -u || return 1
> > > +
> > > + # Check if emit_sample actions can be configured.
> > > + ovs_add_flow "test_emit_sample" emit_sample \
> > > + 'in_port(1),eth(),eth_type(0x0806),arp()' 'emit_sample(group=1)'
> > > + if [ $? == 1 ]; then
> > > + info "no support for emit_sample - skipping"
> > > + ovs_exit_sig
> > > + return $ksft_skip
> > > + fi
> > > +
> > > + ovs_del_flows "test_emit_sample" emit_sample
> > > +
> > > + # Allow ARP
> > > + ovs_add_flow "test_emit_sample" emit_sample \
> > > + 'in_port(1),eth(),eth_type(0x0806),arp()' '2' || return 1
> > > + ovs_add_flow "test_emit_sample" emit_sample \
> > > + 'in_port(2),eth(),eth_type(0x0806),arp()' '1' || return 1
> > > +
> > > + # Test action verification.
> > > + OLDIFS=$IFS
> > > + IFS='*'
> > > + min_key='in_port(1),eth(),eth_type(0x0800),ipv4()'
> > > + for testcase in \
> > > + "cookie to 
> > > large"*"emit_sample(group=1,cookie=1615141312111009080706050403020100)" \
> > > + "no group with cookie"*"emit_sample(cookie=abcd)" \
> > > + "no group"*"sample()";
> > > + do
> > > + set -- $testcase;
> > > + ovs_test_flow_fails "test_emit_sample" emit_sample $min_key $2
> > > + if [ $? == 1 ]; then
> > > + info "failed - $1"
> > > + return 1
> > > + fi
> > > + done
> > > + IFS=$OLDIFS
> > > +
> > > + # Sample first 14 bytes of all traffic.
> > > + ovs_add_flow "test_emit_sample" emit_sample \
> > > + 
> > > "in_port(1),eth(),eth_type(0x0800),ipv4(src=172.31.110.10,proto=1),icmp()"
> > >  "trunc(14),emit_sample(group=1,cookie=c0ffee),2"
> > > +
> > > + # Sample all traffic. In this case, u

Re: [ovs-dev] [PATCH 1/1] ofproto-dpif-upcall: Avoid stale ukeys leaks.

2024-06-18 Thread Eelco Chaudron



On 18 Jun 2024, at 8:05, Roi Dayan wrote:

> On 03/06/2024 16:29, Eelco Chaudron wrote:
>>
>>
>> On 3 Jun 2024, at 10:07, Roi Dayan wrote:
>>
>>> On 03/06/2024 10:18, Roi Dayan wrote:


 On 30/05/2024 18:48, Eelco Chaudron wrote:
>
>
> On 23 May 2024, at 12:46, Roi Dayan via dev wrote:
>
>> It is observed in some environments that there are much more ukeys than
>> actual DP flows. For example:
>>
>> $ ovs-appctl upcall/show
>> system@ovs-system:
>> flows : (current 7) (avg 6) (max 117) (limit 2125)
>> offloaded flows : 525
>> dump duration : 1063ms
>> ufid enabled : true
>>
>> 23: (keys 3612)
>> 24: (keys 3625)
>> 25: (keys 3485)
>>
>> The revalidator threads are busy revalidating the stale ukeys leading to
>> high CPU and long dump duration.
>>
>> There are some possible situations that may result in stale ukeys that
>> have no corresponding DP flows.
>>
>> In revalidator, push_dp_ops() doesn't check error if the op type is not
>> DEL. It is possible that a PUT(MODIFY) fails, especially for tc offload
>> case, where the old flow is deleted first and then the new one is
>> created. If the creation fails, the ukey will be stale (no corresponding
>> DP flow). This patch adds a warning in such case.
>
> Not sure I understand, this behavior should be captured by the 
> UKEY_INCONSISTENT state.

 Hi Eelco,

 thanks for reviewing.
 We started the patch on older branch as we didn't rebase for some time
 and got a little later on sending it.
 I see the addition now for UKEY_INCOSISTENT in the following patch:

 cf11766cbcf1 ofproto-dpif-upcall: Fix push_dp_ops to handle all errors.

 Looks like it handles the same situation we tried to handle in this patch.

>
>> Another possible scenario is in handle_upcalls() if a PUT operation did
>> not succeed and op->error attribute was not set correctly it can lead to
>> stale ukey in operational state.
>
>
> Guess we need to figure out these cases, as these are the root cause of 
> your problem.

 right. maybe. but this could help keep the system alive/clean while 
 logging the
 bad situation instead of having increase in those stale/inconsistent ukeys.
 I understand if it's not nice to handle it like this.

>>>
>>> Hi Eelco,
>>>
>>> I remember now one of the reproduction scenarios we did is do some traffic
>>> to get rules added using TC and then delete those from tc command line
>>> and it got to stale ukeys.
>>> The revalidator dump thread not seeing the rules so not updating anything
>>> and sweep over the ukeys skipping them as well.
>>> This is why we checked against the timing stats of the ukey.
>>>
>>> I remember I tested on the upstream code and not our development branch
>>> and it reproduces. I didn't notice if the commit adding UKEY_INCONSISTENT
>>> existed but it handles errors from adding flows so I dont think it matters.
>>>
>>> I'll try to check and verify again but I think it's still here.
>>> So while cases getting dop.error handled with UKEY_INCONSISTENT,
>>> this case I think is not.
>>
>> I think you are right this case is not covered by the UKEY_INCONSISTENT test 
>> below. See my suggestion on fixing this in revalidate_ukey().
>>
>> Maybe you could also add a test case for this in the offload test suite.
>>
>> //Eelco
>
> Hi Eelco,
>
> Thanks for the review. I didn't have time to check this but wanted to
> reply that it's in my queue to verify your suggestion and add a test.

Thanks for letting me know, and I understand as we are all busy :)

//Eelco

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v2 2/9] net: sched: act_sample: add action cookie to sample

2024-06-18 Thread Adrián Moreno
On Mon, Jun 17, 2024 at 12:00:04PM GMT, Ilya Maximets wrote:
> On 6/3/24 20:56, Adrian Moreno wrote:
> > If the action has a user_cookie, pass it along to the sample so it can
> > be easily identified.
> >
> > Signed-off-by: Adrian Moreno 
> > ---
> >  net/sched/act_sample.c | 12 
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c
> > index a69b53d54039..5c3f86ec964a 100644
> > --- a/net/sched/act_sample.c
> > +++ b/net/sched/act_sample.c
> > @@ -165,9 +165,11 @@ TC_INDIRECT_SCOPE int tcf_sample_act(struct sk_buff 
> > *skb,
> >  const struct tc_action *a,
> >  struct tcf_result *res)
> >  {
> > +   u8 cookie_data[TC_COOKIE_MAX_SIZE] = {};
>
> Is it necessary to initialize these 16 bytes on every call?
> Might be expensive.  We're passing the data length around,
> so the uninitialized parts should not be accessed.
>

They "should" not, indeed. I was just trying to be extra careful.
Are you worried TC_COOKIE_MAX_SIZE could grow or the cycles needed to
clear the current 16 bytes?

> Best regards, Ilya Maximets.
>
> > struct tcf_sample *s = to_sample(a);
> > struct psample_group *psample_group;
> > struct psample_metadata md = {};
> > +   struct tc_cookie *user_cookie;
> > int retval;
> >
> > tcf_lastuse_update(&s->tcf_tm);
> > @@ -189,6 +191,16 @@ TC_INDIRECT_SCOPE int tcf_sample_act(struct sk_buff 
> > *skb,
> > if (skb_at_tc_ingress(skb) && tcf_sample_dev_ok_push(skb->dev))
> > skb_push(skb, skb->mac_len);
> >
> > +   rcu_read_lock();
> > +   user_cookie = rcu_dereference(a->user_cookie);
> > +   if (user_cookie) {
> > +   memcpy(cookie_data, user_cookie->data,
> > +  user_cookie->len);
> > +   md.user_cookie = cookie_data;
> > +   md.user_cookie_len = user_cookie->len;
> > +   }
> > +   rcu_read_unlock();
> > +
> > md.trunc_size = s->truncate ? s->trunc_size : skb->len;
> > psample_sample_packet(psample_group, skb, s->rate, &md);
> >
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] selftests: openvswitch: Set value to nla flags.

2024-06-18 Thread 0-day Robot
Bleep bloop.  Greetings Adrián Moreno, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


git-am:
error: sha1 information is lacking or useless 
(tools/testing/selftests/net/openvswitch/ovs-dpctl.py).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 selftests: openvswitch: Set value to nla flags.
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".


Patch skipped due to previous failure.

Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v2 6/9] net: openvswitch: store sampling probability in cb.

2024-06-18 Thread Adrián Moreno
On Mon, Jun 17, 2024 at 01:26:39PM GMT, Ilya Maximets wrote:
> On 6/17/24 09:08, Adrián Moreno wrote:
> > On Fri, Jun 14, 2024 at 12:55:59PM GMT, Aaron Conole wrote:
> >> Adrian Moreno  writes:
> >>
> >>> The behavior of actions might not be the exact same if they are being
> >>> executed inside a nested sample action. Store the probability of the
> >>> parent sample action in the skb's cb area.
> >>
> >> What does that mean?
> >>
> >
> > Emit action, for instance, needs the probability so that psample
> > consumers know what was the sampling rate applied. Also, the way we
> > should inform about packet drops (via kfree_skb_reason) changes (see
> > patch 7/9).
> >
> >>> Use the probability in emit_sample to pass it down to psample.
> >>>
> >>> Signed-off-by: Adrian Moreno 
> >>> ---
> >>>  include/uapi/linux/openvswitch.h |  3 ++-
> >>>  net/openvswitch/actions.c| 25 ++---
> >>>  net/openvswitch/datapath.h   |  3 +++
> >>>  net/openvswitch/vport.c  |  1 +
> >>>  4 files changed, 28 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/include/uapi/linux/openvswitch.h 
> >>> b/include/uapi/linux/openvswitch.h
> >>> index a0e9dde0584a..9d675725fa2b 100644
> >>> --- a/include/uapi/linux/openvswitch.h
> >>> +++ b/include/uapi/linux/openvswitch.h
> >>> @@ -649,7 +649,8 @@ enum ovs_flow_attr {
> >>>   * Actions are passed as nested attributes.
> >>>   *
> >>>   * Executes the specified actions with the given probability on a 
> >>> per-packet
> >>> - * basis.
> >>> + * basis. Nested actions will be able to access the probability value of 
> >>> the
> >>> + * parent @OVS_ACTION_ATTR_SAMPLE.
> >>>   */
> >>>  enum ovs_sample_attr {
> >>>   OVS_SAMPLE_ATTR_UNSPEC,
> >>> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> >>> index 3b4dba0ded59..33f6d93ba5e4 100644
> >>> --- a/net/openvswitch/actions.c
> >>> +++ b/net/openvswitch/actions.c
> >>> @@ -1048,12 +1048,15 @@ static int sample(struct datapath *dp, struct 
> >>> sk_buff *skb,
> >>>   struct nlattr *sample_arg;
> >>>   int rem = nla_len(attr);
> >>>   const struct sample_arg *arg;
> >>> + u32 init_probability;
> >>>   bool clone_flow_key;
> >>> + int err;
> >>>
> >>>   /* The first action is always 'OVS_SAMPLE_ATTR_ARG'. */
> >>>   sample_arg = nla_data(attr);
> >>>   arg = nla_data(sample_arg);
> >>>   actions = nla_next(sample_arg, &rem);
> >>> + init_probability = OVS_CB(skb)->probability;
> >>>
> >>>   if ((arg->probability != U32_MAX) &&
> >>>   (!arg->probability || get_random_u32() > arg->probability)) {
> >>> @@ -1062,9 +1065,21 @@ static int sample(struct datapath *dp, struct 
> >>> sk_buff *skb,
> >>>   return 0;
> >>>   }
> >>>
> >>> + if (init_probability) {
> >>> + OVS_CB(skb)->probability = ((u64)OVS_CB(skb)->probability *
> >>> + arg->probability / U32_MAX);
> >>> + } else {
> >>> + OVS_CB(skb)->probability = arg->probability;
> >>> + }
> >>> +
> >>
> >> I'm confused by this.  Eventually, integer arithmetic will practically
> >> guarantee that nested sample() calls will go to 0.  So eventually, the
> >> test above will be impossible to meet mathematically.
> >>
> >> OTOH, you could argue that a 1% of 50% is low anyway, but it still would
> >> have a positive probability count, and still be possible for
> >> get_random_u32() call to match.
> >>
> >
> > Using OVS's probability semantics, we can express probabilities as low
> > as (100/U32_MAX)% which is pretty low indeed. However, just because the
> > probability of executing the action is low I don't think we should not
> > report it.
> >
> > Rethinking the integer arithmetics, it's true that we should avoid
> > hitting zero on the division, eg: nesting 6x 1% sampling rates will make
> > the result be zero which will make probability restoration fail on the
> > way back. Threrefore, the new probability should be at least 1.
> >
> >
> >> I'm not sure about this particular change.  Why do we need it?
> >>
> >
> > Why do we need to propagate the probability down to nested "sample"
> > actions? or why do we need to store the probability in the cb area in
> > the first place?
> >
> > The former: Just for correctness as only storing the last one would be
> > incorrect. Although I don't know of any use for nested "sample" actions.
>
> I think, we can drop this for now.  All the user interfaces specify
> the probability per action.  So, it should be fine to report the
> probability of the action that emitted the sample without taking into
> account the whole timeline of that packet.  Besides, packet can leave
> OVS and go back loosing the metadata, so it will not actually be a
> full solution anyway.  Single-action metadata is easier to define.
>

Sure, I guess we can drop it, I don't think there is a use case for nested
samples anyway.

> > The latter: To pass it down to psample so that sample receivers know how
> > the sampling rate applied (and, e.g: do throughput estimations lik

Re: [ovs-dev] [PATCH net-next v2 5/9] net: openvswitch: add emit_sample action

2024-06-18 Thread Adrián Moreno
On Mon, Jun 17, 2024 at 12:44:45PM GMT, Ilya Maximets wrote:
> On 6/3/24 20:56, Adrian Moreno wrote:
> > Add support for a new action: emit_sample.
> >
> > This action accepts a u32 group id and a variable-length cookie and uses
> > the psample multicast group to make the packet available for
> > observability.
> >
> > The maximum length of the user-defined cookie is set to 16, same as
> > tc_cookie, to discourage using cookies that will not be offloadable.
> >
> > Signed-off-by: Adrian Moreno 
> > ---
> >  Documentation/netlink/specs/ovs_flow.yaml | 17 
> >  include/uapi/linux/openvswitch.h  | 25 
> >  net/openvswitch/actions.c | 50 +++
> >  net/openvswitch/flow_netlink.c| 33 ++-
> >  4 files changed, 124 insertions(+), 1 deletion(-)
>
> Some nits below, beside ones already mentioned.
>

Thanks, Ilya.

> >
> > diff --git a/Documentation/netlink/specs/ovs_flow.yaml 
> > b/Documentation/netlink/specs/ovs_flow.yaml
> > index 4fdfc6b5cae9..a7ab5593a24f 100644
> > --- a/Documentation/netlink/specs/ovs_flow.yaml
> > +++ b/Documentation/netlink/specs/ovs_flow.yaml
> > @@ -727,6 +727,12 @@ attribute-sets:
> >  name: dec-ttl
> >  type: nest
> >  nested-attributes: dec-ttl-attrs
> > +  -
> > +name: emit-sample
> > +type: nest
> > +nested-attributes: emit-sample-attrs
> > +doc: |
> > +  Sends a packet sample to psample for external observation.
> >-
> >  name: tunnel-key-attrs
> >  enum-name: ovs-tunnel-key-attr
> > @@ -938,6 +944,17 @@ attribute-sets:
> >-
> >  name: gbp
> >  type: u32
> > +  -
> > +name: emit-sample-attrs
> > +enum-name: ovs-emit-sample-attr
> > +name-prefix: ovs-emit-sample-attr-
> > +attributes:
> > +  -
> > +name: group
> > +type: u32
> > +  -
> > +name: cookie
> > +type: binary
> >
> >  operations:
> >name-prefix: ovs-flow-cmd-
> > diff --git a/include/uapi/linux/openvswitch.h 
> > b/include/uapi/linux/openvswitch.h
> > index efc82c318fa2..a0e9dde0584a 100644
> > --- a/include/uapi/linux/openvswitch.h
> > +++ b/include/uapi/linux/openvswitch.h
> > @@ -914,6 +914,30 @@ struct check_pkt_len_arg {
> >  };
> >  #endif
> >
> > +#define OVS_EMIT_SAMPLE_COOKIE_MAX_SIZE 16
> > +/**
> > + * enum ovs_emit_sample_attr - Attributes for %OVS_ACTION_ATTR_EMIT_SAMPLE
> > + * action.
> > + *
> > + * @OVS_EMIT_SAMPLE_ATTR_GROUP: 32-bit number to identify the source of the
> > + * sample.
> > + * @OVS_EMIT_SAMPLE_ATTR_COOKIE: A variable-length binary cookie that 
> > contains
> > + * user-defined metadata. The maximum length is 16 bytes.
>
> s/16/OVS_EMIT_SAMPLE_COOKIE_MAX_SIZE/
>
> > + *
> > + * Sends the packet to the psample multicast group with the specified 
> > group and
> > + * cookie. It is possible to combine this action with the
> > + * %OVS_ACTION_ATTR_TRUNC action to limit the size of the packet being 
> > emitted.
> > + */
> > +enum ovs_emit_sample_attr {
> > +   OVS_EMIT_SAMPLE_ATTR_UNPSEC,
> > +   OVS_EMIT_SAMPLE_ATTR_GROUP, /* u32 number. */
> > +   OVS_EMIT_SAMPLE_ATTR_COOKIE,/* Optional, user specified cookie. */
> > +   __OVS_EMIT_SAMPLE_ATTR_MAX
> > +};
> > +
> > +#define OVS_EMIT_SAMPLE_ATTR_MAX (__OVS_EMIT_SAMPLE_ATTR_MAX - 1)
> > +
> > +
> >  /**
> >   * enum ovs_action_attr - Action types.
> >   *
> > @@ -1004,6 +1028,7 @@ enum ovs_action_attr {
> > OVS_ACTION_ATTR_ADD_MPLS, /* struct ovs_action_add_mpls. */
> > OVS_ACTION_ATTR_DEC_TTL,  /* Nested OVS_DEC_TTL_ATTR_*. */
> > OVS_ACTION_ATTR_DROP, /* u32 error code. */
> > +   OVS_ACTION_ATTR_EMIT_SAMPLE,  /* Nested OVS_EMIT_SAMPLE_ATTR_*. */
> >
> > __OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
> >* from userspace. */
> > diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> > index 964225580824..3b4dba0ded59 100644
> > --- a/net/openvswitch/actions.c
> > +++ b/net/openvswitch/actions.c
> > @@ -24,6 +24,11 @@
> >  #include 
> >  #include 
> >  #include 
> > +
> > +#if IS_ENABLED(CONFIG_PSAMPLE)
> > +#include 
> > +#endif
> > +
> >  #include 
> >
> >  #include "datapath.h"
> > @@ -1299,6 +1304,46 @@ static int execute_dec_ttl(struct sk_buff *skb, 
> > struct sw_flow_key *key)
> > return 0;
> >  }
> >
> > +static int execute_emit_sample(struct datapath *dp, struct sk_buff *skb,
> > +  const struct sw_flow_key *key,
> > +  const struct nlattr *attr)
> > +{
> > +#if IS_ENABLED(CONFIG_PSAMPLE)
> > +   struct psample_group psample_group = {};
> > +   struct psample_metadata md = {};
> > +   struct vport *input_vport;
> > +   const struct nlattr *a;
> > +   int rem;
> > +
> > +   for (a = nla_data(attr), rem = nla_len(attr); rem > 0;
> > +a = nla_next(a, &rem)) {
>
> Since the action is strictly validated, can use use nla_for_ea

[ovs-dev] [PATCH v2] selftests: openvswitch: Set value to nla flags.

2024-06-18 Thread Adrian Moreno
Netlink flags, although they don't have payload at the netlink level,
are represented as having "True" as value in pyroute2.

Without it, trying to add a flow with a flag-type action (e.g: pop_vlan)
fails with the following traceback:

Traceback (most recent call last):
  File "[...]/ovs-dpctl.py", line 2498, in 
sys.exit(main(sys.argv))
 ^^
  File "[...]/ovs-dpctl.py", line 2487, in main
ovsflow.add_flow(rep["dpifindex"], flow)
  File "[...]/ovs-dpctl.py", line 2136, in add_flow
reply = self.nlm_request(
^
  File "[...]/pyroute2/netlink/nlsocket.py", line 822, in nlm_request
return tuple(self._genlm_request(*argv, **kwarg))
 ^^^
  File "[...]/pyroute2/netlink/generic/__init__.py", line 126, in
nlm_request
return tuple(super().nlm_request(*argv, **kwarg))
   ^^
  File "[...]/pyroute2/netlink/nlsocket.py", line 1124, in nlm_request
self.put(msg, msg_type, msg_flags, msg_seq=msg_seq)
  File "[...]/pyroute2/netlink/nlsocket.py", line 389, in put
self.sendto_gate(msg, addr)
  File "[...]/pyroute2/netlink/nlsocket.py", line 1056, in sendto_gate
msg.encode()
  File "[...]/pyroute2/netlink/__init__.py", line 1245, in encode
offset = self.encode_nlas(offset)
 
  File "[...]/pyroute2/netlink/__init__.py", line 1560, in encode_nlas
nla_instance.setvalue(cell[1])
  File "[...]/pyroute2/netlink/__init__.py", line 1265, in setvalue
nlv.setvalue(nla_tuple[1])
 ~^^^
IndexError: list index out of range

Signed-off-by: Adrian Moreno 
---
 tools/testing/selftests/net/openvswitch/ovs-dpctl.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py 
b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
index 1dd057afd3fb..9f8dec2f6539 100644
--- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
+++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
@@ -531,7 +531,7 @@ class ovsactions(nla):
 for flat_act in parse_flat_map:
 if parse_starts_block(actstr, flat_act[0], False):
 actstr = actstr[len(flat_act[0]):]
-self["attrs"].append([flat_act[1]])
+self["attrs"].append([flat_act[1], True])
 actstr = actstr[strspn(actstr, ", ") :]
 parsed = True
 
-- 
2.45.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v2 7/9] net: openvswitch: do not notify drops inside sample

2024-06-18 Thread Adrián Moreno
On Mon, Jun 17, 2024 at 02:10:37PM GMT, Ilya Maximets wrote:
> On 6/17/24 13:55, Ilya Maximets wrote:
> > On 6/3/24 20:56, Adrian Moreno wrote:
> >> The OVS_ACTION_ATTR_SAMPLE action is, in essence,
> >> observability-oriented.
> >>
> >> Apart from some corner case in which it's used a replacement of clone()
> >> for old kernels, it's really only used for sFlow, IPFIX and now,
> >> local emit_sample.
> >>
> >> With this in mind, it doesn't make much sense to report
> >> OVS_DROP_LAST_ACTION inside sample actions.
> >>
> >> For instance, if the flow:
> >>
> >>   actions:sample(..,emit_sample(..)),2
> >>
> >> triggers a OVS_DROP_LAST_ACTION skb drop event, it would be extremely
> >> confusing for users since the packet did reach its destination.
> >>
> >> This patch makes internal action execution silently consume the skb
> >> instead of notifying a drop for this case.
> >>
> >> Unfortunately, this patch does not remove all potential sources of
> >> confusion since, if the sample action itself is the last action, e.g:
> >>
> >> actions:sample(..,emit_sample(..))
> >>
> >> we actually _should_ generate a OVS_DROP_LAST_ACTION event, but we aren't.
> >>
> >> Sadly, this case is difficult to solve without breaking the
> >> optimization by which the skb is not cloned on last sample actions.
> >> But, given explicit drop actions are now supported, OVS can just add one
> >> after the last sample() and rewrite the flow as:
> >>
> >> actions:sample(..,emit_sample(..)),drop
> >>
> >> Signed-off-by: Adrian Moreno 
> >> ---
> >>  net/openvswitch/actions.c | 13 +++--
> >>  1 file changed, 11 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> >> index 33f6d93ba5e4..54fc1abcff95 100644
> >> --- a/net/openvswitch/actions.c
> >> +++ b/net/openvswitch/actions.c
> >> @@ -82,6 +82,15 @@ static struct action_fifo __percpu *action_fifos;
> >>  static struct action_flow_keys __percpu *flow_keys;
> >>  static DEFINE_PER_CPU(int, exec_actions_level);
> >>
> >> +static inline void ovs_drop_skb_last_action(struct sk_buff *skb)
> >> +{
> >> +  /* Do not emit packet drops inside sample(). */
> >> +  if (OVS_CB(skb)->probability)
> >> +  consume_skb(skb);
> >> +  else
> >> +  ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
> >> +}
> >> +
> >>  /* Make a clone of the 'key', using the pre-allocated percpu 'flow_keys'
> >>   * space. Return NULL if out of key spaces.
> >>   */
> >> @@ -1061,7 +1070,7 @@ static int sample(struct datapath *dp, struct 
> >> sk_buff *skb,
> >>if ((arg->probability != U32_MAX) &&
> >>(!arg->probability || get_random_u32() > arg->probability)) {
> >>if (last)
> >> -  ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
> >> +  ovs_drop_skb_last_action(skb);
>
> Always consuming the skb at this point makes sense, since having smaple()
> as a last action is a reasonable thing to have.  But this looks more like
> a fix for the original drop reason patch set.
>

I don't think consuming the skb at this point makes sense. It was very
intentionally changed to a drop since a very common use-case for
sampling is drop-sampling, i.e: replacing an empty action list (that
triggers OVS_DROP_LAST_ACTION) with a sample(emit_sample()). Ideally,
that replacement should not have any effect on the number of
OVS_DROP_LAST_ACTION being reported as the packets are being treated in
the same way (only observed in one case).


> >>return 0;
> >>}
> >>
> >> @@ -1579,7 +1588,7 @@ static int do_execute_actions(struct datapath *dp, 
> >> struct sk_buff *skb,
> >>}
> >>}
> >>
> >> -  ovs_kfree_skb_reason(skb, OVS_DROP_LAST_ACTION);
> >> +  ovs_drop_skb_last_action(skb);
> >
> > I don't think I agree with this one.  If we have a sample() action with
> > a lot of different actions inside and we reached the end while the last
> > action didn't consume the skb, then we should report that.  E.g.
> > "sample(emit_sample(),push_vlan(),set(eth())),2"  should report that the
> > cloned skb was dropped.  "sample(push_vlan(),emit_sample())" should not.
> >

What is the use case for such action list? Having an action branch
executed randomly doesn't make sense to me if it's not some
observability thing (which IMHO should not trigger drops).

> > The only actions that are actually consuming the skb are "output",
> > "userspace", "recirc" and now "emit_sample".  "output" and "recirc" are
> > consuming the skb "naturally" by stealing it when it is the last action.
> > "userspace" has an explicit check to consume the skb if it is the last
> > action.  "emit_sample" should have the similar check.  It should likely
> > be added at the point of action introduction instead of having a separate
> > patch.
> >

Unlinke "output", "recirc", "userspace", etc. with emit_sample the
packet does not continue it's way through the datapath.

It would be very confusing if OVS starts monitoring drops and