Re: [ovs-dev] [PATCH 2/2] tests: Use OVS_CHECK_XT6 for all applicable IPv6 tests.

2024-12-01 Thread Paolo Valerio
Simon Horman  writes:

> Commit d595473ccaae ("tests: Add nft accept support.") uses
> nft, when available, instead of iptables to add an accept rule.
>
> Unfortunately several such cases were missed by that patch.
> This patchset seeks to address the IPv6 cases, all of which were missed.
>
> It does so by:
>
> 1. Generalising NFT_ACCEPT() and IPTABLES_ACCEPT() to also handle IPv6.
> 2. Adding XT6_ACCEPT
> 3. Using XT6_ACCEPT and OVS_CHECK_XT in the relevant tests
>
> Note that the use of OVS_CHECK_XT adds prerequisites checks to the
> relevant tests, which were previously absent.
>
> Reported-by: Paolo Valerio 
> Signed-off-by: Simon Horman 
> ---

Acked-by: Paolo Valerio 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/2] tests: Use OVS_CHECK_XT for all applicable IPv4 tests.

2024-12-01 Thread Paolo Valerio
Simon Horman  writes:

> Commit d595473ccaae ("tests: Add nft accept support.") uses
> nft, when available, instead of iptables to add an accept rule.
>
> Unfortunately several such cases were missed by that patch.
> This patch seeks to address the IPv4 cases that were missed.
>
> In doing so, it adds a missing pre-requisite check to "datapath - ping
> over erspan v2 tunnel by simulated packets". Which previously should
> have been IPTABLES_ACCEPT() and is now correctly XT_ACCEPT().
>
> Reported-by: Paolo Valerio 
> Signed-off-by: Simon Horman 
> ---

Thanks Simon for the follow up

Acked-by: Paolo Valerio 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 0/3] tests: use nft when available

2024-11-12 Thread Paolo Valerio
On Tue, Nov 12, 2024 at 12:07 AM Aaron Conole  wrote:
>
> Paolo Valerio  writes:
>
> > Simon Horman  writes:
> >
> >> Hi,
> >>
> >> This series aims to update the testsuite so that, if available,
> >> nft is used in palce of iptables. The motivation being to move
> >> to more modern tooling.
> >>
> >> ---
> >
> > Hi Simon,
>
> Hi Paolo,
>
> > The patches look good, I also performed some tests and things work as
> > expected.
> >
> > I noticed that "datapath - ping over erspan v1 tunnel by simulated
> > packets" still uses IPTABLES_ACCEPT().
> >
> > Also, "datapath - ping over erspan v2 tunnel by simulated packets" does
> > not use the macro, but directly uses iptables with the ACCEPT target
> >
> > These last two also:
> > datapath - ping over ip6erspan v1 tunnel by simulated packets
> > datapath - ping over ip6erspan v2 tunnel by simulated packets
> >
> > instead, do the same but for v6 (ip6tables).
> > They went unnoticed while adding $HAVE_IPTABLES.
> >
> > I guess those should be handled in this set as well.
> > WDYT?
>
> Sorry - I didn't see your comment when doing the apply (my tool didn't
> pull the cover letter comments).  I guess this should be done as a follow
> up.

No worries.
Yes, a follow-up will work.

>
> >> Changes in v2:
> >> - Drop dependency in v2
> >> - I have verified that nft is used when the CI runs the testsuite
> >> - Link to v1: 
> >> https://mail.openvswitch.org/pipermail/ovs-dev/2024-October/417704.html
> >>
> >> ---
> >> Simon Horman (3):
> >>   tests: add nft accept support.
> >>   tests: Add nft support to ADD_EXTERNAL_CT.
> >>   tests: Handle marks using nft if available.
> >>
> >>  tests/atlocal.in |  3 ++
> >>  tests/ovs-macros.at  | 26 -
> >>  tests/system-common-macros.at|  4 ++
> >>  tests/system-kmod-macros.at  | 80 
> >> +---
> >>  tests/system-offloads-traffic.at | 29 ++-
> >>  tests/system-traffic.at  |  4 +-
> >>  6 files changed, 135 insertions(+), 11 deletions(-)
> >>
> >> base-commit: e998d4558c10938082e02372ac42f828d252c3cd
> >
> > ___
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 0/3] tests: use nft when available

2024-11-10 Thread Paolo Valerio
Simon Horman  writes:

> Hi,
>
> This series aims to update the testsuite so that, if available,
> nft is used in palce of iptables. The motivation being to move
> to more modern tooling.
>
> ---

Hi Simon,

The patches look good, I also performed some tests and things work as
expected.

I noticed that "datapath - ping over erspan v1 tunnel by simulated
packets" still uses IPTABLES_ACCEPT().

Also, "datapath - ping over erspan v2 tunnel by simulated packets" does
not use the macro, but directly uses iptables with the ACCEPT target

These last two also:
datapath - ping over ip6erspan v1 tunnel by simulated packets
datapath - ping over ip6erspan v2 tunnel by simulated packets

instead, do the same but for v6 (ip6tables).
They went unnoticed while adding $HAVE_IPTABLES.

I guess those should be handled in this set as well.
WDYT?

> Changes in v2:
> - Drop dependency in v2
> - I have verified that nft is used when the CI runs the testsuite
> - Link to v1: 
> https://mail.openvswitch.org/pipermail/ovs-dev/2024-October/417704.html
>
> ---
> Simon Horman (3):
>   tests: add nft accept support.
>   tests: Add nft support to ADD_EXTERNAL_CT.
>   tests: Handle marks using nft if available.
>
>  tests/atlocal.in |  3 ++
>  tests/ovs-macros.at  | 26 -
>  tests/system-common-macros.at|  4 ++
>  tests/system-kmod-macros.at  | 80 
> +---
>  tests/system-offloads-traffic.at | 29 ++-
>  tests/system-traffic.at  |  4 +-
>  6 files changed, 135 insertions(+), 11 deletions(-)
>
> base-commit: e998d4558c10938082e02372ac42f828d252c3cd

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 0/2] tests: Replace wget with curl.

2024-11-03 Thread Paolo Valerio
Eelco Chaudron  writes:

> Eelco Chaudron (2):
>   system-traffic: Replace wget with curl for negative and ftp tests.
>   system-traffic: Standardize by replacing all wget instances with curl.
>
>  tests/system-tap.at |   3 +-
>  tests/system-traffic.at | 249 +++-
>  2 files changed, 172 insertions(+), 80 deletions(-)
>
> -- 

Acked-by: Paolo Valerio 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] conntrack: Fix Windows build due to ternary syntax extension.

2024-10-14 Thread Paolo Valerio
From: Aaron Conole 

In the cited commit a ternary using syntax extension slipped in.

The extension allows omitting the second operand and it is not
supported by MSVC resulting in a build failure.

Fix it by simply specifying the second operand.

Fixes: b57c1da5c39a ("conntrack: Use a per zone default limit.")
Reported-by: Ilya Maximets 
Signed-off-by: Aaron Conole 
[Paolo: added commit message]
Signed-off-by: Paolo Valerio 
---
 lib/conntrack.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 0061a5636..f4b150bee 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -313,7 +313,7 @@ zone_limit_get_limit(struct conntrack *ct, struct 
conntrack_zone_limit *czl)
 
 if (limit == ZONE_LIMIT_CONN_DEFAULT) {
 atomic_read_relaxed(&ct->default_zone_limit, &limit);
-limit = limit ? : -1;
+limit = limit ? limit : -1;
 }
 
 return limit;
-- 
2.46.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 1/2] system-traffic: Do not rely on conncount for already tracked packets.

2024-10-07 Thread Paolo Valerio
Simon Horman  writes:

> On Wed, Oct 02, 2024 at 06:01:41PM +0200, Paolo Valerio wrote:
>> As Long reported, kernels built without CONFIG_NETFILTER_CONNCOUNT
>> result in the unexpected failure of the following tests:
>> 
>> conntrack - multiple zones, local
>> conntrack - multi-stage pipeline, local
>> conntrack - can match and clear ct_state from outside OVS
>> 
>> this happens because the nf_conncount turns on connection tracking and
>> the above tests rely on this side effect. However, this behavior may
>> be corrected in the kernel, which could, in turn, cause the tests to
>> fail.
>> 
>> The patch removes the assumption by adding iptables rules to attach
>> an nf_conn template to the skb resulting tracked once hit the OvS
>> pipeline.
>> 
>> While at it, introduce $HAVE_IPTABLES and skip tests if iptables
>> binary is not present.
>> 
>> Reported-by: Xin Long 
>> Reported-at: https://issues.redhat.com/browse/FDP-708
>> Signed-off-by: Paolo Valerio 
>> ---
>> v3:
>> - generalized introducing CHECK_EXTERNAL_CT()/ADD_EXTERNAL_CT()
>>   to ease the transition toward a different front-end
>> 
>> v2:
>> - add $HAVE_IPTABLES
>> - reduced subject length (0-day Robot)
>
> ...
>
>> diff --git a/tests/atlocal.in b/tests/atlocal.in
>> index 8565a0bae..d6b87f8ec 100644
>> --- a/tests/atlocal.in
>> +++ b/tests/atlocal.in
>> @@ -185,6 +185,9 @@ find_command lftp
>>  # Set HAVE_ETHTOOL
>>  find_command ethtool
>>  
>> +# Set HAVE_IPTABLES
>> +find_command iptables
>> +
>>  CURL_OPT="-g -v --max-time 1 --retry 2 --retry-delay 1 --connect-timeout 1"
>>  
>>  # Determine whether "diff" supports "normal" diffs.  (busybox diff does 
>> not.)
>> diff --git a/tests/ovs-macros.at b/tests/ovs-macros.at
>> index 06c978555..df2835747 100644
>> --- a/tests/ovs-macros.at
>> +++ b/tests/ovs-macros.at
>> @@ -366,3 +366,8 @@ dnl Add a rule to always accept the traffic.
>>  m4_define([IPTABLES_ACCEPT],
>>[AT_CHECK([iptables -I INPUT 1 -i $1 -j ACCEPT])
>> on_exit 'iptables -D INPUT 1 -i $1'])
>> +
>> +dnl Required to let conntrack start tracking the packets outside ovs
>> +m4_define([IPTABLES_CT],
>> +  [AT_CHECK([iptables -t raw -I OUTPUT 1 -o $1 -j CT])
>> +   on_exit 'iptables -t raw -D OUTPUT 1'])
>
> Hi Paolo,
>
> I don't think IPTABLES_CT is needed now that we have ADD_EXTERNAL_CT.
>

it's not, indeed. It's a leftover of the old revision.
I sent a new revision. Thanks.

> Otherwise this looks good to me.
>
>> diff --git a/tests/system-kmod-macros.at b/tests/system-kmod-macros.at
>> index 5203b1df8..135892e91 100644
>> --- a/tests/system-kmod-macros.at
>> +++ b/tests/system-kmod-macros.at
>> @@ -267,3 +267,24 @@ m4_define([OVS_CHECK_BAREUDP],
>>  AT_SKIP_IF([! ip link add dev ovs_bareudp0 type bareudp dstport 6635 
>> ethertype mpls_uc 2>&1 >/dev/null])
>>  AT_CHECK([ip link del dev ovs_bareudp0])
>>  ])
>> +
>> +# CHECK_EXTERNAL_CT()
>> +#
>> +# Checks if packets can be tracked outside OvS.
>> +m4_define([CHECK_EXTERNAL_CT],
>> +[
>> +dnl Kernel config (CONFIG_NETFILTER_XT_TARGET_CT)
>> +dnl and user space extensions need to be present.
>> +AT_SKIP_IF([test $HAVE_IPTABLES = no])
>> +AT_SKIP_IF([! iptables -t raw -I OUTPUT 1 -j CT])
>> +AT_CHECK([iptables -t raw -D OUTPUT 1])
>> +])
>> +
>> +# ADD_EXTERNAL_CT()
>> +#
>> +# Let conntrack start tracking the packets outside OvS.
>> +m4_define([ADD_EXTERNAL_CT],
>> +[
>> +AT_CHECK([iptables -t raw -I OUTPUT 1 -o $1 -j CT])
>> +on_exit 'iptables -t raw -D OUTPUT 1'
>> +])
>
> ...

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v4 1/2] system-traffic: Do not rely on conncount for already tracked packets.

2024-10-07 Thread Paolo Valerio
As Long reported, kernels built without CONFIG_NETFILTER_CONNCOUNT
result in the unexpected failure of the following tests:

conntrack - multiple zones, local
conntrack - multi-stage pipeline, local
conntrack - can match and clear ct_state from outside OVS

this happens because the nf_conncount turns on connection tracking and
the above tests rely on this side effect. However, this behavior may
be corrected in the kernel, which could, in turn, cause the tests to
fail.

The patch removes the assumption by adding iptables rules to attach
an nf_conn template to the skb resulting tracked once hit the OvS
pipeline.

While at it, introduce $HAVE_IPTABLES and skip tests if iptables
binary is not present.

Reported-by: Xin Long 
Reported-at: https://issues.redhat.com/browse/FDP-708
Signed-off-by: Paolo Valerio 
Acked-by: Eelco Chaudron 
---
v4:
- removed IPTABLES_CT() leftover (Simon)

v3:
- generalized introducing CHECK_EXTERNAL_CT()/ADD_EXTERNAL_CT()
  to ease the transition toward a different front-end

v2:
- add $HAVE_IPTABLES
- reduced subject length (0-day Robot)
---
 tests/atlocal.in |  3 +++
 tests/system-kmod-macros.at  | 21 +
 tests/system-traffic.at  |  8 
 tests/system-userspace-macros.at | 16 
 4 files changed, 48 insertions(+)

diff --git a/tests/atlocal.in b/tests/atlocal.in
index 8565a0bae..d6b87f8ec 100644
--- a/tests/atlocal.in
+++ b/tests/atlocal.in
@@ -185,6 +185,9 @@ find_command lftp
 # Set HAVE_ETHTOOL
 find_command ethtool
 
+# Set HAVE_IPTABLES
+find_command iptables
+
 CURL_OPT="-g -v --max-time 1 --retry 2 --retry-delay 1 --connect-timeout 1"
 
 # Determine whether "diff" supports "normal" diffs.  (busybox diff does not.)
diff --git a/tests/system-kmod-macros.at b/tests/system-kmod-macros.at
index 5203b1df8..135892e91 100644
--- a/tests/system-kmod-macros.at
+++ b/tests/system-kmod-macros.at
@@ -267,3 +267,24 @@ m4_define([OVS_CHECK_BAREUDP],
 AT_SKIP_IF([! ip link add dev ovs_bareudp0 type bareudp dstport 6635 
ethertype mpls_uc 2>&1 >/dev/null])
 AT_CHECK([ip link del dev ovs_bareudp0])
 ])
+
+# CHECK_EXTERNAL_CT()
+#
+# Checks if packets can be tracked outside OvS.
+m4_define([CHECK_EXTERNAL_CT],
+[
+dnl Kernel config (CONFIG_NETFILTER_XT_TARGET_CT)
+dnl and user space extensions need to be present.
+AT_SKIP_IF([test $HAVE_IPTABLES = no])
+AT_SKIP_IF([! iptables -t raw -I OUTPUT 1 -j CT])
+AT_CHECK([iptables -t raw -D OUTPUT 1])
+])
+
+# ADD_EXTERNAL_CT()
+#
+# Let conntrack start tracking the packets outside OvS.
+m4_define([ADD_EXTERNAL_CT],
+[
+AT_CHECK([iptables -t raw -I OUTPUT 1 -o $1 -j CT])
+on_exit 'iptables -t raw -D OUTPUT 1'
+])
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 202ff0492..5435a6241 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -1094,6 +1094,7 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/Invalid Geneve tunnel 
metadata on bridge br0 while
 AT_CLEANUP
 
 AT_SETUP([datapath - ping over gre tunnel by simulated packets])
+AT_SKIP_IF([test $HAVE_IPTABLES = no])
 OVS_CHECK_MIN_KERNEL(3, 10)
 
 OVS_TRAFFIC_VSWITCHD_START()
@@ -1140,6 +1141,7 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([datapath - ping over erspan v1 tunnel by simulated packets])
+AT_SKIP_IF([test $HAVE_IPTABLES = no])
 OVS_CHECK_MIN_KERNEL(3, 10)
 
 OVS_TRAFFIC_VSWITCHD_START()
@@ -5456,10 +5458,12 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([conntrack - multiple zones, local])
+CHECK_EXTERNAL_CT()
 CHECK_CONNTRACK()
 CHECK_CONNTRACK_LOCAL_STACK()
 OVS_TRAFFIC_VSWITCHD_START()
 
+ADD_EXTERNAL_CT([br0])
 ADD_NAMESPACES(at_ns0)
 
 AT_CHECK([ip addr add dev br0 "10.1.1.1/24"])
@@ -5505,10 +5509,12 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([conntrack - multi-stage pipeline, local])
+CHECK_EXTERNAL_CT()
 CHECK_CONNTRACK()
 CHECK_CONNTRACK_LOCAL_STACK()
 OVS_TRAFFIC_VSWITCHD_START()
 
+ADD_EXTERNAL_CT([br0])
 ADD_NAMESPACES(at_ns0)
 
 AT_CHECK([ip addr add dev br0 "10.1.1.1/24"])
@@ -8386,6 +8392,7 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([conntrack - can match and clear ct_state from outside OVS])
+CHECK_EXTERNAL_CT()
 CHECK_CONNTRACK_LOCAL_STACK()
 OVS_CHECK_GENEVE()
 
@@ -8396,6 +8403,7 @@ AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
 AT_CHECK([ovs-ofctl add-flow br-underlay 
"priority=100,ct_state=+trk,actions=ct_clear,resubmit(,0)"])
 AT_CHECK([ovs-ofctl add-flow br-underlay "priority=10,actions=normal"])
 
+ADD_EXTERNAL_CT([br0])
 ADD_NAMESPACES(at_ns0)
 
 dnl Set up underlay link from host into the namespace using veth pair.
diff --git a/tests/system-userspace-macros.at b/tests/system-userspace-macros.at
index d9b5b7e4c..c1be97347 100644
--- a/tests/system-userspace-macros.at
+++ b/tests/system-userspace-macros.at
@@ -357,3 +357,19 @@ m4_define([OVS_CHECK_BAREUDP],
 [
 AT_SKIP_IF([:])
 ])
+
+# CHE

[ovs-dev] [PATCH v4 2/2] ovs-macros.at: Correctly delete iptables rule on_exit.

2024-10-07 Thread Paolo Valerio
Currently, at every call of IPTABLES_ACCEPT() an iptables rule gets
added. Such rule is supposed to be removed on exit, but the current
syntax for deleting the rule is incorrect, resulting in a leftover
rule after execution.

Fix it by correcting the deletion command.

Fixes: 5e06e7ac99dc ("tests: Refactor the iptables accept rule.")
Signed-off-by: Paolo Valerio 
Reviewed-by: Aaron Conole 
Acked-by: Simon Horman 
Acked-by: Eelco Chaudron 
---
 tests/ovs-macros.at | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/ovs-macros.at b/tests/ovs-macros.at
index 06c978555..f1b8041fb 100644
--- a/tests/ovs-macros.at
+++ b/tests/ovs-macros.at
@@ -365,4 +365,4 @@ dnl to reject input traffic from bridges such as 
br-underlay.
 dnl Add a rule to always accept the traffic.
 m4_define([IPTABLES_ACCEPT],
   [AT_CHECK([iptables -I INPUT 1 -i $1 -j ACCEPT])
-   on_exit 'iptables -D INPUT 1 -i $1'])
+   on_exit 'iptables -D INPUT 1'])
-- 
2.46.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 2/2] ovs-macros.at: Correctly delete iptables rule on_exit.

2024-10-02 Thread Paolo Valerio
Currently, at every call of IPTABLES_ACCEPT() an iptables rule gets
added. Such rule is supposed to be removed on exit, but the current
syntax for deleting the rule is incorrect, resulting in a leftover
rule after execution.

Fix it by correcting the deletion command.

Fixes: 5e06e7ac99dc ("tests: Refactor the iptables accept rule.")
Signed-off-by: Paolo Valerio 
Reviewed-by: Aaron Conole 
Acked-by: Simon Horman 
Acked-by: Eelco Chaudron 
---
 tests/ovs-macros.at | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/ovs-macros.at b/tests/ovs-macros.at
index df2835747..4cc8e7bc8 100644
--- a/tests/ovs-macros.at
+++ b/tests/ovs-macros.at
@@ -365,7 +365,7 @@ dnl to reject input traffic from bridges such as 
br-underlay.
 dnl Add a rule to always accept the traffic.
 m4_define([IPTABLES_ACCEPT],
   [AT_CHECK([iptables -I INPUT 1 -i $1 -j ACCEPT])
-   on_exit 'iptables -D INPUT 1 -i $1'])
+   on_exit 'iptables -D INPUT 1'])
 
 dnl Required to let conntrack start tracking the packets outside ovs
 m4_define([IPTABLES_CT],
-- 
2.46.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 1/2] system-traffic: Do not rely on conncount for already tracked packets.

2024-10-02 Thread Paolo Valerio
As Long reported, kernels built without CONFIG_NETFILTER_CONNCOUNT
result in the unexpected failure of the following tests:

conntrack - multiple zones, local
conntrack - multi-stage pipeline, local
conntrack - can match and clear ct_state from outside OVS

this happens because the nf_conncount turns on connection tracking and
the above tests rely on this side effect. However, this behavior may
be corrected in the kernel, which could, in turn, cause the tests to
fail.

The patch removes the assumption by adding iptables rules to attach
an nf_conn template to the skb resulting tracked once hit the OvS
pipeline.

While at it, introduce $HAVE_IPTABLES and skip tests if iptables
binary is not present.

Reported-by: Xin Long 
Reported-at: https://issues.redhat.com/browse/FDP-708
Signed-off-by: Paolo Valerio 
---
v3:
- generalized introducing CHECK_EXTERNAL_CT()/ADD_EXTERNAL_CT()
  to ease the transition toward a different front-end

v2:
- add $HAVE_IPTABLES
- reduced subject length (0-day Robot)
---
 tests/atlocal.in |  3 +++
 tests/ovs-macros.at  |  5 +
 tests/system-kmod-macros.at  | 21 +
 tests/system-traffic.at  |  8 
 tests/system-userspace-macros.at | 16 
 5 files changed, 53 insertions(+)

diff --git a/tests/atlocal.in b/tests/atlocal.in
index 8565a0bae..d6b87f8ec 100644
--- a/tests/atlocal.in
+++ b/tests/atlocal.in
@@ -185,6 +185,9 @@ find_command lftp
 # Set HAVE_ETHTOOL
 find_command ethtool
 
+# Set HAVE_IPTABLES
+find_command iptables
+
 CURL_OPT="-g -v --max-time 1 --retry 2 --retry-delay 1 --connect-timeout 1"
 
 # Determine whether "diff" supports "normal" diffs.  (busybox diff does not.)
diff --git a/tests/ovs-macros.at b/tests/ovs-macros.at
index 06c978555..df2835747 100644
--- a/tests/ovs-macros.at
+++ b/tests/ovs-macros.at
@@ -366,3 +366,8 @@ dnl Add a rule to always accept the traffic.
 m4_define([IPTABLES_ACCEPT],
   [AT_CHECK([iptables -I INPUT 1 -i $1 -j ACCEPT])
on_exit 'iptables -D INPUT 1 -i $1'])
+
+dnl Required to let conntrack start tracking the packets outside ovs
+m4_define([IPTABLES_CT],
+  [AT_CHECK([iptables -t raw -I OUTPUT 1 -o $1 -j CT])
+   on_exit 'iptables -t raw -D OUTPUT 1'])
diff --git a/tests/system-kmod-macros.at b/tests/system-kmod-macros.at
index 5203b1df8..135892e91 100644
--- a/tests/system-kmod-macros.at
+++ b/tests/system-kmod-macros.at
@@ -267,3 +267,24 @@ m4_define([OVS_CHECK_BAREUDP],
 AT_SKIP_IF([! ip link add dev ovs_bareudp0 type bareudp dstport 6635 
ethertype mpls_uc 2>&1 >/dev/null])
 AT_CHECK([ip link del dev ovs_bareudp0])
 ])
+
+# CHECK_EXTERNAL_CT()
+#
+# Checks if packets can be tracked outside OvS.
+m4_define([CHECK_EXTERNAL_CT],
+[
+dnl Kernel config (CONFIG_NETFILTER_XT_TARGET_CT)
+dnl and user space extensions need to be present.
+AT_SKIP_IF([test $HAVE_IPTABLES = no])
+AT_SKIP_IF([! iptables -t raw -I OUTPUT 1 -j CT])
+AT_CHECK([iptables -t raw -D OUTPUT 1])
+])
+
+# ADD_EXTERNAL_CT()
+#
+# Let conntrack start tracking the packets outside OvS.
+m4_define([ADD_EXTERNAL_CT],
+[
+AT_CHECK([iptables -t raw -I OUTPUT 1 -o $1 -j CT])
+on_exit 'iptables -t raw -D OUTPUT 1'
+])
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 202ff0492..5435a6241 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -1094,6 +1094,7 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/Invalid Geneve tunnel 
metadata on bridge br0 while
 AT_CLEANUP
 
 AT_SETUP([datapath - ping over gre tunnel by simulated packets])
+AT_SKIP_IF([test $HAVE_IPTABLES = no])
 OVS_CHECK_MIN_KERNEL(3, 10)
 
 OVS_TRAFFIC_VSWITCHD_START()
@@ -1140,6 +1141,7 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([datapath - ping over erspan v1 tunnel by simulated packets])
+AT_SKIP_IF([test $HAVE_IPTABLES = no])
 OVS_CHECK_MIN_KERNEL(3, 10)
 
 OVS_TRAFFIC_VSWITCHD_START()
@@ -5456,10 +5458,12 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([conntrack - multiple zones, local])
+CHECK_EXTERNAL_CT()
 CHECK_CONNTRACK()
 CHECK_CONNTRACK_LOCAL_STACK()
 OVS_TRAFFIC_VSWITCHD_START()
 
+ADD_EXTERNAL_CT([br0])
 ADD_NAMESPACES(at_ns0)
 
 AT_CHECK([ip addr add dev br0 "10.1.1.1/24"])
@@ -5505,10 +5509,12 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([conntrack - multi-stage pipeline, local])
+CHECK_EXTERNAL_CT()
 CHECK_CONNTRACK()
 CHECK_CONNTRACK_LOCAL_STACK()
 OVS_TRAFFIC_VSWITCHD_START()
 
+ADD_EXTERNAL_CT([br0])
 ADD_NAMESPACES(at_ns0)
 
 AT_CHECK([ip addr add dev br0 "10.1.1.1/24"])
@@ -8386,6 +8392,7 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([conntrack - can match and clear ct_state from outside OVS])
+CHECK_EXTERNAL_CT()
 CHECK_CONNTRACK_LOCAL_STACK()
 OVS_CHECK_GENEVE()
 
@@ -8396,6 +8403,7 @@ AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
 AT_CHECK([ovs-ofctl add-flow br-underlay 
"priority=100,ct_state=+trk,actions=ct_cle

[ovs-dev] [PATCH v2 4/6] conntrack: Turn zl local limit into atomic.

2024-09-30 Thread Paolo Valerio
while at it, changes struct zone_limit initialization in
zone_limit_create() in order to use atomic init operations instead of
relying on memset() which, although correctly initializes the struct,
is semantically not aware of atomics.

Signed-off-by: Paolo Valerio 

---
v2:
- Fixed typo s/semantially/semantically (Aaron)
---
 lib/conntrack-private.h |  2 +-
 lib/conntrack.c | 19 ---
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index 2c625d710..2770470d1 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -201,7 +201,7 @@ enum ct_ephemeral_range {
 
 struct conntrack_zone_limit {
 int32_t zone;
-uint32_t limit;
+atomic_uint32_t limit;
 atomic_count count;
 uint32_t zone_limit_seq; /* Used to disambiguate zone limit counts. */
 };
diff --git a/lib/conntrack.c b/lib/conntrack.c
index 112d43216..3d19d37df 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -341,7 +341,7 @@ zone_limit_get(struct conntrack *ct, int32_t zone)
 struct zone_limit *zl = zone_limit_lookup_or_default(ct, zone);
 if (zl) {
 czl.zone = zl->czl.zone;
-czl.limit = zl->czl.limit;
+atomic_read_relaxed(&zl->czl.limit, &czl.limit);
 czl.count = atomic_count_get(&zl->czl.count);
 }
 return czl;
@@ -358,8 +358,9 @@ zone_limit_create(struct conntrack *ct, int32_t zone, 
uint32_t limit)
 }
 
 if (zone >= DEFAULT_ZONE && zone <= MAX_ZONE) {
-zl = xzalloc(sizeof *zl);
-zl->czl.limit = limit;
+zl = xmalloc(sizeof *zl);
+atomic_init(&zl->czl.limit, limit);
+atomic_count_init(&zl->czl.count, 0);
 zl->czl.zone = zone;
 zl->czl.zone_limit_seq = ct->zone_limit_seq++;
 uint32_t hash = zone_key_hash(zone, ct->hash_basis);
@@ -376,7 +377,7 @@ zone_limit_update(struct conntrack *ct, int32_t zone, 
uint32_t limit)
 int err = 0;
 struct zone_limit *zl = zone_limit_lookup(ct, zone);
 if (zl) {
-zl->czl.limit = limit;
+atomic_store_relaxed(&zl->czl.limit, limit);
 VLOG_INFO("Changed zone limit of %u for zone %d", limit, zone);
 } else {
 ovs_mutex_lock(&ct->ct_lock);
@@ -916,12 +917,16 @@ conn_not_found(struct conntrack *ct, struct dp_packet 
*pkt,
 }
 
 if (commit) {
+uint32_t czl_limit;
 struct conn_key_node *fwd_key_node, *rev_key_node;
 struct zone_limit *zl = zone_limit_lookup_or_default(ct,
  ctx->key.zone);
-if (zl && atomic_count_get(&zl->czl.count) >= zl->czl.limit) {
-COVERAGE_INC(conntrack_zone_full);
-return nc;
+if (zl) {
+atomic_read_relaxed(&zl->czl.limit, &czl_limit);
+if (atomic_count_get(&zl->czl.count) >= czl_limit) {
+COVERAGE_INC(conntrack_zone_full);
+return nc;
+}
 }
 
 unsigned int n_conn_limit;
-- 
2.46.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 5/6] conntrack: Use a per zone default limit.

2024-09-30 Thread Paolo Valerio
Before this change the default limit, instead of being considered
per-zone, was considered as a global value that every new entry was
checked against during the creation. This was not the intended
behavior as the default limit should be inherited by each zone instead
of being an aggregate number.

This change corrects that by removing the default limit from the cmap
and making it global (atomic). Now, whenever a new connection needs to
be committed, if default_zone_limit is set and the entry for the zone
doesn't exist, a new entry for that zone is lazily created, marked as
default. All subsequent packets for that zone will undergo the regular
lookup process.
To distinguish between default and user-defined entries, the storage
for the limit member of struct conntrack_zone_limit has been changed
from a 32-bit unsigned integer to a 64-bit signed integer. The
negative value ZONE_LIMIT_CONN_DEFAULT now indicates a default entry.

Operations such as creation/deletion are modified accordingly taking
into account this new behavior.

Worth noting that OVS_REQUIRES(ct->ct_lock) is not a strict
requirement for zone_limit_lookup_or_default(), however since the
function operates under the lock and it can create an entry in the
slow path, the lock requirement is enforced in order to make thread
safety checks work. The function can still be moved outside the
creation lock or any lock, keeping the fastpath lockless (turning
zone_limit_lookup_protected() to its unprotected version) and locking
only in the slow path (replacing zone_limit_create__() with
zone_limit_create__().

The patch also extends `conntrack - limit by zone` test in order to
check the behavior, and while at it, update test's packet-out to use
compose-packet function.

Fixes: a7f33fdbfb67 ("conntrack: Support zone limits.")
Reported-at: https://issues.redhat.com/browse/FDP-122
Reported-by: Ilya Maximets 
Signed-off-by: Paolo Valerio 
---
v2:
Aaron:
- Added entry in NEWS
- updated commit message mentioning the storage change for limit
---
 NEWS|   5 +
 lib/conntrack-private.h |   7 +-
 lib/conntrack.c | 233 +++-
 tests/system-traffic.at |  64 +++
 4 files changed, 236 insertions(+), 73 deletions(-)

diff --git a/NEWS b/NEWS
index 7a9626bf4..48384ab1d 100644
--- a/NEWS
+++ b/NEWS
@@ -1,5 +1,10 @@
 Post-v3.4.0
 
+   - Userspace datapath:
+ * The default zone limit, if set, is now inherited by any zone
+   that does not have a specific value defined, rather than being
+   treated as a global value, aligning the behavior with that of
+   the kernel datapath.
 
 
 v3.4.0 - 15 Aug 2024
diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index 2770470d1..46b212754 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -198,10 +198,11 @@ enum ct_ephemeral_range {
 #define FOR_EACH_PORT_IN_RANGE(curr, min, max) \
 FOR_EACH_PORT_IN_RANGE__(curr, min, max, OVS_JOIN(idx, __COUNTER__))
 
+#define ZONE_LIMIT_CONN_DEFAULT -1
 
 struct conntrack_zone_limit {
 int32_t zone;
-atomic_uint32_t limit;
+atomic_int64_t limit;
 atomic_count count;
 uint32_t zone_limit_seq; /* Used to disambiguate zone limit counts. */
 };
@@ -212,6 +213,9 @@ struct conntrack {
 struct rculist exp_lists[N_EXP_LISTS] OVS_GUARDED;
 struct cmap zone_limits OVS_GUARDED;
 struct cmap timeout_policies OVS_GUARDED;
+uint32_t zone_limit_seq OVS_GUARDED; /* Used to disambiguate zone limit
+  * counts. */
+atomic_uint32_t default_zone_limit;
 
 uint32_t hash_basis; /* Salt for hashing a connection key. */
 pthread_t clean_thread; /* Periodically cleans up connection tracker. */
@@ -234,7 +238,6 @@ struct conntrack {
  * control context.  */
 
 struct ipf *ipf; /* Fragmentation handling context. */
-uint32_t zone_limit_seq; /* Used to disambiguate zone limit counts. */
 atomic_bool tcp_seq_chk; /* Check TCP sequence numbers. */
 atomic_uint32_t sweep_ms; /* Next sweep interval. */
 };
diff --git a/lib/conntrack.c b/lib/conntrack.c
index 3d19d37df..0061a5636 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -270,6 +270,7 @@ conntrack_init(void)
 atomic_init(&ct->n_conn_limit, DEFAULT_N_CONN_LIMIT);
 atomic_init(&ct->tcp_seq_chk, true);
 atomic_init(&ct->sweep_ms, 2);
+atomic_init(&ct->default_zone_limit, 0);
 latch_init(&ct->clean_thread_exit);
 ct->clean_thread = ovs_thread_create("ct_clean", clean_thread_main, ct);
 ct->ipf = ipf_init();
@@ -296,6 +297,28 @@ zone_key_hash(int32_t zone, uint32_t basis)
 return hash;
 }
 
+static int64_t
+zone_limit_get_limit__(struct conntrack_zone_limit *czl)
+{
+int64_t limit;
+atomic_read_relaxed(&czl->limit, &limit);
+
+return limit;
+}
+
+static

[ovs-dev] [PATCH v2 6/6] dpctl: Do not allow out of range values in ct-set-limits.

2024-09-30 Thread Paolo Valerio
The ovs_scan() doesn't enforce in-range values and so
lsbits are stored in case of out-of-range or negative values.

This way negative or values greater than MAX_UINT32 for "default" are
all accepted in dpctl_ct_set_limits(), but they will eventually be
casted to uint32_t, whereas for zones all the values above are
considered invalid.

Align their behaviors and extend the tests for checking values out of
the range.

Signed-off-by: Paolo Valerio 
---
 lib/dpctl.c |  5 +++--
 tests/system-traffic.at | 42 +
 2 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/lib/dpctl.c b/lib/dpctl.c
index 77bf4bf53..2a700f24a 100644
--- a/lib/dpctl.c
+++ b/lib/dpctl.c
@@ -2169,8 +2169,8 @@ dpctl_ct_set_limits(int argc, const char *argv[],
 struct ovs_list zone_limits = OVS_LIST_INITIALIZER(&zone_limits);
 int i =  dp_arg_exists(argc, argv) ? 2 : 1;
 struct ds ds = DS_EMPTY_INITIALIZER;
+unsigned long long default_limit;
 struct dpif *dpif = NULL;
-uint32_t default_limit;
 int error;
 
 if (i >= argc) {
@@ -2186,7 +2186,8 @@ dpctl_ct_set_limits(int argc, const char *argv[],
 
 /* Parse default limit */
 if (!strncmp(argv[i], "default=", 8)) {
-if (ovs_scan(argv[i], "default=%"SCNu32, &default_limit)) {
+if (str_to_ullong(argv[i] + 8, 10, &default_limit) &&
+default_limit <= UINT32_MAX) {
 ct_dpif_push_zone_limit(&zone_limits, OVS_ZONE_LIMIT_DEFAULT_ZONE,
 default_limit, 0);
 i++;
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index fe115d92b..bcb08b0e8 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -5686,12 +5686,54 @@ priority=100,in_port=2,udp,action=ct(zone=3,commit),1
 
 AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
 
+dnl Test values out of range for the default limit.
+dnl Try to set a negative value.
+AT_CHECK([ovs-appctl dpctl/ct-set-limits default=-1], [2], [ignore], [dnl
+ovs-vswitchd: invalid default limit (Invalid argument)
+ovs-appctl: ovs-vswitchd: server returned an error
+])
+
+dnl Try to set UINT32_MAX.
+AT_CHECK([ovs-appctl dpctl/ct-set-limits default=4294967296], [2], [ignore], 
[dnl
+ovs-vswitchd: invalid default limit (Invalid argument)
+ovs-appctl: ovs-vswitchd: server returned an error
+])
+
+dnl Same range checks for zones.
+AT_CHECK([ovs-appctl dpctl/ct-set-limits zone=1,limit=-1], [2], [ignore], [dnl
+ovs-vswitchd: failed to parse field limit (Invalid argument)
+ovs-appctl: ovs-vswitchd: server returned an error
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-set-limits zone=1,limit=4294967296], [2], 
[ignore], [dnl
+ovs-vswitchd: failed to parse field limit (Invalid argument)
+ovs-appctl: ovs-vswitchd: server returned an error
+])
+
+dnl Double check no limits have been applied.
+AT_CHECK([ovs-appctl dpctl/ct-get-limits], [],[dnl
+default limit=0
+])
+
 m4_define([UDP_PKT], [m4_join([,],
   [eth_src=50:54:00:00:00:0$1,eth_dst=50:54:00:00:00:0$2,dl_type=0x0800],
   [nw_src=10.1.1.$1,nw_dst=10.1.1.$2],
   [nw_proto=17,nw_ttl=64,nw_frag=no],
   [udp_src=1,udp_dst=$3])])
 
+AT_CHECK([ovs-appctl dpctl/ct-set-limits zone=1,limit=0])
+pkt=$(ovs-ofctl compose-packet --bare "UDP_PKT([1], [2], [2])")
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=${pkt} 
actions=resubmit(,0)"])
+
+dnl Double check the zl entry exists but no connection was added.
+AT_CHECK([ovs-appctl dpctl/ct-get-limits], [],[dnl
+default limit=0
+zone=1,limit=0,count=0
+])
+
+dnl Remove limit for zone=1.
+AT_CHECK([ovs-appctl dpctl/ct-del-limits zone=1])
+
 AT_CHECK([ovs-appctl dpctl/ct-set-limits default=3])
 AT_CHECK([ovs-appctl dpctl/ct-get-limits], [],[dnl
 default limit=3
-- 
2.46.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 3/6] conntrack: Do not use atomics to report zones info.

2024-09-30 Thread Paolo Valerio
Atomics are not needed when reporting zone limits.
Remove the restriction by defining a non-atomic common structure
to report such data.
The change also access atomics using the related operations to
retrieve atomics reporting only the fields required by the requesting
level instead of relying of struct copy.

Signed-off-by: Paolo Valerio 
---
 lib/conntrack-private.h |  8 
 lib/conntrack.c | 11 ++-
 lib/conntrack.h |  9 -
 lib/dpif-netdev.c   |  6 +++---
 4 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index 6c65caa07..2c625d710 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -198,6 +198,14 @@ enum ct_ephemeral_range {
 #define FOR_EACH_PORT_IN_RANGE(curr, min, max) \
 FOR_EACH_PORT_IN_RANGE__(curr, min, max, OVS_JOIN(idx, __COUNTER__))
 
+
+struct conntrack_zone_limit {
+int32_t zone;
+uint32_t limit;
+atomic_count count;
+uint32_t zone_limit_seq; /* Used to disambiguate zone limit counts. */
+};
+
 struct conntrack {
 struct ovs_mutex ct_lock; /* Protects the following fields. */
 struct cmap conns[UINT16_MAX + 1] OVS_GUARDED;
diff --git a/lib/conntrack.c b/lib/conntrack.c
index 8cf200e06..112d43216 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -330,18 +330,19 @@ zone_limit_lookup_or_default(struct conntrack *ct, 
int32_t zone)
 return zl ? zl : zone_limit_lookup(ct, DEFAULT_ZONE);
 }
 
-struct conntrack_zone_limit
+struct conntrack_zone_info
 zone_limit_get(struct conntrack *ct, int32_t zone)
 {
-struct conntrack_zone_limit czl = {
+struct conntrack_zone_info czl = {
 .zone = DEFAULT_ZONE,
 .limit = 0,
-.count = ATOMIC_COUNT_INIT(0),
-.zone_limit_seq = 0,
+.count = 0,
 };
 struct zone_limit *zl = zone_limit_lookup_or_default(ct, zone);
 if (zl) {
-czl = zl->czl;
+czl.zone = zl->czl.zone;
+czl.limit = zl->czl.limit;
+czl.count = atomic_count_get(&zl->czl.count);
 }
 return czl;
 }
diff --git a/lib/conntrack.h b/lib/conntrack.h
index 13bb02ea9..c3136e955 100644
--- a/lib/conntrack.h
+++ b/lib/conntrack.h
@@ -115,11 +115,10 @@ struct conntrack_dump {
 uint16_t current_zone;
 };
 
-struct conntrack_zone_limit {
+struct conntrack_zone_info {
 int32_t zone;
 uint32_t limit;
-atomic_count count;
-uint32_t zone_limit_seq; /* Used to disambiguate zone limit counts. */
+unsigned int count;
 };
 
 struct timeout_policy {
@@ -161,8 +160,8 @@ int conntrack_set_sweep_interval(struct conntrack *ct, 
uint32_t ms);
 uint32_t conntrack_get_sweep_interval(struct conntrack *ct);
 bool conntrack_get_tcp_seq_chk(struct conntrack *ct);
 struct ipf *conntrack_ipf_ctx(struct conntrack *ct);
-struct conntrack_zone_limit zone_limit_get(struct conntrack *ct,
-   int32_t zone);
+struct conntrack_zone_info zone_limit_get(struct conntrack *ct,
+  int32_t zone);
 int zone_limit_update(struct conntrack *ct, int32_t zone, uint32_t limit);
 int zone_limit_delete(struct conntrack *ct, int32_t zone);
 
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 3d262463f..2a529f272 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -9732,7 +9732,7 @@ dpif_netdev_ct_get_limits(struct dpif *dpif,
struct ovs_list *zone_limits_reply)
 {
 struct dp_netdev *dp = get_dp_netdev(dpif);
-struct conntrack_zone_limit czl;
+struct conntrack_zone_info czl;
 
 if (!ovs_list_is_empty(zone_limits_request)) {
 struct ct_dpif_zone_limit *zone_limit;
@@ -9741,7 +9741,7 @@ dpif_netdev_ct_get_limits(struct dpif *dpif,
 if (czl.zone == zone_limit->zone || czl.zone == DEFAULT_ZONE) {
 ct_dpif_push_zone_limit(zone_limits_reply, zone_limit->zone,
 czl.limit,
-atomic_count_get(&czl.count));
+czl.count);
 } else {
 return EINVAL;
 }
@@ -9757,7 +9757,7 @@ dpif_netdev_ct_get_limits(struct dpif *dpif,
 czl = zone_limit_get(dp->conntrack, z);
 if (czl.zone == z) {
 ct_dpif_push_zone_limit(zone_limits_reply, z, czl.limit,
-atomic_count_get(&czl.count));
+czl.count);
 }
 }
 }
-- 
2.46.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 2/6] conntrack: Add zone limit coverage counter.

2024-09-30 Thread Paolo Valerio
Similarly to what it's done for conntrack_full, add
conntrack_zone_full increased when new entries are not added due to
reaching the zone limit.

Signed-off-by: Paolo Valerio 
---
 lib/conntrack.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index e96779e68..8cf200e06 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -50,6 +50,7 @@ COVERAGE_DEFINE(conntrack_full);
 COVERAGE_DEFINE(conntrack_l3csum_err);
 COVERAGE_DEFINE(conntrack_l4csum_err);
 COVERAGE_DEFINE(conntrack_lookup_natted_miss);
+COVERAGE_DEFINE(conntrack_zone_full);
 
 struct conn_lookup_ctx {
 struct conn_key key;
@@ -918,6 +919,7 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt,
 struct zone_limit *zl = zone_limit_lookup_or_default(ct,
  ctx->key.zone);
 if (zl && atomic_count_get(&zl->czl.count) >= zl->czl.limit) {
+COVERAGE_INC(conntrack_zone_full);
 return nc;
 }
 
-- 
2.46.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 1/6] conntrack: Correctly annotate conntrack member.

2024-09-30 Thread Paolo Valerio
While at it update no longer valid comment.

Signed-off-by: Paolo Valerio 
---
 lib/conntrack-private.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index 71367f211..6c65caa07 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -199,11 +199,12 @@ enum ct_ephemeral_range {
 FOR_EACH_PORT_IN_RANGE__(curr, min, max, OVS_JOIN(idx, __COUNTER__))
 
 struct conntrack {
-struct ovs_mutex ct_lock; /* Protects 2 following fields. */
+struct ovs_mutex ct_lock; /* Protects the following fields. */
 struct cmap conns[UINT16_MAX + 1] OVS_GUARDED;
-struct rculist exp_lists[N_EXP_LISTS];
+struct rculist exp_lists[N_EXP_LISTS] OVS_GUARDED;
 struct cmap zone_limits OVS_GUARDED;
 struct cmap timeout_policies OVS_GUARDED;
+
 uint32_t hash_basis; /* Salt for hashing a connection key. */
 pthread_t clean_thread; /* Periodically cleans up connection tracker. */
 struct latch clean_thread_exit; /* To destroy the 'clean_thread'. */
-- 
2.46.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 1/2] system-traffic: Do not rely on conncount for already tracked packets.

2024-09-03 Thread Paolo Valerio
Simon Horman  writes:

> On Mon, Sep 02, 2024 at 01:28:48PM +0200, Paolo Valerio wrote:
>> Paolo Valerio  writes:
>> 
>> > Simon Horman  writes:
>> >
>> >> On Wed, Aug 28, 2024 at 07:14:06PM +0200, Paolo Valerio wrote:
>> >>> As Long reported, kernels built without CONFIG_NETFILTER_CONNCOUNT
>> >>> result in the unexpected failure of the following tests:
>> >>> 
>> >>> conntrack - multiple zones, local
>> >>> conntrack - multi-stage pipeline, local
>> >>> conntrack - can match and clear ct_state from outside OVS
>> >>> 
>> >>> this happens because the nf_conncount turns on connection tracking and
>> >>> the above tests rely on this side effect. However, this behavior may
>> >>> be corrected in the kernel, which could, in turn, cause the tests to
>> >>> fail.
>> >>> 
>> >>> The patch removes the assumption by adding explicit iptables rules to
>> >>> attach an nf_conn template to the skb resulting tracked once hit the
>> >>> OvS pipeline.
>> >>> 
>> >>> While at it, introduce $HAVE_IPTABLES and skip tests if iptables
>> >>> binary is not present.
>> >>> 
>> >>> Reported-by: Xin Long 
>> >>> Reported-at: https://issues.redhat.com/browse/FDP-708
>> >>> Signed-off-by: Paolo Valerio 
>> >>
>> >> Hi Paolo,
>> >>
>> >> I exercised this using vng with net-next compiled using
>> >> tools/testing/selftests/net/config from the upstream kernel tree [1].
>> >>
>> >> [1] 
>> >> https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style
>> >>
>> >> The resulting config does not have CONFIG_NETFILTER_CONNCOUNT set.
>> >>
>> >
>> > Hi Simon,
>> >
>> > I used vng with net-next as well, but with a different config. In my
>> > case I simply reused a previously used config which is essentially my
>> > local one updated with olddefconfig and manually (to remove/add things).
>> >
>> >> Some observations:
>> >>
>> >> * CONFIG_NETFILTER_XT_TARGET_CT is required for -j CT
>> >>
>> >>   I don't think this is a problem (other than my own problem
>> >>   of it taking me a long time to figure that out). But it seems
>> >>   worth noting (see parentheses in previous sentence:).
>> >>
>> >
>> > It's something I was thinking about right after the patch submission.
>> > Although extensions are fairly common in the main distros, maybe we may
>> > consider checking they are present.
>> >
>> > The only ways that comes to mind to reliably check whether the
>> > extensions (both kernel and user space) are present is simply by
>> > applying a rule.
>> >
>> > I guess we can (added a diff at the bottom), given the nft discussion
>> > and the potential follow-up, change things a bit in order to better
>> > accomodate a potential migration/follow-up.
>> >
>> > WDYT?
>
> Yes, I think this is a good idea.
>
> As an aside: While looking over this I noticed CHECK_CONNTRACK_TIMEOUT.  It
> seems this relies on NF_CONNTRACK_TIMEOUT being a module (not a builtin),
> and /boot/config-$(uname -r) corresponding to the running kernel. I guess
> that is ok most of the time, but it produced a false-skip when running
> tests under vng.
>
> Another aside: I'm unsure how to create an equivalent of the following with
> nft. Do you happen to know? Perhaps we can use notrack as a feature
> test instead?
>
> iptables -t raw -I OUTPUT 1 -o $1 -j CT

This should work:

nft -f - <
>> >> * Of the tests that are updated by this patch,
>> >>   I only observed that the last one,
>> >>   "conntrack - can match and clear ct_state from outside OVS",
>> >>   fails without this patch applied.
>> >>
>> >>   I am unsure if that is something that warrants updating this
>> >>   patch or not. Or if, rather, there is an error in my testing.
>> >
>> > Thanks for testing it.
>> >
>> > Interesting.
>> > I tried the following config against net-next and it seems I can't
>> > reproduce that behaviour:
>> >
>> > vng --build --config tools/testing/selftests/net/config \
>> >   --configitem CONFIG_NF_CONNTRACK_ZONES=y \
>> >   --configitem CONFIG_NETF

Re: [ovs-dev] [PATCH v2 1/2] system-traffic: Do not rely on conncount for already tracked packets.

2024-09-03 Thread Paolo Valerio
Aaron Conole  writes:

> Paolo Valerio  writes:
>
>> Paolo Valerio  writes:
>>
>>> Simon Horman  writes:
>>>
>>>> On Wed, Aug 28, 2024 at 07:14:06PM +0200, Paolo Valerio wrote:
>>>>> As Long reported, kernels built without CONFIG_NETFILTER_CONNCOUNT
>>>>> result in the unexpected failure of the following tests:
>>>>> 
>>>>> conntrack - multiple zones, local
>>>>> conntrack - multi-stage pipeline, local
>>>>> conntrack - can match and clear ct_state from outside OVS
>>>>> 
>>>>> this happens because the nf_conncount turns on connection tracking and
>>>>> the above tests rely on this side effect. However, this behavior may
>>>>> be corrected in the kernel, which could, in turn, cause the tests to
>>>>> fail.
>>>>> 
>>>>> The patch removes the assumption by adding explicit iptables rules to
>>>>> attach an nf_conn template to the skb resulting tracked once hit the
>>>>> OvS pipeline.
>>>>> 
>>>>> While at it, introduce $HAVE_IPTABLES and skip tests if iptables
>>>>> binary is not present.
>>>>> 
>>>>> Reported-by: Xin Long 
>>>>> Reported-at: https://issues.redhat.com/browse/FDP-708
>>>>> Signed-off-by: Paolo Valerio 
>>>>
>>>> Hi Paolo,
>>>>
>>>> I exercised this using vng with net-next compiled using
>>>> tools/testing/selftests/net/config from the upstream kernel tree [1].
>>>>
>>>> [1] 
>>>> https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style
>>>>
>>>> The resulting config does not have CONFIG_NETFILTER_CONNCOUNT set.
>>>>
>>>
>>> Hi Simon,
>>>
>>> I used vng with net-next as well, but with a different config. In my
>>> case I simply reused a previously used config which is essentially my
>>> local one updated with olddefconfig and manually (to remove/add things).
>>>
>>>> Some observations:
>>>>
>>>> * CONFIG_NETFILTER_XT_TARGET_CT is required for -j CT
>>>>
>>>>   I don't think this is a problem (other than my own problem
>>>>   of it taking me a long time to figure that out). But it seems
>>>>   worth noting (see parentheses in previous sentence:).
>>>>
>>>
>>> It's something I was thinking about right after the patch submission.
>>> Although extensions are fairly common in the main distros, maybe we may
>>> consider checking they are present.
>>>
>>> The only ways that comes to mind to reliably check whether the
>>> extensions (both kernel and user space) are present is simply by
>>> applying a rule.
>>>
>>> I guess we can (added a diff at the bottom), given the nft discussion
>>> and the potential follow-up, change things a bit in order to better
>>> accomodate a potential migration/follow-up.
>>>
>>> WDYT?
>>>
>>>> * Of the tests that are updated by this patch,
>>>>   I only observed that the last one,
>>>>   "conntrack - can match and clear ct_state from outside OVS",
>>>>   fails without this patch applied.
>>>>
>>>>   I am unsure if that is something that warrants updating this
>>>>   patch or not. Or if, rather, there is an error in my testing.
>>>
>>> Thanks for testing it.
>>>
>>> Interesting.
>>> I tried the following config against net-next and it seems I can't
>>> reproduce that behaviour:
>>>
>>> vng --build --config tools/testing/selftests/net/config \
>>>   --configitem CONFIG_NF_CONNTRACK_ZONES=y \
>>>   --configitem CONFIG_NETFILTER_XT_TARGET_CT=m -v
>>>
>>> ---
>>> diff --git a/tests/system-kmod-macros.at b/tests/system-kmod-macros.at
>>> index 5203b1df8..6b5eb32fc 100644
>>> --- a/tests/system-kmod-macros.at
>>> +++ b/tests/system-kmod-macros.at
>>> @@ -267,3 +267,22 @@ m4_define([OVS_CHECK_BAREUDP],
>>>  AT_SKIP_IF([! ip link add dev ovs_bareudp0 type bareudp dstport 6635 
>>> ethertype mpls_uc 2>&1 >/dev/null])
>>>  AT_CHECK([ip link del dev ovs_bareudp0])
>>>  ])
>>> +
>>> +# CHECK_EXTERNAL_CT()
>>> +#
>>> +# Checks if packets can be tracked outside OvS.
>>> +m4_define([CHECK_EXTERNAL_CT],
>>> +[
>

Re: [ovs-dev] [PATCH v2 1/2] system-traffic: Do not rely on conncount for already tracked packets.

2024-09-02 Thread Paolo Valerio
Paolo Valerio  writes:

> Simon Horman  writes:
>
>> On Wed, Aug 28, 2024 at 07:14:06PM +0200, Paolo Valerio wrote:
>>> As Long reported, kernels built without CONFIG_NETFILTER_CONNCOUNT
>>> result in the unexpected failure of the following tests:
>>> 
>>> conntrack - multiple zones, local
>>> conntrack - multi-stage pipeline, local
>>> conntrack - can match and clear ct_state from outside OVS
>>> 
>>> this happens because the nf_conncount turns on connection tracking and
>>> the above tests rely on this side effect. However, this behavior may
>>> be corrected in the kernel, which could, in turn, cause the tests to
>>> fail.
>>> 
>>> The patch removes the assumption by adding explicit iptables rules to
>>> attach an nf_conn template to the skb resulting tracked once hit the
>>> OvS pipeline.
>>> 
>>> While at it, introduce $HAVE_IPTABLES and skip tests if iptables
>>> binary is not present.
>>> 
>>> Reported-by: Xin Long 
>>> Reported-at: https://issues.redhat.com/browse/FDP-708
>>> Signed-off-by: Paolo Valerio 
>>
>> Hi Paolo,
>>
>> I exercised this using vng with net-next compiled using
>> tools/testing/selftests/net/config from the upstream kernel tree [1].
>>
>> [1] 
>> https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style
>>
>> The resulting config does not have CONFIG_NETFILTER_CONNCOUNT set.
>>
>
> Hi Simon,
>
> I used vng with net-next as well, but with a different config. In my
> case I simply reused a previously used config which is essentially my
> local one updated with olddefconfig and manually (to remove/add things).
>
>> Some observations:
>>
>> * CONFIG_NETFILTER_XT_TARGET_CT is required for -j CT
>>
>>   I don't think this is a problem (other than my own problem
>>   of it taking me a long time to figure that out). But it seems
>>   worth noting (see parentheses in previous sentence:).
>>
>
> It's something I was thinking about right after the patch submission.
> Although extensions are fairly common in the main distros, maybe we may
> consider checking they are present.
>
> The only ways that comes to mind to reliably check whether the
> extensions (both kernel and user space) are present is simply by
> applying a rule.
>
> I guess we can (added a diff at the bottom), given the nft discussion
> and the potential follow-up, change things a bit in order to better
> accomodate a potential migration/follow-up.
>
> WDYT?
>
>> * Of the tests that are updated by this patch,
>>   I only observed that the last one,
>>   "conntrack - can match and clear ct_state from outside OVS",
>>   fails without this patch applied.
>>
>>   I am unsure if that is something that warrants updating this
>>   patch or not. Or if, rather, there is an error in my testing.
>
> Thanks for testing it.
>
> Interesting.
> I tried the following config against net-next and it seems I can't
> reproduce that behaviour:
>
> vng --build --config tools/testing/selftests/net/config \
>   --configitem CONFIG_NF_CONNTRACK_ZONES=y \
>   --configitem CONFIG_NETFILTER_XT_TARGET_CT=m -v
>
> ---
> diff --git a/tests/system-kmod-macros.at b/tests/system-kmod-macros.at
> index 5203b1df8..6b5eb32fc 100644
> --- a/tests/system-kmod-macros.at
> +++ b/tests/system-kmod-macros.at
> @@ -267,3 +267,22 @@ m4_define([OVS_CHECK_BAREUDP],
>  AT_SKIP_IF([! ip link add dev ovs_bareudp0 type bareudp dstport 6635 
> ethertype mpls_uc 2>&1 >/dev/null])
>  AT_CHECK([ip link del dev ovs_bareudp0])
>  ])
> +
> +# CHECK_EXTERNAL_CT()
> +#
> +# Checks if packets can be tracked outside OvS.
> +m4_define([CHECK_EXTERNAL_CT],
> +[
> +dnl Kernel config (CONFIG_NETFILTER_XT_TARGET_CT)
> +dnl and user space extensions need to be present.
> +AT_SKIP_IF([! iptables -t raw -I OUTPUT 1 -j CT])
> +AT_CHECK([iptables -t raw -D OUTPUT 1])
> +])
> +
> +# ADD_EXTERNAL_CT()
> +#
> +# Let conntrack start tracking the packets outside OvS.
> +m4_define([ADD_EXTERNAL_CT],
> +[
> +AT_CHECK([iptables -t raw -I OUTPUT 1 -o $1 -j CT])
> +])

on_exit here got lost for some reason.
Below the corrected diff.

---
diff --git a/tests/system-kmod-macros.at b/tests/system-kmod-macros.at
index 5203b1df8..ab89ea24b 100644
--- a/tests/system-kmod-macros.at
+++ b/tests/system-kmod-macros.at
@@ -267,3 +267,23 @@ m4_define([OVS_CHECK_BAREUDP],
 AT_SKIP_IF([! ip link add dev ovs_bareudp0 type bareudp dstport 6635 
ethertype mpls_uc 2>&1 >/dev/null])
   

Re: [ovs-dev] [PATCH v2 1/2] system-traffic: Do not rely on conncount for already tracked packets.

2024-09-02 Thread Paolo Valerio
Simon Horman  writes:

> On Wed, Aug 28, 2024 at 07:14:06PM +0200, Paolo Valerio wrote:
>> As Long reported, kernels built without CONFIG_NETFILTER_CONNCOUNT
>> result in the unexpected failure of the following tests:
>> 
>> conntrack - multiple zones, local
>> conntrack - multi-stage pipeline, local
>> conntrack - can match and clear ct_state from outside OVS
>> 
>> this happens because the nf_conncount turns on connection tracking and
>> the above tests rely on this side effect. However, this behavior may
>> be corrected in the kernel, which could, in turn, cause the tests to
>> fail.
>> 
>> The patch removes the assumption by adding explicit iptables rules to
>> attach an nf_conn template to the skb resulting tracked once hit the
>> OvS pipeline.
>> 
>> While at it, introduce $HAVE_IPTABLES and skip tests if iptables
>> binary is not present.
>> 
>> Reported-by: Xin Long 
>> Reported-at: https://issues.redhat.com/browse/FDP-708
>> Signed-off-by: Paolo Valerio 
>
> Hi Paolo,
>
> I exercised this using vng with net-next compiled using
> tools/testing/selftests/net/config from the upstream kernel tree [1].
>
> [1] 
> https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style
>
> The resulting config does not have CONFIG_NETFILTER_CONNCOUNT set.
>

Hi Simon,

I used vng with net-next as well, but with a different config. In my
case I simply reused a previously used config which is essentially my
local one updated with olddefconfig and manually (to remove/add things).

> Some observations:
>
> * CONFIG_NETFILTER_XT_TARGET_CT is required for -j CT
>
>   I don't think this is a problem (other than my own problem
>   of it taking me a long time to figure that out). But it seems
>   worth noting (see parentheses in previous sentence:).
>

It's something I was thinking about right after the patch submission.
Although extensions are fairly common in the main distros, maybe we may
consider checking they are present.

The only ways that comes to mind to reliably check whether the
extensions (both kernel and user space) are present is simply by
applying a rule.

I guess we can (added a diff at the bottom), given the nft discussion
and the potential follow-up, change things a bit in order to better
accomodate a potential migration/follow-up.

WDYT?

> * Of the tests that are updated by this patch,
>   I only observed that the last one,
>   "conntrack - can match and clear ct_state from outside OVS",
>   fails without this patch applied.
>
>   I am unsure if that is something that warrants updating this
>   patch or not. Or if, rather, there is an error in my testing.

Thanks for testing it.

Interesting.
I tried the following config against net-next and it seems I can't
reproduce that behaviour:

vng --build --config tools/testing/selftests/net/config \
  --configitem CONFIG_NF_CONNTRACK_ZONES=y \
  --configitem CONFIG_NETFILTER_XT_TARGET_CT=m -v

---
diff --git a/tests/system-kmod-macros.at b/tests/system-kmod-macros.at
index 5203b1df8..6b5eb32fc 100644
--- a/tests/system-kmod-macros.at
+++ b/tests/system-kmod-macros.at
@@ -267,3 +267,22 @@ m4_define([OVS_CHECK_BAREUDP],
 AT_SKIP_IF([! ip link add dev ovs_bareudp0 type bareudp dstport 6635 
ethertype mpls_uc 2>&1 >/dev/null])
 AT_CHECK([ip link del dev ovs_bareudp0])
 ])
+
+# CHECK_EXTERNAL_CT()
+#
+# Checks if packets can be tracked outside OvS.
+m4_define([CHECK_EXTERNAL_CT],
+[
+dnl Kernel config (CONFIG_NETFILTER_XT_TARGET_CT)
+dnl and user space extensions need to be present.
+AT_SKIP_IF([! iptables -t raw -I OUTPUT 1 -j CT])
+AT_CHECK([iptables -t raw -D OUTPUT 1])
+])
+
+# ADD_EXTERNAL_CT()
+#
+# Let conntrack start tracking the packets outside OvS.
+m4_define([ADD_EXTERNAL_CT],
+[
+AT_CHECK([iptables -t raw -I OUTPUT 1 -o $1 -j CT])
+])
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 46a9414d4..5435a6241 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -5458,12 +5458,12 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([conntrack - multiple zones, local])
-AT_SKIP_IF([test $HAVE_IPTABLES = no])
+CHECK_EXTERNAL_CT()
 CHECK_CONNTRACK()
 CHECK_CONNTRACK_LOCAL_STACK()
 OVS_TRAFFIC_VSWITCHD_START()
 
-IPTABLES_CT([br0])
+ADD_EXTERNAL_CT([br0])
 ADD_NAMESPACES(at_ns0)
 
 AT_CHECK([ip addr add dev br0 "10.1.1.1/24"])
@@ -5509,12 +5509,12 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([conntrack - multi-stage pipeline, local])
-AT_SKIP_IF([test $HAVE_IPTABLES = no])
+CHECK_EXTERNAL_CT()
 CHECK_CONNTRACK()
 CHECK_CONNTRACK_LOCAL_STACK()
 OVS_TRAFFIC_VSWITCHD_START()
 
-IPTABLES_CT([br0])
+ADD_EXTERNAL_CT([br0])
 ADD_NAMESPACES(at_ns0)
 
 AT_CHECK([ip addr add dev br0 "10.1.1.1/24"])
@@ -8392,7 +8392,7 @@ OV

[ovs-dev] [PATCH v2 2/2] ovs-macros.at: Correctly delete iptables rule on_exit.

2024-08-28 Thread Paolo Valerio
Currently, at every call of IPTABLES_ACCEPT() an iptables rule gets
added. Such rule is supposed to be removed on exit, but the current
syntax for deleting the rule is incorrect, resulting in a leftover
rule after execution.

Fix it by correcting the deletion command.

Fixes: 5e06e7ac99dc ("tests: Refactor the iptables accept rule.")
Signed-off-by: Paolo Valerio 
---
 tests/ovs-macros.at | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/ovs-macros.at b/tests/ovs-macros.at
index df2835747..4cc8e7bc8 100644
--- a/tests/ovs-macros.at
+++ b/tests/ovs-macros.at
@@ -365,7 +365,7 @@ dnl to reject input traffic from bridges such as 
br-underlay.
 dnl Add a rule to always accept the traffic.
 m4_define([IPTABLES_ACCEPT],
   [AT_CHECK([iptables -I INPUT 1 -i $1 -j ACCEPT])
-   on_exit 'iptables -D INPUT 1 -i $1'])
+   on_exit 'iptables -D INPUT 1'])
 
 dnl Required to let conntrack start tracking the packets outside ovs
 m4_define([IPTABLES_CT],
-- 
2.46.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 1/2] system-traffic: Do not rely on conncount for already tracked packets.

2024-08-28 Thread Paolo Valerio
As Long reported, kernels built without CONFIG_NETFILTER_CONNCOUNT
result in the unexpected failure of the following tests:

conntrack - multiple zones, local
conntrack - multi-stage pipeline, local
conntrack - can match and clear ct_state from outside OVS

this happens because the nf_conncount turns on connection tracking and
the above tests rely on this side effect. However, this behavior may
be corrected in the kernel, which could, in turn, cause the tests to
fail.

The patch removes the assumption by adding explicit iptables rules to
attach an nf_conn template to the skb resulting tracked once hit the
OvS pipeline.

While at it, introduce $HAVE_IPTABLES and skip tests if iptables
binary is not present.

Reported-by: Xin Long 
Reported-at: https://issues.redhat.com/browse/FDP-708
Signed-off-by: Paolo Valerio 
---

V2:
- add $HAVE_IPTABLES
- reduced subject length (0-day Robot)

Signed-off-by: Paolo Valerio 
---
 tests/atlocal.in| 3 +++
 tests/ovs-macros.at | 5 +
 tests/system-traffic.at | 8 
 3 files changed, 16 insertions(+)

diff --git a/tests/atlocal.in b/tests/atlocal.in
index 8565a0bae..d6b87f8ec 100644
--- a/tests/atlocal.in
+++ b/tests/atlocal.in
@@ -185,6 +185,9 @@ find_command lftp
 # Set HAVE_ETHTOOL
 find_command ethtool
 
+# Set HAVE_IPTABLES
+find_command iptables
+
 CURL_OPT="-g -v --max-time 1 --retry 2 --retry-delay 1 --connect-timeout 1"
 
 # Determine whether "diff" supports "normal" diffs.  (busybox diff does not.)
diff --git a/tests/ovs-macros.at b/tests/ovs-macros.at
index 06c978555..df2835747 100644
--- a/tests/ovs-macros.at
+++ b/tests/ovs-macros.at
@@ -366,3 +366,8 @@ dnl Add a rule to always accept the traffic.
 m4_define([IPTABLES_ACCEPT],
   [AT_CHECK([iptables -I INPUT 1 -i $1 -j ACCEPT])
on_exit 'iptables -D INPUT 1 -i $1'])
+
+dnl Required to let conntrack start tracking the packets outside ovs
+m4_define([IPTABLES_CT],
+  [AT_CHECK([iptables -t raw -I OUTPUT 1 -o $1 -j CT])
+   on_exit 'iptables -t raw -D OUTPUT 1'])
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 202ff0492..46a9414d4 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -1094,6 +1094,7 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/Invalid Geneve tunnel 
metadata on bridge br0 while
 AT_CLEANUP
 
 AT_SETUP([datapath - ping over gre tunnel by simulated packets])
+AT_SKIP_IF([test $HAVE_IPTABLES = no])
 OVS_CHECK_MIN_KERNEL(3, 10)
 
 OVS_TRAFFIC_VSWITCHD_START()
@@ -1140,6 +1141,7 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([datapath - ping over erspan v1 tunnel by simulated packets])
+AT_SKIP_IF([test $HAVE_IPTABLES = no])
 OVS_CHECK_MIN_KERNEL(3, 10)
 
 OVS_TRAFFIC_VSWITCHD_START()
@@ -5456,10 +5458,12 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([conntrack - multiple zones, local])
+AT_SKIP_IF([test $HAVE_IPTABLES = no])
 CHECK_CONNTRACK()
 CHECK_CONNTRACK_LOCAL_STACK()
 OVS_TRAFFIC_VSWITCHD_START()
 
+IPTABLES_CT([br0])
 ADD_NAMESPACES(at_ns0)
 
 AT_CHECK([ip addr add dev br0 "10.1.1.1/24"])
@@ -5505,10 +5509,12 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([conntrack - multi-stage pipeline, local])
+AT_SKIP_IF([test $HAVE_IPTABLES = no])
 CHECK_CONNTRACK()
 CHECK_CONNTRACK_LOCAL_STACK()
 OVS_TRAFFIC_VSWITCHD_START()
 
+IPTABLES_CT([br0])
 ADD_NAMESPACES(at_ns0)
 
 AT_CHECK([ip addr add dev br0 "10.1.1.1/24"])
@@ -8386,6 +8392,7 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([conntrack - can match and clear ct_state from outside OVS])
+AT_SKIP_IF([test $HAVE_IPTABLES = no])
 CHECK_CONNTRACK_LOCAL_STACK()
 OVS_CHECK_GENEVE()
 
@@ -8396,6 +8403,7 @@ AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
 AT_CHECK([ovs-ofctl add-flow br-underlay 
"priority=100,ct_state=+trk,actions=ct_clear,resubmit(,0)"])
 AT_CHECK([ovs-ofctl add-flow br-underlay "priority=10,actions=normal"])
 
+IPTABLES_CT([br0])
 ADD_NAMESPACES(at_ns0)
 
 dnl Set up underlay link from host into the namespace using veth pair.
-- 
2.46.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 2/2] ovs-macros.at: Correctly delete iptables rule on_exit.

2024-08-28 Thread Paolo Valerio
Currently, at every call of IPTABLES_ACCEPT() an iptables rule gets
added. Such rule is supposed to be removed on exit, but the current
syntax for deleting the rule is incorrect, resulting in a leftover
rule after execution.

Fix it by correcting the deletion command.

Fixes: 5e06e7ac99dc ("tests: Refactor the iptables accept rule.")
Signed-off-by: Paolo Valerio 
---
 tests/ovs-macros.at | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/ovs-macros.at b/tests/ovs-macros.at
index df2835747..4cc8e7bc8 100644
--- a/tests/ovs-macros.at
+++ b/tests/ovs-macros.at
@@ -365,7 +365,7 @@ dnl to reject input traffic from bridges such as 
br-underlay.
 dnl Add a rule to always accept the traffic.
 m4_define([IPTABLES_ACCEPT],
   [AT_CHECK([iptables -I INPUT 1 -i $1 -j ACCEPT])
-   on_exit 'iptables -D INPUT 1 -i $1'])
+   on_exit 'iptables -D INPUT 1'])
 
 dnl Required to let conntrack start tracking the packets outside ovs
 m4_define([IPTABLES_CT],
-- 
2.46.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 1/2] system-traffic: Do not rely on conn count for externally tracked packets.

2024-08-28 Thread Paolo Valerio
As Long reported, kernels built without CONFIG_NETFILTER_CONNCOUNT
result in the unexpected failure of the following tests:

conntrack - multiple zones, local
conntrack - multi-stage pipeline, local
conntrack - can match and clear ct_state from outside OVS

this happens because the nf_conncount turns on connection tracking and
the above tests rely on this side effect. However, this behavior may
be corrected in the kernel, which could, in turn, cause the tests to
fail.

The patch removes the assumption by adding explicit iptables rules to
attach an nf_conn template to the skb resulting tracked once hit the
OvS pipeline.

Reported-by: Xin Long 
Reported-at: https://issues.redhat.com/browse/FDP-708
Signed-off-by: Paolo Valerio 
---
 tests/ovs-macros.at | 5 +
 tests/system-traffic.at | 3 +++
 2 files changed, 8 insertions(+)

diff --git a/tests/ovs-macros.at b/tests/ovs-macros.at
index 06c978555..df2835747 100644
--- a/tests/ovs-macros.at
+++ b/tests/ovs-macros.at
@@ -366,3 +366,8 @@ dnl Add a rule to always accept the traffic.
 m4_define([IPTABLES_ACCEPT],
   [AT_CHECK([iptables -I INPUT 1 -i $1 -j ACCEPT])
on_exit 'iptables -D INPUT 1 -i $1'])
+
+dnl Required to let conntrack start tracking the packets outside ovs
+m4_define([IPTABLES_CT],
+  [AT_CHECK([iptables -t raw -I OUTPUT 1 -o $1 -j CT])
+   on_exit 'iptables -t raw -D OUTPUT 1'])
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 202ff0492..4da640604 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -5460,6 +5460,7 @@ CHECK_CONNTRACK()
 CHECK_CONNTRACK_LOCAL_STACK()
 OVS_TRAFFIC_VSWITCHD_START()
 
+IPTABLES_CT([br0])
 ADD_NAMESPACES(at_ns0)
 
 AT_CHECK([ip addr add dev br0 "10.1.1.1/24"])
@@ -5509,6 +5510,7 @@ CHECK_CONNTRACK()
 CHECK_CONNTRACK_LOCAL_STACK()
 OVS_TRAFFIC_VSWITCHD_START()
 
+IPTABLES_CT([br0])
 ADD_NAMESPACES(at_ns0)
 
 AT_CHECK([ip addr add dev br0 "10.1.1.1/24"])
@@ -8396,6 +8398,7 @@ AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
 AT_CHECK([ovs-ofctl add-flow br-underlay 
"priority=100,ct_state=+trk,actions=ct_clear,resubmit(,0)"])
 AT_CHECK([ovs-ofctl add-flow br-underlay "priority=10,actions=normal"])
 
+IPTABLES_CT([br0])
 ADD_NAMESPACES(at_ns0)
 
 dnl Set up underlay link from host into the namespace using veth pair.
-- 
2.46.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 5/5] conntrack: Use a per zone default limit.

2024-08-24 Thread Paolo Valerio
Aaron Conole  writes:

> Hi Paolo,
>
> Paolo Valerio  writes:
>
>> Before this change the default limit, instead of being considered
>> per-zone, was considered as a global value that every new entry was
>> checked against during the creation. This was not the intended
>> behavior as the default limit should be inherited by each zone instead
>> of being an aggregate number.
>>
>> This change corrects that by removing the default limit from the cmap
>> and making it global (atomic). Now, whenever a new connection needs to
>> be committed, if default_zone_limit is set and the entry for the zone
>> doesn't exist, a new entry for that zone is lazily created, marked as
>> default. All subsequent packets for that zone will undergo the regular
>> lookup process.
>>
>> Operations such as creation/deletion are modified accordingly taking
>> into account this new behavior.
>
> I think there should be some documentation about this that includes the
> expected behavior for end users.  At least something so that users can
> plan how they will set their zone limits.
>

"new behavior" isn't really something new, but more how it was supposed
to be.

dpctl man page describes how it was supposed to work, with things now
aligning with kernel dp.

Did you have anything specific in mind when you refer to the
documentation? i.e. articulating the man page, or something else.

> Some other stuff follows.
>
>> Worth noting that OVS_REQUIRES(ct->ct_lock) is not a strict
>> requirement for zone_limit_lookup_or_default(), however since the
>> function operates under the lock and it can create an entry in the
>> slow path, the lock requirement is enforced in order to make thread
>> safety checks work. The function can still be moved outside the
>> creation lock or any lock, keeping the fastpath lockless (turning
>> zone_limit_lookup_protected() to its unprotected version) and locking
>> only in the slow path (replacing zone_limit_create__() with
>> zone_limit_create__().
>>
>> The patch also extends `conntrack - limit by zone` test in order to
>> check the behavior, and while at it, update test's packet-out to use
>> compose-packet function.
>>
>> Fixes: a7f33fdbfb67 ("conntrack: Support zone limits.")
>> Reported-at: https://issues.redhat.com/browse/FDP-122
>> Reported-by: Ilya Maximets 
>> Signed-off-by: Paolo Valerio 
>> ---
>>  lib/conntrack-private.h |   7 +-
>>  lib/conntrack.c | 233 +++-
>>  tests/system-traffic.at |  64 +++
>>  3 files changed, 231 insertions(+), 73 deletions(-)
>>
>> diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
>> index 2770470d1..46b212754 100644
>> --- a/lib/conntrack-private.h
>> +++ b/lib/conntrack-private.h
>> @@ -198,10 +198,11 @@ enum ct_ephemeral_range {
>>  #define FOR_EACH_PORT_IN_RANGE(curr, min, max) \
>>  FOR_EACH_PORT_IN_RANGE__(curr, min, max, OVS_JOIN(idx, __COUNTER__))
>>  
>> +#define ZONE_LIMIT_CONN_DEFAULT -1
>>  
>>  struct conntrack_zone_limit {
>>  int32_t zone;
>> -atomic_uint32_t limit;
>> +atomic_int64_t limit;
>
> We change the zone limit max storage here, it seems.  Maybe that should
> be mentioned in the commit message.
>

Mentioning it makes sense, and I added a note in the commit msg, thanks.

Althought the storage changed here, this will not change the user facing
admitted values. It is essentially intended to accomodate
ZONE_LIMIT_CONN_DEFAULT.

>>  atomic_count count;
>>  uint32_t zone_limit_seq; /* Used to disambiguate zone limit counts. */
>>  };
>> @@ -212,6 +213,9 @@ struct conntrack {
>>  struct rculist exp_lists[N_EXP_LISTS] OVS_GUARDED;
>>  struct cmap zone_limits OVS_GUARDED;
>>  struct cmap timeout_policies OVS_GUARDED;
>> +uint32_t zone_limit_seq OVS_GUARDED; /* Used to disambiguate zone limit
>> +  * counts. */
>> +atomic_uint32_t default_zone_limit;
>>  
>>  uint32_t hash_basis; /* Salt for hashing a connection key. */
>>  pthread_t clean_thread; /* Periodically cleans up connection tracker. */
>> @@ -234,7 +238,6 @@ struct conntrack {
>>   * control context.  */
>>  
>>  struct ipf *ipf; /* Fragmentation handling context. */
>> -uint32_t zone_limit_seq; /* Used to disambiguate zone limit counts. */
>>  atomic_bool tcp_seq_chk; /* Check TCP sequence numbers. */
>>  atomic_uint32_t sweep_ms; /* Next sweep interval. */
>&g

Re: [ovs-dev] [PATCH] tests: Fix transient failure in ping6 header modify.

2024-08-19 Thread Paolo Valerio
Frode Nordahl  writes:

> The "ping6 between two ports with header modify" test currently
> fails 1 in 25 times on an idle system, and apparently 100% of the
> time on a more busy system.
>
> The failure appears to be caused by the very first packet to a
> new destination being dropped.
>
> Log excerpt:
>  ping6 -q -c 3 -i 0.3 -W 2 fc00::3 \
>  | grep "transmitted" | sed 's/time.*ms$/time 0ms/'
>  -3 packets transmitted, 3 received, 0% packet loss, time 0ms
>  +3 packets transmitted, 2 received, 33.% packet loss, time 0ms
>
> The test already primes the neighbour table by waiting for the
> first ping to two of the destinations used in the test.
>
> This patch adds the same wait condition for the third destination
> used in the test.
>
> Having run the test in an endless successful loop for a while
> makes it highly probable to have fixed the issue.
>
> Reported-at: https://launchpad.net/bugs/2077157
> Signed-off-by: Frode Nordahl 
> ---

Hi Frode,

Flakiness with ipv6 tests in some cases may depend on dad ([0]
summarizes what I found out back then).

A while ago I observed a similar problem [1]. I recall that in my local
tests where I was able to reproduce the issue I dropped the wait on
ping6 as well confirming that nodad alone fixed the issue.
For this specific case, I was able to reproduce the issue on a busy
system, and it seems that nodad solves the issue (I applied the small
diff below) and without requiring the wait-until-ping6.

In general the OVS_WAIT_UNIL([... ping6 ...]) might not be needed in any
of our tests (whereas nodad is), but of course that would require more
tests to confirm it's enough for older kernels as well.

Given the above, keeping the wait-until logic, it should be enough
to move the first ping6 after adding the flows and turning the dst addr
from fc00::2 to fc00::3, but probably keeping three pings is easier to
read, so, if you decide to go with this, the patch LGTM.

Paolo

[0] https://mail.openvswitch.org/pipermail/ovs-dev/2022-January/391115.html
[1] 
https://github.com/openvswitch/ovs/commit/989895501c53468569064e060f15bba3fc8f0cac

---

diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 202ff0492..e69926593 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -234,15 +234,9 @@ AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
 
 ADD_NAMESPACES(at_ns0, at_ns1)
 
-ADD_VETH(p0, at_ns0, br0, "fc00::1/96", e4:11:22:33:44:55)
-ADD_VETH(p1, at_ns1, br0, "fc00::2/96", e4:11:22:33:44:54)
-NS_CHECK_EXEC([at_ns0], [ip -6 neigh add fc00::3 lladdr e4:11:22:33:44:54 dev 
p0])
-
-dnl Linux seems to take a little time to get its IPv6 stack in order. Without
-dnl waiting, we get occasional failures due to the following error:
-dnl "connect: Cannot assign requested address"
-OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2])
-OVS_WAIT_UNTIL([ip netns exec at_ns1 ping6 -c 1 fc00::1])
+ADD_VETH(p0, at_ns0, br0, "fc00::1/96", e4:11:22:33:44:55, [], nodad)
+ADD_VETH(p1, at_ns1, br0, "fc00::2/96", e4:11:22:33:44:54, [], nodad)
+NS_CHECK_EXEC([at_ns0], [ip -6 neigh add fc00::3 lladdr e4:11:22:33:44:54 nud 
perm dev p0])
 
 AT_DATA([flows.txt], [dnl
 
priority=100,in_port=ovs-p0,ipv6,ipv6_src=fc00::1,ipv6_dst=fc00::3,actions=set_field:fc00::2->ipv6_dst,ovs-p1



>  tests/system-traffic.at | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index 202ff0492..bb52d2cfb 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -253,6 +253,8 @@ priority=0,actions=NORMAL
>  AT_CHECK([ovs-ofctl del-flows br0])
>  AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
>  
> +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::3])
> +
>  NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -W 2 fc00::3 | FORMAT_PING], 
> [0], [dnl
>  3 packets transmitted, 3 received, 0% packet loss, time 0ms
>  ])
> -- 
> 2.45.2
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 5/5] conntrack: Use a per zone default limit.

2024-07-01 Thread Paolo Valerio
Before this change the default limit, instead of being considered
per-zone, was considered as a global value that every new entry was
checked against during the creation. This was not the intended
behavior as the default limit should be inherited by each zone instead
of being an aggregate number.

This change corrects that by removing the default limit from the cmap
and making it global (atomic). Now, whenever a new connection needs to
be committed, if default_zone_limit is set and the entry for the zone
doesn't exist, a new entry for that zone is lazily created, marked as
default. All subsequent packets for that zone will undergo the regular
lookup process.

Operations such as creation/deletion are modified accordingly taking
into account this new behavior.

Worth noting that OVS_REQUIRES(ct->ct_lock) is not a strict
requirement for zone_limit_lookup_or_default(), however since the
function operates under the lock and it can create an entry in the
slow path, the lock requirement is enforced in order to make thread
safety checks work. The function can still be moved outside the
creation lock or any lock, keeping the fastpath lockless (turning
zone_limit_lookup_protected() to its unprotected version) and locking
only in the slow path (replacing zone_limit_create__() with
zone_limit_create__().

The patch also extends `conntrack - limit by zone` test in order to
check the behavior, and while at it, update test's packet-out to use
compose-packet function.

Fixes: a7f33fdbfb67 ("conntrack: Support zone limits.")
Reported-at: https://issues.redhat.com/browse/FDP-122
Reported-by: Ilya Maximets 
Signed-off-by: Paolo Valerio 
---
 lib/conntrack-private.h |   7 +-
 lib/conntrack.c | 233 +++-
 tests/system-traffic.at |  64 +++
 3 files changed, 231 insertions(+), 73 deletions(-)

diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index 2770470d1..46b212754 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -198,10 +198,11 @@ enum ct_ephemeral_range {
 #define FOR_EACH_PORT_IN_RANGE(curr, min, max) \
 FOR_EACH_PORT_IN_RANGE__(curr, min, max, OVS_JOIN(idx, __COUNTER__))
 
+#define ZONE_LIMIT_CONN_DEFAULT -1
 
 struct conntrack_zone_limit {
 int32_t zone;
-atomic_uint32_t limit;
+atomic_int64_t limit;
 atomic_count count;
 uint32_t zone_limit_seq; /* Used to disambiguate zone limit counts. */
 };
@@ -212,6 +213,9 @@ struct conntrack {
 struct rculist exp_lists[N_EXP_LISTS] OVS_GUARDED;
 struct cmap zone_limits OVS_GUARDED;
 struct cmap timeout_policies OVS_GUARDED;
+uint32_t zone_limit_seq OVS_GUARDED; /* Used to disambiguate zone limit
+  * counts. */
+atomic_uint32_t default_zone_limit;
 
 uint32_t hash_basis; /* Salt for hashing a connection key. */
 pthread_t clean_thread; /* Periodically cleans up connection tracker. */
@@ -234,7 +238,6 @@ struct conntrack {
  * control context.  */
 
 struct ipf *ipf; /* Fragmentation handling context. */
-uint32_t zone_limit_seq; /* Used to disambiguate zone limit counts. */
 atomic_bool tcp_seq_chk; /* Check TCP sequence numbers. */
 atomic_uint32_t sweep_ms; /* Next sweep interval. */
 };
diff --git a/lib/conntrack.c b/lib/conntrack.c
index ac0790e11..0e128a0c6 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -270,6 +270,7 @@ conntrack_init(void)
 atomic_init(&ct->n_conn_limit, DEFAULT_N_CONN_LIMIT);
 atomic_init(&ct->tcp_seq_chk, true);
 atomic_init(&ct->sweep_ms, 2);
+atomic_init(&ct->default_zone_limit, 0);
 latch_init(&ct->clean_thread_exit);
 ct->clean_thread = ovs_thread_create("ct_clean", clean_thread_main, ct);
 ct->ipf = ipf_init();
@@ -296,6 +297,28 @@ zone_key_hash(int32_t zone, uint32_t basis)
 return hash;
 }
 
+static int64_t
+zone_limit_get_limit__(struct conntrack_zone_limit *czl)
+{
+int64_t limit;
+atomic_read_relaxed(&czl->limit, &limit);
+
+return limit;
+}
+
+static int64_t
+zone_limit_get_limit(struct conntrack *ct, struct conntrack_zone_limit *czl)
+{
+int64_t limit = zone_limit_get_limit__(czl);
+
+if (limit == ZONE_LIMIT_CONN_DEFAULT) {
+atomic_read_relaxed(&ct->default_zone_limit, &limit);
+limit = limit ? : -1;
+}
+
+return limit;
+}
+
 static struct zone_limit *
 zone_limit_lookup_protected(struct conntrack *ct, int32_t zone)
 OVS_REQUIRES(ct->ct_lock)
@@ -323,11 +346,56 @@ zone_limit_lookup(struct conntrack *ct, int32_t zone)
 return NULL;
 }
 
+static struct zone_limit *
+zone_limit_create__(struct conntrack *ct, int32_t zone, int64_t limit)
+OVS_REQUIRES(ct->ct_lock)
+{
+struct zone_limit *zl = NULL;
+
+if (zone > DEFAULT_ZONE && zone <= MAX_ZONE) {
+zl = xmalloc(sizeof *zl);
+   

[ovs-dev] [PATCH 4/5] conntrack: Turn zl local limit into atomic.

2024-07-01 Thread Paolo Valerio
while at it, changes struct zone_limit initialization in
zone_limit_create() in order to use atomic init operations instead of
relying on memset() which, although correctly initializes the struct,
is semantially not aware of atomics.

Signed-off-by: Paolo Valerio 
---
 lib/conntrack-private.h |  2 +-
 lib/conntrack.c | 19 ---
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index 2c625d710..2770470d1 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -201,7 +201,7 @@ enum ct_ephemeral_range {
 
 struct conntrack_zone_limit {
 int32_t zone;
-uint32_t limit;
+atomic_uint32_t limit;
 atomic_count count;
 uint32_t zone_limit_seq; /* Used to disambiguate zone limit counts. */
 };
diff --git a/lib/conntrack.c b/lib/conntrack.c
index 0481a8c8a..ac0790e11 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -341,7 +341,7 @@ zone_limit_get(struct conntrack *ct, int32_t zone)
 struct zone_limit *zl = zone_limit_lookup_or_default(ct, zone);
 if (zl) {
 czl.zone = zl->czl.zone;
-czl.limit = zl->czl.limit;
+atomic_read_relaxed(&zl->czl.limit, &czl.limit);
 czl.count = atomic_count_get(&zl->czl.count);
 }
 return czl;
@@ -358,8 +358,9 @@ zone_limit_create(struct conntrack *ct, int32_t zone, 
uint32_t limit)
 }
 
 if (zone >= DEFAULT_ZONE && zone <= MAX_ZONE) {
-zl = xzalloc(sizeof *zl);
-zl->czl.limit = limit;
+zl = xmalloc(sizeof *zl);
+atomic_init(&zl->czl.limit, limit);
+atomic_count_init(&zl->czl.count, 0);
 zl->czl.zone = zone;
 zl->czl.zone_limit_seq = ct->zone_limit_seq++;
 uint32_t hash = zone_key_hash(zone, ct->hash_basis);
@@ -376,7 +377,7 @@ zone_limit_update(struct conntrack *ct, int32_t zone, 
uint32_t limit)
 int err = 0;
 struct zone_limit *zl = zone_limit_lookup(ct, zone);
 if (zl) {
-zl->czl.limit = limit;
+atomic_store_relaxed(&zl->czl.limit, limit);
 VLOG_INFO("Changed zone limit of %u for zone %d", limit, zone);
 } else {
 ovs_mutex_lock(&ct->ct_lock);
@@ -916,12 +917,16 @@ conn_not_found(struct conntrack *ct, struct dp_packet 
*pkt,
 }
 
 if (commit) {
+uint32_t czl_limit;
 struct conn_key_node *fwd_key_node, *rev_key_node;
 struct zone_limit *zl = zone_limit_lookup_or_default(ct,
  ctx->key.zone);
-if (zl && atomic_count_get(&zl->czl.count) >= zl->czl.limit) {
-COVERAGE_INC(conntrack_zone_full);
-return nc;
+if (zl) {
+atomic_read_relaxed(&zl->czl.limit, &czl_limit);
+if (atomic_count_get(&zl->czl.count) >= czl_limit) {
+COVERAGE_INC(conntrack_zone_full);
+return nc;
+}
 }
 
 unsigned int n_conn_limit;
-- 
2.45.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 3/5] conntrack: Do not use atomics to report zones info.

2024-07-01 Thread Paolo Valerio
Atomics are not needed when reporting zone limits.
Remove the restriction by defining a non-atomic common structure
to report such data.
The change also access atomics using the related operations to
retrieve atomics reporting only the fields required by the requesting
level instead of relying of struct copy.

Signed-off-by: Paolo Valerio 
---
 lib/conntrack-private.h |  8 
 lib/conntrack.c | 11 ++-
 lib/conntrack.h |  9 -
 lib/dpif-netdev.c   |  6 +++---
 4 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index 6c65caa07..2c625d710 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -198,6 +198,14 @@ enum ct_ephemeral_range {
 #define FOR_EACH_PORT_IN_RANGE(curr, min, max) \
 FOR_EACH_PORT_IN_RANGE__(curr, min, max, OVS_JOIN(idx, __COUNTER__))
 
+
+struct conntrack_zone_limit {
+int32_t zone;
+uint32_t limit;
+atomic_count count;
+uint32_t zone_limit_seq; /* Used to disambiguate zone limit counts. */
+};
+
 struct conntrack {
 struct ovs_mutex ct_lock; /* Protects the following fields. */
 struct cmap conns[UINT16_MAX + 1] OVS_GUARDED;
diff --git a/lib/conntrack.c b/lib/conntrack.c
index e90ade32f..0481a8c8a 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -330,18 +330,19 @@ zone_limit_lookup_or_default(struct conntrack *ct, 
int32_t zone)
 return zl ? zl : zone_limit_lookup(ct, DEFAULT_ZONE);
 }
 
-struct conntrack_zone_limit
+struct conntrack_zone_info
 zone_limit_get(struct conntrack *ct, int32_t zone)
 {
-struct conntrack_zone_limit czl = {
+struct conntrack_zone_info czl = {
 .zone = DEFAULT_ZONE,
 .limit = 0,
-.count = ATOMIC_COUNT_INIT(0),
-.zone_limit_seq = 0,
+.count = 0,
 };
 struct zone_limit *zl = zone_limit_lookup_or_default(ct, zone);
 if (zl) {
-czl = zl->czl;
+czl.zone = zl->czl.zone;
+czl.limit = zl->czl.limit;
+czl.count = atomic_count_get(&zl->czl.count);
 }
 return czl;
 }
diff --git a/lib/conntrack.h b/lib/conntrack.h
index 13bb02ea9..c3136e955 100644
--- a/lib/conntrack.h
+++ b/lib/conntrack.h
@@ -115,11 +115,10 @@ struct conntrack_dump {
 uint16_t current_zone;
 };
 
-struct conntrack_zone_limit {
+struct conntrack_zone_info {
 int32_t zone;
 uint32_t limit;
-atomic_count count;
-uint32_t zone_limit_seq; /* Used to disambiguate zone limit counts. */
+unsigned int count;
 };
 
 struct timeout_policy {
@@ -161,8 +160,8 @@ int conntrack_set_sweep_interval(struct conntrack *ct, 
uint32_t ms);
 uint32_t conntrack_get_sweep_interval(struct conntrack *ct);
 bool conntrack_get_tcp_seq_chk(struct conntrack *ct);
 struct ipf *conntrack_ipf_ctx(struct conntrack *ct);
-struct conntrack_zone_limit zone_limit_get(struct conntrack *ct,
-   int32_t zone);
+struct conntrack_zone_info zone_limit_get(struct conntrack *ct,
+  int32_t zone);
 int zone_limit_update(struct conntrack *ct, int32_t zone, uint32_t limit);
 int zone_limit_delete(struct conntrack *ct, int32_t zone);
 
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index c7f9e1490..3fbfcfa2b 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -9729,7 +9729,7 @@ dpif_netdev_ct_get_limits(struct dpif *dpif,
struct ovs_list *zone_limits_reply)
 {
 struct dp_netdev *dp = get_dp_netdev(dpif);
-struct conntrack_zone_limit czl;
+struct conntrack_zone_info czl;
 
 if (!ovs_list_is_empty(zone_limits_request)) {
 struct ct_dpif_zone_limit *zone_limit;
@@ -9738,7 +9738,7 @@ dpif_netdev_ct_get_limits(struct dpif *dpif,
 if (czl.zone == zone_limit->zone || czl.zone == DEFAULT_ZONE) {
 ct_dpif_push_zone_limit(zone_limits_reply, zone_limit->zone,
 czl.limit,
-atomic_count_get(&czl.count));
+czl.count);
 } else {
 return EINVAL;
 }
@@ -9754,7 +9754,7 @@ dpif_netdev_ct_get_limits(struct dpif *dpif,
 czl = zone_limit_get(dp->conntrack, z);
 if (czl.zone == z) {
 ct_dpif_push_zone_limit(zone_limits_reply, z, czl.limit,
-atomic_count_get(&czl.count));
+czl.count);
 }
 }
 }
-- 
2.45.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 2/5] conntrack: Add zone limit coverage counter.

2024-07-01 Thread Paolo Valerio
Similarly to what it's done for conntrack_full, add
conntrack_zone_full increased when new entries are not added due to
reaching the zone limit.

Signed-off-by: Paolo Valerio 
---
 lib/conntrack.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index db44f8237..e90ade32f 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -50,6 +50,7 @@ COVERAGE_DEFINE(conntrack_full);
 COVERAGE_DEFINE(conntrack_l3csum_err);
 COVERAGE_DEFINE(conntrack_l4csum_err);
 COVERAGE_DEFINE(conntrack_lookup_natted_miss);
+COVERAGE_DEFINE(conntrack_zone_full);
 
 struct conn_lookup_ctx {
 struct conn_key key;
@@ -918,6 +919,7 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt,
 struct zone_limit *zl = zone_limit_lookup_or_default(ct,
  ctx->key.zone);
 if (zl && atomic_count_get(&zl->czl.count) >= zl->czl.limit) {
+COVERAGE_INC(conntrack_zone_full);
 return nc;
 }
 
-- 
2.45.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 1/5] conntrack: Correctly annotate conntrack member.

2024-07-01 Thread Paolo Valerio
While at it update no longer valid comment.

Signed-off-by: Paolo Valerio 
---
 lib/conntrack-private.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index 71367f211..6c65caa07 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -199,11 +199,12 @@ enum ct_ephemeral_range {
 FOR_EACH_PORT_IN_RANGE__(curr, min, max, OVS_JOIN(idx, __COUNTER__))
 
 struct conntrack {
-struct ovs_mutex ct_lock; /* Protects 2 following fields. */
+struct ovs_mutex ct_lock; /* Protects the following fields. */
 struct cmap conns[UINT16_MAX + 1] OVS_GUARDED;
-struct rculist exp_lists[N_EXP_LISTS];
+struct rculist exp_lists[N_EXP_LISTS] OVS_GUARDED;
 struct cmap zone_limits OVS_GUARDED;
 struct cmap timeout_policies OVS_GUARDED;
+
 uint32_t hash_basis; /* Salt for hashing a connection key. */
 pthread_t clean_thread; /* Periodically cleans up connection tracker. */
 struct latch clean_thread_exit; /* To destroy the 'clean_thread'. */
-- 
2.45.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 0/5] Fix default zone limit

2024-07-01 Thread Paolo Valerio
Ilya reported a problem with zone limits in particular when a default is
set but a zone specific limit is not.
 
As per subject, the series (path 5/5) addresses this issue by creating an
entry in the zone limit map as soon as the first packet for a zone is seen
(and the default limit is in place).
A test was extended to exercise the behavior and make sure the kernel
and userspace datapath behave consistently.

Paolo Valerio (5):
  conntrack: Correctly annotate conntrack member.
  conntrack: Add zone limit coverage counter.
  conntrack: Do not use atomics to report zones info.
  conntrack: Turn zl local limit into atomic.
  conntrack: Use a per zone default limit.

 lib/conntrack-private.h |  18 ++-
 lib/conntrack.c | 241 +++-
 lib/conntrack.h |   9 +-
 lib/dpif-netdev.c   |   6 +-
 tests/system-traffic.at |  64 ---
 5 files changed, 256 insertions(+), 82 deletions(-)

-- 
2.45.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 2/2] ipf: Handle common case of ipf defragmentation.

2024-06-02 Thread Paolo Valerio
Mike Pattrick  writes:

> When conntrack is reassembling packet fragments, the same reassembly
> context can be shared across multiple threads handling different packets
> simultaneously. Once a full packet is assembled, it is added to a packet
> batch for processing, in the case where there are multiple different pmd
> threads accessing conntrack simultaneously, there is a race condition
> where the reassembled packet may be added to an arbitrary batch even if
> the current batch is available.
>
> When this happens, the packet may be handled incorrectly as it is
> inserted into a random openflow execution pipeline, instead of the
> pipeline for that packets flow.
>
> This change makes a best effort attempt to try to add the defragmented
> packet to the current batch. directly. This should succeed most of the
> time.
>
> Fixes: 4ea96698f667 ("Userspace datapath: Add fragmentation handling.")
> Reported-at: https://issues.redhat.com/browse/FDP-560
> Signed-off-by: Mike Pattrick 
> ---

Acked-by: Paolo Valerio 

>  lib/ipf.c | 27 ---
>  1 file changed, 20 insertions(+), 7 deletions(-)
>
> diff --git a/lib/ipf.c b/lib/ipf.c
> index 3c8960be3..2d715f5e9 100644
> --- a/lib/ipf.c
> +++ b/lib/ipf.c
> @@ -506,13 +506,15 @@ ipf_reassemble_v6_frags(struct ipf_list *ipf_list)
>  }
>  
>  /* Called when a frag list state transitions to another state. This is
> - * triggered by new fragment for the list being received.*/
> -static void
> +* triggered by new fragment for the list being received. Returns a 
> reassembled
> +* packet if this fragment has completed one. */
> +static struct reassembled_pkt *
>  ipf_list_state_transition(struct ipf *ipf, struct ipf_list *ipf_list,
>bool ff, bool lf, bool v6)
>  OVS_REQUIRES(ipf->ipf_lock)
>  {
>  enum ipf_list_state curr_state = ipf_list->state;
> +struct reassembled_pkt *ret = NULL;
>  enum ipf_list_state next_state;
>  switch (curr_state) {
>  case IPF_LIST_STATE_UNUSED:
> @@ -562,12 +564,15 @@ ipf_list_state_transition(struct ipf *ipf, struct 
> ipf_list *ipf_list,
>  ipf_reassembled_list_add(&ipf->reassembled_pkt_list, rp);
>  ipf_expiry_list_remove(ipf_list);
>  next_state = IPF_LIST_STATE_COMPLETED;
> +ret = rp;
>  } else {
>  next_state = IPF_LIST_STATE_REASS_FAIL;
>  }
>  }
>  }
>  ipf_list->state = next_state;
> +
> +return ret;
>  }
>  
>  /* Some sanity checks are redundant, but prudent, in case code paths for
> @@ -799,7 +804,8 @@ ipf_is_frag_duped(const struct ipf_frag *frag_list, int 
> last_inuse_idx,
>  static bool
>  ipf_process_frag(struct ipf *ipf, struct ipf_list *ipf_list,
>   struct dp_packet *pkt, uint16_t start_data_byte,
> - uint16_t end_data_byte, bool ff, bool lf, bool v6)
> + uint16_t end_data_byte, bool ff, bool lf, bool v6,
> + struct reassembled_pkt **rp)
>  OVS_REQUIRES(ipf->ipf_lock)
>  {
>  bool duped_frag = ipf_is_frag_duped(ipf_list->frag_list,
> @@ -820,7 +826,7 @@ ipf_process_frag(struct ipf *ipf, struct ipf_list 
> *ipf_list,
>  ipf_list->last_inuse_idx++;
>  atomic_count_inc(&ipf->nfrag);
>  ipf_count(ipf, v6, IPF_NFRAGS_ACCEPTED);
> -ipf_list_state_transition(ipf, ipf_list, ff, lf, v6);
> +*rp = ipf_list_state_transition(ipf, ipf_list, ff, lf, v6);
>  } else {
>  OVS_NOT_REACHED();
>  }
> @@ -853,7 +859,8 @@ ipf_list_init(struct ipf_list *ipf_list, struct 
> ipf_list_key *key,
>   * to a list of fragemnts. */
>  static bool
>  ipf_handle_frag(struct ipf *ipf, struct dp_packet *pkt, ovs_be16 dl_type,
> -uint16_t zone, long long now, uint32_t hash_basis)
> +uint16_t zone, long long now, uint32_t hash_basis,
> +struct reassembled_pkt **rp)
>  OVS_REQUIRES(ipf->ipf_lock)
>  {
>  struct ipf_list_key key;
> @@ -922,7 +929,7 @@ ipf_handle_frag(struct ipf *ipf, struct dp_packet *pkt, 
> ovs_be16 dl_type,
>  }
>  
>  return ipf_process_frag(ipf, ipf_list, pkt, start_data_byte,
> -end_data_byte, ff, lf, v6);
> +end_data_byte, ff, lf, v6, rp);
>  }
>  
>  /* Filters out fragments from a batch of fragments and adjust the batch. */
> @@ -941,11 +948,17 @@ ipf_extract_frags_from_batch(struct ipf *ipf, struct 
> dp_packet_batch *pb,
>||
>   

Re: [ovs-dev] [PATCH v2 1/2] ipf: Only add fragments to batch of same dl_type.

2024-06-02 Thread Paolo Valerio
Mike Pattrick  writes:

> When conntrack is reassembling packet fragments, the same reassembly
> context can be shared across multiple threads handling different packets
> simultaneously. Once a full packet is assembled, it is added to a packet
> batch for processing, this is most likely the batch that added it in the
> first place, but that isn't a guarantee.
>
> The packets in these batches should be segregated by network protocol
> version (ipv4 vs ipv6) for conntrack defragmentation to function
> appropriately. However, there are conditions where we would add a
> reassembled packet of one type to a batch of another.
>
> This change introduces checks to make sure that reassembled or expired
> fragments are only added to packet batches of the same type.
>
> Fixes: 4ea96698f667 ("Userspace datapath: Add fragmentation handling.")
> Reported-at: https://issues.redhat.com/browse/FDP-560
> Signed-off-by: Mike Pattrick 
> ---

Acked-by: Paolo Valerio 

>  lib/ipf.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/lib/ipf.c b/lib/ipf.c
> index 7d74e2c13..3c8960be3 100644
> --- a/lib/ipf.c
> +++ b/lib/ipf.c
> @@ -1063,6 +1063,9 @@ ipf_send_completed_frags(struct ipf *ipf, struct 
> dp_packet_batch *pb,
>  struct ipf_list *ipf_list;
>  
>  LIST_FOR_EACH_SAFE (ipf_list, list_node, &ipf->frag_complete_list) {
> +if ((ipf_list->key.dl_type == htons(ETH_TYPE_IPV6)) != v6) {
> +continue;
> +}
>  if (ipf_send_frags_in_list(ipf, ipf_list, pb, 
> IPF_FRAG_COMPLETED_LIST,
> v6, now)) {
>  ipf_completed_list_clean(&ipf->frag_lists, ipf_list);
> @@ -1096,6 +1099,9 @@ ipf_send_expired_frags(struct ipf *ipf, struct 
> dp_packet_batch *pb,
>  size_t lists_removed = 0;
>  
>  LIST_FOR_EACH_SAFE (ipf_list, list_node, &ipf->frag_exp_list) {
> +if ((ipf_list->key.dl_type == htons(ETH_TYPE_IPV6)) != v6) {
> +continue;
> +}
>  if (now <= ipf_list->expiration ||
>  lists_removed >= IPF_FRAG_LIST_MAX_EXPIRED) {
>  break;
> @@ -1116,7 +1122,8 @@ ipf_send_expired_frags(struct ipf *ipf, struct 
> dp_packet_batch *pb,
>  /* Adds a reassmebled packet to a packet batch to be processed by the caller.
>   */
>  static void
> -ipf_execute_reass_pkts(struct ipf *ipf, struct dp_packet_batch *pb)
> +ipf_execute_reass_pkts(struct ipf *ipf, struct dp_packet_batch *pb,
> +   ovs_be16 dl_type)
>  {
>  if (ovs_list_is_empty(&ipf->reassembled_pkt_list)) {
>  return;
> @@ -1127,6 +1134,7 @@ ipf_execute_reass_pkts(struct ipf *ipf, struct 
> dp_packet_batch *pb)
>  
>  LIST_FOR_EACH_SAFE (rp, rp_list_node, &ipf->reassembled_pkt_list) {
>  if (!rp->list->reass_execute_ctx &&
> +rp->list->key.dl_type == dl_type &&
>  ipf_dp_packet_batch_add(pb, rp->pkt, false)) {
>  rp->list->reass_execute_ctx = rp->pkt;
>  }
> @@ -1237,7 +1245,7 @@ ipf_preprocess_conntrack(struct ipf *ipf, struct 
> dp_packet_batch *pb,
>  }
>  
>  if (ipf_get_enabled(ipf) || atomic_count_get(&ipf->nfrag)) {
> -ipf_execute_reass_pkts(ipf, pb);
> +ipf_execute_reass_pkts(ipf, pb, dl_type);
>  }
>  }
>  
> -- 
> 2.39.3
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 1/8] netdev-linux: Fix possible int overflow in tc_add_matchall_policer().

2024-05-31 Thread Paolo Valerio
Eelco Chaudron  writes:

> Fix unintentional integer overflow reported by Coverity by adding
> the ULL suffix to the numerical literals used in the multiplications.
>
> Fixes: ed2300cca0d3 ("netdev-linux: Refactor put police action netlink 
> message")
> Acked-by: Mike Pattrick 
> Signed-off-by: Eelco Chaudron 
> ---

Acked-by: Paolo Valerio 

>  lib/netdev-linux.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> index 25349c605..eb0c5c624 100644
> --- a/lib/netdev-linux.c
> +++ b/lib/netdev-linux.c
> @@ -2915,8 +2915,8 @@ tc_add_matchall_policer(struct netdev *netdev, uint64_t 
> kbits_rate,
>  basic_offset = nl_msg_start_nested(&request, TCA_OPTIONS);
>  action_offset = nl_msg_start_nested(&request, TCA_MATCHALL_ACT);
>  nl_msg_put_act_police(&request, 0, kbits_rate, kbits_burst,
> -  kpkts_rate * 1000, kpkts_burst * 1000, 
> TC_ACT_UNSPEC,
> -  false);
> +  kpkts_rate * 1000ULL, kpkts_burst * 1000ULL,
> +  TC_ACT_UNSPEC, false);
>  nl_msg_end_nested(&request, action_offset);
>  nl_msg_end_nested(&request, basic_offset);
>  
> -- 
> 2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 7/8] db-ctl-base: Initialize the output variable in the ctx structure.

2024-05-31 Thread Paolo Valerio
Eelco Chaudron  writes:

> Coverity was flagged that the uninitialized output variable was used
> in the ctl_context_init_command() function. This patch initializes
> the variable.
>
> In addition it also destroys the ds string in ctl_context_done()
> in case it's not cleared properly.
>
> Fixes: 07ff77ccb82a ("db-ctl-base: Make common database command code into 
> library.")
> Signed-off-by: Eelco Chaudron 
> ---

Acked-by: Paolo Valerio 

>  lib/db-ctl-base.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/lib/db-ctl-base.c b/lib/db-ctl-base.c
> index 3a8068b12..b3e9b92d1 100644
> --- a/lib/db-ctl-base.c
> +++ b/lib/db-ctl-base.c
> @@ -2656,6 +2656,7 @@ ctl_context_init(struct ctl_context *ctx, struct 
> ctl_command *command,
>   struct ovsdb_symbol_table *symtab,
>   void (*invalidate_cache_cb)(struct ctl_context *))
>  {
> +ds_init(&ctx->output);
>  if (command) {
>  ctl_context_init_command(ctx, command, false);
>  }
> @@ -2688,6 +2689,7 @@ ctl_context_done(struct ctl_context *ctx,
>  ctl_context_done_command(ctx, command);
>  }
>  invalidate_cache(ctx);
> +ds_destroy(&ctx->output);
>  }
>  
>  char * OVS_WARN_UNUSED_RESULT
> -- 
> 2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 2/8] cfm: Fix possible integer overflow in tc_add_matchall_policer().

2024-05-31 Thread Paolo Valerio
Eelco Chaudron  writes:

> Fix unintentional integer overflow reported by Coverity by adding
> the LL suffix to the numerical literals used in the multiplication.
>
> Fixes: 5767a79a4059 ("cfm: Require ccm received in demand mode.")
> Acked-by: Mike Pattrick 
> Signed-off-by: Eelco Chaudron 
> ---

Acked-by: Paolo Valerio 

>  lib/cfm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/lib/cfm.c b/lib/cfm.c
> index c3742f3de..7eb080157 100644
> --- a/lib/cfm.c
> +++ b/lib/cfm.c
> @@ -863,7 +863,7 @@ cfm_process_heartbeat(struct cfm *cfm, const struct 
> dp_packet *p)
>  rmp->num_health_ccm++;
>  if (cfm->demand) {
>  timer_set_duration(&cfm->demand_rx_ccm_t,
> -   100 * cfm->ccm_interval_ms);
> +   100LL * cfm->ccm_interval_ms);
>  }
>  }
>  rmp->recv = true;
> -- 
> 2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 6/8] ofproto-dpif: Define age as time_t in ofproto_unixctl_fdb_add().

2024-05-28 Thread Paolo Valerio
Eelco Chaudron  writes:

> Fix the warning from Coverity about potential truncation of the
> time_t value when copying to a local variable by changing the
> local variable's type to time_t.
>
> ccc24fc88d59 ("ofproto-dpif: APIs and CLI option to add/delete static fdb 
> entry.")

It seems "Fixes:" slipped out here.
I guess this could be fixed while applying.
That aside,

Acked-by: Paolo Valerio 

> Signed-off-by: Eelco Chaudron 
> ---
>  ofproto/ofproto-dpif.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
> index 32d037be6..fcd7cd753 100644
> --- a/ofproto/ofproto-dpif.c
> +++ b/ofproto/ofproto-dpif.c
> @@ -6097,7 +6097,7 @@ ofproto_unixctl_fdb_add(struct unixctl_conn *conn, int 
> argc OVS_UNUSED,
>  const char *port_name = argv[2];
>  uint16_t vlan = atoi(argv[3]);
>  struct eth_addr mac;
> -int age;
> +time_t age;
>  
>  ofproto = ofproto_dpif_lookup_by_name(br_name);
>  if (!ofproto) {
> -- 
> 2.44.0
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 2/2] conntrack: Key connections by zone.

2024-05-13 Thread Paolo Valerio
Hi Peng,

Peng He  writes:

> To seperate into N cmaps, why not use hash value divided by N?
>

FWIW, I think it makes sense to discuss the potential benefits of other
approaches as well.
They may even end up not being as performant as this one, but also some
points to consider here are:

- the number of zones used in the common case (or even for the specific
  use case as the expectation is that the fewer the zones involved, the
  smaller the benefit)
- given flush per zone is where most of the gain is, the flush per zone
  for the use case

As a last remark, partitioning per zone also implies a substantial
design change that may potentially result in contrast with other
approaches targeting the overall performance (e.g., [0] is a quick
example that comes to mind with good scalability improvements in cps,
and probably, but this is just a guess, measurable improvements in the
same ct execute test).

[0] 
https://patchwork.ozlabs.org/project/openvswitch/patch/165668250987.1967719.7371616138630033269.st...@fed.void/

> Simon Horman  于2024年5月1日周三 19:06写道:
>
>> On Wed, Apr 24, 2024 at 02:44:54PM +0200, Felix Huettner via dev wrote:
>> > Currently conntrack uses a single large cmap for all connections stored.
>> > This cmap contains all connections for all conntrack zones which are
>> > completely separate from each other. By separating each zone to its own
>> > cmap we can significantly optimize the performance when using multiple
>> > zones.
>> >
>> > The change fixes a similar issue as [1] where slow conntrack zone flush
>> > operations significantly slow down OVN router failover. The difference is
>> > just that this fix is used whith dpdk, while [1] was when using the ovs
>> > kernel module.
>> >
>> > As we now need to store more cmap's the memory usage of struct conntrack
>> > increases by 524280 bytes. Additionally we need 65535 cmaps with 128
>> > bytes each. This leads to a total memory increase of around 10MB.
>> >
>> > Running "./ovstest test-conntrack benchmark 4 33554432 32 1" shows no
>> > real difference in the multithreading behaviour against a single zone.
>> >
>> > Running the new "./ovstest test-conntrack benchmark-zones" show
>> > significant speedups as shown below. The values for "ct execute" are for
>> > acting on the complete zone with all its entries in total (so in the
>> > first case adding 10,000 new conntrack entries). All tests are run 1000
>> > times.
>> >
>> > When running with 1,000 zones with 10,000 entries each we see the
>> > following results (all in microseconds):
>> > "./ovstest test-conntrack benchmark-zones 1 1000 1000"
>> >
>> >  +--++-+-+
>> >  |  Min |   Max  |  95%ile |   Avg   |
>> > ++--++-+-+
>> > | ct execute (commit)|  || | |
>> > |with commit | 2266 |   3505 | 2707.06 | 2592.06 |
>> > | without commit | 2411 |  12730 | 4432.50 | 2736.78 |
>> > ++--++-+-+
>> > | ct execute (no commit) |  || | |
>> > |with commit |  699 |   1238 |  886.15 |  722.67 |
>> > | without commit |  700 |   3377 | 1934.42 |  803.53 |
>> > ++--++-+-+
>> > | flush full zone|  || | |
>> > |with commit |  619 |   1122 |  901.36 |  679.15 |
>> > | without commit |  618 | 105078 |   64591 | 2886.46 |
>> > ++--++-+-+
>> > | flush empty zone   |  || | |
>> > |with commit |0 |  5 |1.00 |0.64 |
>> > | without commit |   54 |  87469 |   64520 | 2172.25 |
>> > ++--++-+-+
>> >
>> > When running with 10,000 zones with 1,000 entries each we see the
>> > following results (all in microseconds):
>> > "./ovstest test-conntrack benchmark-zones 1000 1 1000"
>> >
>> >  +--++-+-+
>> >  |  Min |   Max  |  95%ile |   Avg   |
>> > ++--++-+-+
>> > | ct execute (commit)|  || | |
>> > |with commit |  215 |287 |  231.88 |  222.30 |
>> > | without commit |  214 |   1692 |  569.18 |  285.83 |
>> > ++--++-+-+
>> > | ct execute (no commit) |  || | |
>> > |with commit |   68 | 97 |   74.69 |   70.09 |
>> > | without commit |   68 |300 |  158.40 |   82.06 |
>> > ++--++-+-+
>> > | flush full zone|  || | |
>> > |with commit |   47 |211 |   56.34 |   50.34 |
>> > | withou

[ovs-dev] [PATCH v2] conntrack: Fully initialize conn struct before insertion.

2024-05-10 Thread Paolo Valerio
From: Mike Pattrick 

In case packets are concurrently received in both directions, there's
a chance that the ones in the reverse direction get received right
after the connection gets added to the connection tracker but before
some of the connection's fields are fully initialized.
This could cause OVS to access potentially invalid, as the lookup may
end up retrieving the wrong offsets during CONTAINER_OF(), or
uninitialized memory.

This may happen in case of regular NAT or all-zero SNAT.

Fix it by initializing early the connections fields.

Fixes: 1116459b3ba8 ("conntrack: Remove nat_conn introducing key 
directionality.")
Reported-at: https://issues.redhat.com/browse/FDP-616
Signed-off-by: Mike Pattrick 
Co-authored-by: Paolo Valerio 
Signed-off-by: Paolo Valerio 
---
 lib/conntrack.c | 24 +++-
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 16e1c8bb5..5fdfe98de 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -947,6 +947,18 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt,
 nc->parent_key = alg_exp->parent_key;
 }
 
+ovs_mutex_init_adaptive(&nc->lock);
+atomic_flag_clear(&nc->reclaimed);
+fwd_key_node->dir = CT_DIR_FWD;
+rev_key_node->dir = CT_DIR_REV;
+
+if (zl) {
+nc->admit_zone = zl->czl.zone;
+nc->zone_limit_seq = zl->czl.zone_limit_seq;
+} else {
+nc->admit_zone = INVALID_ZONE;
+}
+
 if (nat_action_info) {
 nc->nat_action = nat_action_info->nat_action;
 
@@ -972,22 +984,16 @@ conn_not_found(struct conntrack *ct, struct dp_packet 
*pkt,
 &rev_key_node->cm_node, rev_hash);
 }
 
-ovs_mutex_init_adaptive(&nc->lock);
-atomic_flag_clear(&nc->reclaimed);
-fwd_key_node->dir = CT_DIR_FWD;
-rev_key_node->dir = CT_DIR_REV;
 cmap_insert(&ct->conns[ctx->key.zone],
 &fwd_key_node->cm_node, ctx->hash);
 conn_expire_push_front(ct, nc);
 atomic_count_inc(&ct->n_conn);
-ctx->conn = nc; /* For completeness. */
+
 if (zl) {
-nc->admit_zone = zl->czl.zone;
-nc->zone_limit_seq = zl->czl.zone_limit_seq;
 atomic_count_inc(&zl->czl.count);
-} else {
-nc->admit_zone = INVALID_ZONE;
 }
+
+ctx->conn = nc; /* For completeness. */
 }
 
 return nc;
-- 
2.45.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] Subject: conntrack: Fully initialize conn struct before insertion.

2024-05-10 Thread Paolo Valerio
From: Mike Pattrick 

In case packets are concurrently received in both directions, there's
a chance that the ones in the reverse direction get received right
after the connection gets added to the connection tracker but before
some of the connection's fields are fully initialized.
This could cause OVS to access potentially invalid, as the lookup may
end up retrieving the wrong offsets during CONTAINER_OF(), or
uninitialized memory.

This may happen in case of regular NAT or all-zero SNAT.

Fix it by initializing early the connections fields.

Fixes: 1116459b3ba8 ("conntrack: Remove nat_conn introducing key 
directionality.")
Reported-at: https://issues.redhat.com/browse/FDP-616
Signed-off-by: Mike Pattrick 
Co-authored-by: Paolo Valerio 
Signed-off-by: Paolo Valerio 
---
 lib/conntrack.c | 24 +++-
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 16e1c8bb5..5fdfe98de 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -947,6 +947,18 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt,
 nc->parent_key = alg_exp->parent_key;
 }
 
+ovs_mutex_init_adaptive(&nc->lock);
+atomic_flag_clear(&nc->reclaimed);
+fwd_key_node->dir = CT_DIR_FWD;
+rev_key_node->dir = CT_DIR_REV;
+
+if (zl) {
+nc->admit_zone = zl->czl.zone;
+nc->zone_limit_seq = zl->czl.zone_limit_seq;
+} else {
+nc->admit_zone = INVALID_ZONE;
+}
+
 if (nat_action_info) {
 nc->nat_action = nat_action_info->nat_action;
 
@@ -972,22 +984,16 @@ conn_not_found(struct conntrack *ct, struct dp_packet 
*pkt,
 &rev_key_node->cm_node, rev_hash);
 }
 
-ovs_mutex_init_adaptive(&nc->lock);
-atomic_flag_clear(&nc->reclaimed);
-fwd_key_node->dir = CT_DIR_FWD;
-rev_key_node->dir = CT_DIR_REV;
 cmap_insert(&ct->conns[ctx->key.zone],
 &fwd_key_node->cm_node, ctx->hash);
 conn_expire_push_front(ct, nc);
 atomic_count_inc(&ct->n_conn);
-ctx->conn = nc; /* For completeness. */
+
 if (zl) {
-nc->admit_zone = zl->czl.zone;
-nc->zone_limit_seq = zl->czl.zone_limit_seq;
 atomic_count_inc(&zl->czl.count);
-} else {
-nc->admit_zone = INVALID_ZONE;
 }
+
+ctx->conn = nc; /* For completeness. */
 }
 
 return nc;
-- 
2.45.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] conntrack: Do not use {0} to initialize unions.

2024-05-09 Thread Paolo Valerio
Xavier Simonart  writes:

> In the following case:
> union ct_addr {
> unsigned int ipv4;
> struct in6_addr ipv6;
> };
> union ct_addr zero_ip = {0};
>
> The ipv6 field might not be properly initialized.
> For instance, clang 18.1.1 does not initialize the ipv6 field.
>
> Reported-at: https://issues.redhat.com/browse/FDP-608
> Signed-off-by: Xavier Simonart 
> ---
> v2: updated based on nit from Paolo.
> ---

Thanks Xavier.

Acked-by: Paolo Valerio 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] conntrack: Do not use {0} to initialize unions.

2024-05-08 Thread Paolo Valerio
Hello Xavier,

just curious, based on your tests, is clang 18.1.1 the only
compiler/version known so far to lead to the problem, right?

Anyways, only a small cosmetic nit below. Other than that:

Acked-by: Paolo Valerio 

Xavier Simonart  writes:

> In the following case:
> union ct_addr {
> unsigned int ipv4;
> struct in6_addr ipv6;
> };
> union ct_addr zero_ip = {0};
>
> The ipv6 field might not be properly initialized.
> For instance, clang 18.1.1 does not initialize the ipv6 field.
>
> Reported-at: https://issues.redhat.com/browse/FDP-608
> Signed-off-by: Xavier Simonart 
> ---
>  lib/conntrack.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 16e1c8bb5..ff4a17abc 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -2302,7 +2302,8 @@ find_addr(const struct conn_key *key, union ct_addr 
> *min,
>uint32_t hash, bool ipv4,
>const struct nat_action_info_t *nat_info)
>  {
> -const union ct_addr zero_ip = {0};
> +union ct_addr zero_ip;
> +memset(&zero_ip, 0, sizeof zero_ip);
>  
>  /* All-zero case. */
>  if (!memcmp(min, &zero_ip, sizeof *min)) {
> @@ -2394,7 +2395,7 @@ nat_get_unique_tuple(struct conntrack *ct, struct conn 
> *conn,
>  {
>  struct conn_key *fwd_key = &conn->key_node[CT_DIR_FWD].key;
>  struct conn_key *rev_key = &conn->key_node[CT_DIR_REV].key;
> -union ct_addr min_addr = {0}, max_addr = {0}, addr = {0};
> +union ct_addr min_addr, max_addr, addr;

nit: please keep the reverse xmas tree

>  bool pat_proto = fwd_key->nw_proto == IPPROTO_TCP ||
>   fwd_key->nw_proto == IPPROTO_UDP ||
>   fwd_key->nw_proto == IPPROTO_SCTP;
> @@ -2402,6 +2403,10 @@ nat_get_unique_tuple(struct conntrack *ct, struct conn 
> *conn,
>  uint16_t min_sport, max_sport, curr_sport;
>  uint32_t hash, port_off, basis;
>  
> +memset(&min_addr, 0, sizeof min_addr);
> +memset(&max_addr, 0, sizeof max_addr);
> +memset(&addr, 0, sizeof addr);
> +
>  basis = (nat_info->nat_flags & NAT_PERSISTENT) ? 0 : ct->hash_basis;
>  hash = nat_range_hash(fwd_key, basis, nat_info);
>  
> -- 
> 2.31.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] dpctl: fix segfault on ct-{set,del}-limits

2024-04-22 Thread Paolo Valerio
When no parameters other than the datapath are specified a segfault
occurs.

Fix it by checking the argument access is inside the bounds.

Signed-off-by: Paolo Valerio 
---
 lib/dpctl.c | 27 ---
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/lib/dpctl.c b/lib/dpctl.c
index 34ee7d0e2..3c555a559 100644
--- a/lib/dpctl.c
+++ b/lib/dpctl.c
@@ -2168,13 +2168,20 @@ static int
 dpctl_ct_set_limits(int argc, const char *argv[],
 struct dpctl_params *dpctl_p)
 {
-struct dpif *dpif;
-struct ds ds = DS_EMPTY_INITIALIZER;
+struct ovs_list zone_limits = OVS_LIST_INITIALIZER(&zone_limits);
 int i =  dp_arg_exists(argc, argv) ? 2 : 1;
+struct ds ds = DS_EMPTY_INITIALIZER;
+struct dpif *dpif = NULL;
 uint32_t default_limit;
-struct ovs_list zone_limits = OVS_LIST_INITIALIZER(&zone_limits);
+int error;
+
+if (i >= argc) {
+ds_put_cstr(&ds, "too few arguments");
+error = EINVAL;
+goto error;
+}
 
-int error = opt_dpif_open(argc, argv, dpctl_p, INT_MAX, &dpif);
+error = opt_dpif_open(argc, argv, dpctl_p, INT_MAX, &dpif);
 if (error) {
 return error;
 }
@@ -2261,11 +2268,17 @@ static int
 dpctl_ct_del_limits(int argc, const char *argv[],
 struct dpctl_params *dpctl_p)
 {
-struct dpif *dpif;
+struct ovs_list zone_limits = OVS_LIST_INITIALIZER(&zone_limits);
+int i =  dp_arg_exists(argc, argv) ? 2 : 1;
 struct ds ds = DS_EMPTY_INITIALIZER;
+struct dpif *dpif = NULL;
 int error;
-int i =  dp_arg_exists(argc, argv) ? 2 : 1;
-struct ovs_list zone_limits = OVS_LIST_INITIALIZER(&zone_limits);
+
+if (i >= argc) {
+ds_put_cstr(&ds, "too few arguments");
+error = EINVAL;
+goto error;
+}
 
 error = opt_dpif_open(argc, argv, dpctl_p, 4, &dpif);
 if (error) {
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] conntrack: Do not use icmp reverse helper for icmpv6.

2024-03-28 Thread Paolo Valerio
Ilya Maximets  writes:

> On 3/12/24 11:02, Paolo Valerio wrote:
>> In the flush tuple code path, while populating the conn_key,
>> reverse_icmp_type() gets called for both icmp and icmpv6 cases,
>> while, depending on the proto, its respective helper should be
>> called, instead.
>
> Thanks for the fix!
>
> Some minor nits below.
>
>> 
>> The above leads to an abort:
>> 
>> [...]
>> 0x7f3d461888ff in __GI_abort () at abort.c:79
>> 0x0064eeb7 in reverse_icmp_type (type=128 '\200') at 
>> lib/conntrack.c:1795
>> 0x00650a63 in tuple_to_conn_key (tuple=0x7ffe0db5c620, zone=0, 
>> key=0x7ffe0db5c520)
>> at lib/conntrack.c:2590
>> 0x006510f7 in conntrack_flush_tuple (ct=0x25715a0, 
>> tuple=0x7ffe0db5c620, zone=0) at lib/conntrack.c:2787
>> 0x004b5988 in dpif_netdev_ct_flush (dpif=0x25e4640, 
>> zone=0x7ffe0db5c6a4, tuple=0x7ffe0db5c620)
>> at lib/dpif-netdev.c:9618
>> 0x0049938a in ct_dpif_flush_tuple (dpif=0x25e4640, zone=0x0, 
>> match=0x7ffe0db5c7e0) at lib/ct-dpif.c:331
>> 0x0049942a in ct_dpif_flush (dpif=0x25e4640, zone=0x0, 
>> match=0x7ffe0db5c7e0) at lib/ct-dpif.c:361
>> 0x00657b9a in dpctl_flush_conntrack (argc=2, argv=0x254ceb0, 
>> dpctl_p=0x7ffe0db5c8a0) at lib/dpctl.c:1797
>> 0x0065af36 in dpctl_unixctl_handler (conn=0x25c48d0, argc=2, 
>> argv=0x254ceb0,
>> [...]
>
> Could you, please, strip out some unnecessary information from
> the trace?  For example, function addresses in hex are not
> actually needed and most of the function arguments are not
> needed as well.  Only a few of the arguments are actually important.
> Removing those will shorten the lines and make the trace more
> clear for the reader.
>
>> 
>> Fix it by calling reverse_icmp6_type() when needed.
>> Furthermore, self tests have been modified in order to exercise and
>> check this behavior.
>> 
>> Fixes: 271e48a0e244 ("conntrack: Support conntrack flush by ct 5-tuple")
>> Reported-at: https://issues.redhat.com/browse/FDP-447
>> Signed-off-by: Paolo Valerio 
>> ---
>>  lib/conntrack.c |  4 +++-
>>  tests/system-traffic.at | 10 +-
>>  2 files changed, 12 insertions(+), 2 deletions(-)
>> 
>> diff --git a/lib/conntrack.c b/lib/conntrack.c
>> index 5786424f6..a62f27d24 100644
>> --- a/lib/conntrack.c
>> +++ b/lib/conntrack.c
>> @@ -2586,7 +2586,9 @@ tuple_to_conn_key(const struct ct_dpif_tuple *tuple, 
>> uint16_t zone,
>>  key->src.icmp_type = tuple->icmp_type;
>>  key->src.icmp_code = tuple->icmp_code;
>>  key->dst.icmp_id = tuple->icmp_id;
>> -key->dst.icmp_type = reverse_icmp_type(tuple->icmp_type);
>> +key->dst.icmp_type = (tuple->ip_proto == IPPROTO_ICMP) ?
>> +reverse_icmp_type(tuple->icmp_type) :
>> +reverse_icmp6_type(tuple->icmp_type);
>
> Please, wrap the lines before ?:, not after.  And align the branches
> of the ternary to the beginning of a condition, i.e.:
>
> +key->dst.icmp_type = (tuple->ip_proto == IPPROTO_ICMP)
> + ? reverse_icmp_type(tuple->icmp_type)
> + : reverse_icmp6_type(tuple->icmp_type);
>

Thank you Ilya.
I sent a v2 with your suggestions:

https://patchwork.ozlabs.org/project/openvswitch/patch/20240328165608.273344-1-pvale...@redhat.com/

> Best regards, Ilya Maximets.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] conntrack: Do not use icmp reverse helper for icmpv6.

2024-03-28 Thread Paolo Valerio
In the flush tuple code path, while populating the conn_key,
reverse_icmp_type() gets called for both icmp and icmpv6 cases,
while, depending on the proto, its respective helper should be
called, instead.

The above leads to an abort:

[...]
__GI_abort () at abort.c:79
reverse_icmp_type (type=128 '\200') at lib/conntrack.c:1795
tuple_to_conn_key (...) at lib/conntrack.c:2590
in conntrack_flush_tuple (...) at lib/conntrack.c:2787
in dpif_netdev_ct_flush (...) at lib/dpif-netdev.c:9618
ct_dpif_flush_tuple (...) at lib/ct-dpif.c:331
ct_dpif_flush (...) at lib/ct-dpif.c:361
dpctl_flush_conntrack (...) at lib/dpctl.c:1797
[...]

Fix it by calling reverse_icmp6_type() when needed.
Furthermore, self tests have been modified in order to exercise and
check this behavior.

Fixes: 271e48a0e244 ("conntrack: Support conntrack flush by ct 5-tuple")
Reported-at: https://issues.redhat.com/browse/FDP-447
Signed-off-by: Paolo Valerio 
---
v2 (Ilya):
- stripped down backtrace
- aligned ternary
---
 lib/conntrack.c |  4 +++-
 tests/system-traffic.at | 10 +-
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 5786424f6..7e3ed0ee0 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -2586,7 +2586,9 @@ tuple_to_conn_key(const struct ct_dpif_tuple *tuple, 
uint16_t zone,
 key->src.icmp_type = tuple->icmp_type;
 key->src.icmp_code = tuple->icmp_code;
 key->dst.icmp_id = tuple->icmp_id;
-key->dst.icmp_type = reverse_icmp_type(tuple->icmp_type);
+key->dst.icmp_type = (tuple->ip_proto == IPPROTO_ICMP)
+ ? reverse_icmp_type(tuple->icmp_type)
+ : reverse_icmp6_type(tuple->icmp_type);
 key->dst.icmp_code = tuple->icmp_code;
 } else {
 key->src.port = tuple->src_port;
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 2d12d558e..87de0692a 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -3103,7 +3103,10 @@ AT_CHECK([ovs-appctl dpctl/dump-conntrack | 
FORMAT_CT(10.1.1.2)], [0], [dnl
 
icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=,type=0,code=0)
 ])
 
-AT_CHECK([ovs-appctl dpctl/flush-conntrack])
+AT_CHECK([ovs-appctl dpctl/flush-conntrack 
'ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2'])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl
+])
 
 dnl Pings from ns1->ns0 should fail.
 NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 | FORMAT_PING], 
[0], [dnl
@@ -3244,6 +3247,11 @@ AT_CHECK([ovs-appctl dpctl/dump-conntrack | 
FORMAT_CT(fc00::2)], [0], [dnl
 
icmpv6,orig=(src=fc00::1,dst=fc00::2,id=,type=128,code=0),reply=(src=fc00::2,dst=fc00::1,id=,type=129,code=0)
 ])
 
+AT_CHECK([ovs-appctl dpctl/flush-conntrack 
'ct_ipv6_src=fc00::1,ct_ipv6_dst=fc00::2'])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fc00::2)], [0], [dnl
+])
+
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] conntrack: Fix SNAT with exhaustion system test.

2024-03-28 Thread Paolo Valerio
Recent kernels introduced a mechanism that allows to evict colliding
entries in a closing state whereas they were previously considered as
parts of a non-recoverable clash.
This new behavior makes "conntrack - SNAT with port range with
exhaustion test" fail, as it relies on the previous assumptions.

Fix it by creating and not advancing the first entry in SYN_SENT to
avoid early eviction.

Suggested-by: Ilya Maximets 
Reported-at: https://issues.redhat.com/browse/FDP-486
Signed-off-by: Paolo Valerio 
---
v2:
- replaced open-coded bytes with
  'ovs-ofctl compose-packet --bare' (Ilya)
---
 tests/system-traffic.at | 24 +---
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 2d12d558e..20b011b7e 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -6388,7 +6388,6 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([conntrack - SNAT with port range with exhaustion])
-OVS_CHECK_GITHUB_ACTION()
 CHECK_CONNTRACK()
 CHECK_CONNTRACK_NAT()
 OVS_TRAFFIC_VSWITCHD_START()
@@ -6398,11 +6397,11 @@ ADD_NAMESPACES(at_ns0, at_ns1)
 ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
 NS_CHECK_EXEC([at_ns0], [ip link set dev p0 address 80:88:88:88:88:88])
 ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
+NS_CHECK_EXEC([at_ns1], [ip link set dev p1 address 80:89:89:89:89:89])
 
 dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from 
ns1->ns0.
 AT_DATA([flows.txt], [dnl
-in_port=1,tcp,action=ct(commit,zone=1,nat(src=10.1.1.240:34568,random)),2
-in_port=2,ct_state=-trk,tcp,tp_dst=34567,action=ct(table=0,zone=1,nat)
+in_port=1,tcp,action=ct(commit,zone=1,nat(src=10.1.1.240:34568)),2
 in_port=2,ct_state=-trk,tcp,tp_dst=34568,action=ct(table=0,zone=1,nat)
 in_port=2,ct_state=+trk,ct_zone=1,tcp,action=1
 dnl
@@ -6426,17 +6425,28 @@ AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
 
 dnl HTTP requests from p0->p1 should work fine.
 OVS_START_L7([at_ns1], [http])
-NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 1 -T 1 --retry-connrefused -v -o 
wget0.log])
+
+dnl Send a valid SYN to make conntrack pick it up.
+dnl The source port used is 123 to prevent unwanted reuse in the next HTTP 
request.
+syn_pkt=$(ovs-ofctl compose-packet --bare 
"eth_src=80:88:88:88:88:88,eth_dst=80:89:89:89:89:89,\
+  
dl_type=0x0800,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_proto=6,nw_ttl=64,nw_frag=no,tcp_flags=syn,\
+  tcp_src=123,tcp_dst=80")
+AT_CHECK([ovs-ofctl packet-out br0 "packet=${syn_pkt} 
actions=ct(commit,zone=1,nat(src=10.1.1.240:34568))"])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2) | uniq], [0], 
[dnl
+tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=10.1.1.2,dst=10.1.1.240,sport=,dport=),zone=1,protoinfo=(state=)
+])
 
 NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 1 -T 1 --retry-connrefused -v -o 
wget0.log], [4])
 
-AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2) | sed -e 
's/dst=10.1.1.2[[45]][[0-9]]/dst=10.1.1.2XX/' | uniq], [0], [dnl
-tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=10.1.1.2,dst=10.1.1.2XX,sport=,dport=),zone=1,protoinfo=(state=)
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2) | uniq], [0], 
[dnl
+tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=10.1.1.2,dst=10.1.1.240,sport=,dport=),zone=1,protoinfo=(state=)
 ])
 
 OVS_TRAFFIC_VSWITCHD_STOP(["dnl
 /Unable to NAT due to tuple space exhaustion - if DoS attack, use firewalling 
and\/or zone partitioning./d
-/Dropped .* log messages in last .* seconds \(most recently, .* seconds ago\) 
due to excessive rate/d"])
+/Dropped .* log messages in last .* seconds \(most recently, .* seconds ago\) 
due to excessive rate/d
+/|WARN|.* execute ct.* failed/d"])
 AT_CLEANUP
 
 AT_SETUP([conntrack - more complex SNAT])
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] conntrack: Fix SNAT with exhaustion system test.

2024-03-28 Thread Paolo Valerio
Ilya Maximets  writes:

> On 3/13/24 12:08, Paolo Valerio wrote:
>> Recent kernels introduced a mechanism that allows to evict colliding
>> entries in a closing state whereas they were previously considered as
>> parts of a non-recoverable clash.
>> This new behavior makes "conntrack - SNAT with port range with
>> exhaustion test" fail, as it relies on the previous assumptions.
>> 
>> Fix it by creating and not advancing the first entry in SYN_SENT to
>> avoid early eviction.
>> 
>> Suggested-by: Ilya Maximets 
>> Reported-at: https://issues.redhat.com/browse/FDP-486
>> Signed-off-by: Paolo Valerio 
>> ---
>
> Hi, Paolo.  Thanks for the fix!
>

Hi Ilya,

Thanks for the feedback!

> Some small comments inline.
>
>>  tests/system-traffic.at | 21 ++---
>>  1 file changed, 14 insertions(+), 7 deletions(-)
>> 
>> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
>> index 2d12d558e..04559f5e8 100644
>> --- a/tests/system-traffic.at
>> +++ b/tests/system-traffic.at
>> @@ -6388,7 +6388,6 @@ OVS_TRAFFIC_VSWITCHD_STOP
>>  AT_CLEANUP
>>  
>>  AT_SETUP([conntrack - SNAT with port range with exhaustion])
>> -OVS_CHECK_GITHUB_ACTION()
>>  CHECK_CONNTRACK()
>>  CHECK_CONNTRACK_NAT()
>>  OVS_TRAFFIC_VSWITCHD_START()
>> @@ -6398,11 +6397,11 @@ ADD_NAMESPACES(at_ns0, at_ns1)
>>  ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
>>  NS_CHECK_EXEC([at_ns0], [ip link set dev p0 address 80:88:88:88:88:88])
>>  ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
>> +NS_CHECK_EXEC([at_ns1], [ip link set dev p1 address 80:89:89:89:89:89])
>>  
>>  dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from 
>> ns1->ns0.
>>  AT_DATA([flows.txt], [dnl
>> -in_port=1,tcp,action=ct(commit,zone=1,nat(src=10.1.1.240:34568,random)),2
>> -in_port=2,ct_state=-trk,tcp,tp_dst=34567,action=ct(table=0,zone=1,nat)
>
> Do you know why this flow was there in the first place?
>

AFAICT, this seemed to me part of C/P ("conntrack - SNAT with port
range").
While at it, I preferred to clean it a bit as this (along with a couple
of minor things) was not required.

>> +in_port=1,tcp,action=ct(commit,zone=1,nat(src=10.1.1.240:34568)),2
>>  in_port=2,ct_state=-trk,tcp,tp_dst=34568,action=ct(table=0,zone=1,nat)
>>  in_port=2,ct_state=+trk,ct_zone=1,tcp,action=1
>>  dnl
>> @@ -6426,17 +6425,25 @@ AT_CHECK([ovs-ofctl --bundle add-flows br0 
>> flows.txt])
>>  
>>  dnl HTTP requests from p0->p1 should work fine.
>>  OVS_START_L7([at_ns1], [http])
>> -NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 1 -T 1 --retry-connrefused -v -o 
>> wget0.log])
>> +
>> +dnl Send a valid SYN to make conntrack pick it up.
>> +dnl The source port used is 123 to prevent unwanted reuse in the next HTTP 
>> request.
>> +AT_CHECK([ovs-ofctl packet-out br0 
>> "packet=8089898989898088080045280001400664cb0a0101010a010102007b0050500220007913
>>  actions=ct(commit,zone=1,nat(src=10.1.1.240:34568))"])
>
> Can we use 'ovs-ofctl compose-packet --bare' instead of open-coding bytes?
>

sure, I'll send a v2.

> Best regards, Ilya Maximets.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] conntrack: Fix SNAT with exhaustion system test.

2024-03-13 Thread Paolo Valerio
Recent kernels introduced a mechanism that allows to evict colliding
entries in a closing state whereas they were previously considered as
parts of a non-recoverable clash.
This new behavior makes "conntrack - SNAT with port range with
exhaustion test" fail, as it relies on the previous assumptions.

Fix it by creating and not advancing the first entry in SYN_SENT to
avoid early eviction.

Suggested-by: Ilya Maximets 
Reported-at: https://issues.redhat.com/browse/FDP-486
Signed-off-by: Paolo Valerio 
---
 tests/system-traffic.at | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 2d12d558e..04559f5e8 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -6388,7 +6388,6 @@ OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
 AT_SETUP([conntrack - SNAT with port range with exhaustion])
-OVS_CHECK_GITHUB_ACTION()
 CHECK_CONNTRACK()
 CHECK_CONNTRACK_NAT()
 OVS_TRAFFIC_VSWITCHD_START()
@@ -6398,11 +6397,11 @@ ADD_NAMESPACES(at_ns0, at_ns1)
 ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
 NS_CHECK_EXEC([at_ns0], [ip link set dev p0 address 80:88:88:88:88:88])
 ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
+NS_CHECK_EXEC([at_ns1], [ip link set dev p1 address 80:89:89:89:89:89])
 
 dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from 
ns1->ns0.
 AT_DATA([flows.txt], [dnl
-in_port=1,tcp,action=ct(commit,zone=1,nat(src=10.1.1.240:34568,random)),2
-in_port=2,ct_state=-trk,tcp,tp_dst=34567,action=ct(table=0,zone=1,nat)
+in_port=1,tcp,action=ct(commit,zone=1,nat(src=10.1.1.240:34568)),2
 in_port=2,ct_state=-trk,tcp,tp_dst=34568,action=ct(table=0,zone=1,nat)
 in_port=2,ct_state=+trk,ct_zone=1,tcp,action=1
 dnl
@@ -6426,17 +6425,25 @@ AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
 
 dnl HTTP requests from p0->p1 should work fine.
 OVS_START_L7([at_ns1], [http])
-NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 1 -T 1 --retry-connrefused -v -o 
wget0.log])
+
+dnl Send a valid SYN to make conntrack pick it up.
+dnl The source port used is 123 to prevent unwanted reuse in the next HTTP 
request.
+AT_CHECK([ovs-ofctl packet-out br0 
"packet=8089898989898088080045280001400664cb0a0101010a010102007b0050500220007913
 actions=ct(commit,zone=1,nat(src=10.1.1.240:34568))"])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2) | uniq], [0], 
[dnl
+tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=10.1.1.2,dst=10.1.1.240,sport=,dport=),zone=1,protoinfo=(state=)
+])
 
 NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 1 -T 1 --retry-connrefused -v -o 
wget0.log], [4])
 
-AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2) | sed -e 
's/dst=10.1.1.2[[45]][[0-9]]/dst=10.1.1.2XX/' | uniq], [0], [dnl
-tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=10.1.1.2,dst=10.1.1.2XX,sport=,dport=),zone=1,protoinfo=(state=)
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2) | uniq], [0], 
[dnl
+tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=10.1.1.2,dst=10.1.1.240,sport=,dport=),zone=1,protoinfo=(state=)
 ])
 
 OVS_TRAFFIC_VSWITCHD_STOP(["dnl
 /Unable to NAT due to tuple space exhaustion - if DoS attack, use firewalling 
and\/or zone partitioning./d
-/Dropped .* log messages in last .* seconds \(most recently, .* seconds ago\) 
due to excessive rate/d"])
+/Dropped .* log messages in last .* seconds \(most recently, .* seconds ago\) 
due to excessive rate/d
+/|WARN|.* execute ct.* failed/d"])
 AT_CLEANUP
 
 AT_SETUP([conntrack - more complex SNAT])
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] conntrack: Do not use icmp reverse helper for icmpv6.

2024-03-12 Thread Paolo Valerio
In the flush tuple code path, while populating the conn_key,
reverse_icmp_type() gets called for both icmp and icmpv6 cases,
while, depending on the proto, its respective helper should be
called, instead.

The above leads to an abort:

[...]
0x7f3d461888ff in __GI_abort () at abort.c:79
0x0064eeb7 in reverse_icmp_type (type=128 '\200') at 
lib/conntrack.c:1795
0x00650a63 in tuple_to_conn_key (tuple=0x7ffe0db5c620, zone=0, 
key=0x7ffe0db5c520)
at lib/conntrack.c:2590
0x006510f7 in conntrack_flush_tuple (ct=0x25715a0, 
tuple=0x7ffe0db5c620, zone=0) at lib/conntrack.c:2787
0x004b5988 in dpif_netdev_ct_flush (dpif=0x25e4640, 
zone=0x7ffe0db5c6a4, tuple=0x7ffe0db5c620)
at lib/dpif-netdev.c:9618
0x0049938a in ct_dpif_flush_tuple (dpif=0x25e4640, zone=0x0, 
match=0x7ffe0db5c7e0) at lib/ct-dpif.c:331
0x0049942a in ct_dpif_flush (dpif=0x25e4640, zone=0x0, 
match=0x7ffe0db5c7e0) at lib/ct-dpif.c:361
0x00657b9a in dpctl_flush_conntrack (argc=2, argv=0x254ceb0, 
dpctl_p=0x7ffe0db5c8a0) at lib/dpctl.c:1797
0x0065af36 in dpctl_unixctl_handler (conn=0x25c48d0, argc=2, 
argv=0x254ceb0,
[...]

Fix it by calling reverse_icmp6_type() when needed.
Furthermore, self tests have been modified in order to exercise and
check this behavior.

Fixes: 271e48a0e244 ("conntrack: Support conntrack flush by ct 5-tuple")
Reported-at: https://issues.redhat.com/browse/FDP-447
Signed-off-by: Paolo Valerio 
---
 lib/conntrack.c |  4 +++-
 tests/system-traffic.at | 10 +-
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 5786424f6..a62f27d24 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -2586,7 +2586,9 @@ tuple_to_conn_key(const struct ct_dpif_tuple *tuple, 
uint16_t zone,
 key->src.icmp_type = tuple->icmp_type;
 key->src.icmp_code = tuple->icmp_code;
 key->dst.icmp_id = tuple->icmp_id;
-key->dst.icmp_type = reverse_icmp_type(tuple->icmp_type);
+key->dst.icmp_type = (tuple->ip_proto == IPPROTO_ICMP) ?
+reverse_icmp_type(tuple->icmp_type) :
+reverse_icmp6_type(tuple->icmp_type);
 key->dst.icmp_code = tuple->icmp_code;
 } else {
 key->src.port = tuple->src_port;
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 2d12d558e..87de0692a 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -3103,7 +3103,10 @@ AT_CHECK([ovs-appctl dpctl/dump-conntrack | 
FORMAT_CT(10.1.1.2)], [0], [dnl
 
icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=,type=0,code=0)
 ])
 
-AT_CHECK([ovs-appctl dpctl/flush-conntrack])
+AT_CHECK([ovs-appctl dpctl/flush-conntrack 
'ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2'])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl
+])
 
 dnl Pings from ns1->ns0 should fail.
 NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 | FORMAT_PING], 
[0], [dnl
@@ -3244,6 +3247,11 @@ AT_CHECK([ovs-appctl dpctl/dump-conntrack | 
FORMAT_CT(fc00::2)], [0], [dnl
 
icmpv6,orig=(src=fc00::1,dst=fc00::2,id=,type=128,code=0),reply=(src=fc00::2,dst=fc00::1,id=,type=129,code=0)
 ])
 
+AT_CHECK([ovs-appctl dpctl/flush-conntrack 
'ct_ipv6_src=fc00::1,ct_ipv6_dst=fc00::2'])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fc00::2)], [0], [dnl
+])
+
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] github: Temporarily disable SNAT with exhaustion system test.

2024-03-01 Thread Paolo Valerio
Ilya Maximets  writes:

> With a new runner update, GitHub Actions had a kernel update.
> And it seems like something changed between kernels 6.2 and 6.5
> so this test now fails very frequently.
>
> I can reproduce the same issue on RHEL 9, and I can't reproduce
> it on Ubuntu 23.04 (kernel 6.2).
>
> The test is creating a NAT with a single address+port pair in
> an attempt to simulate an address space exhaustion.  It is
> expected that a first connection with wget leaves a conntrack
> entry in a TIME_WAIT state and the second wget should fail
> as long as this entry remains, because the only available
> address+port pair is already taken.
>
> However, for some reason, very frequently (not always!) the
> second connection replaces the first conntrack entry with a
> new one and connection succeeds.  There is still only one
> connection in the conntrack at any single moment in time, so
> there is seemingly no issue with the NAT, but the behavior
> is unexpected and the test fails.
>
> Disable the test in CI until we figure out how to fix the
> kernel (if it is a kernel bug) or the test.
>
> Signed-off-by: Ilya Maximets 
> ---

Acked-by: Paolo Valerio 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 2/2] conntrack: Handle persistent selection for IP addresses.

2024-02-16 Thread Paolo Valerio
Simon Horman  writes:

> On Wed, Feb 07, 2024 at 06:38:08PM +0100, Paolo Valerio wrote:
>> The patch, when 'persistent' flag is specified, makes the IP selection
>> in a range persistent across reboots.
>> 
>> Signed-off-by: Paolo Valerio 
>
> Hi Paolo,
>
> I have some minor nits below - which you can feel free to take or leave.
> But overall this looks good to me.
>
> Acked-by: Simon Horman 
>
> ...
>
>> diff --git a/lib/conntrack.c b/lib/conntrack.c
>
> ...
>
>> @@ -2386,12 +2390,23 @@ nat_get_unique_tuple(struct conntrack *ct, struct 
>> conn *conn,
>>  bool pat_proto = fwd_key->nw_proto == IPPROTO_TCP ||
>>   fwd_key->nw_proto == IPPROTO_UDP ||
>>   fwd_key->nw_proto == IPPROTO_SCTP;
>> +uint32_t hash, port_off, basis = ct->hash_basis;
>>  uint16_t min_dport, max_dport, curr_dport;
>>  uint16_t min_sport, max_sport, curr_sport;
>> -uint32_t hash, port_off;
>>  
>> -hash = nat_range_hash(fwd_key, ct->hash_basis, nat_info);
>> -port_off = nat_info->nat_flags & NAT_RANGE_RANDOM ? random_uint32() : 
>> hash;
>> +if (nat_info->nat_flags & NAT_PERSISTENT) {
>> +basis = 0;
>> +}
>
> nit: maybe it is nicer to set basis only once.
>
> basis = (nat_info->nat_flags & NAT_PERSISTENT) ? 0 : ct->hash_basis;
>
>> +
>> +hash = nat_range_hash(fwd_key, basis, nat_info);
>> +
>> +if (nat_info->nat_flags & NAT_RANGE_RANDOM) {
>> +port_off = random_uint32();
>> +} else {
>> +port_off =
>> +basis ? hash : nat_range_hash(fwd_key, ct->hash_basis, 
>> nat_info);
>> +}
>> +
>
> nit: maybe this is a little easier on the eyes (completely untested!)?
>
> if (nat_info->nat_flags & NAT_RANGE_RANDOM) {
> port_off = random_uint32();
> } else if (basis) {
> port_off = hash;
> } else {
> port_off = nat_range_hash(fwd_key, ct->hash_basis, nat_info);
> }
>

thanks Simon for taking a look.
Agreed, looks easier on the eyes. I included your suggestions and your
acks in v3.

I guess the above solve Aaron's suggestions as well.

>>  min_addr = nat_info->min_addr;
>>  max_addr = nat_info->max_addr;
>>  
>
> ...

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 2/2] conntrack: Handle persistent selection for IP addresses.

2024-02-16 Thread Paolo Valerio
The patch, when 'persistent' flag is specified, makes the IP selection
in a range persistent across reboots.

Signed-off-by: Paolo Valerio 
Acked-by: Simon Horman 
---
v3:
- rearranged branches in nat_get_unique_tuple() (Simon)
---
 NEWS  |  3 ++-
 lib/conntrack.c   | 25 +++--
 lib/conntrack.h   |  1 +
 lib/dpif-netdev.c |  2 ++
 4 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/NEWS b/NEWS
index 93046b963..0c86bba81 100644
--- a/NEWS
+++ b/NEWS
@@ -2,7 +2,8 @@ Post-v3.3.0
 
- Userspace datapath:
  * Conntrack now supports 'random' flag for selecting ports in a range
-   while natting.
+   while natting and 'persistent' flag for selection of the IP address
+   from a range.
 
 
 v3.3.0 - xx xxx 
diff --git a/lib/conntrack.c b/lib/conntrack.c
index e09ecdf33..8a7056bac 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -2202,17 +2202,21 @@ nat_range_hash(const struct conn_key *key, uint32_t 
basis,
 {
 uint32_t hash = basis;
 
+if (!basis) {
+hash = ct_addr_hash_add(hash, &key->src.addr);
+} else {
+hash = ct_endpoint_hash_add(hash, &key->src);
+hash = ct_endpoint_hash_add(hash, &key->dst);
+}
+
 hash = ct_addr_hash_add(hash, &nat_info->min_addr);
 hash = ct_addr_hash_add(hash, &nat_info->max_addr);
 hash = hash_add(hash,
 ((uint32_t) nat_info->max_port << 16)
 | nat_info->min_port);
-hash = ct_endpoint_hash_add(hash, &key->src);
-hash = ct_endpoint_hash_add(hash, &key->dst);
 hash = hash_add(hash, (OVS_FORCE uint32_t) key->dl_type);
 hash = hash_add(hash, key->nw_proto);
 hash = hash_add(hash, key->zone);
-
 /* The purpose of the second parameter is to distinguish hashes of data of
  * different length; our data always has the same length so there is no
  * value in counting. */
@@ -2388,10 +2392,19 @@ nat_get_unique_tuple(struct conntrack *ct, struct conn 
*conn,
  fwd_key->nw_proto == IPPROTO_SCTP;
 uint16_t min_dport, max_dport, curr_dport;
 uint16_t min_sport, max_sport, curr_sport;
-uint32_t hash, port_off;
+uint32_t hash, port_off, basis;
+
+basis = (nat_info->nat_flags & NAT_PERSISTENT) ? 0 : ct->hash_basis;
+hash = nat_range_hash(fwd_key, basis, nat_info);
+
+if (nat_info->nat_flags & NAT_RANGE_RANDOM) {
+port_off = random_uint32();
+} else if (basis) {
+port_off = hash;
+} else {
+port_off = nat_range_hash(fwd_key, ct->hash_basis, nat_info);
+}
 
-hash = nat_range_hash(fwd_key, ct->hash_basis, nat_info);
-port_off = nat_info->nat_flags & NAT_RANGE_RANDOM ? random_uint32() : hash;
 min_addr = nat_info->min_addr;
 max_addr = nat_info->max_addr;
 
diff --git a/lib/conntrack.h b/lib/conntrack.h
index 9b0c6aa88..ee7da099e 100644
--- a/lib/conntrack.h
+++ b/lib/conntrack.h
@@ -79,6 +79,7 @@ enum nat_action_e {
 
 enum nat_flags_e {
 NAT_RANGE_RANDOM = 1 << 0,
+NAT_PERSISTENT = 1 << 1,
 };
 
 struct nat_action_info_t {
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index c3334c667..fbf7ccabd 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -9413,6 +9413,8 @@ dp_execute_cb(void *aux_, struct dp_packet_batch 
*packets_,
 nat_action_info.nat_flags |= NAT_RANGE_RANDOM;
 break;
 case OVS_NAT_ATTR_PERSISTENT:
+nat_action_info.nat_flags |= NAT_PERSISTENT;
+break;
 case OVS_NAT_ATTR_PROTO_HASH:
 break;
 case OVS_NAT_ATTR_UNSPEC:
-- 
2.43.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 1/2] conntrack: Handle random selection for port ranges.

2024-02-16 Thread Paolo Valerio
The userspace conntrack only supported hash for port selection.
With the patch, both userspace and kernel datapath support the random
flag.

The default behavior remains the same, that is, if no flags are
specified, hash is selected.

Signed-off-by: Paolo Valerio 
Acked-by: Simon Horman 
---
 Documentation/ref/ovs-actions.7.rst |  3 +--
 NEWS|  3 +++
 lib/conntrack.c | 15 ---
 lib/conntrack.h |  5 +
 lib/dpif-netdev.c   |  4 +++-
 5 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/Documentation/ref/ovs-actions.7.rst 
b/Documentation/ref/ovs-actions.7.rst
index 36adcc5db..80acd9070 100644
--- a/Documentation/ref/ovs-actions.7.rst
+++ b/Documentation/ref/ovs-actions.7.rst
@@ -1551,8 +1551,7 @@ following arguments:
 should be selected. When a port range is specified, fallback to
 ephemeral ports does not happen, else, it will.  The port number
 selection can be informed by the optional ``random`` and ``hash`` flags
-described below.  The userspace datapath only supports the ``hash``
-behavior.
+described below.
 
 The optional *flags* are:
 
diff --git a/NEWS b/NEWS
index a6617546c..93046b963 100644
--- a/NEWS
+++ b/NEWS
@@ -1,5 +1,8 @@
 Post-v3.3.0
 
+   - Userspace datapath:
+ * Conntrack now supports 'random' flag for selecting ports in a range
+   while natting.
 
 
 v3.3.0 - xx xxx 
diff --git a/lib/conntrack.c b/lib/conntrack.c
index 013709bd6..e09ecdf33 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -,7 +,7 @@ nat_range_hash(const struct conn_key *key, uint32_t basis,
 /* Ports are stored in host byte order for convenience. */
 static void
 set_sport_range(const struct nat_action_info_t *ni, const struct conn_key *k,
-uint32_t hash, uint16_t *curr, uint16_t *min,
+uint32_t off, uint16_t *curr, uint16_t *min,
 uint16_t *max)
 {
 if (((ni->nat_action & NAT_ACTION_SNAT_ALL) == NAT_ACTION_SRC) ||
@@ -2241,19 +2241,19 @@ set_sport_range(const struct nat_action_info_t *ni, 
const struct conn_key *k,
 } else {
 *min = ni->min_port;
 *max = ni->max_port;
-*curr = *min + (hash % ((*max - *min) + 1));
+*curr =  *min + (off % ((*max - *min) + 1));
 }
 }
 
 static void
 set_dport_range(const struct nat_action_info_t *ni, const struct conn_key *k,
-uint32_t hash, uint16_t *curr, uint16_t *min,
+uint32_t off, uint16_t *curr, uint16_t *min,
 uint16_t *max)
 {
 if (ni->nat_action & NAT_ACTION_DST_PORT) {
 *min = ni->min_port;
 *max = ni->max_port;
-*curr = *min + (hash % ((*max - *min) + 1));
+*curr = *min + (off % ((*max - *min) + 1));
 } else {
 *curr = ntohs(k->dst.port);
 *min = *max = *curr;
@@ -2388,18 +2388,19 @@ nat_get_unique_tuple(struct conntrack *ct, struct conn 
*conn,
  fwd_key->nw_proto == IPPROTO_SCTP;
 uint16_t min_dport, max_dport, curr_dport;
 uint16_t min_sport, max_sport, curr_sport;
-uint32_t hash;
+uint32_t hash, port_off;
 
 hash = nat_range_hash(fwd_key, ct->hash_basis, nat_info);
+port_off = nat_info->nat_flags & NAT_RANGE_RANDOM ? random_uint32() : hash;
 min_addr = nat_info->min_addr;
 max_addr = nat_info->max_addr;
 
 find_addr(fwd_key, &min_addr, &max_addr, &addr, hash,
   (fwd_key->dl_type == htons(ETH_TYPE_IP)), nat_info);
 
-set_sport_range(nat_info, fwd_key, hash, &curr_sport,
+set_sport_range(nat_info, fwd_key, port_off, &curr_sport,
 &min_sport, &max_sport);
-set_dport_range(nat_info, fwd_key, hash, &curr_dport,
+set_dport_range(nat_info, fwd_key, port_off, &curr_dport,
 &min_dport, &max_dport);
 
 if (pat_proto) {
diff --git a/lib/conntrack.h b/lib/conntrack.h
index 0a888be45..9b0c6aa88 100644
--- a/lib/conntrack.h
+++ b/lib/conntrack.h
@@ -77,12 +77,17 @@ enum nat_action_e {
 NAT_ACTION_DST_PORT = 1 << 3,
 };
 
+enum nat_flags_e {
+NAT_RANGE_RANDOM = 1 << 0,
+};
+
 struct nat_action_info_t {
 union ct_addr min_addr;
 union ct_addr max_addr;
 uint16_t min_port;
 uint16_t max_port;
 uint16_t nat_action;
+uint16_t nat_flags;
 };
 
 struct conntrack *conntrack_init(void);
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index c1981137f..c3334c667 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -9409,9 +9409,11 @@ dp_execute_cb(void *aux_, struct dp_packet_batch 
*packets_,
 nl_attr_get_u16(b_nest);
 proto_num_max_specified = true;
 break;
+case OVS_NAT_ATTR_PROTO_RANDOM:
+   

[ovs-dev] [PATCH v2 2/2] conntrack: Handle persistent selection for IP addresses.

2024-02-07 Thread Paolo Valerio
The patch, when 'persistent' flag is specified, makes the IP selection
in a range persistent across reboots.

Signed-off-by: Paolo Valerio 
---
 NEWS  |  3 ++-
 lib/conntrack.c   | 27 +--
 lib/conntrack.h   |  1 +
 lib/dpif-netdev.c |  2 ++
 4 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/NEWS b/NEWS
index 93046b963..0c86bba81 100644
--- a/NEWS
+++ b/NEWS
@@ -2,7 +2,8 @@ Post-v3.3.0
 
- Userspace datapath:
  * Conntrack now supports 'random' flag for selecting ports in a range
-   while natting.
+   while natting and 'persistent' flag for selection of the IP address
+   from a range.
 
 
 v3.3.0 - xx xxx 
diff --git a/lib/conntrack.c b/lib/conntrack.c
index e09ecdf33..7868a67f7 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -2202,17 +2202,21 @@ nat_range_hash(const struct conn_key *key, uint32_t 
basis,
 {
 uint32_t hash = basis;
 
+if (!basis) {
+hash = ct_addr_hash_add(hash, &key->src.addr);
+} else {
+hash = ct_endpoint_hash_add(hash, &key->src);
+hash = ct_endpoint_hash_add(hash, &key->dst);
+}
+
 hash = ct_addr_hash_add(hash, &nat_info->min_addr);
 hash = ct_addr_hash_add(hash, &nat_info->max_addr);
 hash = hash_add(hash,
 ((uint32_t) nat_info->max_port << 16)
 | nat_info->min_port);
-hash = ct_endpoint_hash_add(hash, &key->src);
-hash = ct_endpoint_hash_add(hash, &key->dst);
 hash = hash_add(hash, (OVS_FORCE uint32_t) key->dl_type);
 hash = hash_add(hash, key->nw_proto);
 hash = hash_add(hash, key->zone);
-
 /* The purpose of the second parameter is to distinguish hashes of data of
  * different length; our data always has the same length so there is no
  * value in counting. */
@@ -2386,12 +2390,23 @@ nat_get_unique_tuple(struct conntrack *ct, struct conn 
*conn,
 bool pat_proto = fwd_key->nw_proto == IPPROTO_TCP ||
  fwd_key->nw_proto == IPPROTO_UDP ||
  fwd_key->nw_proto == IPPROTO_SCTP;
+uint32_t hash, port_off, basis = ct->hash_basis;
 uint16_t min_dport, max_dport, curr_dport;
 uint16_t min_sport, max_sport, curr_sport;
-uint32_t hash, port_off;
 
-hash = nat_range_hash(fwd_key, ct->hash_basis, nat_info);
-port_off = nat_info->nat_flags & NAT_RANGE_RANDOM ? random_uint32() : hash;
+if (nat_info->nat_flags & NAT_PERSISTENT) {
+basis = 0;
+}
+
+hash = nat_range_hash(fwd_key, basis, nat_info);
+
+if (nat_info->nat_flags & NAT_RANGE_RANDOM) {
+port_off = random_uint32();
+} else {
+port_off =
+basis ? hash : nat_range_hash(fwd_key, ct->hash_basis, nat_info);
+}
+
 min_addr = nat_info->min_addr;
 max_addr = nat_info->max_addr;
 
diff --git a/lib/conntrack.h b/lib/conntrack.h
index 9b0c6aa88..ee7da099e 100644
--- a/lib/conntrack.h
+++ b/lib/conntrack.h
@@ -79,6 +79,7 @@ enum nat_action_e {
 
 enum nat_flags_e {
 NAT_RANGE_RANDOM = 1 << 0,
+NAT_PERSISTENT = 1 << 1,
 };
 
 struct nat_action_info_t {
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index c3334c667..fbf7ccabd 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -9413,6 +9413,8 @@ dp_execute_cb(void *aux_, struct dp_packet_batch 
*packets_,
 nat_action_info.nat_flags |= NAT_RANGE_RANDOM;
 break;
 case OVS_NAT_ATTR_PERSISTENT:
+nat_action_info.nat_flags |= NAT_PERSISTENT;
+break;
 case OVS_NAT_ATTR_PROTO_HASH:
 break;
 case OVS_NAT_ATTR_UNSPEC:
-- 
2.43.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 1/2] conntrack: Handle random selection for port ranges.

2024-02-07 Thread Paolo Valerio
The userspace conntrack only supported hash for port selection.
With the patch, both userspace and kernel datapath support the random
flag.

The default behavior remains the same, that is, if no flags are
specified, hash is selected.

Signed-off-by: Paolo Valerio 
---
 Documentation/ref/ovs-actions.7.rst |  3 +--
 NEWS|  3 +++
 lib/conntrack.c | 15 ---
 lib/conntrack.h |  5 +
 lib/dpif-netdev.c   |  4 +++-
 5 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/Documentation/ref/ovs-actions.7.rst 
b/Documentation/ref/ovs-actions.7.rst
index 36adcc5db..80acd9070 100644
--- a/Documentation/ref/ovs-actions.7.rst
+++ b/Documentation/ref/ovs-actions.7.rst
@@ -1551,8 +1551,7 @@ following arguments:
 should be selected. When a port range is specified, fallback to
 ephemeral ports does not happen, else, it will.  The port number
 selection can be informed by the optional ``random`` and ``hash`` flags
-described below.  The userspace datapath only supports the ``hash``
-behavior.
+described below.
 
 The optional *flags* are:
 
diff --git a/NEWS b/NEWS
index a6617546c..93046b963 100644
--- a/NEWS
+++ b/NEWS
@@ -1,5 +1,8 @@
 Post-v3.3.0
 
+   - Userspace datapath:
+ * Conntrack now supports 'random' flag for selecting ports in a range
+   while natting.
 
 
 v3.3.0 - xx xxx 
diff --git a/lib/conntrack.c b/lib/conntrack.c
index 013709bd6..e09ecdf33 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -,7 +,7 @@ nat_range_hash(const struct conn_key *key, uint32_t basis,
 /* Ports are stored in host byte order for convenience. */
 static void
 set_sport_range(const struct nat_action_info_t *ni, const struct conn_key *k,
-uint32_t hash, uint16_t *curr, uint16_t *min,
+uint32_t off, uint16_t *curr, uint16_t *min,
 uint16_t *max)
 {
 if (((ni->nat_action & NAT_ACTION_SNAT_ALL) == NAT_ACTION_SRC) ||
@@ -2241,19 +2241,19 @@ set_sport_range(const struct nat_action_info_t *ni, 
const struct conn_key *k,
 } else {
 *min = ni->min_port;
 *max = ni->max_port;
-*curr = *min + (hash % ((*max - *min) + 1));
+*curr =  *min + (off % ((*max - *min) + 1));
 }
 }
 
 static void
 set_dport_range(const struct nat_action_info_t *ni, const struct conn_key *k,
-uint32_t hash, uint16_t *curr, uint16_t *min,
+uint32_t off, uint16_t *curr, uint16_t *min,
 uint16_t *max)
 {
 if (ni->nat_action & NAT_ACTION_DST_PORT) {
 *min = ni->min_port;
 *max = ni->max_port;
-*curr = *min + (hash % ((*max - *min) + 1));
+*curr = *min + (off % ((*max - *min) + 1));
 } else {
 *curr = ntohs(k->dst.port);
 *min = *max = *curr;
@@ -2388,18 +2388,19 @@ nat_get_unique_tuple(struct conntrack *ct, struct conn 
*conn,
  fwd_key->nw_proto == IPPROTO_SCTP;
 uint16_t min_dport, max_dport, curr_dport;
 uint16_t min_sport, max_sport, curr_sport;
-uint32_t hash;
+uint32_t hash, port_off;
 
 hash = nat_range_hash(fwd_key, ct->hash_basis, nat_info);
+port_off = nat_info->nat_flags & NAT_RANGE_RANDOM ? random_uint32() : hash;
 min_addr = nat_info->min_addr;
 max_addr = nat_info->max_addr;
 
 find_addr(fwd_key, &min_addr, &max_addr, &addr, hash,
   (fwd_key->dl_type == htons(ETH_TYPE_IP)), nat_info);
 
-set_sport_range(nat_info, fwd_key, hash, &curr_sport,
+set_sport_range(nat_info, fwd_key, port_off, &curr_sport,
 &min_sport, &max_sport);
-set_dport_range(nat_info, fwd_key, hash, &curr_dport,
+set_dport_range(nat_info, fwd_key, port_off, &curr_dport,
 &min_dport, &max_dport);
 
 if (pat_proto) {
diff --git a/lib/conntrack.h b/lib/conntrack.h
index 0a888be45..9b0c6aa88 100644
--- a/lib/conntrack.h
+++ b/lib/conntrack.h
@@ -77,12 +77,17 @@ enum nat_action_e {
 NAT_ACTION_DST_PORT = 1 << 3,
 };
 
+enum nat_flags_e {
+NAT_RANGE_RANDOM = 1 << 0,
+};
+
 struct nat_action_info_t {
 union ct_addr min_addr;
 union ct_addr max_addr;
 uint16_t min_port;
 uint16_t max_port;
 uint16_t nat_action;
+uint16_t nat_flags;
 };
 
 struct conntrack *conntrack_init(void);
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index c1981137f..c3334c667 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -9409,9 +9409,11 @@ dp_execute_cb(void *aux_, struct dp_packet_batch 
*packets_,
 nl_attr_get_u16(b_nest);
 proto_num_max_specified = true;
 break;
+case OVS_NAT_ATTR_PROTO_RANDOM:
+nat_action_

Re: [ovs-dev] [PATCH 2/2] conntrack: Handle persistent selection for IP addresses.

2024-02-07 Thread Paolo Valerio
Paolo Valerio  writes:

> The patch, when 'persistent' flag is specified, makes the IP selection
> in a range persistent across reboots.
>
> Signed-off-by: Paolo Valerio 
> ---
>  NEWS  |  3 ++-
>  lib/conntrack.c   | 26 ++
>  lib/conntrack.h   |  1 +
>  lib/dpif-netdev.c |  2 ++
>  4 files changed, 27 insertions(+), 5 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index 93046b963..0c86bba81 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -2,7 +2,8 @@ Post-v3.3.0
>  

The patch needs a respin because of a leftover that slipped during a
rebase.
Will send a v2.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 2/2] conntrack: Handle persistent selection for IP addresses.

2024-02-07 Thread Paolo Valerio
The patch, when 'persistent' flag is specified, makes the IP selection
in a range persistent across reboots.

Signed-off-by: Paolo Valerio 
---
 NEWS  |  3 ++-
 lib/conntrack.c   | 26 ++
 lib/conntrack.h   |  1 +
 lib/dpif-netdev.c |  2 ++
 4 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/NEWS b/NEWS
index 93046b963..0c86bba81 100644
--- a/NEWS
+++ b/NEWS
@@ -2,7 +2,8 @@ Post-v3.3.0
 
- Userspace datapath:
  * Conntrack now supports 'random' flag for selecting ports in a range
-   while natting.
+   while natting and 'persistent' flag for selection of the IP address
+   from a range.
 
 
 v3.3.0 - xx xxx 
diff --git a/lib/conntrack.c b/lib/conntrack.c
index e09ecdf33..e085ddee9 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -2202,17 +2202,21 @@ nat_range_hash(const struct conn_key *key, uint32_t 
basis,
 {
 uint32_t hash = basis;
 
+if (!basis) {
+hash = ct_addr_hash_add(hash, &key->src.addr);
+} else {
+hash = ct_endpoint_hash_add(hash, &key->src);
+hash = ct_endpoint_hash_add(hash, &key->dst);
+}
+
 hash = ct_addr_hash_add(hash, &nat_info->min_addr);
 hash = ct_addr_hash_add(hash, &nat_info->max_addr);
 hash = hash_add(hash,
 ((uint32_t) nat_info->max_port << 16)
 | nat_info->min_port);
-hash = ct_endpoint_hash_add(hash, &key->src);
-hash = ct_endpoint_hash_add(hash, &key->dst);
 hash = hash_add(hash, (OVS_FORCE uint32_t) key->dl_type);
 hash = hash_add(hash, key->nw_proto);
 hash = hash_add(hash, key->zone);
-
 /* The purpose of the second parameter is to distinguish hashes of data of
  * different length; our data always has the same length so there is no
  * value in counting. */
@@ -2386,12 +2390,26 @@ nat_get_unique_tuple(struct conntrack *ct, struct conn 
*conn,
 bool pat_proto = fwd_key->nw_proto == IPPROTO_TCP ||
  fwd_key->nw_proto == IPPROTO_UDP ||
  fwd_key->nw_proto == IPPROTO_SCTP;
+uint32_t hash, port_off, basis = ct->hash_basis;
 uint16_t min_dport, max_dport, curr_dport;
 uint16_t min_sport, max_sport, curr_sport;
-uint32_t hash, port_off;
 
 hash = nat_range_hash(fwd_key, ct->hash_basis, nat_info);
 port_off = nat_info->nat_flags & NAT_RANGE_RANDOM ? random_uint32() : hash;
+
+if (nat_info->nat_flags & NAT_PERSISTENT) {
+basis = 0;
+}
+
+hash = nat_range_hash(fwd_key, basis, nat_info);
+
+if (nat_info->nat_flags & NAT_RANGE_RANDOM) {
+port_off = random_uint16();
+} else {
+port_off =
+basis ? hash : nat_range_hash(fwd_key, ct->hash_basis, nat_info);
+}
+
 min_addr = nat_info->min_addr;
 max_addr = nat_info->max_addr;
 
diff --git a/lib/conntrack.h b/lib/conntrack.h
index 9b0c6aa88..ee7da099e 100644
--- a/lib/conntrack.h
+++ b/lib/conntrack.h
@@ -79,6 +79,7 @@ enum nat_action_e {
 
 enum nat_flags_e {
 NAT_RANGE_RANDOM = 1 << 0,
+NAT_PERSISTENT = 1 << 1,
 };
 
 struct nat_action_info_t {
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index c3334c667..fbf7ccabd 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -9413,6 +9413,8 @@ dp_execute_cb(void *aux_, struct dp_packet_batch 
*packets_,
 nat_action_info.nat_flags |= NAT_RANGE_RANDOM;
 break;
 case OVS_NAT_ATTR_PERSISTENT:
+nat_action_info.nat_flags |= NAT_PERSISTENT;
+break;
 case OVS_NAT_ATTR_PROTO_HASH:
 break;
 case OVS_NAT_ATTR_UNSPEC:
-- 
2.43.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 1/2] conntrack: Handle random selection for port ranges.

2024-02-07 Thread Paolo Valerio
The userspace conntrack only supported hash for port selection.
With the patch, both userspace and kernel datapath support the random
flag.

The default behavior remains the same, that is, if no flags are
specified, hash is selected.

Signed-off-by: Paolo Valerio 
---
 Documentation/ref/ovs-actions.7.rst |  3 +--
 NEWS|  3 +++
 lib/conntrack.c | 15 ---
 lib/conntrack.h |  5 +
 lib/dpif-netdev.c   |  4 +++-
 5 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/Documentation/ref/ovs-actions.7.rst 
b/Documentation/ref/ovs-actions.7.rst
index 36adcc5db..80acd9070 100644
--- a/Documentation/ref/ovs-actions.7.rst
+++ b/Documentation/ref/ovs-actions.7.rst
@@ -1551,8 +1551,7 @@ following arguments:
 should be selected. When a port range is specified, fallback to
 ephemeral ports does not happen, else, it will.  The port number
 selection can be informed by the optional ``random`` and ``hash`` flags
-described below.  The userspace datapath only supports the ``hash``
-behavior.
+described below.
 
 The optional *flags* are:
 
diff --git a/NEWS b/NEWS
index a6617546c..93046b963 100644
--- a/NEWS
+++ b/NEWS
@@ -1,5 +1,8 @@
 Post-v3.3.0
 
+   - Userspace datapath:
+ * Conntrack now supports 'random' flag for selecting ports in a range
+   while natting.
 
 
 v3.3.0 - xx xxx 
diff --git a/lib/conntrack.c b/lib/conntrack.c
index 013709bd6..e09ecdf33 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -,7 +,7 @@ nat_range_hash(const struct conn_key *key, uint32_t basis,
 /* Ports are stored in host byte order for convenience. */
 static void
 set_sport_range(const struct nat_action_info_t *ni, const struct conn_key *k,
-uint32_t hash, uint16_t *curr, uint16_t *min,
+uint32_t off, uint16_t *curr, uint16_t *min,
 uint16_t *max)
 {
 if (((ni->nat_action & NAT_ACTION_SNAT_ALL) == NAT_ACTION_SRC) ||
@@ -2241,19 +2241,19 @@ set_sport_range(const struct nat_action_info_t *ni, 
const struct conn_key *k,
 } else {
 *min = ni->min_port;
 *max = ni->max_port;
-*curr = *min + (hash % ((*max - *min) + 1));
+*curr =  *min + (off % ((*max - *min) + 1));
 }
 }
 
 static void
 set_dport_range(const struct nat_action_info_t *ni, const struct conn_key *k,
-uint32_t hash, uint16_t *curr, uint16_t *min,
+uint32_t off, uint16_t *curr, uint16_t *min,
 uint16_t *max)
 {
 if (ni->nat_action & NAT_ACTION_DST_PORT) {
 *min = ni->min_port;
 *max = ni->max_port;
-*curr = *min + (hash % ((*max - *min) + 1));
+*curr = *min + (off % ((*max - *min) + 1));
 } else {
 *curr = ntohs(k->dst.port);
 *min = *max = *curr;
@@ -2388,18 +2388,19 @@ nat_get_unique_tuple(struct conntrack *ct, struct conn 
*conn,
  fwd_key->nw_proto == IPPROTO_SCTP;
 uint16_t min_dport, max_dport, curr_dport;
 uint16_t min_sport, max_sport, curr_sport;
-uint32_t hash;
+uint32_t hash, port_off;
 
 hash = nat_range_hash(fwd_key, ct->hash_basis, nat_info);
+port_off = nat_info->nat_flags & NAT_RANGE_RANDOM ? random_uint32() : hash;
 min_addr = nat_info->min_addr;
 max_addr = nat_info->max_addr;
 
 find_addr(fwd_key, &min_addr, &max_addr, &addr, hash,
   (fwd_key->dl_type == htons(ETH_TYPE_IP)), nat_info);
 
-set_sport_range(nat_info, fwd_key, hash, &curr_sport,
+set_sport_range(nat_info, fwd_key, port_off, &curr_sport,
 &min_sport, &max_sport);
-set_dport_range(nat_info, fwd_key, hash, &curr_dport,
+set_dport_range(nat_info, fwd_key, port_off, &curr_dport,
 &min_dport, &max_dport);
 
 if (pat_proto) {
diff --git a/lib/conntrack.h b/lib/conntrack.h
index 0a888be45..9b0c6aa88 100644
--- a/lib/conntrack.h
+++ b/lib/conntrack.h
@@ -77,12 +77,17 @@ enum nat_action_e {
 NAT_ACTION_DST_PORT = 1 << 3,
 };
 
+enum nat_flags_e {
+NAT_RANGE_RANDOM = 1 << 0,
+};
+
 struct nat_action_info_t {
 union ct_addr min_addr;
 union ct_addr max_addr;
 uint16_t min_port;
 uint16_t max_port;
 uint16_t nat_action;
+uint16_t nat_flags;
 };
 
 struct conntrack *conntrack_init(void);
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index c1981137f..c3334c667 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -9409,9 +9409,11 @@ dp_execute_cb(void *aux_, struct dp_packet_batch 
*packets_,
 nl_attr_get_u16(b_nest);
 proto_num_max_specified = true;
 break;
+case OVS_NAT_ATTR_PROTO_RANDOM:
+nat_action_

Re: [ovs-dev] [PATCH v3 3/3] mcast-snooping: Fix comments format.

2023-11-21 Thread Paolo Valerio
David Marchand  writes:

> Capitalize comments and end them with a . when needed.
>
> Signed-off-by: David Marchand 
> ---
>  tests/mcast-snooping.at | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
>

Acked-by: Paolo Valerio 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 2/2] mcast-snooping: Flush flood and report ports when deleting interfaces.

2023-11-15 Thread Paolo Valerio
David Marchand  writes:

> When a configuration change triggers an interface destruction/creation
> (like for example, setting ofport_request), a port object may still be
> referenced as a fport or a rport in the mdb.
>
> Before the fix, when flooding multicast traffic:
> bridge("br0")
> -
>  0. priority 32768
> NORMAL
>  -> forwarding to mcast group port
>  >> mcast flood port is unknown, dropping
>  -> mcast flood port is input port, dropping
>  -> forwarding to mcast flood port
>
> Before the fix, when flooding igmp report traffic:
> bridge("br0")
> -
>  0. priority 32768
> NORMAL
>  >> mcast port is unknown, dropping the report
>  -> forwarding report to mcast flagged port
>  -> mcast port is input port, dropping the Report
>  -> forwarding report to mcast flagged port
>
> Add relevant cleanup and update unit tests.
>
> Fixes: 4fbbf8624868 ("mcast-snooping: Flush ports mdb when VLAN configuration 
> changed.")
> Signed-off-by: David Marchand 
> ---
> Changes since v1:
> - updated the test on report flooding,
>

Acked-by: Paolo Valerio 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 1/2] mcast-snooping: Test per port explicit flooding.

2023-11-15 Thread Paolo Valerio
David Marchand  writes:

> Various options affect how the mcast snooping module work.
>
> When multicast snooping is enabled and a reporter is known, it is still
> possible to flood associated packets to some other port via the
> mcast-snooping-flood option.
>
> If flooding unregistered traffic is disabled, it is still possible to
> flood multicast traffic too with the mcast-snooping-flood option.
>
> IGMP reports may have to be flooded to some ports explicitly with the
> mcast-snooping-flood-reports option.
>
> Test those parameters.
>
> Signed-off-by: David Marchand 
> ---

Thanks David.
The patch lgtm.

Acked-by: Paolo Valerio 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/2] mcast-snooping: Test per port explicit flooding.

2023-11-10 Thread Paolo Valerio
David Marchand  writes:

> On Thu, Nov 9, 2023 at 4:33 PM Paolo Valerio  wrote:
>>
>> David Marchand  writes:
>>
>> > When multicast snooping is enabled and a reporter is known, it is still
>> > possible to flood associated packets to some other port via the
>> > mcast-snooping-flood option.
>> >
>> > Test this combination.
>> >
>> > Signed-off-by: David Marchand 
>> > ---
>> >  tests/mcast-snooping.at | 88 +
>> >  1 file changed, 88 insertions(+)
>> >
>> > diff --git a/tests/mcast-snooping.at b/tests/mcast-snooping.at
>> > index d5b7c4774c..21c806ef63 100644
>> > --- a/tests/mcast-snooping.at
>> > +++ b/tests/mcast-snooping.at
>> > @@ -105,6 +105,94 @@ AT_CHECK([ovs-appctl mdb/show br0], [0], [dnl
>> >  OVS_VSWITCHD_STOP
>> >  AT_CLEANUP
>> >
>> > +
>> > +AT_SETUP([mcast - check flooding on ports])
>> > +OVS_VSWITCHD_START([])
>> > +
>> > +AT_CHECK([
>> > +ovs-vsctl set bridge br0 \
>> > +datapath_type=dummy \
>> > +mcast_snooping_enable=true \
>> > +other-config:mcast-snooping-disable-flood-unregistered=false
>> > +], [0])
>> > +
>>
>> in the case flood unregistered is disabled packets are supposed to
>> be sent to flood ports. While at it, it might also be worth testing that
>> like in the quick example at the end I used to test it.
>> WDYT?
>
> It sounds reasonable yes.
>
> I was also considering testing reports flooding.
> WDYT?
>

if you mean testing mcast-snooping-flood-reports, that would be nice.
This way that flag as well will have some coverage.

>
>>
>> > +AT_CHECK([ovs-ofctl add-flow br0 action=normal])
>> > +
>> > +AT_CHECK([
>> > +ovs-vsctl add-port br0 p1 \
>> > +-- set Interface p1 type=dummy other-config:hwaddr=aa:55:aa:55:00:01 
>> > ofport_request=1 \
>> > +-- add-port br0 p2 \
>> > +-- set Interface p2 type=dummy other-config:hwaddr=aa:55:aa:55:00:02 
>> > ofport_request=2 \
>> > +-- add-port br0 p3 \
>> > +-- set Interface p3 type=dummy other-config:hwaddr=aa:55:aa:55:00:03 
>> > ofport_request=3 \
>> > +], [0])
>> > +
>> > +ovs-appctl time/stop
>> > +
>> > +# send report packets
>> > +AT_CHECK([
>> > +ovs-appctl netdev-dummy/receive p1  \
>> > +
>> > '01005E010101000C29A027A10800451C00014002CBAEAC10221EE001010112140CE9E0010101'
>> > +], [0])
>> > +
>> > +AT_CHECK([ovs-appctl mdb/show br0], [0], [dnl
>> > + port  VLAN  GROUPAge
>> > +1 0  224.1.1.1   0
>> > +])
>> > +
>> > +AT_CHECK([ovs-appctl ofproto/trace 
>> > "in_port(3),eth(src=aa:55:aa:55:00:ff,dst=01:00:5e:5e:01:01),eth_type(0x0800),ipv4(src=10.0.0.1,dst=224.1.1.1,proto=17,tos=0,ttl=64,frag=no),udp(src=0,dst=8000)"],
>> >  [0], [dnl
>> > +Flow: 
>> > udp,in_port=3,vlan_tci=0x,dl_src=aa:55:aa:55:00:ff,dl_dst=01:00:5e:5e:01:01,nw_src=10.0.0.1,nw_dst=224.1.1.1,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=0,tp_dst=8000
>> > +
>>
>> I think the mac for 224.1.1.1 maps to 01:00:5e:01:01:01.
>
> Argh.. indeed, wrong copy/paste.
> Thanks for the review!
>

thank you for working on this!

>>
>> > +bridge("br0")
>> > +-
>> > + 0. priority 32768
>> > +NORMAL
>> > + -> forwarding to mcast group port
>
>
> -- 
> David Marchand

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/2] mcast-snooping: Test per port explicit flooding.

2023-11-09 Thread Paolo Valerio
David Marchand  writes:

> When multicast snooping is enabled and a reporter is known, it is still
> possible to flood associated packets to some other port via the
> mcast-snooping-flood option.
>
> Test this combination.
>
> Signed-off-by: David Marchand 
> ---
>  tests/mcast-snooping.at | 88 +
>  1 file changed, 88 insertions(+)
>
> diff --git a/tests/mcast-snooping.at b/tests/mcast-snooping.at
> index d5b7c4774c..21c806ef63 100644
> --- a/tests/mcast-snooping.at
> +++ b/tests/mcast-snooping.at
> @@ -105,6 +105,94 @@ AT_CHECK([ovs-appctl mdb/show br0], [0], [dnl
>  OVS_VSWITCHD_STOP
>  AT_CLEANUP
>  
> +
> +AT_SETUP([mcast - check flooding on ports])
> +OVS_VSWITCHD_START([])
> +
> +AT_CHECK([
> +ovs-vsctl set bridge br0 \
> +datapath_type=dummy \
> +mcast_snooping_enable=true \
> +other-config:mcast-snooping-disable-flood-unregistered=false
> +], [0])
> +

in the case flood unregistered is disabled packets are supposed to
be sent to flood ports. While at it, it might also be worth testing that
like in the quick example at the end I used to test it.
WDYT?

> +AT_CHECK([ovs-ofctl add-flow br0 action=normal])
> +
> +AT_CHECK([
> +ovs-vsctl add-port br0 p1 \
> +-- set Interface p1 type=dummy other-config:hwaddr=aa:55:aa:55:00:01 
> ofport_request=1 \
> +-- add-port br0 p2 \
> +-- set Interface p2 type=dummy other-config:hwaddr=aa:55:aa:55:00:02 
> ofport_request=2 \
> +-- add-port br0 p3 \
> +-- set Interface p3 type=dummy other-config:hwaddr=aa:55:aa:55:00:03 
> ofport_request=3 \
> +], [0])
> +
> +ovs-appctl time/stop
> +
> +# send report packets
> +AT_CHECK([
> +ovs-appctl netdev-dummy/receive p1  \
> +
> '01005E010101000C29A027A10800451C00014002CBAEAC10221EE001010112140CE9E0010101'
> +], [0])
> +
> +AT_CHECK([ovs-appctl mdb/show br0], [0], [dnl
> + port  VLAN  GROUPAge
> +1 0  224.1.1.1   0
> +])
> +
> +AT_CHECK([ovs-appctl ofproto/trace 
> "in_port(3),eth(src=aa:55:aa:55:00:ff,dst=01:00:5e:5e:01:01),eth_type(0x0800),ipv4(src=10.0.0.1,dst=224.1.1.1,proto=17,tos=0,ttl=64,frag=no),udp(src=0,dst=8000)"],
>  [0], [dnl
> +Flow: 
> udp,in_port=3,vlan_tci=0x,dl_src=aa:55:aa:55:00:ff,dl_dst=01:00:5e:5e:01:01,nw_src=10.0.0.1,nw_dst=224.1.1.1,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=0,tp_dst=8000
> +

I think the mac for 224.1.1.1 maps to 01:00:5e:01:01:01. 

> +bridge("br0")
> +-
> + 0. priority 32768
> +NORMAL
> + -> forwarding to mcast group port
> +
> +Final flow: unchanged
> +Megaflow: 
> recirc_id=0,eth,udp,in_port=3,dl_src=aa:55:aa:55:00:ff,dl_dst=01:00:5e:5e:01:01,nw_dst=224.1.1.1,nw_frag=no
> +Datapath actions: 1
> +])
> +
> +AT_CHECK([ovs-vsctl set port p2 other_config:mcast-snooping-flood=true])
> +
> +AT_CHECK([ovs-appctl ofproto/trace 
> "in_port(3),eth(src=aa:55:aa:55:00:ff,dst=01:00:5e:5e:01:01),eth_type(0x0800),ipv4(src=10.0.0.1,dst=224.1.1.1,proto=17,tos=0,ttl=64,frag=no),udp(src=0,dst=8000)"],
>  [0], [dnl
> +Flow: 
> udp,in_port=3,vlan_tci=0x,dl_src=aa:55:aa:55:00:ff,dl_dst=01:00:5e:5e:01:01,nw_src=10.0.0.1,nw_dst=224.1.1.1,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=0,tp_dst=8000
> +
> +bridge("br0")
> +-
> + 0. priority 32768
> +NORMAL
> + -> forwarding to mcast group port
> + -> forwarding to mcast flood port
> +
> +Final flow: unchanged
> +Megaflow: 
> recirc_id=0,eth,udp,in_port=3,dl_src=aa:55:aa:55:00:ff,dl_dst=01:00:5e:5e:01:01,nw_dst=224.1.1.1,nw_frag=no
> +Datapath actions: 1,2
> +])
> +
> +AT_CHECK([ovs-vsctl set port p3 other_config:mcast-snooping-flood=true])
> +
> +AT_CHECK([ovs-appctl ofproto/trace 
> "in_port(3),eth(src=aa:55:aa:55:00:ff,dst=01:00:5e:5e:01:01),eth_type(0x0800),ipv4(src=10.0.0.1,dst=224.1.1.1,proto=17,tos=0,ttl=64,frag=no),udp(src=0,dst=8000)"],
>  [0], [dnl
> +Flow: 
> udp,in_port=3,vlan_tci=0x,dl_src=aa:55:aa:55:00:ff,dl_dst=01:00:5e:5e:01:01,nw_src=10.0.0.1,nw_dst=224.1.1.1,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=0,tp_dst=8000
> +
> +bridge("br0")
> +-
> + 0. priority 32768
> +NORMAL
> + -> forwarding to mcast group port
> + -> forwarding to mcast flood port
> + -> mcast flood port is input port, dropping
> +
> +Final flow: unchanged
> +Megaflow: 
> recirc_id=0,eth,udp,in_port=3,dl_src=aa:55:aa:55:00:ff,dl_dst=01:00:5e:5e:01:01,nw_dst=224.1.1.1,nw_frag=no
> +Datapath actions: 1,2
> +])
> +
> +OVS_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +
>  AT_SETUP([mcast - delete the port mdb when vlan configuration changed])
>  OVS_VSWITCHD_START([])
>  
> -- 
> 2.41.0
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev


diff --git a/tests/mcast-snooping.at b/tests/mcast-snooping.at
index 21c806ef6..787b09570 100644
--- a/tests/mcast-snooping.at
+++ b/tests/mcast-snooping.at
@@ -189,6 +189,41 @@ Megaflow: 
recirc_id=0,eth,udp,in_port=3,dl_src=aa:55:aa:55:00:

Re: [ovs-dev] [PATCH v3 branch-2.17 1/2] conntrack: simplify cleanup path

2023-10-12 Thread Paolo Valerio
Frode Nordahl  writes:

> On Tue, Oct 3, 2023 at 9:06 PM Aaron Conole  wrote:
>>
>> The conntrack cleanup and allocation code is spread across multiple
>> list invocations.  This was changed in mainline code when the timeout
>> expiration lists were refactored, but backporting those fixes would
>> be a rather large effort.  Instead, take only the changes we need
>> to backport "contrack: Remove nat_conn introducing key directionality"
>> into branch-2.17.
>
> Thanks alot for your help in backporting this patch.
>
> We have a managed customer environment where circumstances make the
> issue trigger with a rate of 70% when performing a certain action. Up
> until now they have been running with a temporary package containing
> the patches from
> https://patchwork.ozlabs.org/project/openvswitch/list/?series=351579&state=*
>
> To test this series, they have first re-confirmed that they see the
> issue with a packaged version of OVS 2.17.7, and then switched to a
> packaged version of OVS 2.17.7 with these patches and confirmed that
> the issue is no longer occurring. The same package has been in
> production use for the past week, being exposed to real world traffic.
> No side effects or incidents to report.
>
> Tested-by: Frode Nordahl 
>

Thanks Frode, Aaron and Simon.

On my side, I don't see any issues with the series, both patches look
good to me.

> -- 
> Frode Nordahl
>
>> Signed-off-by: Aaron Conole 
>> Co-authored-by: Paolo Valerio 
>> Signed-off-by: Paolo Valerio 
>> ---
>>  lib/conntrack.c | 60 +++--
>>  1 file changed, 18 insertions(+), 42 deletions(-)
>>
>> diff --git a/lib/conntrack.c b/lib/conntrack.c
>> index fff8e77db1..83a73995d6 100644
>> --- a/lib/conntrack.c
>> +++ b/lib/conntrack.c
>> @@ -94,9 +94,8 @@ static bool valid_new(struct dp_packet *pkt, struct 
>> conn_key *);
>>  static struct conn *new_conn(struct conntrack *ct, struct dp_packet *pkt,
>>   struct conn_key *, long long now,
>>   uint32_t tp_id);
>> -static void delete_conn_cmn(struct conn *);
>> +static void delete_conn__(struct conn *);
>>  static void delete_conn(struct conn *);
>> -static void delete_conn_one(struct conn *conn);
>>  static enum ct_update_res conn_update(struct conntrack *ct, struct conn 
>> *conn,
>>struct dp_packet *pkt,
>>struct conn_lookup_ctx *ctx,
>> @@ -444,9 +443,11 @@ zone_limit_delete(struct conntrack *ct, uint16_t zone)
>>  }
>>
>>  static void
>> -conn_clean_cmn(struct conntrack *ct, struct conn *conn)
>> +conn_clean(struct conntrack *ct, struct conn *conn)
>>  OVS_REQUIRES(ct->ct_lock)
>>  {
>> +ovs_assert(conn->conn_type == CT_CONN_TYPE_DEFAULT);
>> +
>>  if (conn->alg) {
>>  expectation_clean(ct, &conn->key);
>>  }
>> @@ -458,19 +459,9 @@ conn_clean_cmn(struct conntrack *ct, struct conn *conn)
>>  if (zl && zl->czl.zone_limit_seq == conn->zone_limit_seq) {
>>  zl->czl.count--;
>>  }
>> -}
>>
>> -/* Must be called with 'conn' of 'conn_type' CT_CONN_TYPE_DEFAULT.  Also
>> - * removes the associated nat 'conn' from the lookup datastructures. */
>> -static void
>> -conn_clean(struct conntrack *ct, struct conn *conn)
>> -OVS_REQUIRES(ct->ct_lock)
>> -{
>> -ovs_assert(conn->conn_type == CT_CONN_TYPE_DEFAULT);
>> -
>> -conn_clean_cmn(ct, conn);
>>  if (conn->nat_conn) {
>> -uint32_t hash = conn_key_hash(&conn->nat_conn->key, ct->hash_basis);
>> +hash = conn_key_hash(&conn->nat_conn->key, ct->hash_basis);
>>  cmap_remove(&ct->conns, &conn->nat_conn->cm_node, hash);
>>  }
>>  ovs_list_remove(&conn->exp_node);
>> @@ -479,19 +470,6 @@ conn_clean(struct conntrack *ct, struct conn *conn)
>>  atomic_count_dec(&ct->n_conn);
>>  }
>>
>> -static void
>> -conn_clean_one(struct conntrack *ct, struct conn *conn)
>> -OVS_REQUIRES(ct->ct_lock)
>> -{
>> -conn_clean_cmn(ct, conn);
>> -if (conn->conn_type == CT_CONN_TYPE_DEFAULT) {
>> -ovs_list_remove(&conn->exp_node);
>> -conn->cleaned = true;
>> -atomic_count_dec(&ct->n_conn);
>> -}
>> -ovsrcu_postpone(delete_conn_one, conn);
>> -}
>> -
&

[ovs-dev] [PATCH v2] ofproto-dpif-xlate: Fix recirculation with patch port and controller.

2023-09-05 Thread Paolo Valerio
If a packet originating from the controller recirculates after going
through a patch port, it gets dropped with the following message:

ofproto_dpif_upcall(handler8)|INFO|received packet on unassociated
  datapath port 4294967295

This happens because there's no xport_uuid in the recirculation node
and at the same type in_port refers to the patch port.

The patch, in the case of zeroed uuid, checks that in_port belongs to
the bridge and returns the related ofproto.

Signed-off-by: Paolo Valerio 
---
 ofproto/ofproto-dpif-xlate.c |   12 +++-
 tests/ofproto-dpif.at|   34 ++
 2 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 47ea0f47e..fcd547645 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -1615,7 +1615,8 @@ xlate_lookup_ofproto_(const struct dpif_backer *backer,
 }
 
 ofp_port_t in_port = recirc_id_node->state.metadata.in_port;
-if (in_port != OFPP_NONE && in_port != OFPP_CONTROLLER) {
+if (in_port != OFPP_NONE && in_port != OFPP_CONTROLLER &&
+!uuid_is_zero(&recirc_id_node->state.xport_uuid)) {
 struct uuid xport_uuid = recirc_id_node->state.xport_uuid;
 xport = xport_lookup_by_uuid(xcfg, &xport_uuid);
 if (xport && xport->xbridge && xport->xbridge->ofproto) {
@@ -1626,11 +1627,19 @@ xlate_lookup_ofproto_(const struct dpif_backer *backer,
  * that the packet originated from the controller via an OpenFlow
  * "packet-out".  The right thing to do is to find just the
  * ofproto.  There is no xport, which is OK.
+ * Also a zeroed xport_uuid with a valid in_port, means that
+ * the packet originated from OFPP_CONTROLLER passed
+ * through a patch port.
  *
  * OFPP_NONE can also indicate that a bond caused recirculation. */
 struct uuid uuid = recirc_id_node->state.ofproto_uuid;
 const struct xbridge *bridge = xbridge_lookup_by_uuid(xcfg, &uuid);
+
 if (bridge && bridge->ofproto) {
+if (in_port != OFPP_CONTROLLER && in_port != OFPP_NONE &&
+!get_ofp_port(bridge, in_port)) {
+goto xport_lookup;
+}
 if (errorp) {
 *errorp = NULL;
 }
@@ -1643,6 +1652,7 @@ xlate_lookup_ofproto_(const struct dpif_backer *backer,
 }
 }
 
+xport_lookup:
 xport = xport_lookup(xcfg, tnl_port_should_receive(flow)
  ? tnl_port_receive(flow)
  : odp_port_to_ofport(backer, flow->in_port.odp_port));
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index f242f77f3..a0a4aaf5d 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -5854,6 +5854,40 @@ OVS_WAIT_UNTIL([check_flows], [ovs-ofctl dump-flows br0])
 OVS_VSWITCHD_STOP
 AT_CLEANUP
 
+# Checks for regression against a bug in which OVS dropped packets
+# originating from the the controller passing through a patch port
+AT_SETUP([ofproto-dpif - packet-out recirculation OFPP_CONTROLLER and patch 
port])
+OVS_VSWITCHD_START(
+[add-port br0 patch-br1 -- \
+ set interface patch-br1 type=patch options:peer=patch-br0 -- \
+ add-br br1 -- set bridge br1 datapath-type=dummy fail-mode=secure -- \
+ add-port br1 patch-br0 -- set interface patch-br0 type=patch 
options:peer=patch-br1
+])
+
+add_of_ports --pcap br1 1
+
+AT_DATA([flows-br0.txt], [dnl
+table=0 icmp actions=output:patch-br1
+])
+AT_CHECK([ovs-ofctl add-flows br0 flows-br0.txt])
+
+AT_DATA([flows-br1.txt], [dnl
+table=0, icmp actions=ct(table=1,zone=1)
+table=1, ct_state=+trk, icmp actions=p1
+])
+AT_CHECK([ovs-ofctl add-flows br1 flows-br1.txt])
+
+packet=50540007505400050800455c8001b94dc0a80001c0a80002080013fc000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f
+AT_CHECK([ovs-ofctl packet-out br0 "in_port=CONTROLLER packet=$packet 
actions=table"])
+
+OVS_WAIT_UNTIL_EQUAL([ovs-ofctl dump-flows -m br1 | grep "ct_state" | 
ofctl_strip], [dnl
+ table=1, n_packets=1, n_bytes=106, ct_state=+trk,icmp actions=output:2])
+
+OVS_WAIT_UNTIL([ovs-pcap p1-tx.pcap | grep -q "$packet"])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([ofproto-dpif - debug_slow action])
 OVS_VSWITCHD_START
 add_of_ports br0 1 2 3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3] conntrack: Remove nat_conn introducing key directionality.

2023-08-30 Thread Paolo Valerio
From: hepeng 

The patch avoids the extra allocation for nat_conn.
Currently, when doing NAT, the userspace conntrack will use an extra
conn for the two directions in a flow. However, each conn has actually
the two keys for both orig and rev directions. This patch introduces a
key_node[CT_DIRS] member as per Aaron's suggestion in the conn which
consists of a key, direction, and a cmap_node for hash lookup so
addressing the feedback received by the original patch [0].

[0] 
https://patchwork.ozlabs.org/project/openvswitch/patch/20201129033255.64647-2-hepeng.0...@bytedance.com/

Signed-off-by: Peng He 
Co-authored-by: Paolo Valerio 
Signed-off-by: Paolo Valerio 
---
v3:
  - resolved a potentially UB with offsetof() and integer constant
expression (Ilya)
  - int to bool assignment (Ilya)
  - check the direction early in conntrack_dump_next() to avoid
unneeded operations (Ilya)
  - unrelated change added that turns the branch:
if (!conn_lookup()) { return true; } else { return false; }
into return !conn_lookup() (Ilya)
  - cosmetic/coding style changes (Ilya)

v2:
  - use enum value instead of bool (Aaron).
  - s/conn_for_expectation/conn_for_exp/ in process_ftp_ctl_v6()
to avoid long line.
  - removed CT_CONN_TYPE_* reference in two comments.
---
 lib/conntrack-private.h |   19 +-
 lib/conntrack-tp.c  |6 +
 lib/conntrack.c |  366 +++
 3 files changed, 164 insertions(+), 227 deletions(-)

diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index bb326868e..3fd5fccd3 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -49,6 +49,12 @@ struct ct_endpoint {
  * hashing in ct_endpoint_hash_add(). */
 BUILD_ASSERT_DECL(sizeof(struct ct_endpoint) == sizeof(union ct_addr) + 4);
 
+enum key_dir {
+CT_DIR_FWD = 0,
+CT_DIR_REV,
+CT_DIRS,
+};
+
 /* Changes to this structure need to be reflected in conn_key_hash()
  * and conn_key_cmp(). */
 struct conn_key {
@@ -112,20 +118,18 @@ enum ct_timeout {
 
 #define N_EXP_LISTS 100
 
-enum OVS_PACKED_ENUM ct_conn_type {
-CT_CONN_TYPE_DEFAULT,
-CT_CONN_TYPE_UN_NAT,
+struct conn_key_node {
+enum key_dir dir;
+struct conn_key key;
+struct cmap_node cm_node;
 };
 
 struct conn {
 /* Immutable data. */
-struct conn_key key;
-struct conn_key rev_key;
+struct conn_key_node key_node[CT_DIRS];
 struct conn_key parent_key; /* Only used for orig_tuple support. */
-struct cmap_node cm_node;
 uint16_t nat_action;
 char *alg;
-struct conn *nat_conn; /* The NAT 'conn' context, if there is one. */
 atomic_flag reclaimed; /* False during the lifetime of the connection,
 * True as soon as a thread has started freeing
 * its memory. */
@@ -150,7 +154,6 @@ struct conn {
 
 /* Immutable data. */
 bool alg_related; /* True if alg data connection. */
-enum ct_conn_type conn_type;
 
 uint32_t tp_id; /* Timeout policy ID. */
 };
diff --git a/lib/conntrack-tp.c b/lib/conntrack-tp.c
index 89cb2704a..2149fdc73 100644
--- a/lib/conntrack-tp.c
+++ b/lib/conntrack-tp.c
@@ -253,7 +253,8 @@ conn_update_expiration(struct conntrack *ct, struct conn 
*conn,
 }
 VLOG_DBG_RL(&rl, "Update timeout %s zone=%u with policy id=%d "
 "val=%u sec.",
-ct_timeout_str[tm], conn->key.zone, conn->tp_id, val);
+ct_timeout_str[tm], conn->key_node[CT_DIR_FWD].key.zone,
+conn->tp_id, val);
 
 atomic_store_relaxed(&conn->expiration, now + val * 1000);
 }
@@ -273,7 +274,8 @@ conn_init_expiration(struct conntrack *ct, struct conn 
*conn,
 }
 
 VLOG_DBG_RL(&rl, "Init timeout %s zone=%u with policy id=%d val=%u sec.",
-ct_timeout_str[tm], conn->key.zone, conn->tp_id, val);
+ct_timeout_str[tm], conn->key_node[CT_DIR_FWD].key.zone,
+conn->tp_id, val);
 
 conn->expiration = now + val * 1000;
 }
diff --git a/lib/conntrack.c b/lib/conntrack.c
index 5f1176d33..47a443fba 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -103,7 +103,7 @@ static enum ct_update_res conn_update(struct conntrack *ct, 
struct conn *conn,
   struct conn_lookup_ctx *ctx,
   long long now);
 static long long int conn_expiration(const struct conn *);
-static bool conn_expired(struct conn *, long long now);
+static bool conn_expired(const struct conn *, long long now);
 static void conn_expire_push_front(struct conntrack *ct, struct conn *conn);
 static void set_mark(struct dp_packet *, struct conn *,
  uint32_t val, uint32_t mask);
@@ -113,8 +113,7 @@ static void set_label(struct dp_packet *, struct conn *,
 static void *clean_thread_main(void *f_);
 
 static bool
-nat_get_unique_tuple(str

Re: [ovs-dev] [PATCH v2] conntrack: Remove nat_conn introducing key directionality.

2023-08-30 Thread Paolo Valerio
Ilya Maximets  writes:

> On 8/23/23 14:53, Paolo Valerio wrote:
>> From: hepeng 
>> 
>> The patch avoids the extra allocation for nat_conn.
>> Currently, when doing NAT, the userspace conntrack will use an extra
>> conn for the two directions in a flow. However, each conn has actually
>> the two keys for both orig and rev directions. This patch introduces a
>> key_node[CT_DIRS] member as per Aaron's suggestion in the conn which
>> consists of a key, direction, and a cmap_node for hash lookup so
>> addressing the feedback received by the original patch [0].
>> 
>> [0] 
>> https://patchwork.ozlabs.org/project/openvswitch/patch/20201129033255.64647-2-hepeng.0...@bytedance.com/
>> 
>> Signed-off-by: Peng He 
>> Co-authored-by: Paolo Valerio 
>> Signed-off-by: Paolo Valerio 
>> ---
>> v2:
>>   - use enum value instead of bool (Aaron).
>>   - s/conn_for_expectation/conn_for_exp/ in process_ftp_ctl_v6()
>> to avoid long line.
>>   - removed CT_CONN_TYPE_* reference in two comments.
>> ---
>>  lib/conntrack-private.h |   19 +--
>>  lib/conntrack-tp.c  |6 +
>>  lib/conntrack.c |  350 
>> +++
>>  3 files changed, 155 insertions(+), 220 deletions(-)
>> 
>> diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
>> index bb326868e..3fd5fccd3 100644
>> --- a/lib/conntrack-private.h
>> +++ b/lib/conntrack-private.h
>> @@ -49,6 +49,12 @@ struct ct_endpoint {
>>   * hashing in ct_endpoint_hash_add(). */
>>  BUILD_ASSERT_DECL(sizeof(struct ct_endpoint) == sizeof(union ct_addr) + 4);
>>  
>> +enum key_dir {
>> +CT_DIR_FWD = 0,
>> +CT_DIR_REV,
>> +CT_DIRS,
>> +};
>> +
>>  /* Changes to this structure need to be reflected in conn_key_hash()
>>   * and conn_key_cmp(). */
>>  struct conn_key {
>> @@ -112,20 +118,18 @@ enum ct_timeout {
>>  
>>  #define N_EXP_LISTS 100
>>  
>> -enum OVS_PACKED_ENUM ct_conn_type {
>> -CT_CONN_TYPE_DEFAULT,
>> -CT_CONN_TYPE_UN_NAT,
>> +struct conn_key_node {
>> +enum key_dir dir;
>> +struct conn_key key;
>> +struct cmap_node cm_node;
>>  };
>
> This structure and the whole business of adding the connection
> to cmap twice with different hashes is bothering me, but I really
> don't have a better solution for this, so let it be...  :)
>

this happens for the nat case. So far two connections were added to
represent one connection, with one being of type CT_CONN_TYPE_UN_NAT
(with the assumption of working with CT_CONN_TYPE_DEFAULT for most
operations).

> Just to refresh my memory, we do that because original and reply
> tuples can be completely different due to NAT, so the hashing
> being symmetric doesn't help in this case, right?
>

Yes, nat plays a role here, and it is the only case where we have two
reference to the same conn in the cmap.

If we consider the reason this has been revived (the bug it solves),
mostly the problem happens when packets go through nat, but without
actually changing the packet (all-zero with no clash). In such case two
connections get [allocated and copied] added to the cmap and if a lookup
ends up retrieving a conn of type CT_CONN_TYPE_UN_NAT with its
respective DEFAULT expired, the assertion kicks in once the second
lookup happens when attempting to get the default conn.
In general, NAT (with or without packet mangling), expired conn and hash
collision should be theoretically enough to hit the issue.

>>  
>>  struct conn {
>>  /* Immutable data. */
>> -struct conn_key key;
>> -struct conn_key rev_key;
>> +struct conn_key_node key_node[CT_DIRS];
>>  struct conn_key parent_key; /* Only used for orig_tuple support. */
>> -struct cmap_node cm_node;
>>  uint16_t nat_action;
>>  char *alg;
>> -struct conn *nat_conn; /* The NAT 'conn' context, if there is one. */
>>  atomic_flag reclaimed; /* False during the lifetime of the connection,
>>  * True as soon as a thread has started freeing
>>  * its memory. */
>> @@ -150,7 +154,6 @@ struct conn {
>>  
>>  /* Immutable data. */
>>  bool alg_related; /* True if alg data connection. */
>> -enum ct_conn_type conn_type;
>>  
>>  uint32_t tp_id; /* Timeout policy ID. */
>>  };
>> diff --git a/lib/conntrack-tp.c b/lib/conntrack-tp.c
>> index 89cb2704a..2149fdc73 100644
>> --- a/lib/conntrack-tp.c
>> +++ b/lib/conntrack-tp.c
>> @@ -253,7 +253,8 @@ conn_update_expi

[ovs-dev] [PATCH v2] conntrack: Remove nat_conn introducing key directionality.

2023-08-23 Thread Paolo Valerio
From: hepeng 

The patch avoids the extra allocation for nat_conn.
Currently, when doing NAT, the userspace conntrack will use an extra
conn for the two directions in a flow. However, each conn has actually
the two keys for both orig and rev directions. This patch introduces a
key_node[CT_DIRS] member as per Aaron's suggestion in the conn which
consists of a key, direction, and a cmap_node for hash lookup so
addressing the feedback received by the original patch [0].

[0] 
https://patchwork.ozlabs.org/project/openvswitch/patch/20201129033255.64647-2-hepeng.0...@bytedance.com/

Signed-off-by: Peng He 
Co-authored-by: Paolo Valerio 
Signed-off-by: Paolo Valerio 
---
v2:
  - use enum value instead of bool (Aaron).
  - s/conn_for_expectation/conn_for_exp/ in process_ftp_ctl_v6()
to avoid long line.
  - removed CT_CONN_TYPE_* reference in two comments.
---
 lib/conntrack-private.h |   19 +--
 lib/conntrack-tp.c  |6 +
 lib/conntrack.c |  350 +++
 3 files changed, 155 insertions(+), 220 deletions(-)

diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index bb326868e..3fd5fccd3 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -49,6 +49,12 @@ struct ct_endpoint {
  * hashing in ct_endpoint_hash_add(). */
 BUILD_ASSERT_DECL(sizeof(struct ct_endpoint) == sizeof(union ct_addr) + 4);
 
+enum key_dir {
+CT_DIR_FWD = 0,
+CT_DIR_REV,
+CT_DIRS,
+};
+
 /* Changes to this structure need to be reflected in conn_key_hash()
  * and conn_key_cmp(). */
 struct conn_key {
@@ -112,20 +118,18 @@ enum ct_timeout {
 
 #define N_EXP_LISTS 100
 
-enum OVS_PACKED_ENUM ct_conn_type {
-CT_CONN_TYPE_DEFAULT,
-CT_CONN_TYPE_UN_NAT,
+struct conn_key_node {
+enum key_dir dir;
+struct conn_key key;
+struct cmap_node cm_node;
 };
 
 struct conn {
 /* Immutable data. */
-struct conn_key key;
-struct conn_key rev_key;
+struct conn_key_node key_node[CT_DIRS];
 struct conn_key parent_key; /* Only used for orig_tuple support. */
-struct cmap_node cm_node;
 uint16_t nat_action;
 char *alg;
-struct conn *nat_conn; /* The NAT 'conn' context, if there is one. */
 atomic_flag reclaimed; /* False during the lifetime of the connection,
 * True as soon as a thread has started freeing
 * its memory. */
@@ -150,7 +154,6 @@ struct conn {
 
 /* Immutable data. */
 bool alg_related; /* True if alg data connection. */
-enum ct_conn_type conn_type;
 
 uint32_t tp_id; /* Timeout policy ID. */
 };
diff --git a/lib/conntrack-tp.c b/lib/conntrack-tp.c
index 89cb2704a..2149fdc73 100644
--- a/lib/conntrack-tp.c
+++ b/lib/conntrack-tp.c
@@ -253,7 +253,8 @@ conn_update_expiration(struct conntrack *ct, struct conn 
*conn,
 }
 VLOG_DBG_RL(&rl, "Update timeout %s zone=%u with policy id=%d "
 "val=%u sec.",
-ct_timeout_str[tm], conn->key.zone, conn->tp_id, val);
+ct_timeout_str[tm], conn->key_node[CT_DIR_FWD].key.zone,
+conn->tp_id, val);
 
 atomic_store_relaxed(&conn->expiration, now + val * 1000);
 }
@@ -273,7 +274,8 @@ conn_init_expiration(struct conntrack *ct, struct conn 
*conn,
 }
 
 VLOG_DBG_RL(&rl, "Init timeout %s zone=%u with policy id=%d val=%u sec.",
-ct_timeout_str[tm], conn->key.zone, conn->tp_id, val);
+ct_timeout_str[tm], conn->key_node[CT_DIR_FWD].key.zone,
+conn->tp_id, val);
 
 conn->expiration = now + val * 1000;
 }
diff --git a/lib/conntrack.c b/lib/conntrack.c
index 5f1176d33..f75f9a8f1 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -113,8 +113,7 @@ static void set_label(struct dp_packet *, struct conn *,
 static void *clean_thread_main(void *f_);
 
 static bool
-nat_get_unique_tuple(struct conntrack *ct, const struct conn *conn,
- struct conn *nat_conn,
+nat_get_unique_tuple(struct conntrack *ct, struct conn *conn,
  const struct nat_action_info_t *nat_info);
 
 static uint8_t
@@ -208,7 +207,7 @@ static alg_helper alg_helpers[] = {
 #define ALG_WC_SRC_PORT 0
 
 /* If the total number of connections goes above this value, no new connections
- * are accepted; this is for CT_CONN_TYPE_DEFAULT connections. */
+ * are accepted. */
 #define DEFAULT_N_CONN_LIMIT 300
 
 /* Does a member by member comparison of two conn_keys; this
@@ -234,61 +233,6 @@ conn_key_cmp(const struct conn_key *key1, const struct 
conn_key *key2)
 return 1;
 }
 
-static void
-ct_print_conn_info(const struct conn *c, const char *log_msg,
-   enum vlog_level vll, bool force, bool rl_on)
-{
-#define CT_VLOG(RL_ON, LEVEL, ...)  \
-do {\
-if (RL_ON

[ovs-dev] [PATCH RFC] conntrack: Remove nat_conn introducing key directionality.

2023-08-14 Thread Paolo Valerio
From: hepeng 

The patch avoids the extra allocation for nat_conn.
Currently, when doing NAT, the userspace conntrack will use an extra
conn for the two directions in a flow. However, each conn has actually
the two keys for both orig and rev directions. This patch introduces a
key_node[CT_DIRS] member in the conn which consists of a key, direction,
and a cmap_node for hash lookup so addressing the feedback received by
the original patch [0].

The patch is an alternative approach to [1].
The patch has the advantage of solving the issue in a clean way, but,
unlike [1], it has the disadvantage of requiring some changes to the
connection clean up for older branches (down to 2.17) and all the
related operations. To make an idea, [0] contains most of the changes
required.

[0] 
https://patchwork.ozlabs.org/project/openvswitch/patch/20201129033255.64647-2-hepeng.0...@bytedance.com/
[1] https://patchwork.ozlabs.org/project/openvswitch/list/?series=351579&state=*

Signed-off-by: Peng He 
Co-authored-by: Paolo Valerio 
Signed-off-by: Paolo Valerio 
---
 lib/conntrack-private.h |   19 ++-
 lib/conntrack-tp.c  |6 +
 lib/conntrack.c |  339 +++
 3 files changed, 149 insertions(+), 215 deletions(-)

diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index bb326868e..3fd5fccd3 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -49,6 +49,12 @@ struct ct_endpoint {
  * hashing in ct_endpoint_hash_add(). */
 BUILD_ASSERT_DECL(sizeof(struct ct_endpoint) == sizeof(union ct_addr) + 4);
 
+enum key_dir {
+CT_DIR_FWD = 0,
+CT_DIR_REV,
+CT_DIRS,
+};
+
 /* Changes to this structure need to be reflected in conn_key_hash()
  * and conn_key_cmp(). */
 struct conn_key {
@@ -112,20 +118,18 @@ enum ct_timeout {
 
 #define N_EXP_LISTS 100
 
-enum OVS_PACKED_ENUM ct_conn_type {
-CT_CONN_TYPE_DEFAULT,
-CT_CONN_TYPE_UN_NAT,
+struct conn_key_node {
+enum key_dir dir;
+struct conn_key key;
+struct cmap_node cm_node;
 };
 
 struct conn {
 /* Immutable data. */
-struct conn_key key;
-struct conn_key rev_key;
+struct conn_key_node key_node[CT_DIRS];
 struct conn_key parent_key; /* Only used for orig_tuple support. */
-struct cmap_node cm_node;
 uint16_t nat_action;
 char *alg;
-struct conn *nat_conn; /* The NAT 'conn' context, if there is one. */
 atomic_flag reclaimed; /* False during the lifetime of the connection,
 * True as soon as a thread has started freeing
 * its memory. */
@@ -150,7 +154,6 @@ struct conn {
 
 /* Immutable data. */
 bool alg_related; /* True if alg data connection. */
-enum ct_conn_type conn_type;
 
 uint32_t tp_id; /* Timeout policy ID. */
 };
diff --git a/lib/conntrack-tp.c b/lib/conntrack-tp.c
index 89cb2704a..2149fdc73 100644
--- a/lib/conntrack-tp.c
+++ b/lib/conntrack-tp.c
@@ -253,7 +253,8 @@ conn_update_expiration(struct conntrack *ct, struct conn 
*conn,
 }
 VLOG_DBG_RL(&rl, "Update timeout %s zone=%u with policy id=%d "
 "val=%u sec.",
-ct_timeout_str[tm], conn->key.zone, conn->tp_id, val);
+ct_timeout_str[tm], conn->key_node[CT_DIR_FWD].key.zone,
+conn->tp_id, val);
 
 atomic_store_relaxed(&conn->expiration, now + val * 1000);
 }
@@ -273,7 +274,8 @@ conn_init_expiration(struct conntrack *ct, struct conn 
*conn,
 }
 
 VLOG_DBG_RL(&rl, "Init timeout %s zone=%u with policy id=%d val=%u sec.",
-ct_timeout_str[tm], conn->key.zone, conn->tp_id, val);
+ct_timeout_str[tm], conn->key_node[CT_DIR_FWD].key.zone,
+conn->tp_id, val);
 
 conn->expiration = now + val * 1000;
 }
diff --git a/lib/conntrack.c b/lib/conntrack.c
index 5f1176d33..6f219eb9e 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -113,8 +113,7 @@ static void set_label(struct dp_packet *, struct conn *,
 static void *clean_thread_main(void *f_);
 
 static bool
-nat_get_unique_tuple(struct conntrack *ct, const struct conn *conn,
- struct conn *nat_conn,
+nat_get_unique_tuple(struct conntrack *ct, struct conn *conn,
  const struct nat_action_info_t *nat_info);
 
 static uint8_t
@@ -234,61 +233,6 @@ conn_key_cmp(const struct conn_key *key1, const struct 
conn_key *key2)
 return 1;
 }
 
-static void
-ct_print_conn_info(const struct conn *c, const char *log_msg,
-   enum vlog_level vll, bool force, bool rl_on)
-{
-#define CT_VLOG(RL_ON, LEVEL, ...)  \
-do {\
-if (RL_ON) {\
-static struct vlog_rate_limit rl_ = VLOG_RATE_LIMIT_INIT(5, 5); \
-   

[ovs-dev] [PATCH v4] conntrack: Extract l4 information for SCTP.

2023-07-12 Thread Paolo Valerio
since a27d70a89 ("conntrack: add generic IP protocol support") all
the unrecognized IP protocols get handled using ct_proto_other ops
and are managed as L3 using 3 tuples.

This patch stores L4 information for SCTP in the conn_key so that
multiple conn instances, instead of one with ports zeroed, will be
created when there are multiple SCTP connections between two hosts.
It also performs crc32c check when not offloaded, and adds SCTP to
pat_enabled.

With this patch, given two SCTP association between two hosts,
tracking the connection will result in:

sctp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=55884,dport=5201),reply=(src=10.1.1.1,dst=10.1.1.2,sport=5201,dport=12345),zone=1
sctp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=59874,dport=5202),reply=(src=10.1.1.1,dst=10.1.1.2,sport=5202,dport=12346),zone=1

instead of:

sctp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=0,dport=0),reply=(src=10.1.1.1,dst=10.1.1.2,sport=0,dport=0),zone=1

Signed-off-by: Paolo Valerio 
---
v4
- rebased on top of current master
- test: turned graceful termination into ABORT.
  The graceful shutdown sequence could lead to failures because of a
  very small default timeout set for SHUTDOWN_SENT state.
  The proto state transition sequence for the kerneldp is now:
protoinfo=(state=CLOSED,vtag_orig=0,vtag_reply=3431784816)
protoinfo=(state=COOKIE_WAIT,vtag_orig=4204641061,vtag_reply=3431784816)
protoinfo=(state=COOKIE_ECHOED,vtag_orig=4204641061,vtag_reply=3431784816)
protoinfo=(state=ESTABLISHED,vtag_orig=4204641061,vtag_reply=3431784816)
protoinfo=(state=ESTABLISHED,vtag_orig=4204641061,vtag_reply=3431784816)
protoinfo=(state=ESTABLISHED,vtag_orig=4204641061,vtag_reply=3431784816)
protoinfo=(state=CLOSED,vtag_orig=4204641061,vtag_reply=3431784816)


v3:
- rebased on top of current master
- minor adjustments: commit message, comments

v2:
- ordered includes
- while at it, slightly modified the commit subject (capital letter
  and period)
---
 lib/conntrack.c  |   86 ++
 lib/packets.h|   11 +
 tests/system-kmod-macros.at  |   11 +
 tests/system-traffic.at  |   73 
 tests/system-userspace-macros.at |7 +++
 5 files changed, 187 insertions(+), 1 deletion(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 4375c03e2..786531e21 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -27,6 +27,7 @@
 #include "conntrack-private.h"
 #include "conntrack-tp.h"
 #include "coverage.h"
+#include "crc32c.h"
 #include "csum.h"
 #include "ct-dpif.h"
 #include "dp-packet.h"
@@ -41,6 +42,7 @@
 #include "random.h"
 #include "rculist.h"
 #include "timeval.h"
+#include "unaligned.h"
 
 VLOG_DEFINE_THIS_MODULE(conntrack);
 
@@ -771,6 +773,8 @@ pat_packet(struct dp_packet *pkt, const struct conn_key 
*key)
 packet_set_tcp_port(pkt, key->dst.port, key->src.port);
 } else if (key->nw_proto == IPPROTO_UDP) {
 packet_set_udp_port(pkt, key->dst.port, key->src.port);
+} else if (key->nw_proto == IPPROTO_SCTP) {
+packet_set_sctp_port(pkt, key->dst.port, key->src.port);
 }
 }
 
@@ -1675,6 +1679,26 @@ checksum_valid(const struct conn_key *key, const void 
*data, size_t size,
 return valid;
 }
 
+static inline bool
+sctp_checksum_valid(const void *data, size_t size)
+{
+struct sctp_header *sctp = (struct sctp_header *) data;
+ovs_be32 rcvd_csum, csum;
+bool ret;
+
+rcvd_csum = get_16aligned_be32(&sctp->sctp_csum);
+put_16aligned_be32(&sctp->sctp_csum, 0);
+csum = crc32c(data, size);
+put_16aligned_be32(&sctp->sctp_csum, rcvd_csum);
+
+ret = (rcvd_csum == csum);
+if (!ret) {
+COVERAGE_INC(conntrack_l4csum_err);
+}
+
+return ret;
+}
+
 static inline bool
 check_l4_tcp(const struct conn_key *key, const void *data, size_t size,
  const void *l3, bool validate_checksum)
@@ -1711,6 +1735,47 @@ check_l4_udp(const struct conn_key *key, const void 
*data, size_t size,
|| (validate_checksum ? checksum_valid(key, data, size, l3) : true);
 }
 
+static inline bool
+sctp_check_len(const struct sctp_header *sh, size_t size)
+{
+const struct sctp_chunk_header *sch;
+size_t next;
+
+if (size < SCTP_HEADER_LEN) {
+return false;
+}
+
+/* rfc4960: Chunks (including Type, Length, and Value fields) are padded
+ * out by the sender with all zero bytes to be a multiple of 4 bytes long.
+ */
+for (next = sizeof(struct sctp_header),
+ sch = SCTP_NEXT_CHUNK(sh, next);
+ next < size;
+ next += ROUND_UP(ntohs(sch->length), 4),
+ sch = SCTP_NEXT_CHUNK(sh, next)) {
+/* rfc4960: This value represents the size of the chunk in bytes,
+ * including the Chunk Type, Chunk Flags, Chunk Le

[ovs-dev] [PATCH] conntrack: Allow to dump userspace conntrack expectations.

2023-06-23 Thread Paolo Valerio
The patch introduces a new commands ovs-appctl dpctl/dump-conntrack-exp
that allows to dump the existing expectations for the userspace ct.

Signed-off-by: Paolo Valerio 
---
 NEWS |2 +
 lib/conntrack.c  |   66 +
 lib/conntrack.h  |   10 
 lib/ct-dpif.c|   87 ++
 lib/ct-dpif.h|   15 +++
 lib/dpctl.c  |   49 +
 lib/dpctl.man|6 +++
 lib/dpif-netdev.c|   50 ++
 lib/dpif-netlink.c   |3 +
 lib/dpif-provider.h  |   11 +
 tests/system-kmod-macros.at  |9 
 tests/system-traffic.at  |   44 +++
 tests/system-userspace-macros.at |6 +++
 13 files changed, 357 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index 66d5a4ea3..16cdb6933 100644
--- a/NEWS
+++ b/NEWS
@@ -24,6 +24,8 @@ Post-v3.1.0
  * New commands "dpctl/{ct-get-sweep-interval,ct-set-sweep-interval}" that
allow to get and set, for the userspace datapath, the sweep interval
for the conntrack garbage collector.
+ * New commands "dpctl/dump-conntrack-exp" that allows to dump
+   conntrack's expectations for the userspace datapath.
- ovs-ctl:
  * Added new options --[ovsdb-server|ovs-vswitchd]-umask=MODE to set umask
value when starting OVS daemons.  E.g., use --ovsdb-server-umask=0002
diff --git a/lib/conntrack.c b/lib/conntrack.c
index f5ebfa05b..4375c03e2 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -2670,6 +2670,72 @@ conntrack_dump_done(struct conntrack_dump *dump 
OVS_UNUSED)
 return 0;
 }
 
+static void
+exp_node_to_ct_dpif_exp(const struct alg_exp_node *exp,
+struct ct_dpif_exp *entry)
+{
+memset(entry, 0, sizeof *entry);
+
+conn_key_to_tuple(&exp->key, &entry->tuple_orig);
+conn_key_to_tuple(&exp->parent_key, &entry->tuple_parent);
+entry->zone = exp->key.zone;
+entry->mark = exp->parent_mark;
+memcpy(&entry->labels, &exp->parent_label, sizeof entry->labels);
+entry->protoinfo.proto = exp->key.nw_proto;
+}
+
+int
+conntrack_exp_dump_start(struct conntrack *ct, struct conntrack_dump *dump,
+ const uint16_t *pzone)
+{
+memset(dump, 0, sizeof(*dump));
+
+if (pzone) {
+dump->zone = *pzone;
+dump->filter_zone = true;
+}
+
+dump->ct = ct;
+
+return 0;
+}
+
+int
+conntrack_exp_dump_next(struct conntrack_dump *dump, struct ct_dpif_exp *entry)
+{
+struct conntrack *ct = dump->ct;
+struct alg_exp_node *enode;
+int ret = EOF;
+
+ovs_rwlock_rdlock(&ct->resources_lock);
+
+for (;;) {
+struct hmap_node *node = hmap_at_position(&ct->alg_expectations,
+  &dump->hmap_pos);
+if (!node) {
+break;
+}
+
+enode = CONTAINER_OF(node, struct alg_exp_node, node);
+
+if (!dump->filter_zone || enode->key.zone == dump->zone) {
+ret = 0;
+exp_node_to_ct_dpif_exp(enode, entry);
+break;
+}
+}
+
+ovs_rwlock_unlock(&ct->resources_lock);
+
+return ret;
+}
+
+int
+conntrack_exp_dump_done(struct conntrack_dump *dump OVS_UNUSED)
+{
+return 0;
+}
+
 int
 conntrack_flush(struct conntrack *ct, const uint16_t *zone)
 {
diff --git a/lib/conntrack.h b/lib/conntrack.h
index 524ec0acb..57d5159b6 100644
--- a/lib/conntrack.h
+++ b/lib/conntrack.h
@@ -100,7 +100,10 @@ void conntrack_clear(struct dp_packet *packet);
 struct conntrack_dump {
 struct conntrack *ct;
 unsigned bucket;
-struct cmap_position cm_pos;
+union {
+struct cmap_position cm_pos;
+struct hmap_position hmap_pos;
+};
 bool filter_zone;
 uint16_t zone;
 };
@@ -132,6 +135,11 @@ int conntrack_dump_start(struct conntrack *, struct 
conntrack_dump *,
 int conntrack_dump_next(struct conntrack_dump *, struct ct_dpif_entry *);
 int conntrack_dump_done(struct conntrack_dump *);
 
+int conntrack_exp_dump_start(struct conntrack *, struct conntrack_dump *,
+ const uint16_t *);
+int conntrack_exp_dump_next(struct conntrack_dump *, struct ct_dpif_exp *);
+int conntrack_exp_dump_done(struct conntrack_dump *);
+
 int conntrack_flush(struct conntrack *, const uint16_t *zone);
 int conntrack_flush_tuple(struct conntrack *, const struct ct_dpif_tuple *,
   uint16_t zone);
diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c
index 0c4b2964f..f59c6e560 100644
--- a/lib/ct-dpif.c
+++ b/lib/ct-dpif.c
@@ -101,6 +101,65 @@ ct_dpif_dump_done(struct ct_dpif_dump_state *dump)
 ? dpif->dpif_class->ct_dump_done(dpif, dump)
 

Re: [ovs-dev] [PATCH v3] conntrack: Extract l4 information for SCTP.

2023-06-16 Thread Paolo Valerio
Ilya Maximets  writes:

> On 6/16/23 14:56, Aaron Conole wrote:
>> Ilya Maximets  writes:
>> 
>>> On 6/15/23 19:49, Paolo Valerio wrote:
>>>> Ilya Maximets  writes:
>>>>
>>>>> On 6/14/23 21:08, Ilya Maximets wrote:
>>>>>> On 6/14/23 20:11, Paolo Valerio wrote:
>>>>>>> Ilya Maximets  writes:
>>>>>>>
>>>>>>>> On 6/12/23 16:57, Aaron Conole wrote:
>>>>>>>>> Paolo Valerio  writes:
>>>>>>>>>
>>>>>>>>>> since a27d70a89 ("conntrack: add generic IP protocol support") all
>>>>>>>>>> the unrecognized IP protocols get handled using ct_proto_other ops
>>>>>>>>>> and are managed as L3 using 3 tuples.
>>>>>>>>>>
>>>>>>>>>> This patch stores L4 information for SCTP in the conn_key so that
>>>>>>>>>> multiple conn instances, instead of one with ports zeroed, will be
>>>>>>>>>> created when there are multiple SCTP connections between two hosts.
>>>>>>>>>> It also performs crc32c check when not offloaded, and adds SCTP to
>>>>>>>>>> pat_enabled.
>>>>>>>>>>
>>>>>>>>>> With this patch, given two SCTP association between two hosts,
>>>>>>>>>> tracking the connection will result in:
>>>>>>>>>>
>>>>>>>>>> sctp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=55884,dport=5201),reply=(src=10.1.1.1,dst=10.1.1.2,sport=5201,dport=12345),zone=1
>>>>>>>>>> sctp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=59874,dport=5202),reply=(src=10.1.1.1,dst=10.1.1.2,sport=5202,dport=12346),zone=1
>>>>>>>>>>
>>>>>>>>>> instead of:
>>>>>>>>>>
>>>>>>>>>> sctp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=0,dport=0),reply=(src=10.1.1.1,dst=10.1.1.2,sport=0,dport=0),zone=1
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Paolo Valerio 
>>>>>>>>>> ---
>>>>>>>>>
>>>>>>>>> Thanks for this work - I think it looks good.
>>>>>>>>>
>>>>>>>>> Perhaps it should have a NEWS item mentioned that the userspace
>>>>>>>>> conntrack now supports matching SCTP l4 data.
>>>>>>>>>
>>>>>>>>> If you do spin a v4 with that change, you can keep my:
>>>>>>>>>
>>>>>>>>> Acked-by: Aaron Conole 
>>>>>>>>
>>>>>>>> Hi, Paolo and Aaron.
>>>>>>>>
>>>>>>>> I'm getting a consistent test failure while running check-kernel
>>>>>>>> on Ubuntu 22.10 with 5.19 kernel:
>>>>>>>>
>>>>>>>>
>>>>>>>> ./system-traffic.at:4754: cat ofctl_monitor.log
>>>>>>>> --- -   2023-06-14 11:26:41.958591125 +
>>>>>>>> +++
>>>>>>>> /root/ovs/tests/system-kmod-testsuite.dir/at-groups/105/stdout
>>>>>>>> 2023-06-14 11:26:41.95200 +
>>>>>>>> @@ -12,8 +12,6 @@
>>>>>>>>  
>>>>>>>> sctp,vlan_tci=0x,dl_src=e6:66:c1:22:22:22,dl_dst=e6:66:c1:11:11:11,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=2,nw_ttl=64,nw_frag=no,tp_src=12345,tp_dst=54969
>>>>>>>> sctp_csum:9b67e853
>>>>>>>>  NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=54 in_port=1
>>>>>>>> (via action) data_len=54 (unbuffered)
>>>>>>>>  
>>>>>>>> sctp,vlan_tci=0x,dl_src=e6:66:c1:11:11:11,dl_dst=e6:66:c1:22:22:22,nw_src=10.1.1.240,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=2,nw_ttl=64,nw_frag=no,tp_src=34567,tp_dst=12345
>>>>>>>> sctp_csum:bc0e5463
>>>>>>>> -NXT_PACKET_IN2 (xid=0x0): table_id=1 cookie=0x0 total_len=50
>>>>>>>> ct_state=est|rpl|trk|dnat,ct_zone=1,ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=132,ct_tp_src=54969,ct_tp_dst=12345,ip,in_port=2
>>>>>>>> (via action) data_len=50 (unbuffered)
>>>>>>>> -sctp,vlan_tci=0x,dl_src=e6:66:c1:22:22:22,dl_dst=e6:66:c1:11:11:11,nw_src=10.1.1

Re: [ovs-dev] [PATCH v3] conntrack: Extract l4 information for SCTP.

2023-06-15 Thread Paolo Valerio
Ilya Maximets  writes:

> On 6/14/23 21:08, Ilya Maximets wrote:
>> On 6/14/23 20:11, Paolo Valerio wrote:
>>> Ilya Maximets  writes:
>>>
>>>> On 6/12/23 16:57, Aaron Conole wrote:
>>>>> Paolo Valerio  writes:
>>>>>
>>>>>> since a27d70a89 ("conntrack: add generic IP protocol support") all
>>>>>> the unrecognized IP protocols get handled using ct_proto_other ops
>>>>>> and are managed as L3 using 3 tuples.
>>>>>>
>>>>>> This patch stores L4 information for SCTP in the conn_key so that
>>>>>> multiple conn instances, instead of one with ports zeroed, will be
>>>>>> created when there are multiple SCTP connections between two hosts.
>>>>>> It also performs crc32c check when not offloaded, and adds SCTP to
>>>>>> pat_enabled.
>>>>>>
>>>>>> With this patch, given two SCTP association between two hosts,
>>>>>> tracking the connection will result in:
>>>>>>
>>>>>> sctp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=55884,dport=5201),reply=(src=10.1.1.1,dst=10.1.1.2,sport=5201,dport=12345),zone=1
>>>>>> sctp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=59874,dport=5202),reply=(src=10.1.1.1,dst=10.1.1.2,sport=5202,dport=12346),zone=1
>>>>>>
>>>>>> instead of:
>>>>>>
>>>>>> sctp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=0,dport=0),reply=(src=10.1.1.1,dst=10.1.1.2,sport=0,dport=0),zone=1
>>>>>>
>>>>>> Signed-off-by: Paolo Valerio 
>>>>>> ---
>>>>>
>>>>> Thanks for this work - I think it looks good.
>>>>>
>>>>> Perhaps it should have a NEWS item mentioned that the userspace
>>>>> conntrack now supports matching SCTP l4 data.
>>>>>
>>>>> If you do spin a v4 with that change, you can keep my:
>>>>>
>>>>> Acked-by: Aaron Conole 
>>>>
>>>> Hi, Paolo and Aaron.
>>>>
>>>> I'm getting a consistent test failure while running check-kernel
>>>> on Ubuntu 22.10 with 5.19 kernel:
>>>>
>>>>
>>>> ./system-traffic.at:4754: cat ofctl_monitor.log
>>>> --- -   2023-06-14 11:26:41.958591125 +
>>>> +++ /root/ovs/tests/system-kmod-testsuite.dir/at-groups/105/stdout  
>>>> 2023-06-14 11:26:41.95200 +
>>>> @@ -12,8 +12,6 @@
>>>>  
>>>> sctp,vlan_tci=0x,dl_src=e6:66:c1:22:22:22,dl_dst=e6:66:c1:11:11:11,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=2,nw_ttl=64,nw_frag=no,tp_src=12345,tp_dst=54969
>>>>  sctp_csum:9b67e853
>>>>  NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=54 in_port=1 (via action) 
>>>> data_len=54 (unbuffered)
>>>>  
>>>> sctp,vlan_tci=0x,dl_src=e6:66:c1:11:11:11,dl_dst=e6:66:c1:22:22:22,nw_src=10.1.1.240,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=2,nw_ttl=64,nw_frag=no,tp_src=34567,tp_dst=12345
>>>>  sctp_csum:bc0e5463
>>>> -NXT_PACKET_IN2 (xid=0x0): table_id=1 cookie=0x0 total_len=50 
>>>> ct_state=est|rpl|trk|dnat,ct_zone=1,ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=132,ct_tp_src=54969,ct_tp_dst=12345,ip,in_port=2
>>>>  (via action) data_len=50 (unbuffered)
>>>> -sctp,vlan_tci=0x,dl_src=e6:66:c1:22:22:22,dl_dst=e6:66:c1:11:11:11,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=2,nw_ttl=64,nw_frag=no,tp_src=12345,tp_dst=54969
>>>>  sctp_csum:d6ce6b9e
>>>>  NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=50 in_port=1 (via action) 
>>>> data_len=50 (unbuffered)
>>>> -sctp,vlan_tci=0x,dl_src=e6:66:c1:11:11:11,dl_dst=e6:66:c1:22:22:22,nw_src=10.1.1.240,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=2,nw_ttl=64,nw_frag=no,tp_src=34567,tp_dst=12345
>>>>  sctp_csum:add7db93
>>>> +sctp,vlan_tci=0x,dl_src=e6:66:c1:11:11:11,dl_dst=e6:66:c1:22:22:22,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=2,nw_ttl=64,nw_frag=no,tp_src=54969,tp_dst=12345
>>>>  sctp_csum:5db68ce
>>>>
>>>>
>>>> Do you know what can be a problem here?
>>>>
>>>> Test is passing on Fedora 38 with 6.3 kernel and on rhel 9.2.
>>>>
>>>
>>> Hi Ilya,
>>>
>>> Uhm, it seems there's a problem with the shutdown sequence.
>>> I just ran the on a VM:
>>>
>>> vagrant@ubuntu2210:~/ovs$ grep CONFIG_NF_CT_PROTO_SCTP 
>>> /boot/config-5.19.0-38-generic 
>>>

Re: [ovs-dev] [PATCH v3] conntrack: Extract l4 information for SCTP.

2023-06-14 Thread Paolo Valerio
Ilya Maximets  writes:

> On 6/12/23 16:57, Aaron Conole wrote:
>> Paolo Valerio  writes:
>> 
>>> since a27d70a89 ("conntrack: add generic IP protocol support") all
>>> the unrecognized IP protocols get handled using ct_proto_other ops
>>> and are managed as L3 using 3 tuples.
>>>
>>> This patch stores L4 information for SCTP in the conn_key so that
>>> multiple conn instances, instead of one with ports zeroed, will be
>>> created when there are multiple SCTP connections between two hosts.
>>> It also performs crc32c check when not offloaded, and adds SCTP to
>>> pat_enabled.
>>>
>>> With this patch, given two SCTP association between two hosts,
>>> tracking the connection will result in:
>>>
>>> sctp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=55884,dport=5201),reply=(src=10.1.1.1,dst=10.1.1.2,sport=5201,dport=12345),zone=1
>>> sctp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=59874,dport=5202),reply=(src=10.1.1.1,dst=10.1.1.2,sport=5202,dport=12346),zone=1
>>>
>>> instead of:
>>>
>>> sctp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=0,dport=0),reply=(src=10.1.1.1,dst=10.1.1.2,sport=0,dport=0),zone=1
>>>
>>> Signed-off-by: Paolo Valerio 
>>> ---
>> 
>> Thanks for this work - I think it looks good.
>> 
>> Perhaps it should have a NEWS item mentioned that the userspace
>> conntrack now supports matching SCTP l4 data.
>> 
>> If you do spin a v4 with that change, you can keep my:
>> 
>> Acked-by: Aaron Conole 
>
> Hi, Paolo and Aaron.
>
> I'm getting a consistent test failure while running check-kernel
> on Ubuntu 22.10 with 5.19 kernel:
>
>
> ./system-traffic.at:4754: cat ofctl_monitor.log
> --- -   2023-06-14 11:26:41.958591125 +
> +++ /root/ovs/tests/system-kmod-testsuite.dir/at-groups/105/stdout  
> 2023-06-14 11:26:41.95200 +
> @@ -12,8 +12,6 @@
>  
> sctp,vlan_tci=0x,dl_src=e6:66:c1:22:22:22,dl_dst=e6:66:c1:11:11:11,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=2,nw_ttl=64,nw_frag=no,tp_src=12345,tp_dst=54969
>  sctp_csum:9b67e853
>  NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=54 in_port=1 (via action) 
> data_len=54 (unbuffered)
>  
> sctp,vlan_tci=0x,dl_src=e6:66:c1:11:11:11,dl_dst=e6:66:c1:22:22:22,nw_src=10.1.1.240,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=2,nw_ttl=64,nw_frag=no,tp_src=34567,tp_dst=12345
>  sctp_csum:bc0e5463
> -NXT_PACKET_IN2 (xid=0x0): table_id=1 cookie=0x0 total_len=50 
> ct_state=est|rpl|trk|dnat,ct_zone=1,ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=132,ct_tp_src=54969,ct_tp_dst=12345,ip,in_port=2
>  (via action) data_len=50 (unbuffered)
> -sctp,vlan_tci=0x,dl_src=e6:66:c1:22:22:22,dl_dst=e6:66:c1:11:11:11,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=2,nw_ttl=64,nw_frag=no,tp_src=12345,tp_dst=54969
>  sctp_csum:d6ce6b9e
>  NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=50 in_port=1 (via action) 
> data_len=50 (unbuffered)
> -sctp,vlan_tci=0x,dl_src=e6:66:c1:11:11:11,dl_dst=e6:66:c1:22:22:22,nw_src=10.1.1.240,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=2,nw_ttl=64,nw_frag=no,tp_src=34567,tp_dst=12345
>  sctp_csum:add7db93
> +sctp,vlan_tci=0x,dl_src=e6:66:c1:11:11:11,dl_dst=e6:66:c1:22:22:22,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=2,nw_ttl=64,nw_frag=no,tp_src=54969,tp_dst=12345
>  sctp_csum:5db68ce
>
>
> Do you know what can be a problem here?
>
> Test is passing on Fedora 38 with 6.3 kernel and on rhel 9.2.
>

Hi Ilya,

Uhm, it seems there's a problem with the shutdown sequence.
I just ran the on a VM:

vagrant@ubuntu2210:~/ovs$ grep CONFIG_NF_CT_PROTO_SCTP 
/boot/config-5.19.0-38-generic 
CONFIG_NF_CT_PROTO_SCTP=y

vagrant@ubuntu2210:~/ovs$ grep VERSION /etc/os-release 
VERSION_ID="22.10"
VERSION="22.10 (Kinetic Kudu)"
VERSION_CODENAME=kinetic

vagrant@ubuntu2210:~/ovs$ uname -r
5.19.0-38-generic

but I can't see the failure.
Any chance to see if they are marked for some reason as invalid?

> Best regards, Ilya Maximets.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/2] conntrack: Release nat_conn in case both keys have the same hash.

2023-06-08 Thread Paolo Valerio
Brian Haley  writes:

> Hi Paolo,
>
> On 4/19/23 2:40 PM, Paolo Valerio wrote:
>> During the creation of a new connection, there's a chance both key and
>> rev_key end up having the same hash. This is more common in the case
>> of all-zero snat with no collisions. In that case, once the
>> connection is expired, but not cleaned up, if a new packet with the
>> same 5-tuple is received, an assertion failure gets triggered in
>> conn_update_state() because of a previous failure of retrieving a
>> CT_CONN_TYPE_DEFAULT connection.
>> 
>> Fix it by releasing the nat_conn during the connection creation in the
>> case of same hash for both key and rev_key.
>
> Sorry for reviving a two month-old thread, but we recently started 
> seeing this issue which seemed to also be related to [0], but I can't 
> find it in patchworks or the tree. Was there a plan to update it?
>

Hi Brian,

It transitioned to "Changes Requested" [0].

At the moment the idea is to upstream a patch initially proposed by
Peng. I'm pretty busy at the moment, and I can't look at it right away,
but yes, the plan is to update it.

[0] https://patchwork.ozlabs.org/project/openvswitch/list/?series=351579&state=*

> Thanks,
>
> -Brian
>
> [0] https://www.mail-archive.com/ovs-discuss@openvswitch.org/msg08945.html
>
>> 
>> Reported-by: Michael Plato 
>> Fixes: 61e48c2d1db2 ("conntrack: Handle SNAT with all-zero IP address.")
>> Signed-off-by: Paolo Valerio 
>> ---
>> In this thread [0] there are some more details. A similar
>> approach here could be to avoid to add the nat_conn to the cmap and
>> letting the sweeper release the memory for nat_conn once the whole
>> connection gets freed.
>> That approach could still be ok, but the drawback is that it could
>> require a different patch for older branches that don't include
>> 3d9c1b855a5f ("conntrack: Replace timeout based expiration lists with
>> rculists."). It still worth to be considered.
>> 
>> [0] https://mail.openvswitch.org/pipermail/ovs-discuss/2023-April/052339.html
>> ---
>>   lib/conntrack.c |   21 +
>>   1 file changed, 13 insertions(+), 8 deletions(-)
>> 
>> diff --git a/lib/conntrack.c b/lib/conntrack.c
>> index 7e1fc4b1f..d2ee127d9 100644
>> --- a/lib/conntrack.c
>> +++ b/lib/conntrack.c
>> @@ -1007,14 +1007,19 @@ conn_not_found(struct conntrack *ct, struct 
>> dp_packet *pkt,
>>   }
>>   
>>   nat_packet(pkt, nc, false, ctx->icmp_related);
>> -memcpy(&nat_conn->key, &nc->rev_key, sizeof nat_conn->key);
>> -memcpy(&nat_conn->rev_key, &nc->key, sizeof nat_conn->rev_key);
>> -nat_conn->conn_type = CT_CONN_TYPE_UN_NAT;
>> -nat_conn->nat_action = 0;
>> -nat_conn->alg = NULL;
>> -nat_conn->nat_conn = NULL;
>> -uint32_t nat_hash = conn_key_hash(&nat_conn->key, 
>> ct->hash_basis);
>> -cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash);
>> +uint32_t nat_hash = conn_key_hash(&nc->rev_key, ct->hash_basis);
>> +if (nat_hash != ctx->hash) {
>> +memcpy(&nat_conn->key, &nc->rev_key, sizeof nat_conn->key);
>> +memcpy(&nat_conn->rev_key, &nc->key, sizeof 
>> nat_conn->rev_key);
>> +nat_conn->conn_type = CT_CONN_TYPE_UN_NAT;
>> +nat_conn->nat_action = 0;
>> +nat_conn->alg = NULL;
>> +nat_conn->nat_conn = NULL;
>> +cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash);
>> +} else {
>> +free(nat_conn);
>> +nat_conn = NULL;
>> +}
>>   }
>>   
>>   nc->nat_conn = nat_conn;
>> 
>> ___
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3] conntrack: Extract l4 information for SCTP.

2023-06-01 Thread Paolo Valerio
since a27d70a89 ("conntrack: add generic IP protocol support") all
the unrecognized IP protocols get handled using ct_proto_other ops
and are managed as L3 using 3 tuples.

This patch stores L4 information for SCTP in the conn_key so that
multiple conn instances, instead of one with ports zeroed, will be
created when there are multiple SCTP connections between two hosts.
It also performs crc32c check when not offloaded, and adds SCTP to
pat_enabled.

With this patch, given two SCTP association between two hosts,
tracking the connection will result in:

sctp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=55884,dport=5201),reply=(src=10.1.1.1,dst=10.1.1.2,sport=5201,dport=12345),zone=1
sctp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=59874,dport=5202),reply=(src=10.1.1.1,dst=10.1.1.2,sport=5202,dport=12346),zone=1

instead of:

sctp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=0,dport=0),reply=(src=10.1.1.1,dst=10.1.1.2,sport=0,dport=0),zone=1

Signed-off-by: Paolo Valerio 
---
v3:
- rebased on top of current master
- minor adjustments: commit message, comments

v2:
- ordered includes
- while at it, slightly modified the commit subject (capital letter
  and period)
---
 lib/conntrack.c  |   86 ++
 lib/packets.h|   11 +
 tests/system-kmod-macros.at  |   11 +
 tests/system-traffic.at  |   80 +++
 tests/system-userspace-macros.at |7 +++
 5 files changed, 194 insertions(+), 1 deletion(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index ce8a63de5..6f2e6ef74 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -27,6 +27,7 @@
 #include "conntrack-private.h"
 #include "conntrack-tp.h"
 #include "coverage.h"
+#include "crc32c.h"
 #include "csum.h"
 #include "ct-dpif.h"
 #include "dp-packet.h"
@@ -41,6 +42,7 @@
 #include "random.h"
 #include "rculist.h"
 #include "timeval.h"
+#include "unaligned.h"
 
 VLOG_DEFINE_THIS_MODULE(conntrack);
 
@@ -771,6 +773,8 @@ pat_packet(struct dp_packet *pkt, const struct conn_key 
*key)
 packet_set_tcp_port(pkt, key->dst.port, key->src.port);
 } else if (key->nw_proto == IPPROTO_UDP) {
 packet_set_udp_port(pkt, key->dst.port, key->src.port);
+} else if (key->nw_proto == IPPROTO_SCTP) {
+packet_set_sctp_port(pkt, key->dst.port, key->src.port);
 }
 }
 
@@ -1675,6 +1679,26 @@ checksum_valid(const struct conn_key *key, const void 
*data, size_t size,
 return valid;
 }
 
+static inline bool
+sctp_checksum_valid(const void *data, size_t size)
+{
+struct sctp_header *sctp = (struct sctp_header *) data;
+ovs_be32 rcvd_csum, csum;
+bool ret;
+
+rcvd_csum = get_16aligned_be32(&sctp->sctp_csum);
+put_16aligned_be32(&sctp->sctp_csum, 0);
+csum = crc32c(data, size);
+put_16aligned_be32(&sctp->sctp_csum, rcvd_csum);
+
+ret = (rcvd_csum == csum);
+if (!ret) {
+COVERAGE_INC(conntrack_l4csum_err);
+}
+
+return ret;
+}
+
 static inline bool
 check_l4_tcp(const struct conn_key *key, const void *data, size_t size,
  const void *l3, bool validate_checksum)
@@ -1711,6 +1735,47 @@ check_l4_udp(const struct conn_key *key, const void 
*data, size_t size,
|| (validate_checksum ? checksum_valid(key, data, size, l3) : true);
 }
 
+static inline bool
+sctp_check_len(const struct sctp_header *sh, size_t size)
+{
+const struct sctp_chunk_header *sch;
+size_t next;
+
+if (size < SCTP_HEADER_LEN) {
+return false;
+}
+
+/* rfc4960: Chunks (including Type, Length, and Value fields) are padded
+ * out by the sender with all zero bytes to be a multiple of 4 bytes long.
+ */
+for (next = sizeof(struct sctp_header),
+ sch = SCTP_NEXT_CHUNK(sh, next);
+ next < size;
+ next += ROUND_UP(ntohs(sch->length), 4),
+ sch = SCTP_NEXT_CHUNK(sh, next)) {
+/* rfc4960: This value represents the size of the chunk in bytes,
+ * including the Chunk Type, Chunk Flags, Chunk Length, and Chunk Value
+ * fields.
+ * Therefore, if the Chunk Value field is zero-length, the Length
+ * field will be set to 4. */
+if (ntohs(sch->length) < sizeof(*sch)) {
+return false;
+}
+}
+
+return (next == size);
+}
+
+static inline bool
+check_l4_sctp(const void *data, size_t size, bool validate_checksum)
+{
+if (OVS_UNLIKELY(!sctp_check_len(data, size))) {
+return false;
+}
+
+return validate_checksum ? sctp_checksum_valid(data, size) : true;
+}
+
 static inline bool
 check_l4_icmp(const void *data, size_t size, bool validate_checksum)
 {
@@ -1761,6 +1826,21 @@ extract_l4_udp(struct conn_key *key, const void *data, 
size_t size,
 return key->src.port && key->dst.port;
 }
 

Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: Fix recirculation with patch port and controller.

2023-05-22 Thread Paolo Valerio
Ilya Maximets  writes:

> On 5/15/23 17:22, Paolo Valerio wrote:
>> If a packet originating from the controller recirculates after going
>> through a patch port, it gets dropped with the following message:
>> 
>> ofproto_dpif_upcall(handler8)|INFO|received packet on unassociated
>>   datapath port 4294967295
>> 
>> This happens because there's no xport_uuid in the recirculation node
>> and at the same type in_port refers to the patch port.
>> 
>> The patch, in the case of zeroed uuid, retrieves the xport starting
>> from the ofproto_uuid stored in the recirc node.
>> 
>> Signed-off-by: Paolo Valerio 
>> ---
>>  ofproto/ofproto-dpif-xlate.c |   11 +--
>>  tests/ofproto-dpif.at|   34 ++
>>  2 files changed, 43 insertions(+), 2 deletions(-)
>> 
>> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
>> index c01177718..3509cc73c 100644
>> --- a/ofproto/ofproto-dpif-xlate.c
>> +++ b/ofproto/ofproto-dpif-xlate.c
>> @@ -1533,8 +1533,15 @@ xlate_lookup_ofproto_(const struct dpif_backer 
>> *backer,
>>  
>>  ofp_port_t in_port = recirc_id_node->state.metadata.in_port;
>>  if (in_port != OFPP_NONE && in_port != OFPP_CONTROLLER) {
>> -struct uuid xport_uuid = recirc_id_node->state.xport_uuid;
>> -xport = xport_lookup_by_uuid(xcfg, &xport_uuid);
>> +if (uuid_is_zero(&recirc_id_node->state.xport_uuid)) {
>> +const struct xbridge *bridge =
>> +xbridge_lookup_by_uuid(xcfg, 
>> &recirc_id_node->state.ofproto_uuid);
>> +xport = bridge ? get_ofp_port(bridge, in_port) : NULL;
>
> IIUC, xport_uuid is designed to not be uuid of the patch port.
> But the in_port here is a patch port, right?  So, we will find
> a different xport, right?
>
> Shouldn't we just fall into the else condition that handles
> NONE and CONTROLLER and not look for xport?
>

I guess it's ok to fall in the else in this case.
The only problem is that we'd return the ofproto even if the in_port is
invalid.
This would make in turn fail "conntrack - fragment reassembly with L3 L4
protocol information". This test was fixed in the past after it already
broke once 323ae1e808e6 ("ofproto-dpif-xlate: Fix recirculation when
in_port is OFPP_CONTROLLER.") fixed the use case involving packet-out
and recirculation.

One possibility is to just retrieve the xport for that case in order to
verify the in_port belongs to the bridge, without returning it (so
honoring the xport_uuid logic). Maybe this could be done in the else
branch so to make clear we're handling the special case related to
OFPP_{NONE,CONTROLLER}.

WDYT?

> Best regards, Ilya Maximets.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] ofproto-dpif-xlate: Fix recirculation with patch port and controller.

2023-05-15 Thread Paolo Valerio
If a packet originating from the controller recirculates after going
through a patch port, it gets dropped with the following message:

ofproto_dpif_upcall(handler8)|INFO|received packet on unassociated
  datapath port 4294967295

This happens because there's no xport_uuid in the recirculation node
and at the same type in_port refers to the patch port.

The patch, in the case of zeroed uuid, retrieves the xport starting
from the ofproto_uuid stored in the recirc node.

Signed-off-by: Paolo Valerio 
---
 ofproto/ofproto-dpif-xlate.c |   11 +--
 tests/ofproto-dpif.at|   34 ++
 2 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index c01177718..3509cc73c 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -1533,8 +1533,15 @@ xlate_lookup_ofproto_(const struct dpif_backer *backer,
 
 ofp_port_t in_port = recirc_id_node->state.metadata.in_port;
 if (in_port != OFPP_NONE && in_port != OFPP_CONTROLLER) {
-struct uuid xport_uuid = recirc_id_node->state.xport_uuid;
-xport = xport_lookup_by_uuid(xcfg, &xport_uuid);
+if (uuid_is_zero(&recirc_id_node->state.xport_uuid)) {
+const struct xbridge *bridge =
+xbridge_lookup_by_uuid(xcfg, 
&recirc_id_node->state.ofproto_uuid);
+xport = bridge ? get_ofp_port(bridge, in_port) : NULL;
+} else {
+struct uuid xport_uuid = recirc_id_node->state.xport_uuid;
+xport = xport_lookup_by_uuid(xcfg, &xport_uuid);
+}
+
 if (xport && xport->xbridge && xport->xbridge->ofproto) {
 goto out;
 }
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index 6824ce0bb..8b9447c74 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -5854,6 +5854,40 @@ OVS_WAIT_UNTIL([check_flows], [ovs-ofctl dump-flows br0])
 OVS_VSWITCHD_STOP
 AT_CLEANUP
 
+# Checks for regression against a bug in which OVS dropped packets
+# originating from the the controller passing through a patch port
+AT_SETUP([ofproto-dpif - packet-out recirculation OFPP_CONTROLLER and patch 
port])
+OVS_VSWITCHD_START(
+[add-port br0 patch-br1 -- \
+ set interface patch-br1 type=patch options:peer=patch-br0 -- \
+ add-br br1 -- set bridge br1 datapath-type=dummy fail-mode=secure -- \
+ add-port br1 patch-br0 -- set interface patch-br0 type=patch 
options:peer=patch-br1
+])
+
+add_of_ports --pcap br1 1
+
+AT_DATA([flows-br0.txt], [dnl
+table=0 icmp actions=output:patch-br1
+])
+AT_CHECK([ovs-ofctl add-flows br0 flows-br0.txt])
+
+AT_DATA([flows-br1.txt], [dnl
+table=0, icmp actions=ct(table=1,zone=1)
+table=1, ct_state=+trk, icmp actions=p1
+])
+AT_CHECK([ovs-ofctl add-flows br1 flows-br1.txt])
+
+packet=50540007505400050800455c8001b94dc0a80001c0a80002080013fc000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f
+AT_CHECK([ovs-ofctl packet-out br0 "in_port=controller packet=$packet 
actions=table"])
+
+OVS_WAIT_UNTIL_EQUAL([ovs-ofctl dump-flows -m br1 | grep "ct_state" | 
ofctl_strip], [dnl
+ table=1, n_packets=1, n_bytes=106, ct_state=+trk,icmp actions=output:2])
+
+OVS_WAIT_UNTIL([ovs-pcap p1-tx.pcap | grep -q "$packet"])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([ofproto-dpif - debug_slow action])
 OVS_VSWITCHD_START
 add_of_ports br0 1 2 3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] ofproto-dpif-xlate: Fix recirculation with patch port and controller.

2023-05-15 Thread Paolo Valerio
If a packet originating from the controller recirculates after going
through a patch port, it gets dropped with the following message:

ofproto_dpif_upcall(handler8)|INFO|received packet on unassociated
  datapath port 4294967295

This happens because there's no xport_uuid in the recirculation node
and at the same type in_port refers to the patch port.

The patch, in the case of zeroed uuid, retrieves the xport starting
from the ofproto_uuid stored in the recirc node.

Signed-off-by: Paolo Valerio 
---
 ofproto/ofproto-dpif-xlate.c |   11 +--
 tests/ofproto-dpif.at|   34 ++
 2 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index c01177718..3509cc73c 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -1533,8 +1533,15 @@ xlate_lookup_ofproto_(const struct dpif_backer *backer,
 
 ofp_port_t in_port = recirc_id_node->state.metadata.in_port;
 if (in_port != OFPP_NONE && in_port != OFPP_CONTROLLER) {
-struct uuid xport_uuid = recirc_id_node->state.xport_uuid;
-xport = xport_lookup_by_uuid(xcfg, &xport_uuid);
+if (uuid_is_zero(&recirc_id_node->state.xport_uuid)) {
+const struct xbridge *bridge =
+xbridge_lookup_by_uuid(xcfg, 
&recirc_id_node->state.ofproto_uuid);
+xport = bridge ? get_ofp_port(bridge, in_port) : NULL;
+} else {
+struct uuid xport_uuid = recirc_id_node->state.xport_uuid;
+xport = xport_lookup_by_uuid(xcfg, &xport_uuid);
+}
+
 if (xport && xport->xbridge && xport->xbridge->ofproto) {
 goto out;
 }
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index 6824ce0bb..8b9447c74 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -5854,6 +5854,40 @@ OVS_WAIT_UNTIL([check_flows], [ovs-ofctl dump-flows br0])
 OVS_VSWITCHD_STOP
 AT_CLEANUP
 
+# Checks for regression against a bug in which OVS dropped packets
+# originating from the the controller passing through a patch port
+AT_SETUP([ofproto-dpif - packet-out recirculation OFPP_CONTROLLER and patch 
port])
+OVS_VSWITCHD_START(
+[add-port br0 patch-br1 -- \
+ set interface patch-br1 type=patch options:peer=patch-br0 -- \
+ add-br br1 -- set bridge br1 datapath-type=dummy fail-mode=secure -- \
+ add-port br1 patch-br0 -- set interface patch-br0 type=patch 
options:peer=patch-br1
+])
+
+add_of_ports --pcap br1 1
+
+AT_DATA([flows-br0.txt], [dnl
+table=0 icmp actions=output:patch-br1
+])
+AT_CHECK([ovs-ofctl add-flows br0 flows-br0.txt])
+
+AT_DATA([flows-br1.txt], [dnl
+table=0, icmp actions=ct(table=1,zone=1)
+table=1, ct_state=+trk, icmp actions=p1
+])
+AT_CHECK([ovs-ofctl add-flows br1 flows-br1.txt])
+
+packet=50540007505400050800455c8001b94dc0a80001c0a80002080013fc000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f
+AT_CHECK([ovs-ofctl packet-out br0 "in_port=controller packet=$packet 
actions=table"])
+
+OVS_WAIT_UNTIL_EQUAL([ovs-ofctl dump-flows -m br1 | grep "ct_state" | 
ofctl_strip], [dnl
+ table=1, n_packets=1, n_bytes=106, ct_state=+trk,icmp actions=output:2])
+
+OVS_WAIT_UNTIL([ovs-pcap p1-tx.pcap | grep -q "$packet"])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([ofproto-dpif - debug_slow action])
 OVS_VSWITCHD_START
 add_of_ports br0 1 2 3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/2] conntrack: Release nat_conn in case both keys have the same hash.

2023-05-15 Thread Paolo Valerio
Ilya Maximets  writes:

> On 5/4/23 19:21, Paolo Valerio wrote:
>> Ilya Maximets  writes:
>> 
>>> On 4/19/23 20:40, Paolo Valerio wrote:
>>>> During the creation of a new connection, there's a chance both key and
>>>> rev_key end up having the same hash. This is more common in the case
>>>> of all-zero snat with no collisions. In that case, once the
>>>> connection is expired, but not cleaned up, if a new packet with the
>>>> same 5-tuple is received, an assertion failure gets triggered in
>>>> conn_update_state() because of a previous failure of retrieving a
>>>> CT_CONN_TYPE_DEFAULT connection.
>>>>
>>>> Fix it by releasing the nat_conn during the connection creation in the
>>>> case of same hash for both key and rev_key.
>>>
>>> This sounds a bit odd.  Shouldn't we treat hash collision as a normal case?
>>>
>>> Looking at the code, I'm assuming that the issue comes from the following
>>> part in process_one():
>>>
>>> if (OVS_LIKELY(conn)) {
>>> if (conn->conn_type == CT_CONN_TYPE_UN_NAT) {
>>> ...
>>> conn_key_lookup(ct, &ctx->key, hash, now, &conn, &ctx->reply);
>>>
>>> And here we get the same connection again, because the default one is 
>>> already
>>> expired.  Is that correct?
>>>
>>> If so, maybe we should add an extra condition to conn_key_lookup() to
>>> only look for DEFAULT connections instead, just for this case?  Since
>>> we really don't want to get the UN_NAT one here.
>>>
>> 
>> Hello Ilya,
>> 
>> It's a fair point.
>> I initially thought about the approach you're suggesting, but I had some
>> concerns about it that I'll try to summarize below.
>> 
>> For sure it would fix the issue (it could require the first patch to be
>> applied as well for the branches with rcu exp lists).
>> 
>> Based on the current logic, new packets matching that expired connection
>> but not evicted will be marked as +inv and further packets will be
>> marked so for the whole sweep interval unless an exception like this get
>> added:
>> 
>> uint32_t hash = conn_key_hash(&conn->rev_key, ct->hash_basis);
>> /* the last flag indicates CT_CONN_TYPE_DEFAULT only */
>> conn_key_lookup_(ct, &ctx->key, hash, now, &conn, &ctx->reply, true);
>> /* special case where there's hash collision */
>> if (!conn && ctx->hash != hash) {
>> pkt->md.ct_state |= CS_INVALID;
>> write_ct_md(pkt, zone, NULL, NULL, NULL);
>> ...
>> return;
>> }
>> 
>> This would further require that subsequent lookup in the create_new_conn
>> path are restricted to CT_CONN_TYPE_DEFAULT, e.g.:
>> 
>> uint32_t hash = conn_key_hash(&ctx->key, ct->hash_basis);
>> /* Only check for CT_CONN_TYPE_DEFAULT */
>> if (!conn_key_lookup_(ct, &ctx->key, hash, now, NULL, NULL, true)) {
>> conn = conn_not_found(ct, pkt, ctx, commit, now, nat_action_info,
>>   helper, alg_exp, ct_alg_ctl, tp_id);
>> }
>> 
>> otherwise we could incur in a false positive which prevent to create a
>> new connection.
>
> I'm not really sure if what described above is more correct way of doing
> things or not...  Aaron, do you have opinion on this?
>
> Another thought: Can we expire the CT_CONN_TYPE_UN_NAT connection the
> moment DEFAULT counterpart of it expires?  Or that will that be against
> some logic / not possible to do?
>

As far as I can tell, this could not be straightforward as simply
marking it as expired should not be reliable (e.g. doing it from the
sweeper), and I guess that managing the expiration time field for the
nat_conn as well would require updating the nat_conn every time the
default one gets updated, probably making it a bit unpractical.

Another approach would be removing the nat_conn [1] altogether.
The problem in this case is backporting. Some adjustments that would add
to the patch might be needed for older branches.

[1] 
https://patchwork.ozlabs.org/project/openvswitch/patch/20201129033255.64647-2-hepeng.0...@bytedance.com/

>
>> 
>>> Best regards, Ilya Maximets.
>>>
>>>>
>>>> Reported-by: Michael Plato 
>>>> Fixes: 61e48c2d1db2 ("conntrack: Handle SNAT with all-zero IP address.")
>>>> Signed-off-by: Paolo Valerio 
>>>> ---
>>>> In this thread [0] there are some more details. A simil

Re: [ovs-dev] [PATCH 2/2] conntrack: Release nat_conn in case both keys have the same hash.

2023-05-04 Thread Paolo Valerio
Ilya Maximets  writes:

> On 4/19/23 20:40, Paolo Valerio wrote:
>> During the creation of a new connection, there's a chance both key and
>> rev_key end up having the same hash. This is more common in the case
>> of all-zero snat with no collisions. In that case, once the
>> connection is expired, but not cleaned up, if a new packet with the
>> same 5-tuple is received, an assertion failure gets triggered in
>> conn_update_state() because of a previous failure of retrieving a
>> CT_CONN_TYPE_DEFAULT connection.
>> 
>> Fix it by releasing the nat_conn during the connection creation in the
>> case of same hash for both key and rev_key.
>
> This sounds a bit odd.  Shouldn't we treat hash collision as a normal case?
>
> Looking at the code, I'm assuming that the issue comes from the following
> part in process_one():
>
> if (OVS_LIKELY(conn)) {
> if (conn->conn_type == CT_CONN_TYPE_UN_NAT) {
> ...
> conn_key_lookup(ct, &ctx->key, hash, now, &conn, &ctx->reply);
>
> And here we get the same connection again, because the default one is already
> expired.  Is that correct?
>
> If so, maybe we should add an extra condition to conn_key_lookup() to
> only look for DEFAULT connections instead, just for this case?  Since
> we really don't want to get the UN_NAT one here.
>

Hello Ilya,

It's a fair point.
I initially thought about the approach you're suggesting, but I had some
concerns about it that I'll try to summarize below.

For sure it would fix the issue (it could require the first patch to be
applied as well for the branches with rcu exp lists).

Based on the current logic, new packets matching that expired connection
but not evicted will be marked as +inv and further packets will be
marked so for the whole sweep interval unless an exception like this get
added:

uint32_t hash = conn_key_hash(&conn->rev_key, ct->hash_basis);
/* the last flag indicates CT_CONN_TYPE_DEFAULT only */
conn_key_lookup_(ct, &ctx->key, hash, now, &conn, &ctx->reply, true);
/* special case where there's hash collision */
if (!conn && ctx->hash != hash) {
pkt->md.ct_state |= CS_INVALID;
write_ct_md(pkt, zone, NULL, NULL, NULL);
...
return;
}

This would further require that subsequent lookup in the create_new_conn
path are restricted to CT_CONN_TYPE_DEFAULT, e.g.:

uint32_t hash = conn_key_hash(&ctx->key, ct->hash_basis);
/* Only check for CT_CONN_TYPE_DEFAULT */
if (!conn_key_lookup_(ct, &ctx->key, hash, now, NULL, NULL, true)) {
conn = conn_not_found(ct, pkt, ctx, commit, now, nat_action_info,
  helper, alg_exp, ct_alg_ctl, tp_id);
}

otherwise we could incur in a false positive which prevent to create a
new connection.

> Best regards, Ilya Maximets.
>
>> 
>> Reported-by: Michael Plato 
>> Fixes: 61e48c2d1db2 ("conntrack: Handle SNAT with all-zero IP address.")
>> Signed-off-by: Paolo Valerio 
>> ---
>> In this thread [0] there are some more details. A similar
>> approach here could be to avoid to add the nat_conn to the cmap and
>> letting the sweeper release the memory for nat_conn once the whole
>> connection gets freed.
>> That approach could still be ok, but the drawback is that it could
>> require a different patch for older branches that don't include
>> 3d9c1b855a5f ("conntrack: Replace timeout based expiration lists with
>> rculists."). It still worth to be considered.
>> 
>> [0] https://mail.openvswitch.org/pipermail/ovs-discuss/2023-April/052339.html
>> ---
>>  lib/conntrack.c |   21 +
>>  1 file changed, 13 insertions(+), 8 deletions(-)
>> 
>> diff --git a/lib/conntrack.c b/lib/conntrack.c
>> index 7e1fc4b1f..d2ee127d9 100644
>> --- a/lib/conntrack.c
>> +++ b/lib/conntrack.c
>> @@ -1007,14 +1007,19 @@ conn_not_found(struct conntrack *ct, struct 
>> dp_packet *pkt,
>>  }
>>  
>>  nat_packet(pkt, nc, false, ctx->icmp_related);
>> -memcpy(&nat_conn->key, &nc->rev_key, sizeof nat_conn->key);
>> -memcpy(&nat_conn->rev_key, &nc->key, sizeof nat_conn->rev_key);
>> -nat_conn->conn_type = CT_CONN_TYPE_UN_NAT;
>> -nat_conn->nat_action = 0;
>> -nat_conn->alg = NULL;
>> -nat_conn->nat_conn = NULL;
>> -uint32_t nat_hash = conn_key_hash(&nat_conn->key, 
>> ct->hash_basis);
>> -cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash);
>> +uint

Re: [ovs-dev] [PATCH 1/2] conntrack: Do not defer connection clean up.

2023-04-20 Thread Paolo Valerio
Aaron Conole  writes:

> Paolo Valerio  writes:
>
>> Connections that need to be removed, e.g. while forcing a direction,
>> were invalidated forcing them to be expired.
>> This is not actually needed, as it's typically a one-time
>> operation.
>> The patch replaces a call to conn_force_expire() with a call to
>> conn_clean().
>>
>> Signed-off-by: Paolo Valerio 
>> ---
>
> Is there a possible contention issue now where the conn update can also
> take the ct lock?  IE: before, we would rely on the expiration timer
> processing, but now we directly release which requires the ct lock.
>
> Maybe since it is a rare enough event, this isn't as big a deal?
>

That's a fair point and mostly the reason I opted to split this one from
the next. Assuming as common the scenario where, e.g. many connections
are in TIME_WAIT and new connections with the same 5-tuple are
initiated while the sweeper is actually deleting, yes. The advantage
with this patch is that nconns is lowered earlier instead of waiting for
the next sweep interval, and, assuming it is an actual upside, the load
on the sweeper thread is reduced for those deletions.

The reason I included it is that forcing the expiration makes the
reported issue theoretically possible for those use case, but doesn't
solve it for all the cases as the second patch should.

I guess it's fine to drop this, at least for the time being.

>>  lib/conntrack.c |   10 ++
>>  1 file changed, 2 insertions(+), 8 deletions(-)
>>
>> diff --git a/lib/conntrack.c b/lib/conntrack.c
>> index ce8a63de5..7e1fc4b1f 100644
>> --- a/lib/conntrack.c
>> +++ b/lib/conntrack.c
>> @@ -514,12 +514,6 @@ conn_clean(struct conntrack *ct, struct conn *conn)
>>  atomic_count_dec(&ct->n_conn);
>>  }
>>  
>> -static void
>> -conn_force_expire(struct conn *conn)
>> -{
>> -atomic_store_relaxed(&conn->expiration, 0);
>> -}
>> -
>>  /* Destroys the connection tracker 'ct' and frees all the allocated memory.
>>   * The caller of this function must already have shut down packet input
>>   * and PMD threads (which would have been quiesced).  */
>> @@ -1089,7 +1083,7 @@ conn_update_state(struct conntrack *ct, struct 
>> dp_packet *pkt,
>>  break;
>>  case CT_UPDATE_NEW:
>>  if (conn_lookup(ct, &conn->key, now, NULL, NULL)) {
>> -conn_force_expire(conn);
>> +conn_clean(ct, conn);
>>  }
>>  create_new_conn = true;
>>  break;
>> @@ -1299,7 +1293,7 @@ process_one(struct conntrack *ct, struct dp_packet 
>> *pkt,
>>  /* Delete found entry if in wrong direction. 'force' implies commit. */
>>  if (OVS_UNLIKELY(force && ctx->reply && conn)) {
>>  if (conn_lookup(ct, &conn->key, now, NULL, NULL)) {
>> -conn_force_expire(conn);
>> +conn_clean(ct, conn);
>>  }
>>  conn = NULL;
>>  }

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 2/2] conntrack: Release nat_conn in case both keys have the same hash.

2023-04-19 Thread Paolo Valerio
During the creation of a new connection, there's a chance both key and
rev_key end up having the same hash. This is more common in the case
of all-zero snat with no collisions. In that case, once the
connection is expired, but not cleaned up, if a new packet with the
same 5-tuple is received, an assertion failure gets triggered in
conn_update_state() because of a previous failure of retrieving a
CT_CONN_TYPE_DEFAULT connection.

Fix it by releasing the nat_conn during the connection creation in the
case of same hash for both key and rev_key.

Reported-by: Michael Plato 
Fixes: 61e48c2d1db2 ("conntrack: Handle SNAT with all-zero IP address.")
Signed-off-by: Paolo Valerio 
---
In this thread [0] there are some more details. A similar
approach here could be to avoid to add the nat_conn to the cmap and
letting the sweeper release the memory for nat_conn once the whole
connection gets freed.
That approach could still be ok, but the drawback is that it could
require a different patch for older branches that don't include
3d9c1b855a5f ("conntrack: Replace timeout based expiration lists with
rculists."). It still worth to be considered.

[0] https://mail.openvswitch.org/pipermail/ovs-discuss/2023-April/052339.html
---
 lib/conntrack.c |   21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 7e1fc4b1f..d2ee127d9 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -1007,14 +1007,19 @@ conn_not_found(struct conntrack *ct, struct dp_packet 
*pkt,
 }
 
 nat_packet(pkt, nc, false, ctx->icmp_related);
-memcpy(&nat_conn->key, &nc->rev_key, sizeof nat_conn->key);
-memcpy(&nat_conn->rev_key, &nc->key, sizeof nat_conn->rev_key);
-nat_conn->conn_type = CT_CONN_TYPE_UN_NAT;
-nat_conn->nat_action = 0;
-nat_conn->alg = NULL;
-nat_conn->nat_conn = NULL;
-uint32_t nat_hash = conn_key_hash(&nat_conn->key, ct->hash_basis);
-cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash);
+uint32_t nat_hash = conn_key_hash(&nc->rev_key, ct->hash_basis);
+if (nat_hash != ctx->hash) {
+memcpy(&nat_conn->key, &nc->rev_key, sizeof nat_conn->key);
+memcpy(&nat_conn->rev_key, &nc->key, sizeof nat_conn->rev_key);
+nat_conn->conn_type = CT_CONN_TYPE_UN_NAT;
+nat_conn->nat_action = 0;
+nat_conn->alg = NULL;
+nat_conn->nat_conn = NULL;
+cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash);
+} else {
+free(nat_conn);
+nat_conn = NULL;
+}
 }
 
 nc->nat_conn = nat_conn;

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 1/2] conntrack: Do not defer connection clean up.

2023-04-19 Thread Paolo Valerio
Connections that need to be removed, e.g. while forcing a direction,
were invalidated forcing them to be expired.
This is not actually needed, as it's typically a one-time
operation.
The patch replaces a call to conn_force_expire() with a call to
conn_clean().

Signed-off-by: Paolo Valerio 
---
 lib/conntrack.c |   10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index ce8a63de5..7e1fc4b1f 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -514,12 +514,6 @@ conn_clean(struct conntrack *ct, struct conn *conn)
 atomic_count_dec(&ct->n_conn);
 }
 
-static void
-conn_force_expire(struct conn *conn)
-{
-atomic_store_relaxed(&conn->expiration, 0);
-}
-
 /* Destroys the connection tracker 'ct' and frees all the allocated memory.
  * The caller of this function must already have shut down packet input
  * and PMD threads (which would have been quiesced).  */
@@ -1089,7 +1083,7 @@ conn_update_state(struct conntrack *ct, struct dp_packet 
*pkt,
 break;
 case CT_UPDATE_NEW:
 if (conn_lookup(ct, &conn->key, now, NULL, NULL)) {
-conn_force_expire(conn);
+conn_clean(ct, conn);
 }
 create_new_conn = true;
 break;
@@ -1299,7 +1293,7 @@ process_one(struct conntrack *ct, struct dp_packet *pkt,
 /* Delete found entry if in wrong direction. 'force' implies commit. */
 if (OVS_UNLIKELY(force && ctx->reply && conn)) {
 if (conn_lookup(ct, &conn->key, now, NULL, NULL)) {
-conn_force_expire(conn);
+conn_clean(ct, conn);
 }
 conn = NULL;
 }

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 0/2] conntrack: Fix failed assertion in conn_update_state().

2023-04-19 Thread Paolo Valerio
The series addresses the issue reported here [0] by Michael Plato and
confirmed by Lazuardi Nasution.

More details in the patch descriptions.

The first patch is mostly a clean up and not necessarily required,
whereas the second one contains the actual fix.

[0] https://mail.openvswitch.org/pipermail/ovs-discuss/2023-April/052328.html

Paolo Valerio (2):
  conntrack: Do not defer connection clean up.
  conntrack: Release nat_conn in case both keys have the same hash.


 lib/conntrack.c | 31 +++
 1 file changed, 15 insertions(+), 16 deletions(-)

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3] ovs-dpctl: Add new command dpctl/ct-[sg]et-sweep-interval.

2023-04-06 Thread Paolo Valerio
Since 3d9c1b855a5f ("conntrack: Replace timeout based expiration lists
with rculists.") the sweep interval changed as well as the constraints
related to the sweeper.
Being able to change the default reschedule time may be convenient in
some conditions, like debugging.
This patch introduces new commands allowing to get and set the sweep
interval in ms.

Signed-off-by: Paolo Valerio 
---
v3:
- rebased on top of the current master
- renamed commands to dpctl/ct-[sg]et-sweep-interval (Ilya)
- added simple get/set test in ofproto-dpif.at (Ilya)

v2:
- resolved conflict in NEWS
- added missing comment
- added missing '\' in dpctl.man
---
 NEWS|3 ++
 lib/conntrack-private.h |1 +
 lib/conntrack.c |   18 +-
 lib/conntrack.h |2 ++
 lib/ct-dpif.c   |   14 +++
 lib/ct-dpif.h   |1 +
 lib/dpctl.c |   61 +++
 lib/dpctl.man   |9 +++
 lib/dpif-netdev.c   |   17 +
 lib/dpif-netlink.c  |2 ++
 lib/dpif-provider.h |4 +++
 tests/ofproto-dpif.at   |   22 +
 12 files changed, 153 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index b6418c36e..1155bfbb1 100644
--- a/NEWS
+++ b/NEWS
@@ -11,6 +11,9 @@ Post-v3.1.0
- ovs-appctl:
  * Add support for selecting the source address with the
'ovs-appctl ovs/route/add' command.
+ * New commands "dpctl/{ct-get-sweep-interval,ct-set-sweep-interval}" that
+   allow to get and set, for the userspace datapath, the sweep interval
+   for the conntrack garbage collector.
- ovs-ctl:
  * Added new options --[ovsdb-server|ovs-vswitchd]-umask=MODE to set umask
value when starting OVS daemons.  E.g., use --ovsdb-server-umask=0002
diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index fae8b3a9b..bb326868e 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -224,6 +224,7 @@ struct conntrack {
 struct ipf *ipf; /* Fragmentation handling context. */
 uint32_t zone_limit_seq; /* Used to disambiguate zone limit counts. */
 atomic_bool tcp_seq_chk; /* Check TCP sequence numbers. */
+atomic_uint32_t sweep_ms; /* Next sweep interval. */
 };
 
 /* Lock acquisition order:
diff --git a/lib/conntrack.c b/lib/conntrack.c
index f86fa26f4..ce8a63de5 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -320,6 +320,7 @@ conntrack_init(void)
 atomic_count_init(&ct->n_conn, 0);
 atomic_init(&ct->n_conn_limit, DEFAULT_N_CONN_LIMIT);
 atomic_init(&ct->tcp_seq_chk, true);
+atomic_init(&ct->sweep_ms, 2);
 latch_init(&ct->clean_thread_exit);
 ct->clean_thread = ovs_thread_create("ct_clean", clean_thread_main, ct);
 ct->ipf = ipf_init();
@@ -1480,6 +1481,21 @@ set_label(struct dp_packet *pkt, struct conn *conn,
 }
 
 
+int
+conntrack_set_sweep_interval(struct conntrack *ct, uint32_t ms)
+{
+atomic_store_relaxed(&ct->sweep_ms, ms);
+return 0;
+}
+
+uint32_t
+conntrack_get_sweep_interval(struct conntrack *ct)
+{
+uint32_t ms;
+atomic_read_relaxed(&ct->sweep_ms, &ms);
+return ms;
+}
+
 static size_t
 ct_sweep(struct conntrack *ct, struct rculist *list, long long now)
 OVS_NO_THREAD_SAFETY_ANALYSIS
@@ -1504,7 +1520,7 @@ ct_sweep(struct conntrack *ct, struct rculist *list, long 
long now)
 static long long
 conntrack_clean(struct conntrack *ct, long long now)
 {
-long long next_wakeup = now + 20 * 1000;
+long long next_wakeup = now + conntrack_get_sweep_interval(ct);
 unsigned int n_conn_limit, i;
 size_t clean_end, count = 0;
 
diff --git a/lib/conntrack.h b/lib/conntrack.h
index b064abc9f..524ec0acb 100644
--- a/lib/conntrack.h
+++ b/lib/conntrack.h
@@ -139,6 +139,8 @@ int conntrack_set_maxconns(struct conntrack *ct, uint32_t 
maxconns);
 int conntrack_get_maxconns(struct conntrack *ct, uint32_t *maxconns);
 int conntrack_get_nconns(struct conntrack *ct, uint32_t *nconns);
 int conntrack_set_tcp_seq_chk(struct conntrack *ct, bool enabled);
+int conntrack_set_sweep_interval(struct conntrack *ct, uint32_t ms);
+uint32_t conntrack_get_sweep_interval(struct conntrack *ct);
 bool conntrack_get_tcp_seq_chk(struct conntrack *ct);
 struct ipf *conntrack_ipf_ctx(struct conntrack *ct);
 struct conntrack_zone_limit zone_limit_get(struct conntrack *ct,
diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c
index d3b2783ce..0c4b2964f 100644
--- a/lib/ct-dpif.c
+++ b/lib/ct-dpif.c
@@ -368,6 +368,20 @@ ct_dpif_del_limits(struct dpif *dpif, const struct 
ovs_list *zone_limits)
 : EOPNOTSUPP);
 }
 
+int
+ct_dpif_sweep(struct dpif *dpif, uint32_t *ms)
+{
+if (*ms) {
+return (dpif->dpif_class->ct_set_sweep_interval
+? dpif->dpif_class->ct_set_sweep_interval(dpif, *ms)
+: EOPNOTSUPP);
+} else {
+  

Re: [ovs-dev] [PATCH v2] ovs-dpctl: Add new command dpctl/ct-sweep-next-run.

2023-03-31 Thread Paolo Valerio
Ilya Maximets  writes:

> On 2/27/23 13:30, Paolo Valerio wrote:
>> Since 3d9c1b855a5f ("conntrack: Replace timeout based expiration lists
>> with rculists.") the sweep interval changed as well as the constraints
>> related to the sweeper.
>> Being able to change the default reschedule time may be convenient in
>> some conditions, like debugging.
>> This patch introduces new commands allowing to get and set the sweep
>> next run in ms.
>> 
>> Signed-off-by: Paolo Valerio 
>> ---
>> v2:
>> - resolved conflict in NEWS
>> - added missing comment
>> - added missing '\' in dpctl.man
>> ---
>>  NEWS|4 +++
>>  lib/conntrack-private.h |1 +
>>  lib/conntrack.c |   18 +-
>>  lib/conntrack.h |2 ++
>>  lib/ct-dpif.c   |   14 +++
>>  lib/ct-dpif.h   |1 +
>>  lib/dpctl.c |   61 
>> +++
>>  lib/dpctl.man   |8 ++
>>  lib/dpif-netdev.c   |   17 +
>>  lib/dpif-netlink.c  |2 ++
>>  lib/dpif-provider.h |4 +++
>>  11 files changed, 131 insertions(+), 1 deletion(-)
>> 
>> diff --git a/NEWS b/NEWS
>> index 85b349621..4c4ef4b2b 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -10,6 +10,10 @@ Post-v3.1.0
>> in order to create OVSDB sockets with access mode of 0770.
>> - QoS:
>>   * Added new configuration option 'jitter' for a linux-netem QoS type.
>> +   - ovs-appctl:
>> + * New commands "dpctl/{ct-get-sweep-next-run,ct-set-sweep-next-run}" 
>> that
>> +   allow to get and set, for the userspace datapath, the next run 
>> interval
>> +   for the conntrack garbage collector.
>
> Hi, Paolo.  Thanks for the patch!
>
> It looks good to me in general, but the command name seems a bit
> strange.  It sounds like it is a one-shot configuration that only
> applies to the next run and will be dropped to default afterwards.
> But that doesn't seem to be the case in the code.  It's a permanent
> configuration for the sweep interval.  So, maybe we should call it
> ct-[gs]et-sweep-interval, or something like that ?
>
> What do you think?
>

Agreed, ct-[gs]et-sweep-interval seems a better name.

> Also, some small unit test, even a basic set+get check, would be
> nice to have.  We have some similar tests in tests/ofproto-dpif.at.
>

sure, I'll add it.

Thank you.

> Best regards, Ilya Maximets.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] system-traffic.at: Add icmp error tests while dnatting address and port.

2023-02-27 Thread Paolo Valerio
Ilya Maximets  writes:

> On 2/27/23 12:08, Paolo Valerio wrote:
>> The two tests verify, for both icmp and icmpv6, that the correct port
>> translation happen in the inner packet in the case an error is
>> received in the reply direction.
>> 
>> Signed-off-by: Paolo Valerio 
>> ---
>>  tests/system-traffic.at |   72 
>> +++
>>  1 file changed, 72 insertions(+)
>> 
>> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
>> index 3a15b88a2..02fd0ee1b 100644
>> --- a/tests/system-traffic.at
>> +++ b/tests/system-traffic.at
>> @@ -3561,6 +3561,42 @@ AT_CHECK([ovs-appctl dpctl/dump-conntrack | 
>> FORMAT_CT(172.16.0.3)], [0], [dnl
>>  OVS_TRAFFIC_VSWITCHD_STOP
>>  AT_CLEANUP
>>  
>> +AT_SETUP([conntrack - ICMP related NAT with single port])
>> +AT_SKIP_IF([test $HAVE_NC = no])
>> +AT_SKIP_IF([test $HAVE_TCPDUMP = no])
>> +CHECK_CONNTRACK()
>> +CHECK_CONNTRACK_NAT()
>> +OVS_TRAFFIC_VSWITCHD_START()
>> +
>> +ADD_NAMESPACES(at_ns0, at_ns1)
>> +
>> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", "f0:00:00:01:01:01")
>> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", "f0:00:00:01:01:02")
>> +
>> +NS_CHECK_EXEC([at_ns0], [ip neigh add 10.1.1.240 lladdr f0:00:00:01:01:02 
>> dev p0])
>> +NS_CHECK_EXEC([at_ns1], [ip neigh add 10.1.1.1 lladdr f0:00:00:01:01:01 dev 
>> p1])
>> +
>> +AT_DATA([flows.txt], [dnl
>> +table=0,ip,ct_state=-trk,actions=ct(table=0,nat)
>> +table=0,in_port=ovs-p0,udp,ct_state=+trk+new,actions=ct(commit,nat(dst=10.1.1.2:8080)),ovs-p1
>> +table=0,in_port=ovs-p1,ct_state=+trk+rel+rpl,icmp,actions=ovs-p0
>> +])
>> +
>> +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
>> +
>> +rm p0.pcap
>> +NETNS_DAEMONIZE([at_ns0], [tcpdump -l -U -i p0 -w p0.pcap 2>tcpdump0_err], 
>> [tcpdump0.pid])
>> +NS_CHECK_EXEC([at_ns0], [bash -c "echo dest_unreach | nc $NC_EOF_OPT -p 
>> 1234 -u 10.1.1.240 80"])
>> +
>> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.1," | 
>> sort], [0], [dnl
>> +udp,orig=(src=10.1.1.1,dst=10.1.1.240,sport=1234,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=8080,dport=1234)
>> +])
>> +
>> +OVS_WAIT_UNTIL([ovs-pcap p0.pcap | grep -Eq 
>> "f0010101f0010102080045c00045[[[:xdigit:]]]{4}4001[[[:xdigit:]]]{4}0a0101f00a010101030314164529[[[:xdigit:]]]{4}40004011[[[:xdigit:]]]{4}0a0101010a0101f004d2005000156b24646573745f756e72656163680a"])
>> +
>> +OVS_TRAFFIC_VSWITCHD_STOP
>> +AT_CLEANUP
>> +
>>  AT_SETUP([conntrack - IPv4 fragmentation])
>>  CHECK_CONNTRACK()
>>  OVS_TRAFFIC_VSWITCHD_START()
>> @@ -6555,6 +6591,42 @@ 
>> udp,orig=(src=fc00::1,dst=fc00::2,sport=,dport=),reply=(src=fc
>>  OVS_TRAFFIC_VSWITCHD_STOP
>>  AT_CLEANUP
>>  
>> +AT_SETUP([conntrack - ICMPv6 related NAT with single port])
>
> Looks like this test is failing Intel CI.
> Could you, please, check?
>

thanks, I sent a v2. It should fix the problem.

> Best regards, Ilya Maximets.
>
>> +AT_SKIP_IF([test $HAVE_NC = no])
>> +AT_SKIP_IF([test $HAVE_TCPDUMP = no])
>> +CHECK_CONNTRACK()
>> +CHECK_CONNTRACK_NAT()
>> +OVS_TRAFFIC_VSWITCHD_START()
>> +
>> +ADD_NAMESPACES(at_ns0, at_ns1)
>> +
>> +ADD_VETH(p0, at_ns0, br0, "fc00::1/96", "f0:00:00:01:01:01", [], "nodad")
>> +ADD_VETH(p1, at_ns1, br0, "fc00::2/96", "f0:00:00:01:01:02", [], "nodad")
>> +
>> +NS_CHECK_EXEC([at_ns0], [ip -6 neigh add fc00::240 lladdr f0:00:00:01:01:02 
>> dev p0])
>> +NS_CHECK_EXEC([at_ns1], [ip -6 neigh add fc00::1 lladdr f0:00:00:01:01:01 
>> dev p1])
>> +
>> +AT_DATA([flows.txt], [dnl
>> +table=0,ipv6,ct_state=-trk,actions=ct(table=0,nat)
>> +table=0,in_port=ovs-p0,udp6,ct_state=+trk+new,actions=ct(commit,nat(dst=[[fc00::2]]:8080)),ovs-p1
>> +table=0,in_port=ovs-p1,ct_state=+trk+rel+rpl,icmp6,actions=ovs-p0
>> +])
>> +
>> +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
>> +
>> +rm p0.pcap
>> +NETNS_DAEMONIZE([at_ns0], [tcpdump -l -U -i p0 -w p0.pcap 2>tcpdump0_err], 
>> [tcpdump0.pid])
>> +NS_CHECK_EXEC([at_ns0], [bash -c "echo dest_unreach | nc -6 $NC_EOF_OPT -p 
>> 1234 -u fc00::240 80"])
>> +
>> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=fc00::1," | 
>> sort], [0], [dnl
>> +udp,orig=(src=fc00::1,dst=fc00::240,sport=1234,dport=80),reply=(src=fc00::2,dst=fc00::1,sport=8080,dport=1234)
>> +])
>> +
>> +OVS_WAIT_UNTIL([ovs-pcap p0.pcap | grep -Eq 
>> "f0010101f001010286dd60[[[:xdigit:]]]{6}00453a40fc000240fc010104[[[:xdigit:]]]{4}60[[[:xdigit:]]]{6}00151140fc01fc00024004d20050001587d4646573745f756e72656163680a"])
>> +
>> +OVS_TRAFFIC_VSWITCHD_STOP
>> +AT_CLEANUP
>> +
>>  AT_SETUP([conntrack - IPv6 FTP with SNAT])
>>  AT_SKIP_IF([test $HAVE_FTP = no])
>>  CHECK_CONNTRACK()
>> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] system-traffic.at: Add icmp error tests while dnatting address and port.

2023-02-27 Thread Paolo Valerio
The two tests verify, for both icmp and icmpv6, that the correct port
translation happen in the inner packet in the case an error is
received in the reply direction.

Signed-off-by: Paolo Valerio 
---
v2:
- added missing OVS_WAIT_UNTIL for tcpdump
- removed nc dependency and replaced with packet-out
---
 tests/system-traffic.at |   74 +++
 1 file changed, 74 insertions(+)

diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 3a15b88a2..380372430 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -3561,6 +3561,43 @@ AT_CHECK([ovs-appctl dpctl/dump-conntrack | 
FORMAT_CT(172.16.0.3)], [0], [dnl
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([conntrack - ICMP related NAT with single port])
+AT_SKIP_IF([test $HAVE_TCPDUMP = no])
+CHECK_CONNTRACK()
+CHECK_CONNTRACK_NAT()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", "f0:00:00:01:01:01")
+ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", "f0:00:00:01:01:02")
+
+AT_DATA([flows.txt], [dnl
+table=0,ip,ct_state=-trk,actions=ct(table=0,nat)
+table=0,in_port=ovs-p0,ct_state=+trk+new,udp,actions=ct(commit,nat(dst=10.1.1.2:8080)),ovs-p1
+table=0,in_port=ovs-p1,ct_state=+trk+rel+rpl,icmp,actions=ovs-p0
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+rm p0.pcap
+OVS_DAEMONIZE([tcpdump -l -U -i ovs-p0 -w p0.pcap 2> tcpdump0_err], 
[tcpdump0.pid])
+OVS_WAIT_UNTIL([grep "listening" tcpdump0_err])
+
+dnl Send UDP packet from 10.1.1.1:1234 to 10.1.1.240:80
+AT_CHECK([ovs-ofctl packet-out br0 
"in_port=ovs-p0,packet=f0010102f00101010800452944c140004011df100a0101010a0101f004d2005000156b24646573745f756e72656163680a,actions=resubmit(,0)"])
+dnl Send "destination unreachable" response
+AT_CHECK([ovs-ofctl packet-out br0 
"in_port=ovs-p1,packet=f0010101f0010102080045c000456a374001f9bc0a0101020a01010103031328452944c140004011dffe0a0101010a01010204d21f9000154cd2646573745f756e72656163680a,actions=resubmit(,0)"])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.1," | 
sort], [0], [dnl
+udp,orig=(src=10.1.1.1,dst=10.1.1.240,sport=1234,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=8080,dport=1234)
+])
+
+OVS_WAIT_UNTIL([ovs-pcap p0.pcap | grep -q 
"f0010101f0010102080045c000456a374001f8ce0a0101f00a01010103031416452944c140004011df100a0101010a0101f004d2005000156b24646573745f756e72656163680a"])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([conntrack - IPv4 fragmentation])
 CHECK_CONNTRACK()
 OVS_TRAFFIC_VSWITCHD_START()
@@ -6555,6 +6592,43 @@ 
udp,orig=(src=fc00::1,dst=fc00::2,sport=,dport=),reply=(src=fc
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([conntrack - ICMPv6 related NAT with single port])
+AT_SKIP_IF([test $HAVE_TCPDUMP = no])
+CHECK_CONNTRACK()
+CHECK_CONNTRACK_NAT()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "fc00::1/96", "f0:00:00:01:01:01", [], "nodad")
+ADD_VETH(p1, at_ns1, br0, "fc00::2/96", "f0:00:00:01:01:02", [], "nodad")
+
+AT_DATA([flows.txt], [dnl
+table=0,ipv6,ct_state=-trk,actions=ct(table=0,nat)
+table=0,in_port=ovs-p0,ct_state=+trk+new,udp6,actions=ct(commit,nat(dst=[[fc00::2]]:8080)),ovs-p1
+table=0,in_port=ovs-p1,ct_state=+trk+rel+rpl,icmp6,actions=ovs-p0
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+rm p0.pcap
+OVS_DAEMONIZE([tcpdump -l -U -i ovs-p0 -w p0.pcap 2> tcpdump0_err], 
[tcpdump0.pid])
+OVS_WAIT_UNTIL([grep "listening" tcpdump0_err])
+
+dnl Send UDP packet from [[fc00::1]]:1234 to [[fc00::240]]:80
+AT_CHECK([ovs-ofctl packet-out br0 
"in_port=ovs-p0,packet=f0010102f001010186dd60066ced00151140fc01fc00024004d20050001587d4646573745f756e72656163680a,actions=resubmit(,0)"])
+dnl Send "destination unreachable" response
+AT_CHECK([ovs-ofctl packet-out br0 
"in_port=ovs-p1,packet=f0010101f001010286dd600733ed00453a40fc02fc010104285560066ced00151140fc01fc0204d21f9000156ad2646573745f756e72656163680a,actions=resubmit(,0)"])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=fc00::1," | sort], 
[0], [dnl
+udp,orig=(src=fc00::1,dst=fc00::240,sport=1234,dport=80),reply=(src=fc00::2,dst=fc00::1,sport=8080,dport=1234)
+])
+
+OVS_WAIT_UNTIL([ovs-pcap p0.pcap | grep -q 
"f0010101f001010286dd600733ed00453a40fc000240fc010104261760066ced00151140fc01fc00024004d20050001587d4646573745f756e72656163680a"])
+
+OVS_

[ovs-dev] [PATCH v2] ovs-dpctl: Add new command dpctl/ct-sweep-next-run.

2023-02-27 Thread Paolo Valerio
Since 3d9c1b855a5f ("conntrack: Replace timeout based expiration lists
with rculists.") the sweep interval changed as well as the constraints
related to the sweeper.
Being able to change the default reschedule time may be convenient in
some conditions, like debugging.
This patch introduces new commands allowing to get and set the sweep
next run in ms.

Signed-off-by: Paolo Valerio 
---
v2:
- resolved conflict in NEWS
- added missing comment
- added missing '\' in dpctl.man
---
 NEWS|4 +++
 lib/conntrack-private.h |1 +
 lib/conntrack.c |   18 +-
 lib/conntrack.h |2 ++
 lib/ct-dpif.c   |   14 +++
 lib/ct-dpif.h   |1 +
 lib/dpctl.c |   61 +++
 lib/dpctl.man   |8 ++
 lib/dpif-netdev.c   |   17 +
 lib/dpif-netlink.c  |2 ++
 lib/dpif-provider.h |4 +++
 11 files changed, 131 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index 85b349621..4c4ef4b2b 100644
--- a/NEWS
+++ b/NEWS
@@ -10,6 +10,10 @@ Post-v3.1.0
in order to create OVSDB sockets with access mode of 0770.
- QoS:
  * Added new configuration option 'jitter' for a linux-netem QoS type.
+   - ovs-appctl:
+ * New commands "dpctl/{ct-get-sweep-next-run,ct-set-sweep-next-run}" that
+   allow to get and set, for the userspace datapath, the next run interval
+   for the conntrack garbage collector.
 
 
 v3.1.0 - 16 Feb 2023
diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index fae8b3a9b..bb326868e 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -224,6 +224,7 @@ struct conntrack {
 struct ipf *ipf; /* Fragmentation handling context. */
 uint32_t zone_limit_seq; /* Used to disambiguate zone limit counts. */
 atomic_bool tcp_seq_chk; /* Check TCP sequence numbers. */
+atomic_uint32_t sweep_ms; /* Next sweep interval. */
 };
 
 /* Lock acquisition order:
diff --git a/lib/conntrack.c b/lib/conntrack.c
index 5029b2cda..9356c1282 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -320,6 +320,7 @@ conntrack_init(void)
 atomic_count_init(&ct->n_conn, 0);
 atomic_init(&ct->n_conn_limit, DEFAULT_N_CONN_LIMIT);
 atomic_init(&ct->tcp_seq_chk, true);
+atomic_init(&ct->sweep_ms, 2);
 latch_init(&ct->clean_thread_exit);
 ct->clean_thread = ovs_thread_create("ct_clean", clean_thread_main, ct);
 ct->ipf = ipf_init();
@@ -1480,6 +1481,21 @@ set_label(struct dp_packet *pkt, struct conn *conn,
 }
 
 
+int
+conntrack_set_sweep_next_run(struct conntrack *ct, uint32_t ms)
+{
+atomic_store_relaxed(&ct->sweep_ms, ms);
+return 0;
+}
+
+uint32_t
+conntrack_get_sweep_next_run(struct conntrack *ct)
+{
+uint32_t ms;
+atomic_read_relaxed(&ct->sweep_ms, &ms);
+return ms;
+}
+
 static size_t
 ct_sweep(struct conntrack *ct, struct rculist *list, long long now)
 OVS_NO_THREAD_SAFETY_ANALYSIS
@@ -1504,7 +1520,7 @@ ct_sweep(struct conntrack *ct, struct rculist *list, long 
long now)
 static long long
 conntrack_clean(struct conntrack *ct, long long now)
 {
-long long next_wakeup = now + 20 * 1000;
+long long next_wakeup = now + conntrack_get_sweep_next_run(ct);
 unsigned int n_conn_limit, i;
 size_t clean_end, count = 0;
 
diff --git a/lib/conntrack.h b/lib/conntrack.h
index b064abc9f..2306cf375 100644
--- a/lib/conntrack.h
+++ b/lib/conntrack.h
@@ -139,6 +139,8 @@ int conntrack_set_maxconns(struct conntrack *ct, uint32_t 
maxconns);
 int conntrack_get_maxconns(struct conntrack *ct, uint32_t *maxconns);
 int conntrack_get_nconns(struct conntrack *ct, uint32_t *nconns);
 int conntrack_set_tcp_seq_chk(struct conntrack *ct, bool enabled);
+int conntrack_set_sweep_next_run(struct conntrack *ct, uint32_t ms);
+uint32_t conntrack_get_sweep_next_run(struct conntrack *ct);
 bool conntrack_get_tcp_seq_chk(struct conntrack *ct);
 struct ipf *conntrack_ipf_ctx(struct conntrack *ct);
 struct conntrack_zone_limit zone_limit_get(struct conntrack *ct,
diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c
index d3b2783ce..0a08eb11c 100644
--- a/lib/ct-dpif.c
+++ b/lib/ct-dpif.c
@@ -368,6 +368,20 @@ ct_dpif_del_limits(struct dpif *dpif, const struct 
ovs_list *zone_limits)
 : EOPNOTSUPP);
 }
 
+int
+ct_dpif_sweep(struct dpif *dpif, uint32_t *ms)
+{
+if (*ms) {
+return (dpif->dpif_class->ct_set_sweep_next_run
+? dpif->dpif_class->ct_set_sweep_next_run(dpif, *ms)
+: EOPNOTSUPP);
+} else {
+return (dpif->dpif_class->ct_get_sweep_next_run
+? dpif->dpif_class->ct_get_sweep_next_run(dpif, ms)
+: EOPNOTSUPP);
+}
+}
+
 int
 ct_dpif_ipf_set_enabled(struct dpif *dpif, bool v6, bool enable)
 {
diff --git a/lib/ct-dpif.h b/lib/ct-dpif.h
index 5edbbfd3b..

Re: [ovs-dev] [PATCH 1/2] cli: add option to display the version from Cargo.toml.

2023-02-27 Thread Paolo Valerio
Sorry for the noise, but this local test got sent unintentionally.

Please, ignore it.

Paolo Valerio  writes:

> Signed-off-by: Paolo Valerio 
> ---
>  src/cli/cli.rs |1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/src/cli/cli.rs b/src/cli/cli.rs
> index a5b08e6..f8593e1 100644
> --- a/src/cli/cli.rs
> +++ b/src/cli/cli.rs
> @@ -73,6 +73,7 @@ impl Debug for dyn SubCommand {
>  ///
>  /// packet-tracer is a tool for capturing networking-related events from the 
> system using ebpf and analyzing them.
>  #[derive(Args, Default, Debug)]
> +#[command(version)]
>  pub(crate) struct MainConfig {}
>  
>  /// ThinCli handles the first (a.k.a "thin") round of Command Line Interface 
> parsing.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 2/2] WIP

2023-02-27 Thread Paolo Valerio
Signed-off-by: Paolo Valerio 

Signed-off-by: Paolo Valerio 
---
 src/main.rs |1 +
 1 file changed, 1 insertion(+)

diff --git a/src/main.rs b/src/main.rs
index c922fae..c28a07f 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -2,6 +2,7 @@ use anyhow::Result;
 use log::error;
 use simplelog::{Config, LevelFilter, SimpleLogger};
 
+
 mod cli;
 mod collect;
 mod core;

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 1/2] cli: add option to display the version from Cargo.toml.

2023-02-27 Thread Paolo Valerio
Signed-off-by: Paolo Valerio 
---
 src/cli/cli.rs |1 +
 1 file changed, 1 insertion(+)

diff --git a/src/cli/cli.rs b/src/cli/cli.rs
index a5b08e6..f8593e1 100644
--- a/src/cli/cli.rs
+++ b/src/cli/cli.rs
@@ -73,6 +73,7 @@ impl Debug for dyn SubCommand {
 ///
 /// packet-tracer is a tool for capturing networking-related events from the 
system using ebpf and analyzing them.
 #[derive(Args, Default, Debug)]
+#[command(version)]
 pub(crate) struct MainConfig {}
 
 /// ThinCli handles the first (a.k.a "thin") round of Command Line Interface 
parsing.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] ovs-dpctl: Add new command dpctl/ct-sweep-next-run.

2023-02-27 Thread Paolo Valerio
Since 3d9c1b855a5f ("conntrack: Replace timeout based expiration lists
with rculists.") the sweep interval changed as well as the constraints
related to the sweeper.
Being able to change the default reschedule time may be convenient in
some conditions, like debugging.
This patch introduces new commands allowing to get and set the sweep
next run in ms.

Signed-off-by: Paolo Valerio 
---
 NEWS|4 +++
 lib/conntrack-private.h |1 +
 lib/conntrack.c |   18 +-
 lib/conntrack.h |2 ++
 lib/ct-dpif.c   |   14 +++
 lib/ct-dpif.h   |1 +
 lib/dpctl.c |   61 +++
 lib/dpctl.man   |8 ++
 lib/dpif-netdev.c   |   17 +
 lib/dpif-netlink.c  |2 ++
 lib/dpif-provider.h |4 +++
 11 files changed, 131 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index 391badd7c..c80f44429 100644
--- a/NEWS
+++ b/NEWS
@@ -4,6 +4,10 @@ Post-v3.1.0
  * OVS now collects per-interface upcall statistics that can be obtained
via 'ovs-appctl dpctl/show -s' or the interface's statistics column
in OVSDB.  Available with upstream kernel 6.2+.
+   - ovs-appctl:
+ * New commands "dpctl/{ct-get-sweep-next-run,ct-set-sweep-next-run}" that
+   allow to get and set, for the userspace datapath, the next run interval
+   for the conntrack garbage collector.
 
 
 v3.1.0 - 16 Feb 2023
diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index fae8b3a9b..3438c3554 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -224,6 +224,7 @@ struct conntrack {
 struct ipf *ipf; /* Fragmentation handling context. */
 uint32_t zone_limit_seq; /* Used to disambiguate zone limit counts. */
 atomic_bool tcp_seq_chk; /* Check TCP sequence numbers. */
+atomic_uint32_t sweep_ms;
 };
 
 /* Lock acquisition order:
diff --git a/lib/conntrack.c b/lib/conntrack.c
index 524670e45..e9a37f2c1 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -320,6 +320,7 @@ conntrack_init(void)
 atomic_count_init(&ct->n_conn, 0);
 atomic_init(&ct->n_conn_limit, DEFAULT_N_CONN_LIMIT);
 atomic_init(&ct->tcp_seq_chk, true);
+atomic_init(&ct->sweep_ms, 2);
 latch_init(&ct->clean_thread_exit);
 ct->clean_thread = ovs_thread_create("ct_clean", clean_thread_main, ct);
 ct->ipf = ipf_init();
@@ -1480,6 +1481,21 @@ set_label(struct dp_packet *pkt, struct conn *conn,
 }
 
 
+int
+conntrack_set_sweep_next_run(struct conntrack *ct, uint32_t ms)
+{
+atomic_store_relaxed(&ct->sweep_ms, ms);
+return 0;
+}
+
+uint32_t
+conntrack_get_sweep_next_run(struct conntrack *ct)
+{
+uint32_t ms;
+atomic_read_relaxed(&ct->sweep_ms, &ms);
+return ms;
+}
+
 static size_t
 ct_sweep(struct conntrack *ct, struct rculist *list, long long now)
 OVS_NO_THREAD_SAFETY_ANALYSIS
@@ -1504,7 +1520,7 @@ ct_sweep(struct conntrack *ct, struct rculist *list, long 
long now)
 static long long
 conntrack_clean(struct conntrack *ct, long long now)
 {
-long long next_wakeup = now + 20 * 1000;
+long long next_wakeup = now + conntrack_get_sweep_next_run(ct);
 unsigned int n_conn_limit, i;
 size_t clean_end, count = 0;
 
diff --git a/lib/conntrack.h b/lib/conntrack.h
index b064abc9f..2306cf375 100644
--- a/lib/conntrack.h
+++ b/lib/conntrack.h
@@ -139,6 +139,8 @@ int conntrack_set_maxconns(struct conntrack *ct, uint32_t 
maxconns);
 int conntrack_get_maxconns(struct conntrack *ct, uint32_t *maxconns);
 int conntrack_get_nconns(struct conntrack *ct, uint32_t *nconns);
 int conntrack_set_tcp_seq_chk(struct conntrack *ct, bool enabled);
+int conntrack_set_sweep_next_run(struct conntrack *ct, uint32_t ms);
+uint32_t conntrack_get_sweep_next_run(struct conntrack *ct);
 bool conntrack_get_tcp_seq_chk(struct conntrack *ct);
 struct ipf *conntrack_ipf_ctx(struct conntrack *ct);
 struct conntrack_zone_limit zone_limit_get(struct conntrack *ct,
diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c
index d3b2783ce..0a08eb11c 100644
--- a/lib/ct-dpif.c
+++ b/lib/ct-dpif.c
@@ -368,6 +368,20 @@ ct_dpif_del_limits(struct dpif *dpif, const struct 
ovs_list *zone_limits)
 : EOPNOTSUPP);
 }
 
+int
+ct_dpif_sweep(struct dpif *dpif, uint32_t *ms)
+{
+if (*ms) {
+return (dpif->dpif_class->ct_set_sweep_next_run
+? dpif->dpif_class->ct_set_sweep_next_run(dpif, *ms)
+: EOPNOTSUPP);
+} else {
+return (dpif->dpif_class->ct_get_sweep_next_run
+? dpif->dpif_class->ct_get_sweep_next_run(dpif, ms)
+: EOPNOTSUPP);
+}
+}
+
 int
 ct_dpif_ipf_set_enabled(struct dpif *dpif, bool v6, bool enable)
 {
diff --git a/lib/ct-dpif.h b/lib/ct-dpif.h
index 5edbbfd3b..1e265604f 100644
--- a/lib/ct-dpif.h
+++ b/lib/ct-dpif.h
@@ -298

[ovs-dev] [PATCH] system-traffic.at: Add icmp error tests while dnatting address and port.

2023-02-27 Thread Paolo Valerio
The two tests verify, for both icmp and icmpv6, that the correct port
translation happen in the inner packet in the case an error is
received in the reply direction.

Signed-off-by: Paolo Valerio 
---
 tests/system-traffic.at |   72 +++
 1 file changed, 72 insertions(+)

diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 3a15b88a2..02fd0ee1b 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -3561,6 +3561,42 @@ AT_CHECK([ovs-appctl dpctl/dump-conntrack | 
FORMAT_CT(172.16.0.3)], [0], [dnl
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([conntrack - ICMP related NAT with single port])
+AT_SKIP_IF([test $HAVE_NC = no])
+AT_SKIP_IF([test $HAVE_TCPDUMP = no])
+CHECK_CONNTRACK()
+CHECK_CONNTRACK_NAT()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", "f0:00:00:01:01:01")
+ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24", "f0:00:00:01:01:02")
+
+NS_CHECK_EXEC([at_ns0], [ip neigh add 10.1.1.240 lladdr f0:00:00:01:01:02 dev 
p0])
+NS_CHECK_EXEC([at_ns1], [ip neigh add 10.1.1.1 lladdr f0:00:00:01:01:01 dev 
p1])
+
+AT_DATA([flows.txt], [dnl
+table=0,ip,ct_state=-trk,actions=ct(table=0,nat)
+table=0,in_port=ovs-p0,udp,ct_state=+trk+new,actions=ct(commit,nat(dst=10.1.1.2:8080)),ovs-p1
+table=0,in_port=ovs-p1,ct_state=+trk+rel+rpl,icmp,actions=ovs-p0
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+rm p0.pcap
+NETNS_DAEMONIZE([at_ns0], [tcpdump -l -U -i p0 -w p0.pcap 2>tcpdump0_err], 
[tcpdump0.pid])
+NS_CHECK_EXEC([at_ns0], [bash -c "echo dest_unreach | nc $NC_EOF_OPT -p 1234 
-u 10.1.1.240 80"])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.1," | 
sort], [0], [dnl
+udp,orig=(src=10.1.1.1,dst=10.1.1.240,sport=1234,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=8080,dport=1234)
+])
+
+OVS_WAIT_UNTIL([ovs-pcap p0.pcap | grep -Eq 
"f0010101f0010102080045c00045[[[:xdigit:]]]{4}4001[[[:xdigit:]]]{4}0a0101f00a010101030314164529[[[:xdigit:]]]{4}40004011[[[:xdigit:]]]{4}0a0101010a0101f004d2005000156b24646573745f756e72656163680a"])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([conntrack - IPv4 fragmentation])
 CHECK_CONNTRACK()
 OVS_TRAFFIC_VSWITCHD_START()
@@ -6555,6 +6591,42 @@ 
udp,orig=(src=fc00::1,dst=fc00::2,sport=,dport=),reply=(src=fc
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([conntrack - ICMPv6 related NAT with single port])
+AT_SKIP_IF([test $HAVE_NC = no])
+AT_SKIP_IF([test $HAVE_TCPDUMP = no])
+CHECK_CONNTRACK()
+CHECK_CONNTRACK_NAT()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "fc00::1/96", "f0:00:00:01:01:01", [], "nodad")
+ADD_VETH(p1, at_ns1, br0, "fc00::2/96", "f0:00:00:01:01:02", [], "nodad")
+
+NS_CHECK_EXEC([at_ns0], [ip -6 neigh add fc00::240 lladdr f0:00:00:01:01:02 
dev p0])
+NS_CHECK_EXEC([at_ns1], [ip -6 neigh add fc00::1 lladdr f0:00:00:01:01:01 dev 
p1])
+
+AT_DATA([flows.txt], [dnl
+table=0,ipv6,ct_state=-trk,actions=ct(table=0,nat)
+table=0,in_port=ovs-p0,udp6,ct_state=+trk+new,actions=ct(commit,nat(dst=[[fc00::2]]:8080)),ovs-p1
+table=0,in_port=ovs-p1,ct_state=+trk+rel+rpl,icmp6,actions=ovs-p0
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+rm p0.pcap
+NETNS_DAEMONIZE([at_ns0], [tcpdump -l -U -i p0 -w p0.pcap 2>tcpdump0_err], 
[tcpdump0.pid])
+NS_CHECK_EXEC([at_ns0], [bash -c "echo dest_unreach | nc -6 $NC_EOF_OPT -p 
1234 -u fc00::240 80"])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=fc00::1," | sort], 
[0], [dnl
+udp,orig=(src=fc00::1,dst=fc00::240,sport=1234,dport=80),reply=(src=fc00::2,dst=fc00::1,sport=8080,dport=1234)
+])
+
+OVS_WAIT_UNTIL([ovs-pcap p0.pcap | grep -Eq 
"f0010101f001010286dd60[[[:xdigit:]]]{6}00453a40fc000240fc010104[[[:xdigit:]]]{4}60[[[:xdigit:]]]{6}00151140fc01fc00024004d20050001587d4646573745f756e72656163680a"])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([conntrack - IPv6 FTP with SNAT])
 AT_SKIP_IF([test $HAVE_FTP = no])
 CHECK_CONNTRACK()

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] conntrack: fix conntrack_clean may access the same exp_list each time be called

2023-02-21 Thread Paolo Valerio
Liang Mancang  writes:

> when a exp_list contains more than the clean_end's number of nodes,
> and these nodes will not expire immediately. Then, every times we
> call conntrack_clean, it use the same next_sweep to get exp_list.
>
> Actually, we should add i every times after we call ct_sweep.
>
> v2: delete unnecessary line.
>

It's better to place the log after "---" at the bottom of the commit
message. I don't know if it's worth a new version only for this. If no
other respin will be needed maybe could be removed while applying.

Other than that, the change looks good to me.
Thanks for fixing this:

Acked-by: Paolo Valerio 


> Fixes: 3d9c1b855a5f ("conntrack: Replace timeout based expiration lists with 
> rculists.")
> Signed-off-by: Liang Mancang 
> ---
>  lib/conntrack.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 524670e45..8cf7779c6 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -1512,12 +1512,12 @@ conntrack_clean(struct conntrack *ct, long long now)
>  clean_end = n_conn_limit / 64;
>  
>  for (i = ct->next_sweep; i < N_EXP_LISTS; i++) {
> -count += ct_sweep(ct, &ct->exp_lists[i], now);
> -
>  if (count > clean_end) {
>  next_wakeup = 0;
>  break;
>  }
> +
> +count += ct_sweep(ct, &ct->exp_lists[i], now);
>  }
>  
>  ct->next_sweep = (i < N_EXP_LISTS) ? i : 0;
> -- 
> 2.30.0.windows.2
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] conntrack:fix conntrack_clean may access the same exp_list each time be called

2023-02-21 Thread Paolo Valerio
Liang Mancang  writes:

> On Mon, Feb 20, 2023 at 07:38:39PM +0100, Paolo Valerio wrote:
>> Paolo Valerio  writes:
>> 
>> > Hello Liang,
>> >
>> > Liang Mancang  writes:
>> >
>> >> when a exp_list contains more than the clean_end's number of nodes,
>> >> and these nodes will not expire immediately. Then, every times we
>> >> call conntrack_clean, it use the same next_sweep to get exp_list.
>> >>
>> >
>> > Yes, in general, if the previous count exceeds the clean_end, it should
>> > not make the sweeper restart from a list just swept, but it should not
>> > happen that a single list contains more than n_conn_limit / 64.
>> >
>> > Did you observe a single exp_list containing more than n_conn_limit / 64
>> > entries?
> We only select exp_list for a conntrack entry when createing it, but never 
> move 
> them when update their expires or delete them. So the number of each exp_list
> will become unbalanced after long-time running.

of course, if not balanced that could happen.

>> >
>> >> Actually, we should add i every times after we call ct_sweep.
>> >>
>> >> Signed-off-by: Liang Mancang 
>> >> ---
>> >>  lib/conntrack.c | 5 +++--
>> >>  1 file changed, 3 insertions(+), 2 deletions(-)
>> >>
>> >> diff --git a/lib/conntrack.c b/lib/conntrack.c
>> >> index 524670e45..5029b2cda 100644
>> >> --- a/lib/conntrack.c
>> >> +++ b/lib/conntrack.c
>> >> @@ -1512,12 +1512,13 @@ conntrack_clean(struct conntrack *ct, long long 
>> >> now)
>> >>  clean_end = n_conn_limit / 64;
>> >>  
>> >>  for (i = ct->next_sweep; i < N_EXP_LISTS; i++) {
>> >> -count += ct_sweep(ct, &ct->exp_lists[i], now);
>> >> -
>> >>  if (count > clean_end) {
>> >>  next_wakeup = 0;
>> >> +
>> >
>> > This new line is not needed, and a Fixes tag could be added:
>> >
>> > Fixes: 3d9c1b855a5f ("conntrack: Replace timeout based expiration lists 
>> > with rculists.")
>> >
>> > The patch LGTM, 
>> >
>> 
>> Sorry, the last line slipped out. Please consider my question and the
>> other comments. I will explicitly tag the patch once we're done.
>> 
> I sent v2 for this.

Thanks.

>> >>  break;
>> >>  }
>> >> +
>> >> +count += ct_sweep(ct, &ct->exp_lists[i], now);
>> >>  }
>> >>  
>> >>  ct->next_sweep = (i < N_EXP_LISTS) ? i : 0;
>> >> -- 
>> >> 2.30.0.windows.2
>> >>
>> >> ___
>> >> dev mailing list
>> >> d...@openvswitch.org

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] conntrack:fix conntrack_clean may access the same exp_list each time be called

2023-02-20 Thread Paolo Valerio
Paolo Valerio  writes:

> Hello Liang,
>
> Liang Mancang  writes:
>
>> when a exp_list contains more than the clean_end's number of nodes,
>> and these nodes will not expire immediately. Then, every times we
>> call conntrack_clean, it use the same next_sweep to get exp_list.
>>
>
> Yes, in general, if the previous count exceeds the clean_end, it should
> not make the sweeper restart from a list just swept, but it should not
> happen that a single list contains more than n_conn_limit / 64.
>
> Did you observe a single exp_list containing more than n_conn_limit / 64
> entries?
>
>> Actually, we should add i every times after we call ct_sweep.
>>
>> Signed-off-by: Liang Mancang 
>> ---
>>  lib/conntrack.c | 5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/conntrack.c b/lib/conntrack.c
>> index 524670e45..5029b2cda 100644
>> --- a/lib/conntrack.c
>> +++ b/lib/conntrack.c
>> @@ -1512,12 +1512,13 @@ conntrack_clean(struct conntrack *ct, long long now)
>>  clean_end = n_conn_limit / 64;
>>  
>>  for (i = ct->next_sweep; i < N_EXP_LISTS; i++) {
>> -count += ct_sweep(ct, &ct->exp_lists[i], now);
>> -
>>  if (count > clean_end) {
>>  next_wakeup = 0;
>> +
>
> This new line is not needed, and a Fixes tag could be added:
>
> Fixes: 3d9c1b855a5f ("conntrack: Replace timeout based expiration lists with 
> rculists.")
>
> The patch LGTM, 
>

Sorry, the last line slipped out. Please consider my question and the
other comments. I will explicitly tag the patch once we're done.

>>  break;
>>  }
>> +
>> +count += ct_sweep(ct, &ct->exp_lists[i], now);
>>  }
>>  
>>  ct->next_sweep = (i < N_EXP_LISTS) ? i : 0;
>> -- 
>> 2.30.0.windows.2
>>
>> ___
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


  1   2   3   4   >