from:"Matthieu Baerts"

Recently, we had an issue where getting info about the memory would have
helped better understanding what went wrong.

Let add it just in case for later.

Reviewed-by: Geliang Tang 
Signed-off-by: Matthieu Baerts (NGI0) 
---
 tools/testing/selftests/net/mptcp/mptcp_lib.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/net/mptcp/mptcp_lib.sh 
b/tools/testing/selftests/net/mptcp/mptcp_lib.sh
index 
91a1d3b76e664bd95fc36310ac2e2c89bfba1aa1..051e289d79676c5feb9f46da67a08116548f4b47
 100644
--- a/tools/testing/selftests/net/mptcp/mptcp_lib.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_lib.sh
@@ -118,11 +118,11 @@ mptcp_lib_pr_err_stats() {
echo -en "${MPTCP_LIB_COLOR_RED}"
{
printf "\nnetns %s (listener) socket stat for %d:\n" "${lns}" 
"${port}"
-   ip netns exec "${lns}" ss -Menita -o "sport = :${port}"
+   ip netns exec "${lns}" ss -Menitam -o "sport = :${port}"
cat "${lstat}"
 
printf "\nnetns %s (connector) socket stat for %d:\n" "${cns}" 
"${port}"
-   ip netns exec "${cns}" ss -Menita -o "dport = :${port}"
+   ip netns exec "${cns}" ss -Menitam -o "dport = :${port}"
[ "${lstat}" != "${cstat}" ] && cat "${cstat}"
} 1>&2
echo -en "${MPTCP_LIB_COLOR_RESET}"

-- 
2.47.1

Re: [PATCH 0/3] selftests: mptcp: Fix various issues in main_loop

2025-01-14 Thread Matthieu Baerts

Hi Cong Liu,

On 13/01/2025 09:52, Cong Liu wrote:
> Fix several issues in the mptcp connect test's main_loop function.
> 
>  - Fix a bug where the wrong file descriptor was being checked for errors
>  - Fix the input file descriptor lifecycle in the reconnection loop to
>prevent use of invalid fd
>  - Add proper resource cleanup in error paths

Thank you for these fixes!

Please note that when sending patches to the Netdev mailing list, it is
asked to follow some specific rules, e.g.

- designate your patch to a tree - [PATCH net] or [PATCH net-next]
- for fixes the Fixes: tag is required, regardless of the tree

https://docs.kernel.org/process/maintainer-netdev.html

If I'm not mistaken, it looks like you have here fixes for net, and the
Fixes tag is missing.

Do you mind sending a v2 with this being fixed please? Because these
fixes are not urgent, do you mind sending them only to the MPTCP ML for
the moment, and not to the other ones (and no other people in CC).

> Cong Liu (3):
>   selftests: mptcp: Fix incorrect file descriptor check in main_loop
>   selftests: mptcp: Fix input fd lifecycle in reconnection loop
>   selftests: mptcp: Clean up resources properly in main_loop
> 
>  .../selftests/net/mptcp/mptcp_connect.c| 18 +-
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> 
> base-commit: 2b88851f583d3c4e40bcd40cfe1965241ec229dd

Note that this base-commit doesn't seem to exist. Because of that, our
MPTCP CI was not able to automatically apply this commit.

Talking about our CI, it looks like that 'mptcp_connect' now crashes in
some cases with:

  free(): double free detected in tcache 2

Do you mind checking this please?

https://github.com/multipath-tcp/mptcp_net-next/actions/runs/12751035845

pw-bot: cr

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

[PATCH net-next 6/6] selftests: mptcp: connect: better display the files size

'du' will print the name of the file, which was already displayed
before, e.g.

  Created /tmp/tmp.UOyy0ghfmQ (size 4703740/tmp/tmp.UOyy0ghfmQ) containing data 
sent by client
  Created /tmp/tmp.xq3zvFinGo (size 1391724/tmp/tmp.xq3zvFinGo) containing data 
sent by server

'stat' can be used instead, to display this instead:

  Created /tmp/tmp.UOyy0ghfmQ (size 4703740 B) containing data sent by client
  Created /tmp/tmp.xq3zvFinGo (size 1391724 B) containing data sent by server

So easier to spot the file sizes.

Reviewed-by: Geliang Tang 
Signed-off-by: Matthieu Baerts (NGI0) 
---
 tools/testing/selftests/net/mptcp/mptcp_connect.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh 
b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
index 
e508d356fcdaebbfb95750bba0fa834a8463e32a..5e3c56253274a1f938d2ed9986c4290fcea8b96b
 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
@@ -582,7 +582,7 @@ make_file()
mptcp_lib_make_file $name 1024 $ksize
dd if=/dev/urandom conv=notrunc of="$name" oflag=append bs=1 count=$rem 
2> /dev/null
 
-   echo "Created $name (size $(du -b "$name")) containing data sent by 
$who"
+   echo "Created $name (size $(stat -c "%s" "$name") B) containing data 
sent by $who"
 }
 
 run_tests_lo()

-- 
2.47.1

[PATCH net-next 5/6] selftests: mptcp: connect: remove unused variable

'cin_disconnect' is used in run_tests_disconnect(), but not
'cout_disconnect', so it is safe to drop it.

Reviewed-by: Geliang Tang 
Signed-off-by: Matthieu Baerts (NGI0) 
---
 tools/testing/selftests/net/mptcp/mptcp_connect.sh | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh 
b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
index 
bfdaecd0a6a0564020530345daf91bed296bc15c..e508d356fcdaebbfb95750bba0fa834a8463e32a
 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
@@ -137,7 +137,7 @@ TEST_GROUP=""
 #shellcheck disable=SC2317
 cleanup()
 {
-   rm -f "$cin_disconnect" "$cout_disconnect"
+   rm -f "$cin_disconnect"
rm -f "$cin" "$cout"
rm -f "$sin" "$sout"
rm -f "$capout"
@@ -155,7 +155,6 @@ cin=$(mktemp)
 cout=$(mktemp)
 capout=$(mktemp)
 cin_disconnect="$cin".disconnect
-cout_disconnect="$cout".disconnect
 trap cleanup EXIT
 
 mptcp_lib_ns_init ns1 ns2 ns3 ns4

-- 
2.47.1

[PATCH net-next 3/6] selftests: mptcp: move stats info in case of errors to lib.sh

A few MPTCP selftests are using the same code to print stats in case of
error. This code can then be moved to mptcp_lib.sh.

No behaviour changes intended, except to print the error in red and to
stderr, like most error messages.

Reviewed-by: Geliang Tang 
Signed-off-by: Matthieu Baerts (NGI0) 
---
 tools/testing/selftests/net/mptcp/mptcp_connect.sh |  8 ++--
 tools/testing/selftests/net/mptcp/mptcp_join.sh|  9 ++---
 tools/testing/selftests/net/mptcp/mptcp_lib.sh | 21 +
 tools/testing/selftests/net/mptcp/mptcp_sockopt.sh |  8 ++--
 tools/testing/selftests/net/mptcp/simult_flows.sh  |  8 ++--
 5 files changed, 29 insertions(+), 25 deletions(-)

diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh 
b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
index 
b48b4e56826a9cfdb3501242b707ae2ebe29b220..bfdaecd0a6a0564020530345daf91bed296bc15c
 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
@@ -445,12 +445,8 @@ do_transfer()
printf "(duration %05sms) " "${duration}"
if [ ${rets} -ne 0 ] || [ ${retc} -ne 0 ]; then
mptcp_lib_pr_fail "client exit code $retc, server $rets"
-   echo -e "\nnetns ${listener_ns} socket stat for ${port}:" 1>&2
-   ip netns exec ${listener_ns} ss -Menita 1>&2 -o "sport = :$port"
-   cat /tmp/${listener_ns}.out
-   echo -e "\nnetns ${connector_ns} socket stat for ${port}:" 1>&2
-   ip netns exec ${connector_ns} ss -Menita 1>&2 -o "dport = 
:$port"
-   [ ${listener_ns} != ${connector_ns} ] && cat 
/tmp/${connector_ns}.out
+   mptcp_lib_pr_err_stats "${listener_ns}" "${connector_ns}" 
"${port}" \
+   "/tmp/${listener_ns}.out" "/tmp/${connector_ns}.out"
 
echo
cat "$capout"
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh 
b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index 
c07e2bd3a315aac9c422fed85c3196ec46e060f7..13a3b68181ee14eb628a858e5738094c3c936b74
 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -1039,13 +1039,8 @@ do_transfer()
 
if [ ${rets} -ne 0 ] || [ ${retc} -ne 0 ]; then
fail_test "client exit code $retc, server $rets"
-   echo -e "\nnetns ${listener_ns} socket stat for ${port}:" 1>&2
-   ip netns exec ${listener_ns} ss -Menita 1>&2 -o "sport = :$port"
-   cat /tmp/${listener_ns}.out
-   echo -e "\nnetns ${connector_ns} socket stat for ${port}:" 1>&2
-   ip netns exec ${connector_ns} ss -Menita 1>&2 -o "dport = 
:$port"
-   cat /tmp/${connector_ns}.out
-
+   mptcp_lib_pr_err_stats "${listener_ns}" "${connector_ns}" 
"${port}" \
+   "/tmp/${listener_ns}.out" "/tmp/${connector_ns}.out"
return 1
fi
 
diff --git a/tools/testing/selftests/net/mptcp/mptcp_lib.sh 
b/tools/testing/selftests/net/mptcp/mptcp_lib.sh
index 
975d4d4c862afff2e685e86dc08a892dbd09d783..91a1d3b76e664bd95fc36310ac2e2c89bfba1aa1
 100644
--- a/tools/testing/selftests/net/mptcp/mptcp_lib.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_lib.sh
@@ -107,6 +107,27 @@ mptcp_lib_pr_info() {
mptcp_lib_print_info "INFO: ${*}"
 }
 
+# $1-2: listener/connector ns ; $3 port ; $4-5 listener/connector stat file
+mptcp_lib_pr_err_stats() {
+   local lns="${1}"
+   local cns="${2}"
+   local port="${3}"
+   local lstat="${4}"
+   local cstat="${5}"
+
+   echo -en "${MPTCP_LIB_COLOR_RED}"
+   {
+   printf "\nnetns %s (listener) socket stat for %d:\n" "${lns}" 
"${port}"
+   ip netns exec "${lns}" ss -Menita -o "sport = :${port}"
+   cat "${lstat}"
+
+   printf "\nnetns %s (connector) socket stat for %d:\n" "${cns}" 
"${port}"
+   ip netns exec "${cns}" ss -Menita -o "dport = :${port}"
+   [ "${lstat}" != "${cstat}" ] && cat "${cstat}"
+   } 1>&2
+   echo -en "${MPTCP_LIB_COLOR_RESET}"
+}
+
 # SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var can be set when validating 
all
 # features using the last version of the kernel and the selftests to make sure
 # a test is not being skipped by mistake.
diff --git a/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh 
b/tools/testin

[PATCH net-next 2/6] selftests: mptcp: sockopt: save nstat infos

From: Geliang Tang 

Similar to the way nstat information is stored in mptcp_connect.sh
and mptcp_join.sh scripts, this patch adds a similar way for
mptcp_sockopt.sh and displays the nstat information when errors
occur.

Signed-off-by: Geliang Tang 
Reviewed-by: Matthieu Baerts (NGI0) 
Signed-off-by: Matthieu Baerts (NGI0) 
---
 tools/testing/selftests/net/mptcp/mptcp_sockopt.sh | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh 
b/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh
index 
5e8d5b83e2d092879efc179f1a450542be4e575e..9a78bfdc3d5e27fdf6859d34f8d313bd08dd4457
 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh
@@ -169,6 +169,11 @@ do_transfer()
cmsg+=",TCPINQ"
fi
 
+   NSTAT_HISTORY=/tmp/${listener_ns}.nstat ip netns exec ${listener_ns} \
+   nstat -n
+   NSTAT_HISTORY=/tmp/${connector_ns}.nstat ip netns exec ${connector_ns} \
+   nstat -n
+
timeout ${timeout_test} \
ip netns exec ${listener_ns} \
$mptcp_connect -t ${timeout_poll} -l -M 1 -p $port -s 
${srv_proto} -c "${cmsg}" \
@@ -189,14 +194,20 @@ do_transfer()
wait $spid
local rets=$?
 
+   NSTAT_HISTORY=/tmp/${listener_ns}.nstat ip netns exec ${listener_ns} \
+   nstat | grep Tcp > /tmp/${listener_ns}.out
+   NSTAT_HISTORY=/tmp/${connector_ns}.nstat ip netns exec ${connector_ns} \
+   nstat | grep Tcp > /tmp/${connector_ns}.out
+
print_title "Transfer ${ip:2}"
if [ ${rets} -ne 0 ] || [ ${retc} -ne 0 ]; then
mptcp_lib_pr_fail "client exit code $retc, server $rets"
echo -e "\nnetns ${listener_ns} socket stat for ${port}:" 1>&2
ip netns exec ${listener_ns} ss -Menita 1>&2 -o "sport = :$port"
-
+   cat /tmp/${listener_ns}.out
echo -e "\nnetns ${connector_ns} socket stat for ${port}:" 1>&2
ip netns exec ${connector_ns} ss -Menita 1>&2 -o "dport = 
:$port"
+   cat /tmp/${connector_ns}.out
 
mptcp_lib_result_fail "transfer ${ip}"
 

-- 
2.47.1

[PATCH net-next 1/6] selftests: mptcp: simult_flows: unify errors msgs

In order to unify what is printed in case of error, similar to what is
done in mptcp_connect.sh and mptcp_join.sh, it is interesting to do the
following modifications in simult_flows.sh:

- Print the rc errors at the end of the line.

- Print the MIB counters.

- Use the same ss options: add -M (MPTCP sockets) and -e (detailed
  socket information).

While at it, also print of the 'max' time only in case of success,
because 'mptcp_connect.c' will already print this info in case of error,
e.g.:

  transfer slower than expected! runtime 11948 ms, expected 11921 ms

Reviewed-by: Geliang Tang 
Signed-off-by: Matthieu Baerts (NGI0) 
---
 tools/testing/selftests/net/mptcp/simult_flows.sh | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/net/mptcp/simult_flows.sh 
b/tools/testing/selftests/net/mptcp/simult_flows.sh
index 
8fa77c8e9b651171a34c89bfd5c9ded0288a5bde..e98e5907d52c2d0e9c0152efda82176861905cf1
 100755
--- a/tools/testing/selftests/net/mptcp/simult_flows.sh
+++ b/tools/testing/selftests/net/mptcp/simult_flows.sh
@@ -155,6 +155,11 @@ do_transfer()
sleep 1
fi
 
+   NSTAT_HISTORY=/tmp/${ns3}.nstat ip netns exec ${ns3} \
+   nstat -n
+   NSTAT_HISTORY=/tmp/${ns1}.nstat ip netns exec ${ns1} \
+   nstat -n
+
timeout ${timeout_test} \
ip netns exec ${ns3} \
./mptcp_connect -jt ${timeout_poll} -l -p $port -T 
$max_time \
@@ -180,25 +185,31 @@ do_transfer()
kill ${cappid_connector}
fi
 
+   NSTAT_HISTORY=/tmp/${ns3}.nstat ip netns exec ${ns3} \
+   nstat | grep Tcp > /tmp/${ns3}.out
+   NSTAT_HISTORY=/tmp/${ns1}.nstat ip netns exec ${ns1} \
+   nstat | grep Tcp > /tmp/${ns1}.out
+
cmp $sin $cout > /dev/null 2>&1
local cmps=$?
cmp $cin $sout > /dev/null 2>&1
local cmpc=$?
 
-   printf "%-16s" " max $max_time "
if [ $retc -eq 0 ] && [ $rets -eq 0 ] && \
   [ $cmpc -eq 0 ] && [ $cmps -eq 0 ]; then
+   printf "%-16s" " max $max_time "
mptcp_lib_pr_ok
cat "$capout"
return 0
fi
 
-   mptcp_lib_pr_fail
-   echo "client exit code $retc, server $rets" 1>&2
+   mptcp_lib_pr_fail "client exit code $retc, server $rets"
echo -e "\nnetns ${ns3} socket stat for $port:" 1>&2
-   ip netns exec ${ns3} ss -nita 1>&2 -o "sport = :$port"
+   ip netns exec ${ns3} ss -Menita 1>&2 -o "sport = :$port"
+   cat /tmp/${ns3}.out
echo -e "\nnetns ${ns1} socket stat for $port:" 1>&2
-   ip netns exec ${ns1} ss -nita 1>&2 -o "dport = :$port"
+   ip netns exec ${ns1} ss -Menita 1>&2 -o "dport = :$port"
+   cat /tmp/${ns1}.out
ls -l $sin $cout
ls -l $cin $sout
 

-- 
2.47.1

[PATCH net-next 0/6] mptcp: selftests: more debug in case of errors

Here are just a bunch of small improvements for the MPTCP selftests:

Patch 1: Unify errors messages in simult_flows: print MIB and 'ss -Me'.

Patch 2: Unify errors messages in sockopt: print MIB.

Patch 3: Move common code to print debug info to mptcp_lib.sh.

Patch 4: Use 'ss' with '-m' in case of errors.

Patch 5: Remove an unused variable.

Patch 6: Print only the size instead of size + filename again.

Signed-off-by: Matthieu Baerts (NGI0) 
---
Geliang Tang (1):
  selftests: mptcp: sockopt: save nstat infos

Matthieu Baerts (NGI0) (5):
  selftests: mptcp: simult_flows: unify errors msgs
  selftests: mptcp: move stats info in case of errors to lib.sh
  selftests: mptcp: add -m with ss in case of errors
  selftests: mptcp: connect: remove unused variable
  selftests: mptcp: connect: better display the files size

 tools/testing/selftests/net/mptcp/mptcp_connect.sh | 13 -
 tools/testing/selftests/net/mptcp/mptcp_join.sh|  9 ++---
 tools/testing/selftests/net/mptcp/mptcp_lib.sh | 21 +
 tools/testing/selftests/net/mptcp/mptcp_sockopt.sh | 17 -
 tools/testing/selftests/net/mptcp/simult_flows.sh  | 21 ++---
 5 files changed, 53 insertions(+), 28 deletions(-)
---
base-commit: 9c7ad35632297edc08d0f2c7b599137e9fb5f9ff
change-id: 20250114-net-next-mptcp-st-more-debug-err-3f3f1aa15a10

Best regards,
-- 
Matthieu Baerts (NGI0)

[PATCH net 3/3] selftests: mptcp: avoid spurious errors on disconnect

From: Paolo Abeni 

The disconnect test-case generates spurious errors:

  INFO: disconnect
  INFO: extra options: -I 3 -i /tmp/tmp.r43niviyoI
  01 ns1 MPTCP -> ns1 (10.0.1.1:1  ) MPTCP (duration 140ms) [FAIL]
  file received by server does not match (in, out):
  Unexpected revents: POLLERR/POLLNVAL(19)
  -rw-r--r-- 1 root root 10028676 Jan 10 10:47 /tmp/tmp.r43niviyoI.disconnect
  Trailing bytes are:
  ��\R���!8��u2��5N%
  -rw--- 1 root root 9992290 Jan 10 10:47 /tmp/tmp.Os4UbnWbI1
  Trailing bytes are:
  ��\R���!8��u2��5N%
  02 ns1 MPTCP -> ns1 (dead:beef:1::1:10001) MPTCP (duration 206ms) [ OK ]
  03 ns1 MPTCP -> ns1 (dead:beef:1::1:10002) TCP   (duration  31ms) [ OK ]
  04 ns1 TCP   -> ns1 (dead:beef:1::1:10003) MPTCP (duration  26ms) [ OK ]
  [FAIL] Tests of the full disconnection have failed
  Time: 2 seconds

The root cause is actually in the user-space bits: the test program
currently disconnects as soon as all the pending data has been spooled,
generating an FASTCLOSE. If such option reaches the peer before the
latter has reached the closed status, the msk socket will report an
error to the user-space, as per protocol specification, causing the
above failure.

Address the issue explicitly waiting for all the relevant sockets to
reach a closed status before performing the disconnect.

Fixes: 05be5e273c84 ("selftests: mptcp: add disconnect tests")
Cc: sta...@vger.kernel.org
Signed-off-by: Paolo Abeni 
Reviewed-by: Matthieu Baerts (NGI0) 
Signed-off-by: Matthieu Baerts (NGI0) 
---
 tools/testing/selftests/net/mptcp/mptcp_connect.c | 43 +--
 1 file changed, 32 insertions(+), 11 deletions(-)

diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c 
b/tools/testing/selftests/net/mptcp/mptcp_connect.c
index 
4209b95690394b7565f330a37606bf39b6d2d228..414addef9a4514c489ecd09249143fe0ce2af649
 100644
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.c
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c
@@ -25,6 +25,8 @@
 #include 
 #include 
 
+#include 
+
 #include 
 #include 
 
@@ -1211,23 +1213,42 @@ static void parse_setsock_options(const char *name)
exit(1);
 }
 
-void xdisconnect(int fd, int addrlen)
+void xdisconnect(int fd)
 {
-   struct sockaddr_storage empty;
+   socklen_t addrlen = sizeof(struct sockaddr_storage);
+   struct sockaddr_storage addr, empty;
int msec_sleep = 10;
-   int queued = 1;
-   int i;
+   void *raw_addr;
+   int i, cmdlen;
+   char cmd[128];
+
+   /* get the local address and convert it to string */
+   if (getsockname(fd, (struct sockaddr *)&addr, &addrlen) < 0)
+   xerror("getsockname");
+
+   if (addr.ss_family == AF_INET)
+   raw_addr = &(((struct sockaddr_in *)&addr)->sin_addr);
+   else if (addr.ss_family == AF_INET6)
+   raw_addr = &(((struct sockaddr_in6 *)&addr)->sin6_addr);
+   else
+   xerror("bad family");
+
+   strcpy(cmd, "ss -M | grep -q ");
+   cmdlen = strlen(cmd);
+   if (!inet_ntop(addr.ss_family, raw_addr, &cmd[cmdlen],
+  sizeof(cmd) - cmdlen))
+   xerror("inet_ntop");
 
shutdown(fd, SHUT_WR);
 
-   /* while until the pending data is completely flushed, the later
+   /*
+* wait until the pending data is completely flushed and all
+* the MPTCP sockets reached the closed status.
 * disconnect will bypass/ignore/drop any pending data.
 */
for (i = 0; ; i += msec_sleep) {
-   if (ioctl(fd, SIOCOUTQ, &queued) < 0)
-   xerror("can't query out socket queue: %d", errno);
-
-   if (!queued)
+   /* closed socket are not listed by 'ss' */
+   if (system(cmd) != 0)
break;
 
if (i > poll_timeout)
@@ -1281,9 +1302,9 @@ int main_loop(void)
return ret;
 
if (cfg_truncate > 0) {
-   xdisconnect(fd, peer->ai_addrlen);
+   xdisconnect(fd);
} else if (--cfg_repeat > 0) {
-   xdisconnect(fd, peer->ai_addrlen);
+   xdisconnect(fd);
 
/* the socket could be unblocking at this point, we need the
 * connect to be blocking

-- 
2.47.1

[PATCH net 2/3] mptcp: fix spurious wake-up on under memory pressure

From: Paolo Abeni 

The wake-up condition currently implemented by mptcp_epollin_ready()
is wrong, as it could mark the MPTCP socket as readable even when
no data are present and the system is under memory pressure.

Explicitly check for some data being available in the receive queue.

Fixes: 5684ab1a0eff ("mptcp: give rcvlowat some love")
Cc: sta...@vger.kernel.org
Signed-off-by: Paolo Abeni 
Reviewed-by: Matthieu Baerts (NGI0) 
Signed-off-by: Matthieu Baerts (NGI0) 
---
 net/mptcp/protocol.h | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 
a93e661ef5c435155066ce9cc109092661f0711c..73526f1d768fcb6ba5bcf1a43eb09ed5ff9d67bf
 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -760,10 +760,15 @@ static inline u64 mptcp_data_avail(const struct 
mptcp_sock *msk)
 
 static inline bool mptcp_epollin_ready(const struct sock *sk)
 {
+   u64 data_avail = mptcp_data_avail(mptcp_sk(sk));
+
+   if (!data_avail)
+   return false;
+
/* mptcp doesn't have to deal with small skbs in the receive queue,
-* at it can always coalesce them
+* as it can always coalesce them
 */
-   return (mptcp_data_avail(mptcp_sk(sk)) >= sk->sk_rcvlowat) ||
+   return (data_avail >= sk->sk_rcvlowat) ||
   (mem_cgroup_sockets_enabled && sk->sk_memcg &&
mem_cgroup_under_socket_pressure(sk->sk_memcg)) ||
   READ_ONCE(tcp_memory_pressure);

-- 
2.47.1

[PATCH net 1/3] mptcp: be sure to send ack when mptcp-level window re-opens

From: Paolo Abeni 

mptcp_cleanup_rbuf() is responsible to send acks when the user-space
reads enough data to update the receive windows significantly.

It tries hard to avoid acquiring the subflow sockets locks by checking
conditions similar to the ones implemented at the TCP level.

To avoid too much code duplication - the MPTCP protocol can't reuse the
TCP helpers as part of the relevant status is maintained into the msk
socket - and multiple costly window size computation, mptcp_cleanup_rbuf
uses a rough estimate for the most recently advertised window size:
the MPTCP receive free space, as recorded as at last-ack time.

Unfortunately the above does not allow mptcp_cleanup_rbuf() to detect
a zero to non-zero win change in some corner cases, skipping the
tcp_cleanup_rbuf call and leaving the peer stuck.

After commit ea66758c1795 ("tcp: allow MPTCP to update the announced
window"), MPTCP has actually cheap access to the announced window value.
Use it in mptcp_cleanup_rbuf() for a more accurate ack generation.

Fixes: e3859603ba13 ("mptcp: better msk receive window updates")
Cc: sta...@vger.kernel.org
Reported-by: Jakub Kicinski 
Closes: https://lore.kernel.org/20250107131845.5e5de...@kernel.org
Signed-off-by: Paolo Abeni 
Reviewed-by: Matthieu Baerts (NGI0) 
Signed-off-by: Matthieu Baerts (NGI0) 
---
 net/mptcp/options.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 
a62bc874bf1e17c4ee3b4d8d94eef4d0e7c9f272..123f3f2972841aab9fc2ef338687b4c4758efd4a
 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -607,7 +607,6 @@ static bool mptcp_established_options_dss(struct sock *sk, 
struct sk_buff *skb,
}
opts->ext_copy.use_ack = 1;
opts->suboptions = OPTION_MPTCP_DSS;
-   WRITE_ONCE(msk->old_wspace, __mptcp_space((struct sock *)msk));
 
/* Add kind/length/subtype/flag overhead if mapping is not populated */
if (dss_size == 0)
@@ -1288,7 +1287,7 @@ static void mptcp_set_rwin(struct tcp_sock *tp, struct 
tcphdr *th)
}
MPTCP_INC_STATS(sock_net(ssk), 
MPTCP_MIB_RCVWNDCONFLICT);
}
-   return;
+   goto update_wspace;
}
 
if (rcv_wnd_new != rcv_wnd_old) {
@@ -1313,6 +1312,9 @@ static void mptcp_set_rwin(struct tcp_sock *tp, struct 
tcphdr *th)
th->window = htons(new_win);
MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_RCVWNDSHARED);
}
+
+update_wspace:
+   WRITE_ONCE(msk->old_wspace, tp->rcv_wnd);
 }
 
 __sum16 __mptcp_make_csum(u64 data_seq, u32 subflow_seq, u16 data_len, __wsum 
sum)

-- 
2.47.1

[PATCH net 0/3] mptcp: fixes for connect selftest flakes

Last week, Jakub reported [1] that the MPTCP Connect selftest was
unstable. It looked like it started after the introduction of some fixes
[2]. After analysis from Paolo, these patches revealed existing bugs,
that should be fixed by the following patches.

- Patch 1: Make sure ACK are sent when MPTCP-level window re-opens. In
  some corner cases, the other peer was not notified when more data
  could be sent. A fix for v5.11, but depending on a feature introduced
  in v5.19.

- Patch 2: Fix spurious wake-up under memory pressure. In this
  situation, the userspace could be invited to read data not being there
  yet. A fix for v6.7.

- Patch 3: Fix a false positive error when running the MPTCP Connect
  selftest with the "disconnect" cases. The userspace could disconnect
  the socket too soon, which would reset (MP_FASTCLOSE) the connection,
  interpreted as an error by the test. A fix for v5.17.

Link: https://lore.kernel.org/20250107131845.5e5de...@kernel.org [1]
Link: 
https://lore.kernel.org/20241230-net-mptcp-rbuf-fixes-v1-0-8608af434...@kernel.org
 [2]
Signed-off-by: Matthieu Baerts (NGI0) 
---
Paolo Abeni (3):
  mptcp: be sure to send ack when mptcp-level window re-opens
  mptcp: fix spurious wake-up on under memory pressure
  selftests: mptcp: avoid spurious errors on disconnect

 net/mptcp/options.c   |  6 ++--
 net/mptcp/protocol.h  |  9 +++--
 tools/testing/selftests/net/mptcp/mptcp_connect.c | 43 +--
 3 files changed, 43 insertions(+), 15 deletions(-)
---
base-commit: 76201b5979768500bca362871db66d77cb4c225e
change-id: 20250113-net-mptcp-connect-st-flakes-4af6389808de

Best regards,
-- 
Matthieu Baerts (NGI0)

[PATCH bpf-next/net v2 7/7] selftests/bpf: Add mptcp_subflow bpf_iter subtest

From: Geliang Tang 

This patch adds a "cgroup/getsockopt" program "iters_subflow" to test the
newly added mptcp_subflow bpf_iter.

Export mptcp_subflow helpers bpf_iter_mptcp_subflow_new/_next/_destroy,
bpf_mptcp_sock_acquire/_release and other helpers into bpf_experimental.h.

Use bpf_mptcp_sock_acquire() to acquire the msk, then use bpf_for_each()
to walk the subflow list of this msk. From there, future MPTCP-specific
kfunc can be called in the loop. Because they are not there yet, this
test doesn't do anything very useful for the moment, but it focuses on
validating the 'bpf_iter' part and the basic MPTCP kfunc. That's why it
simply adds all subflow ids to local variable local_ids to make sure all
subflows have been seen, then invoke mptcp_subflow_tcp_sock() in the
loop to pick the subflow context.

Out of the loop, use bpf_mptcp_subflow_ctx() to get the subflow context
of the picked subflow context and do some verifications. Finally, assign
local_ids to global variable ids so that the application can obtain this
value, and release the msk.

A related subtest called test_iters_subflow is added to load and verify
the newly added mptcp_subflow type bpf_iter example in test_mptcp. The
endpoint_init() helper is used to add 3 new subflow endpoints. Then one
byte of message is sent to trigger the creation of new subflows.
getsockopt() is invoked once the subflows have been created to trigger
the "cgroup/getsockopt" test program "iters_subflow". skel->bss->ids is
then checked to make sure it equals 10, the sum of each subflow ID: we
should have 4 subflows: 1 + 2 + 3 + 4 = 10. If that's the case, the
bpf_iter loop did the job as expected.

Signed-off-by: Geliang Tang 
Reviewed-by: Matthieu Baerts (NGI0) 
Signed-off-by: Matthieu Baerts (NGI0) 
---
Notes:
 - v2:
   - explicit sk protocol checks are no longer needed, implicitly done
 in bpf_skc_to_mptcp_sock().
   - use bpf_skc_to_mptcp_sock() instead of bpf_mptcp_sk(), and
 mptcp_subflow_tcp_sock() instead of bpf_mptcp_subflow_tcp_sock().
   - bpf_mptcp_subflow_ctx() can now return NULL.
---
 tools/testing/selftests/bpf/bpf_experimental.h |  8 +++
 tools/testing/selftests/bpf/prog_tests/mptcp.c | 73 ++
 tools/testing/selftests/bpf/progs/mptcp_bpf.h  |  9 +++
 .../testing/selftests/bpf/progs/mptcp_bpf_iters.c  | 63 +++
 4 files changed, 153 insertions(+)

diff --git a/tools/testing/selftests/bpf/bpf_experimental.h 
b/tools/testing/selftests/bpf/bpf_experimental.h
index 
cd8ecd39c3f3c68d40c6e3e1465b42ed66537027..2ab3f0063c0fd6091ee19da4787671b89b5661f0
 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -575,6 +575,14 @@ extern int bpf_iter_css_new(struct bpf_iter_css *it,
 extern struct cgroup_subsys_state *bpf_iter_css_next(struct bpf_iter_css *it) 
__weak __ksym;
 extern void bpf_iter_css_destroy(struct bpf_iter_css *it) __weak __ksym;
 
+struct bpf_iter_mptcp_subflow;
+extern int bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it,
+ struct mptcp_sock *msk) __weak __ksym;
+extern struct mptcp_subflow_context *
+bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it) __weak __ksym;
+extern void
+bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it) __weak 
__ksym;
+
 extern int bpf_wq_init(struct bpf_wq *wq, void *p__map, unsigned int flags) 
__weak __ksym;
 extern int bpf_wq_start(struct bpf_wq *wq, unsigned int flags) __weak __ksym;
 extern int bpf_wq_set_callback_impl(struct bpf_wq *wq,
diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c 
b/tools/testing/selftests/bpf/prog_tests/mptcp.c
index 
85f3d4119802a85c86cde7b74a0b857252bad8b8..f37574b5ef68d8f32f8002df317869dfdf1d4b2d
 100644
--- a/tools/testing/selftests/bpf/prog_tests/mptcp.c
+++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c
@@ -11,6 +11,7 @@
 #include "mptcp_sock.skel.h"
 #include "mptcpify.skel.h"
 #include "mptcp_subflow.skel.h"
+#include "mptcp_bpf_iters.skel.h"
 
 #define NS_TEST "mptcp_ns"
 #define ADDR_1 "10.0.1.1"
@@ -33,6 +34,9 @@
 #ifndef MPTCP_INFO
 #define MPTCP_INFO 1
 #endif
+#ifndef TCP_IS_MPTCP
+#define TCP_IS_MPTCP   43  /* Is MPTCP being used? */
+#endif
 #ifndef MPTCP_INFO_FLAG_FALLBACK
 #define MPTCP_INFO_FLAG_FALLBACK   _BITUL(0)
 #endif
@@ -480,6 +484,73 @@ static void test_subflow(void)
close(cgroup_fd);
 }
 
+static void run_iters_subflow(void)
+{
+   int server_fd, client_fd;
+   int is_mptcp, err;
+   socklen_t len;
+
+   server_fd = start_mptcp_server(AF_INET, ADDR_1, PORT_1, 0);
+   if (!ASSERT_OK_FD(server_fd, "start_mptcp_server"))
+   return;
+
+   client_fd = connect_to_fd(server_fd, 0);
+   if (!ASSERT_OK_FD(client_fd, "connect_to_

[PATCH bpf-next/net v2 6/7] selftests/bpf: More endpoints for endpoint_init

From: Geliang Tang 

This patch changes ADDR_2 from "10.0.1.2" to "10.0.2.1", and adds two more
IPv4 test addresses ADDR_3 - ADDR_4, four IPv6 addresses ADDR6_1 - ADDR6_4.
Add a new helper address_init() to initialize all these addresses.

Add a new parameter "endpoints" for endpoint_init() to control how many
endpoints are used for the tests. This makes it more flexible. Update the
parameters of endpoint_init() in test_subflow().

Signed-off-by: Geliang Tang 
Reviewed-by: Mat Martineau 
Signed-off-by: Matthieu Baerts (NGI0) 
---
 tools/testing/selftests/bpf/prog_tests/mptcp.c | 56 +++---
 1 file changed, 50 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c 
b/tools/testing/selftests/bpf/prog_tests/mptcp.c
index 
f8eb7f9d4fd20bbb7ee018728f7ae0f0a09d4d30..85f3d4119802a85c86cde7b74a0b857252bad8b8
 100644
--- a/tools/testing/selftests/bpf/prog_tests/mptcp.c
+++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c
@@ -14,7 +14,13 @@
 
 #define NS_TEST "mptcp_ns"
 #define ADDR_1 "10.0.1.1"
-#define ADDR_2 "10.0.1.2"
+#define ADDR_2 "10.0.2.1"
+#define ADDR_3 "10.0.3.1"
+#define ADDR_4 "10.0.4.1"
+#define ADDR6_1"dead:beef:1::1"
+#define ADDR6_2"dead:beef:2::1"
+#define ADDR6_3"dead:beef:3::1"
+#define ADDR6_4"dead:beef:4::1"
 #define PORT_1 10001
 
 #ifndef IPPROTO_MPTCP
@@ -322,22 +328,60 @@ static void test_mptcpify(void)
close(cgroup_fd);
 }
 
-static int endpoint_init(char *flags)
+static int address_init(void)
 {
SYS(fail, "ip -net %s link add veth1 type veth peer name veth2", 
NS_TEST);
SYS(fail, "ip -net %s addr add %s/24 dev veth1", NS_TEST, ADDR_1);
+   SYS(fail, "ip -net %s addr add %s/64 dev veth1 nodad", NS_TEST, 
ADDR6_1);
SYS(fail, "ip -net %s link set dev veth1 up", NS_TEST);
SYS(fail, "ip -net %s addr add %s/24 dev veth2", NS_TEST, ADDR_2);
+   SYS(fail, "ip -net %s addr add %s/64 dev veth2 nodad", NS_TEST, 
ADDR6_2);
SYS(fail, "ip -net %s link set dev veth2 up", NS_TEST);
-   if (SYS_NOFAIL("ip -net %s mptcp endpoint add %s %s", NS_TEST, ADDR_2, 
flags)) {
+
+   SYS(fail, "ip -net %s link add veth3 type veth peer name veth4", 
NS_TEST);
+   SYS(fail, "ip -net %s addr add %s/24 dev veth3", NS_TEST, ADDR_3);
+   SYS(fail, "ip -net %s addr add %s/64 dev veth3 nodad", NS_TEST, 
ADDR6_3);
+   SYS(fail, "ip -net %s link set dev veth3 up", NS_TEST);
+   SYS(fail, "ip -net %s addr add %s/24 dev veth4", NS_TEST, ADDR_4);
+   SYS(fail, "ip -net %s addr add %s/64 dev veth4 nodad", NS_TEST, 
ADDR6_4);
+   SYS(fail, "ip -net %s link set dev veth4 up", NS_TEST);
+
+   return 0;
+fail:
+   return -1;
+}
+
+static int endpoint_add(char *addr, char *flags)
+{
+   return SYS_NOFAIL("ip -net %s mptcp endpoint add %s %s", NS_TEST, addr, 
flags);
+}
+
+static int endpoint_init(char *flags, u8 endpoints)
+{
+   int ret = -1;
+
+   if (!endpoints || endpoints > 4)
+   goto fail;
+
+   if (address_init())
+   goto fail;
+
+   if (SYS_NOFAIL("ip -net %s mptcp limits set add_addr_accepted 4 
subflows 4",
+  NS_TEST)) {
printf("'ip mptcp' not supported, skip this test.\n");
test__skip();
goto fail;
}
 
-   return 0;
+   if (endpoints > 1)
+   ret = endpoint_add(ADDR_2, flags);
+   if (endpoints > 2)
+   ret = ret ?: endpoint_add(ADDR_3, flags);
+   if (endpoints > 3)
+   ret = ret ?: endpoint_add(ADDR_4, flags);
+
 fail:
-   return -1;
+   return ret;
 }
 
 static void wait_for_new_subflows(int fd)
@@ -423,7 +467,7 @@ static void test_subflow(void)
if (!ASSERT_OK_PTR(netns, "netns_new: mptcp_subflow"))
goto skel_destroy;
 
-   if (endpoint_init("subflow") < 0)
+   if (endpoint_init("subflow", 2) < 0)
goto close_netns;
 
run_subflow();

-- 
2.47.1

[PATCH bpf-next/net v2 5/7] bpf: Acquire and release mptcp socket

From: Geliang Tang 

The KF_TRUSTED_ARGS flag is used for bpf_iter_mptcp_subflow_new, it
indicates that the all pointer arguments are valid. It's necessary to
add a KF_ACQUIRE helper to get valid "msk".

This patch adds bpf_mptcp_sock_acquire() and bpf_mptcp_sock_release()
helpers for this. Increase sk->sk_refcnt in _acquire() and decrease it
in _release(). Register them with KF_ACQUIRE flag and KF_RELEASE flag.

Signed-off-by: Geliang Tang 
Reviewed-by: Mat Martineau 
Signed-off-by: Matthieu Baerts (NGI0) 
---
 net/mptcp/bpf.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
index 
e39f0e4fb683c1aa31ee075281daee218dac5878..d50bd1ea7f6d0ff1abff32deef9a98b98ee8f42c
 100644
--- a/net/mptcp/bpf.c
+++ b/net/mptcp/bpf.c
@@ -97,6 +97,23 @@ bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow 
*it)
 {
 }
 
+__bpf_kfunc static struct
+mptcp_sock *bpf_mptcp_sock_acquire(struct mptcp_sock *msk)
+{
+   struct sock *sk = (struct sock *)msk;
+
+   if (sk && refcount_inc_not_zero(&sk->sk_refcnt))
+   return msk;
+   return NULL;
+}
+
+__bpf_kfunc static void bpf_mptcp_sock_release(struct mptcp_sock *msk)
+{
+   struct sock *sk = (struct sock *)msk;
+
+   WARN_ON_ONCE(!sk || !refcount_dec_not_one(&sk->sk_refcnt));
+}
+
 __bpf_kfunc_end_defs();
 
 BTF_KFUNCS_START(bpf_mptcp_common_kfunc_ids)
@@ -104,6 +121,8 @@ BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx, KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
 BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next, KF_ITER_NEXT | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy, KF_ITER_DESTROY)
+BTF_ID_FLAGS(func, bpf_mptcp_sock_acquire, KF_ACQUIRE | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_mptcp_sock_release, KF_RELEASE)
 BTF_KFUNCS_END(bpf_mptcp_common_kfunc_ids)
 
 static const struct btf_kfunc_id_set bpf_mptcp_common_kfunc_set = {

-- 
2.47.1

[PATCH bpf-next/net v2 4/7] bpf: Add mptcp_subflow bpf_iter

From: Geliang Tang 

It's necessary to traverse all subflows on the conn_list of an MPTCP
socket and then call kfunc to modify the fields of each subflow. In
kernel space, mptcp_for_each_subflow() helper is used for this:

mptcp_for_each_subflow(msk, subflow)
kfunc(subflow);

But in the MPTCP BPF program, this has not yet been implemented. As
Martin suggested recently, this conn_list walking + modify-by-kfunc
usage fits the bpf_iter use case.

So this patch adds a new bpf_iter type named "mptcp_subflow" to do
this and implements its helpers bpf_iter_mptcp_subflow_new()/_next()/
_destroy(). And register these bpf_iter mptcp_subflow into mptcp
common kfunc set. Then bpf_for_each() for mptcp_subflow can be used
in BPF program like this:

bpf_for_each(mptcp_subflow, subflow, msk)
kfunc(subflow);

Suggested-by: Martin KaFai Lau 
Signed-off-by: Geliang Tang 
Reviewed-by: Mat Martineau 
Signed-off-by: Matthieu Baerts (NGI0) 
---
Notes:
 - v2:
   - Add BUILD_BUG_ON() checks, similar to the ones done with other
 bpf_iter_(...) helpers.
   - Replace msk_owned_by_me() by sock_owned_by_user_nocheck() and
 !spin_is_locked() (Martin).

A few versions of this single patch have been previously posted to the
BPF mailing list by Geliang, before continuing to the MPTCP mailing list
only, with other patches of this series. The version of the whole series
has been reset to 1, but here is the ChangeLog for the previous ones:
 - v2: remove msk->pm.lock in _new() and _destroy() (Martin)
   drop DEFINE_BPF_ITER_FUNC, change opaque[3] to opaque[2] (Andrii)
 - v3: drop bpf_iter__mptcp_subflow
 - v4: if msk is NULL, initialize kit->msk to NULL in _new() and check
   it in _next() (Andrii)
 - v5: use list_is_last() instead of list_entry_is_head() add
   KF_ITER_NEW/NEXT/DESTROY flags add msk_owned_by_me in _new()
 - v6: add KF_TRUSTED_ARGS flag (Andrii, Martin)
---
 net/mptcp/bpf.c | 53 +
 1 file changed, 53 insertions(+)

diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
index 
c5bfd84c16c43230d9d8e1fd8ff781a767e647b5..e39f0e4fb683c1aa31ee075281daee218dac5878
 100644
--- a/net/mptcp/bpf.c
+++ b/net/mptcp/bpf.c
@@ -35,6 +35,15 @@ static const struct btf_kfunc_id_set bpf_mptcp_fmodret_set = 
{
.set   = &bpf_mptcp_fmodret_ids,
 };
 
+struct bpf_iter_mptcp_subflow {
+   __u64 __opaque[2];
+} __aligned(8);
+
+struct bpf_iter_mptcp_subflow_kern {
+   struct mptcp_sock *msk;
+   struct list_head *pos;
+} __aligned(8);
+
 __bpf_kfunc_start_defs();
 
 __bpf_kfunc static struct mptcp_subflow_context *
@@ -47,10 +56,54 @@ bpf_mptcp_subflow_ctx(const struct sock *sk)
return NULL;
 }
 
+__bpf_kfunc static int
+bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it,
+  struct mptcp_sock *msk)
+{
+   struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
+   struct sock *sk = (struct sock *)msk;
+
+   BUILD_BUG_ON(sizeof(struct bpf_iter_mptcp_subflow_kern) >
+sizeof(struct bpf_iter_mptcp_subflow));
+   BUILD_BUG_ON(__alignof__(struct bpf_iter_mptcp_subflow_kern) !=
+__alignof__(struct bpf_iter_mptcp_subflow));
+
+   kit->msk = msk;
+   if (!msk)
+   return -EINVAL;
+
+   if (!sock_owned_by_user_nocheck(sk) &&
+   !spin_is_locked(&sk->sk_lock.slock))
+   return -EINVAL;
+
+   kit->pos = &msk->conn_list;
+   return 0;
+}
+
+__bpf_kfunc static struct mptcp_subflow_context *
+bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it)
+{
+   struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
+
+   if (!kit->msk || list_is_last(kit->pos, &kit->msk->conn_list))
+   return NULL;
+
+   kit->pos = kit->pos->next;
+   return list_entry(kit->pos, struct mptcp_subflow_context, node);
+}
+
+__bpf_kfunc static void
+bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it)
+{
+}
+
 __bpf_kfunc_end_defs();
 
 BTF_KFUNCS_START(bpf_mptcp_common_kfunc_ids)
 BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx, KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next, KF_ITER_NEXT | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy, KF_ITER_DESTROY)
 BTF_KFUNCS_END(bpf_mptcp_common_kfunc_ids)
 
 static const struct btf_kfunc_id_set bpf_mptcp_common_kfunc_set = {

-- 
2.47.1

[PATCH bpf-next/net v2 3/7] bpf: Register mptcp common kfunc set

From: Geliang Tang 

MPTCP helper mptcp_sk() is used to convert struct sock to mptcp_sock.
Helpers mptcp_subflow_ctx() and mptcp_subflow_tcp_sock() are used to
convert between struct mptcp_subflow_context and sock. They all will
be used in MPTCP BPF programs too.

This patch defines corresponding wrappers of them, and put the
wrappers into mptcp common kfunc set and register the set with the
flag BPF_PROG_TYPE_UNSPEC to let them accessible to all types of BPF
programs.

Signed-off-by: Geliang Tang 
Reviewed-by: Mat Martineau 
Reviewed-by: Matthieu Baerts (NGI0) 
Signed-off-by: Matthieu Baerts (NGI0) 
---
Notes:
 - v2:
   - Thanks to the two new previous patches, bpf_mptcp_sk() and
 bpf_mptcp_subflow_tcp_sock() are no longer needed.
   - bpf_mptcp_subflow_ctx(): make sure the socket is an MPTCP subflow,
 and add KF_RET_NULL (Martin).
   - Restrict this kfunc to BPF_PROG_TYPE_CGROUP_SOCKOPT for the moment.
---
 net/mptcp/bpf.c | 31 ++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
index 
988b22a06331ac293b36f3c6f9bc29ff0f6db048..c5bfd84c16c43230d9d8e1fd8ff781a767e647b5
 100644
--- a/net/mptcp/bpf.c
+++ b/net/mptcp/bpf.c
@@ -35,8 +35,37 @@ static const struct btf_kfunc_id_set bpf_mptcp_fmodret_set = 
{
.set   = &bpf_mptcp_fmodret_ids,
 };
 
+__bpf_kfunc_start_defs();
+
+__bpf_kfunc static struct mptcp_subflow_context *
+bpf_mptcp_subflow_ctx(const struct sock *sk)
+{
+   if (sk && sk_fullsock(sk) &&
+   sk->sk_protocol == IPPROTO_TCP && sk_is_mptcp(sk))
+   return mptcp_subflow_ctx(sk);
+
+   return NULL;
+}
+
+__bpf_kfunc_end_defs();
+
+BTF_KFUNCS_START(bpf_mptcp_common_kfunc_ids)
+BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx, KF_RET_NULL)
+BTF_KFUNCS_END(bpf_mptcp_common_kfunc_ids)
+
+static const struct btf_kfunc_id_set bpf_mptcp_common_kfunc_set = {
+   .owner  = THIS_MODULE,
+   .set= &bpf_mptcp_common_kfunc_ids,
+};
+
 static int __init bpf_mptcp_kfunc_init(void)
 {
-   return register_btf_fmodret_id_set(&bpf_mptcp_fmodret_set);
+   int ret;
+
+   ret = register_btf_fmodret_id_set(&bpf_mptcp_fmodret_set);
+   ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCKOPT,
+  &bpf_mptcp_common_kfunc_set);
+
+   return ret;
 }
 late_initcall(bpf_mptcp_kfunc_init);

-- 
2.47.1

[PATCH bpf-next/net v2 2/7] bpf: Allow use of skc_to_mptcp_sock in cg_sockopt

From: Geliang Tang 

Currently, bpf_skc_to_mptcp_sock() helper is not allowed to be used
in cg_sockopt. This patch adds this permission.

Thanks to the previous patch allowing skc_to_mptcp_sock() to be used
with MPTCP sockets, this permission allows this helper to be use it in
CGroup BPF hooks, e.g. [gs]etsocktopt.

Signed-off-by: Geliang Tang 
Reviewed-by: Matthieu Baerts (NGI0) 
Signed-off-by: Matthieu Baerts (NGI0) 
---
Notes:
 - v2: new patch.
---
 kernel/bpf/cgroup.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 
46e5db65dbc8d8c6591b53dfc77bb689357f33ea..1ca22e4842cf6b6c6bdbc37bf415187e706b2b69
 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -2358,6 +2358,8 @@ cg_sockopt_func_proto(enum bpf_func_id func_id, const 
struct bpf_prog *prog)
 #ifdef CONFIG_INET
case BPF_FUNC_tcp_sock:
return &bpf_tcp_sock_proto;
+   case BPF_FUNC_skc_to_mptcp_sock:
+   return &bpf_skc_to_mptcp_sock_proto;
 #endif
case BPF_FUNC_perf_event_output:
return &bpf_event_output_data_proto;

-- 
2.47.1

[PATCH bpf-next/net v2 1/7] bpf: Extend bpf_skc_to_mptcp_sock to MPTCP sock

From: Geliang Tang 

Currently, bpf_skc_to_mptcp_sock() can only be used with sockets that
are MPTCP subflows: TCP sockets with tp->is_mptcp, created by the kernel
from an MPTCP socket (IPPROTO_MPTCP). Typically used with BPF sock_ops
operators.

Here, this helper is extended to support MPTCP sockets, the ones created
by the userspace (IPPROTO_MPTCP). This is useful for BPF hooks involving
these sockets, e.g. [gs]etsocktopt.

bpf_skc_to_mptcp_sock() uses bpf_mptcp_sock_from_subflow(). The former
suggests any MPTCP type/subtype can be used, but the latter only accepts
subflow ones. So bpf_mptcp_sock_from_subflow is modified here to support
MPTCP socket, and renamed to avoid confusions.

Signed-off-by: Geliang Tang 
Reviewed-by: Matthieu Baerts (NGI0) 
Signed-off-by: Matthieu Baerts (NGI0) 
---
Notes:
 - v2: new patch.
---
 include/net/mptcp.h |  4 ++--
 net/core/filter.c   |  2 +-
 net/mptcp/bpf.c | 10 --
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index 
814b5f2e3ed5e3e474a2bac5e4cca5a89abcfe1c..94d5976f7b8d8eb8b82f298f7b33ecd653937dc0
 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -322,9 +322,9 @@ static inline void mptcpv6_handle_mapped(struct sock *sk, 
bool mapped) { }
 #endif
 
 #if defined(CONFIG_MPTCP) && defined(CONFIG_BPF_SYSCALL)
-struct mptcp_sock *bpf_mptcp_sock_from_subflow(struct sock *sk);
+struct mptcp_sock *bpf_mptcp_sock_from_sock(struct sock *sk);
 #else
-static inline struct mptcp_sock *bpf_mptcp_sock_from_subflow(struct sock *sk) 
{ return NULL; }
+static inline struct mptcp_sock *bpf_mptcp_sock_from_sock(struct sock *sk) { 
return NULL; }
 #endif
 
 #if !IS_ENABLED(CONFIG_MPTCP)
diff --git a/net/core/filter.c b/net/core/filter.c
index 
6625b3f563a4a3110a76b9c12a46340828e16b6e..40ed3854a899bf9d11847ec0e8aee174923f03d2
 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -11833,7 +11833,7 @@ const struct bpf_func_proto bpf_skc_to_unix_sock_proto 
= {
 BPF_CALL_1(bpf_skc_to_mptcp_sock, struct sock *, sk)
 {
BTF_TYPE_EMIT(struct mptcp_sock);
-   return (unsigned long)bpf_mptcp_sock_from_subflow(sk);
+   return (unsigned long)bpf_mptcp_sock_from_sock(sk);
 }
 
 const struct bpf_func_proto bpf_skc_to_mptcp_sock_proto = {
diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
index 
8a16672b94e2384f5263e1432296cbca1236bb30..988b22a06331ac293b36f3c6f9bc29ff0f6db048
 100644
--- a/net/mptcp/bpf.c
+++ b/net/mptcp/bpf.c
@@ -12,9 +12,15 @@
 #include 
 #include "protocol.h"
 
-struct mptcp_sock *bpf_mptcp_sock_from_subflow(struct sock *sk)
+struct mptcp_sock *bpf_mptcp_sock_from_sock(struct sock *sk)
 {
-   if (sk && sk_fullsock(sk) && sk->sk_protocol == IPPROTO_TCP && 
sk_is_mptcp(sk))
+   if (unlikely(!sk || !sk_fullsock(sk)))
+   return NULL;
+
+   if (sk->sk_protocol == IPPROTO_MPTCP)
+   return mptcp_sk(sk);
+
+   if (sk->sk_protocol == IPPROTO_TCP && sk_is_mptcp(sk))
return mptcp_sk(mptcp_subflow_ctx(sk)->conn);
 
return NULL;

-- 
2.47.1

[PATCH bpf-next/net v2 0/7] bpf: Add mptcp_subflow bpf_iter support

Here is a series from Geliang, adding mptcp_subflow bpf_iter support.

We are working on extending MPTCP with BPF, e.g. to control the path
manager -- in charge of the creation, deletion, and announcements of
subflows (paths) -- and the packet scheduler -- in charge of selecting
which available path the next data will be sent to. These extensions
need to iterate over the list of subflows attached to an MPTCP
connection, and do some specific actions via some new kfunc that will be
added later on.

This preparation work is split in different patches:

- Patch 1: extend bpf_skc_to_mptcp_sock() to be called with msk.

- Patch 2: allow using skc_to_mptcp_sock() in CGroup sockopt hooks.

- Patch 3: register some "basic" MPTCP kfunc.

- Patch 4: add mptcp_subflow bpf_iter support. Note that previous
versions of this single patch have already been shared to the
BPF mailing list. The changelog has been kept with a comment,
but the version number has been reset to avoid confusions.

- Patch 5: add kfunc to make sure the msk is valid

- Patch 6: add more MPTCP endpoints in the selftests, in order to create
more than 2 subflows.

- Patch 7: add a very simple test validating mptcp_subflow bpf_iter
support. This test could be written without the new bpf_iter,
but it is there only to make sure this specific feature works
as expected.

Signed-off-by: Matthieu Baerts (NGI0)
---
Changes in v2:
- Patches 1-2: new ones.
- Patch 3: remove two kfunc, more restrictions. (Martin)
- Patch 4: add BUILD_BUG_ON(), more restrictions. (Martin)
- Patch 7: adaptations due to modifications in patches 1-4.
- Link to v1:
https://lore.kernel.org/r/20241108-bpf-next-net-mptcp-bpf_iter-subflows-v1-0-cf1695303...@kernel.org

---
Geliang Tang (7):
bpf: Extend bpf_skc_to_mptcp_sock to MPTCP sock
bpf: Allow use of skc_to_mptcp_sock in cg_sockopt
bpf: Register mptcp common kfunc set
bpf: Add mptcp_subflow bpf_iter
bpf: Acquire and release mptcp socket
selftests/bpf: More endpoints for endpoint_init
selftests/bpf: Add mptcp_subflow bpf_iter subtest

include/net/mptcp.h| 4 +-
kernel/bpf/cgroup.c| 2 +
net/core/filter.c | 2 +-
net/mptcp/bpf.c| 113 +-
tools/testing/selftests/bpf/bpf_experimental.h | 8 ++
tools/testing/selftests/bpf/prog_tests/mptcp.c | 129 -
tools/testing/selftests/bpf/progs/mptcp_bpf.h | 9 ++
.../testing/selftests/bpf/progs/mptcp_bpf_iters.c | 63 ++
8 files changed, 318 insertions(+), 12 deletions(-)
---
base-commit: dad704ebe38642cd405e15b9c51263356391355c
change-id: 20241108-bpf-next-net-mptcp-bpf_iter-subflows-027f6d87770e

Best regards,
--
Matthieu Baerts (NGI0)

Re: [PATCH net-next v12 11/22] ovpn: implement TCP transport

2024-12-09 Thread Matthieu Baerts

On 09/12/2024 15:08, Antonio Quartulli wrote:
> On 09/12/2024 12:31, Matthieu Baerts wrote:
>> On 09/12/2024 11:58, Antonio Quartulli wrote:
>>> On 09/12/2024 11:46, Matthieu Baerts wrote:
>>>> Hi Antonio,
>>>>
>>>> Thank you for working on this, and sharing your work here!
>>>>
>>>> On 05/12/2024 00:09, Antonio Quartulli wrote:
>>>>> On 04/12/2024 23:52, Antonio Quartulli wrote:
>>>>>> Paolo,
>>>>>>
>>>>>> On 04/12/2024 12:15, Antonio Quartulli wrote:
>>>>>> [...]
>>>>>>>>> +    mutex_lock(&tcp6_prot_mutex);
>>>>>>>>> +    if (!ovpn_tcp6_prot.recvmsg)
>>>>>>>>> +    ovpn_tcp_build_protos(&ovpn_tcp6_prot,
>>>>>>>>> &ovpn_tcp6_ops,
>>>>>>>>> +  sock->sk->sk_prot,
>>>>>>>>> +  sock->sk->sk_socket->ops);
>>>>>>>>> +    mutex_unlock(&tcp6_prot_mutex);
>>>>>>>>
>>>>>>>> This looks like an hack to avoid a build dependency on IPV6, I
>>>>>>>> think
>>>>>>>> the
>>>>>>>> explicit
>>>>>>>
>>>>>>> I happily copied this approach from
>>>>>>> espintcp.c:espintcp_init_sk() :-D
>>>>>>>
>>>>>>>>
>>>>>>>> #if IS_ENABLED(CONFIG_IPV6)
>>>>>>>>
>>>>>>>> at init time should be preferable
>>>>>>
>>>>>> To get this done at init time I need inet6_stream_ops to be
>>>>>> accessible, but it seems there is no EXPORT_SYMBOL() for this object.
>>>>>>
>>>>>> However, I see that mptcp/protocol.c is happily accessing it.
>>>>>> Any clue how this is possible?
>>>>>
>>>>> I answer myself: mptcp is not tristate and it can only be compiled as
>>>>> built-in.
>>>>
>>>> Indeed, that's why.
>>>>
>>>> Talking about MPTCP, by chance, do you plan to support it later on? :)
>>>
>>> Hi Matthieu,
>>>
>>> It is not on our current roadmap (TCP doesn't get much love in the VPN
>>> world), but I agree it could be an interesting option to explore!
>>
>> I understand, it makes sense not to recommend using TCP for the
>> transport layer for tunnelling solutions.
>>
>>> I have to admit that I haven't played much with MPTCP myself yet, but I
>>> am more than happy to talk about potential advantages for the ovpn use
>>> case.
>>
>> Some people told me they were interested in using OpenVPN with MPTCP to
>> use multiple (low-capacity) network links at the same time. I think
>> intercepting and proxying TCP traffic would always be the best in terms
>> of performances, but using OpenVPN with MPTCP seems to be enough for
>> some, especially when they want to "improve" some type of UDP traffic
>> that cannot be intercepted: QUIC, VPN, etc.
>>
>> I don't have numbers to share, but I can understand this feature can
>> help in some cases.
> 
> Yeah, some people may definitely benefit from this feature.
> I'll have a look at MPTCP once ovpn is merged.

Thank you :)

Don't hesitate to email the ML, or open an issue on the GitHub repo if
needed!

(More details: https://www.mptcp.dev/#communication)

>> (This reminds me this: https://github.com/OpenVPN/ovpn-dco/issues/60)
>> (and this: https://github.com/arinc9/openvpn/pull/1)
> 
> Right, this definitely shows some interest and it means we should easily
> find people willing to test :-)
Indeed!

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH net-next v12 11/22] ovpn: implement TCP transport

2024-12-09 Thread Matthieu Baerts

On 09/12/2024 11:58, Antonio Quartulli wrote:
> On 09/12/2024 11:46, Matthieu Baerts wrote:
>> Hi Antonio,
>>
>> Thank you for working on this, and sharing your work here!
>>
>> On 05/12/2024 00:09, Antonio Quartulli wrote:
>>> On 04/12/2024 23:52, Antonio Quartulli wrote:
>>>> Paolo,
>>>>
>>>> On 04/12/2024 12:15, Antonio Quartulli wrote:
>>>> [...]
>>>>>>> +    mutex_lock(&tcp6_prot_mutex);
>>>>>>> +    if (!ovpn_tcp6_prot.recvmsg)
>>>>>>> +    ovpn_tcp_build_protos(&ovpn_tcp6_prot, &ovpn_tcp6_ops,
>>>>>>> +  sock->sk->sk_prot,
>>>>>>> +  sock->sk->sk_socket->ops);
>>>>>>> +    mutex_unlock(&tcp6_prot_mutex);
>>>>>>
>>>>>> This looks like an hack to avoid a build dependency on IPV6, I think
>>>>>> the
>>>>>> explicit
>>>>>
>>>>> I happily copied this approach from espintcp.c:espintcp_init_sk() :-D
>>>>>
>>>>>>
>>>>>> #if IS_ENABLED(CONFIG_IPV6)
>>>>>>
>>>>>> at init time should be preferable
>>>>
>>>> To get this done at init time I need inet6_stream_ops to be
>>>> accessible, but it seems there is no EXPORT_SYMBOL() for this object.
>>>>
>>>> However, I see that mptcp/protocol.c is happily accessing it.
>>>> Any clue how this is possible?
>>>
>>> I answer myself: mptcp is not tristate and it can only be compiled as
>>> built-in.
>>
>> Indeed, that's why.
>>
>> Talking about MPTCP, by chance, do you plan to support it later on? :)
> 
> Hi Matthieu,
> 
> It is not on our current roadmap (TCP doesn't get much love in the VPN
> world), but I agree it could be an interesting option to explore!

I understand, it makes sense not to recommend using TCP for the
transport layer for tunnelling solutions.

> I have to admit that I haven't played much with MPTCP myself yet, but I
> am more than happy to talk about potential advantages for the ovpn use
> case.

Some people told me they were interested in using OpenVPN with MPTCP to
use multiple (low-capacity) network links at the same time. I think
intercepting and proxying TCP traffic would always be the best in terms
of performances, but using OpenVPN with MPTCP seems to be enough for
some, especially when they want to "improve" some type of UDP traffic
that cannot be intercepted: QUIC, VPN, etc.

I don't have numbers to share, but I can understand this feature can
help in some cases.

(This reminds me this: https://github.com/OpenVPN/ovpn-dco/issues/60)
(and this: https://github.com/arinc9/openvpn/pull/1)

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH net-next v12 11/22] ovpn: implement TCP transport

2024-12-09 Thread Matthieu Baerts

Hi Antonio,

Thank you for working on this, and sharing your work here!

On 05/12/2024 00:09, Antonio Quartulli wrote:
> On 04/12/2024 23:52, Antonio Quartulli wrote:
>> Paolo,
>>
>> On 04/12/2024 12:15, Antonio Quartulli wrote:
>> [...]
> +    mutex_lock(&tcp6_prot_mutex);
> +    if (!ovpn_tcp6_prot.recvmsg)
> +    ovpn_tcp_build_protos(&ovpn_tcp6_prot, &ovpn_tcp6_ops,
> +  sock->sk->sk_prot,
> +  sock->sk->sk_socket->ops);
> +    mutex_unlock(&tcp6_prot_mutex);

 This looks like an hack to avoid a build dependency on IPV6, I think
 the
 explicit
>>>
>>> I happily copied this approach from espintcp.c:espintcp_init_sk() :-D
>>>

 #if IS_ENABLED(CONFIG_IPV6)

 at init time should be preferable
>>
>> To get this done at init time I need inet6_stream_ops to be
>> accessible, but it seems there is no EXPORT_SYMBOL() for this object.
>>
>> However, I see that mptcp/protocol.c is happily accessing it.
>> Any clue how this is possible?
> 
> I answer myself: mptcp is not tristate and it can only be compiled as
> built-in.

Indeed, that's why.

Talking about MPTCP, by chance, do you plan to support it later on? :)

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: LKFT CI: improving Networking selftests results when validating stable kernels

2024-11-15 Thread Matthieu Baerts

Hi Dan,

On 15/11/2024 14:07, Dan Carpenter wrote:
> On Fri, Nov 15, 2024 at 01:43:14PM +0100, Matthieu Baerts wrote:
>> Regarding the other questions from my previous email -- skipped tests
>> (e.g. I think Netfilter tests are no longer validated), KVM,
>> notifications -- do you know who at Linaro could eventually look at them?
>>
> 
> The skip tests were because they lead to hangs.  We're going to look at those
> again to see if they're still an issue.  And we're also going to try enable 
> the
> other tests you mentioned.

Great, thank you!

For KVM (or similar), I guess it is not available, right? Some
time-sensitive tests might be unstable in such environment, and need to
be skipped.

For the notifications, do not hesitate to contact the corresponding
maintainers, the last people who modified the problematic selftests and
the netdev list. These "net" selftests are now better maintained, and
they are regularly validated on the development branches:

  https://netdev.bots.linux.dev/status.html

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: LKFT CI: improving Networking selftests results when validating stable kernels

2024-11-15 Thread Matthieu Baerts

Hi Dan,

Thank you for your reply!

On 13/11/2024 18:08, Dan Carpenter wrote:
> On Fri, Nov 08, 2024 at 07:21:59PM +0100, Matthieu Baerts wrote:
>> KSelftests from the same version
>> 
>>
>> According to the doc [2], kselftests should support all previous kernel
>> versions. The LKFT CI is then using the kselftests from the last stable
>> release to validate all stable versions. Even if there are good reasons
>> to do that, we would like to ask for an opt-out for this policy for the
>> networking tests: this is hard to maintain with the increased
>> complexity, hard to validate on all stable kernels before applying
>> patches, and hard to put in place in some situations. As a result, many
>> tests are failing on older kernels, and it looks like it is a lot of
>> work to support older kernels, and to maintain this.
>>
>> Many networking tests are validating the internal behaviour that is not
>> exposed to the userspace. A typical example: some tests look at the raw
>> packets being exchanged during a test, and this behaviour can change
>> without modifying how the userspace is interacting with the kernel. The
>> kernel could expose capabilities, but that's not something that seems
>> natural to put in place for internal behaviours that are not exposed to
>> end users. Maybe workarounds could be used, e.g. looking at kernel
>> symbols, etc. Nut that doesn't always work, increase the complexity, and
>> often "false positive" issue will be noticed only after a patch hits
>> stable, and will cause a bunch of tests to be ignored.
>>
>> Regarding fixes, ideally they will come with a new or modified test that
>> can also be backported. So the coverage can continue to grow in stable
>> versions too.
>>
>> Do you think that from the kernel v6.12 (or before?), the LKFT CI could
>> run the networking kselftests from the version that is being validated,
>> and not from a newer one? So validating the selftests from v6.12.1 on a
>> v6.12.1, and not the ones from a future v6.16.y on a v6.12.42.
>>
> 
> These kinds of decisions are something that Greg and Shuah need to decide on.

Thank you, it makes sense.

> You would still need some way to automatically detect that kselftest is 
> running
> on an old kernel and disable the networking checks.  Otherwise when random
> people on the internet try to run selftests they would run into issues.


Indeed. I guess we can always add a warning when the kernel and
selftests versions are different. I suppose the selftests are built
using the same kernel version, then executed on older versions: we could
then compare the kernel versions at build time and run time, no?


Regarding the other questions from my previous email -- skipped tests
(e.g. I think Netfilter tests are no longer validated), KVM,
notifications -- do you know who at Linaro could eventually look at them?


Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: LKFT CI: improving Networking selftests results when validating stable kernels

2024-11-15 Thread Matthieu Baerts

Hi Shuah, Greg,

Thank you for your reply!

On 13/11/2024 19:33, Shuah Khan wrote:
> On 11/8/24 11:21, Matthieu Baerts wrote:
>> Hello LKFT maintainers, CI operators,
>>
>> First, I would like to say thank you to the people behind the LKFT
>> project for validating stable kernels (and more), and including some
>> Network selftests in their tests suites.
>>
>> A lot of improvements around the networking kselftests have been done
>> this year. At the last Netconf [1], we discussed how these tests were
>> validated on stable kernels from CIs like the LKFT one, and we have some
>> suggestions to improve the situation.
>>
>>
>> KSelftests from the same version
>> 
>>
>> According to the doc [2], kselftests should support all previous kernel
>> versions. The LKFT CI is then using the kselftests from the last stable
>> release to validate all stable versions. Even if there are good reasons
>> to do that, we would like to ask for an opt-out for this policy for the
>> networking tests: this is hard to maintain with the increased
>> complexity, hard to validate on all stable kernels before applying
>> patches, and hard to put in place in some situations. As a result, many
>> tests are failing on older kernels, and it looks like it is a lot of
>> work to support older kernels, and to maintain this.
>>
> 
> This is from the Documentation/dev-tools/kselftest.rst:
> 
> Kselftest from mainline can be run on older stable kernels. Running tests
> from mainline offers the best coverage. Several test rings run mainline
> kselftest suite on stable releases. The reason is that when a new test
> gets added to test existing code to regression test a bug, we should be
> able to run that test on an older kernel. Hence, it is important to keep
> code that can still test an older kernel and make sure it skips the test
> gracefully on newer releases.
> 
> 
> As it states, running tests from mainline increases the coverage when new
> tests are added to regression test an existing kernel feature in a stable
> release.
> 
> It also says that when mainline tests are running on an older kernel, the
> test should detect missing features and report skips.
> 
> The above paragraph addresses test developers and users. I would say the
> policy regarding the test development will not change. We want to keep
> it the same, continuing to take measures to skip tests when a feature
> isn't supported in the kernel the tests are running on. This addresses
> not just a kernel and test revision mismatch, but also when a feature
> isn't enabled when kernel and test revisions match.
> 
> This policy helps us find bugs in the tests failing when they should
> skip. If test rings move to a new policy, our ability to find bugs
> like this goes down.
> 
> As per users and test ring maintainers, they need to be aware of the
> reduced coverage if they revision match kernel and tests.
> Revision matching example: 6.11.8 tests on 6.11.8 stable
> 
> Greg KH and other stable maintainers can weigh in on whether they would
> like LKFT to go from running mainline tests on stable releases to
> revision matching.

I appreciate these explanations. When we discussed this subject at
Netconf, we looked at the documentation, and we understood the
advantages of running newer kselftests on older kernels. But the issue
we have is to "detect missing features and report skips": that's hard to
maintain, because it increases the code complexity, and it is hard to
validate before applying patches.

One of the reasons is that many networking selftests are validating
internal behaviours that are not exposed to the userspace. That makes it
hard to detect what behaviour to expect, and checking the kernel version
doesn't seem to be the right thing to do here. Or does it mean that
these essential tests should not validate the internal behaviours, e.g.
checking that the packets sent on the wire are formatted correctly?

A compromise could be to mark the tests checking internal behaviours,
and warn testers that they should be executed on the same version. Or
even run all the tests twice: once with the kselftests from the same
version, and once using the kselftests from the latest stable version. WDYT?

The main problem we saw when using kselftests from a newer version is
that the code coverage of many 'net' tests might even decrease over
time. In this subsystem, it is common to have "big" selftests running
many subtests. When a new feature is added, a new subtest might be added
in an existing selftest. When one subtest fails -- e.g. because the test
is not skipped on older kernels -- the whole selftest is marked as
failed. In a situation

LKFT CI: improving Networking selftests results when validating stable kernels

2024-11-08 Thread Matthieu Baerts

Hello LKFT maintainers, CI operators,

First, I would like to say thank you to the people behind the LKFT
project for validating stable kernels (and more), and including some
Network selftests in their tests suites.

A lot of improvements around the networking kselftests have been done
this year. At the last Netconf [1], we discussed how these tests were
validated on stable kernels from CIs like the LKFT one, and we have some
suggestions to improve the situation.

KSelftests from the same version

According to the doc [2], kselftests should support all previous kernel
versions. The LKFT CI is then using the kselftests from the last stable
release to validate all stable versions. Even if there are good reasons
to do that, we would like to ask for an opt-out for this policy for the
networking tests: this is hard to maintain with the increased
complexity, hard to validate on all stable kernels before applying
patches, and hard to put in place in some situations. As a result, many
tests are failing on older kernels, and it looks like it is a lot of
work to support older kernels, and to maintain this.

Many networking tests are validating the internal behaviour that is not
exposed to the userspace. A typical example: some tests look at the raw
packets being exchanged during a test, and this behaviour can change
without modifying how the userspace is interacting with the kernel. The
kernel could expose capabilities, but that's not something that seems
natural to put in place for internal behaviours that are not exposed to
end users. Maybe workarounds could be used, e.g. looking at kernel
symbols, etc. Nut that doesn't always work, increase the complexity, and
often "false positive" issue will be noticed only after a patch hits
stable, and will cause a bunch of tests to be ignored.

Regarding fixes, ideally they will come with a new or modified test that
can also be backported. So the coverage can continue to grow in stable
versions too.

Do you think that from the kernel v6.12 (or before?), the LKFT CI could
run the networking kselftests from the version that is being validated,
and not from a newer one? So validating the selftests from v6.12.1 on a
v6.12.1, and not the ones from a future v6.16.y on a v6.12.42.

Skipped tests
-

It looks like many tests are skipped:

- Some have been in a skip file [3] for a while: maybe they can be removed?

- Some are skipped because of missing tools: maybe they can be added?
e.g. iputils, tshark, ipv6toolkit, etc.

- Some tests are in 'net', but in subdirectories, and hence not tested,
e.g. forwarding, packetdrill, netfilter, tcp_ao. Could they be tested too?

How can we change this to increase the code coverage using existing tests?

KVM
---

It looks like different VMs are being used to execute the different
tests. Do these VMs benefit from any accelerations like KVM? If not,
some tests might fail because the environment is too slow.

The KSFT_MACHINE_SLOW=yes env var can be set to increase some
tolerances, timeout or to skip some parts, but that might not be enough
for some tests.

Notifications
-

In case of new regressions, who is being notified? Are the people from
the MAINTAINERS file, and linked to the corresponding selftests being
notified or do they need to do the monitoring on their side?

Looking forward to improving the networking selftests results when
validating stable kernels!

[1] https://netdev.bots.linux.dev/netconf/2024/
[2] https://docs.kernel.org/dev-tools/kselftest.html
[3]
https://github.com/Linaro/test-definitions/blob/master/automated/linux/kselftest/skipfile-lkft.yaml

Cheers,
Matt
--
Sponsored by the NGI0 Core fund.

[PATCH bpf-next/net 0/5] bpf: Add mptcp_subflow bpf_iter support

Here is a series from Geliang, adding mptcp_subflow bpf_iter support.

We are working on extending MPTCP with BPF, e.g. to control the path
manager -- in charge of the creation, deletion, and announcements of
subflows (paths) -- and the packet scheduler -- in charge of selecting
which available path the next data will be sent to. These extensions
need to iterate over the list of subflows attached to an MPTCP
connection, and do some specific actions via some new kfunc that will be
added later on.

This preparation work is split in different patches:

- Patch 1: register some "basic" MPTCP kfunc.

- Patch 2: add mptcp_subflow bpf_iter support. Note that previous
   versions of this single patch have already been shared to the
   BPF mailing list. The changelog has been kept with a comment,
   but the version number has been reset to avoid confusions.

- Patch 3: add kfunc to make sure the msk is valid

- Patch 4: add more MPTCP endpoints in the selftests, in order to create
   more than 2 subflows.

- Patch 5: add a very simple test validating mptcp_subflow bpf_iter
   support. This test could be written without the new bpf_iter,
   but it is there only to make sure this specific feature works
   as expected.

Signed-off-by: Matthieu Baerts (NGI0) 
---
Geliang Tang (5):
  bpf: Register mptcp common kfunc set
  bpf: Add mptcp_subflow bpf_iter
  bpf: Acquire and release mptcp socket
  selftests/bpf: More endpoints for endpoint_init
  selftests/bpf: Add mptcp_subflow bpf_iter subtest

 net/mptcp/bpf.c| 104 -
 tools/testing/selftests/bpf/bpf_experimental.h |   8 ++
 tools/testing/selftests/bpf/prog_tests/mptcp.c | 129 -
 tools/testing/selftests/bpf/progs/mptcp_bpf.h  |  10 ++
 .../testing/selftests/bpf/progs/mptcp_bpf_iters.c  |  64 ++
 5 files changed, 308 insertions(+), 7 deletions(-)
---
base-commit: 141b4d6a8049cecdc8124f87e044b83a9e80730d
change-id: 20241108-bpf-next-net-mptcp-bpf_iter-subflows-027f6d87770e

Best regards,
-- 
Matthieu Baerts (NGI0)

[PATCH bpf-next/net 5/5] selftests/bpf: Add mptcp_subflow bpf_iter subtest

From: Geliang Tang 

This patch adds a "cgroup/getsockopt" program "iters_subflow" to test the
newly added mptcp_subflow bpf_iter.

Export mptcp_subflow helpers bpf_iter_mptcp_subflow_new/_next/_destroy,
bpf_mptcp_sock_acquire/_release and other helpers into bpf_experimental.h.

Use bpf_mptcp_sock_acquire() to acquire the msk, then use bpf_for_each()
to walk the subflow list of this msk. From there, future MPTCP-specific
kfunc can be called in the loop. Because they are not there yet, this
test doesn't do anything very useful for the moment, but it focuses on
validating the 'bpf_iter' part and the basic MPTCP kfunc. That's why it
simply adds all subflow ids to local variable local_ids to make sure all
subflows have been seen, then invoke the kfunc bpf_mptcp_subflow_tcp_sock()
in the loop to pick a subsocket.

Out of the loop, use bpf_mptcp_subflow_ctx() to get the subflow context
of the picked subsocket and do some verifications. Finally, assign
local_ids to global variable ids so that the application can obtain this
value, and release the msk.

A related subtest called test_iters_subflow is added to load and verify
the newly added mptcp_subflow type bpf_iter example in test_mptcp. The
endpoint_init() helper is used to add 3 new subflow endpoints. Then one
byte of message is sent to trigger the creation of new subflows.
getsockopt() is invoked once the subflows have been created to trigger
the "cgroup/getsockopt" test program "iters_subflow". skel->bss->ids is
then checked to make sure it equals 10, the sum of each subflow ID: we
should have 4 subflows: 1 + 2 + 3 + 4 = 10. If that's the case, the
bpf_iter loop did the job as expected.

Signed-off-by: Geliang Tang 
Reviewed-by: Matthieu Baerts (NGI0) 
Signed-off-by: Matthieu Baerts (NGI0) 
---
 tools/testing/selftests/bpf/bpf_experimental.h |  8 +++
 tools/testing/selftests/bpf/prog_tests/mptcp.c | 73 ++
 tools/testing/selftests/bpf/progs/mptcp_bpf.h  | 10 +++
 .../testing/selftests/bpf/progs/mptcp_bpf_iters.c  | 64 +++
 4 files changed, 155 insertions(+)

diff --git a/tools/testing/selftests/bpf/bpf_experimental.h 
b/tools/testing/selftests/bpf/bpf_experimental.h
index 
b0668f29f7b394eb5294b6c9cade28fc1b17265a..08eaa431aafd758117f8e818410c4a3e7fd0b70c
 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -575,6 +575,14 @@ extern int bpf_iter_css_new(struct bpf_iter_css *it,
 extern struct cgroup_subsys_state *bpf_iter_css_next(struct bpf_iter_css *it) 
__weak __ksym;
 extern void bpf_iter_css_destroy(struct bpf_iter_css *it) __weak __ksym;
 
+struct bpf_iter_mptcp_subflow;
+extern int bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it,
+ struct mptcp_sock *msk) __weak __ksym;
+extern struct mptcp_subflow_context *
+bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it) __weak __ksym;
+extern void
+bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it) __weak 
__ksym;
+
 extern int bpf_wq_init(struct bpf_wq *wq, void *p__map, unsigned int flags) 
__weak __ksym;
 extern int bpf_wq_start(struct bpf_wq *wq, unsigned int flags) __weak __ksym;
 extern int bpf_wq_set_callback_impl(struct bpf_wq *wq,
diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c 
b/tools/testing/selftests/bpf/prog_tests/mptcp.c
index 
85f3d4119802a85c86cde7b74a0b857252bad8b8..f37574b5ef68d8f32f8002df317869dfdf1d4b2d
 100644
--- a/tools/testing/selftests/bpf/prog_tests/mptcp.c
+++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c
@@ -11,6 +11,7 @@
 #include "mptcp_sock.skel.h"
 #include "mptcpify.skel.h"
 #include "mptcp_subflow.skel.h"
+#include "mptcp_bpf_iters.skel.h"
 
 #define NS_TEST "mptcp_ns"
 #define ADDR_1 "10.0.1.1"
@@ -33,6 +34,9 @@
 #ifndef MPTCP_INFO
 #define MPTCP_INFO 1
 #endif
+#ifndef TCP_IS_MPTCP
+#define TCP_IS_MPTCP   43  /* Is MPTCP being used? */
+#endif
 #ifndef MPTCP_INFO_FLAG_FALLBACK
 #define MPTCP_INFO_FLAG_FALLBACK   _BITUL(0)
 #endif
@@ -480,6 +484,73 @@ static void test_subflow(void)
close(cgroup_fd);
 }
 
+static void run_iters_subflow(void)
+{
+   int server_fd, client_fd;
+   int is_mptcp, err;
+   socklen_t len;
+
+   server_fd = start_mptcp_server(AF_INET, ADDR_1, PORT_1, 0);
+   if (!ASSERT_OK_FD(server_fd, "start_mptcp_server"))
+   return;
+
+   client_fd = connect_to_fd(server_fd, 0);
+   if (!ASSERT_OK_FD(client_fd, "connect_to_fd"))
+   goto close_server;
+
+   send_byte(client_fd);
+   wait_for_new_subflows(client_fd);
+
+   len = sizeof(is_mptcp);
+   /* mainly to trigger the BPF program */
+   err = getsockopt(client_fd, SOL_TCP, TCP_IS_MPTCP, &is_mptcp, &len);
+   if (ASSERT_OK(err,

[PATCH bpf-next/net 4/5] selftests/bpf: More endpoints for endpoint_init

From: Geliang Tang 

This patch changes ADDR_2 from "10.0.1.2" to "10.0.2.1", and adds two more
IPv4 test addresses ADDR_3 - ADDR_4, four IPv6 addresses ADDR6_1 - ADDR6_4.
Add a new helper address_init() to initialize all these addresses.

Add a new parameter "endpoints" for endpoint_init() to control how many
endpoints are used for the tests. This makes it more flexible. Update the
parameters of endpoint_init() in test_subflow().

Signed-off-by: Geliang Tang 
Reviewed-by: Mat Martineau 
Signed-off-by: Matthieu Baerts (NGI0) 
---
 tools/testing/selftests/bpf/prog_tests/mptcp.c | 56 +++---
 1 file changed, 50 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c 
b/tools/testing/selftests/bpf/prog_tests/mptcp.c
index 
f8eb7f9d4fd20bbb7ee018728f7ae0f0a09d4d30..85f3d4119802a85c86cde7b74a0b857252bad8b8
 100644
--- a/tools/testing/selftests/bpf/prog_tests/mptcp.c
+++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c
@@ -14,7 +14,13 @@
 
 #define NS_TEST "mptcp_ns"
 #define ADDR_1 "10.0.1.1"
-#define ADDR_2 "10.0.1.2"
+#define ADDR_2 "10.0.2.1"
+#define ADDR_3 "10.0.3.1"
+#define ADDR_4 "10.0.4.1"
+#define ADDR6_1"dead:beef:1::1"
+#define ADDR6_2"dead:beef:2::1"
+#define ADDR6_3"dead:beef:3::1"
+#define ADDR6_4"dead:beef:4::1"
 #define PORT_1 10001
 
 #ifndef IPPROTO_MPTCP
@@ -322,22 +328,60 @@ static void test_mptcpify(void)
close(cgroup_fd);
 }
 
-static int endpoint_init(char *flags)
+static int address_init(void)
 {
SYS(fail, "ip -net %s link add veth1 type veth peer name veth2", 
NS_TEST);
SYS(fail, "ip -net %s addr add %s/24 dev veth1", NS_TEST, ADDR_1);
+   SYS(fail, "ip -net %s addr add %s/64 dev veth1 nodad", NS_TEST, 
ADDR6_1);
SYS(fail, "ip -net %s link set dev veth1 up", NS_TEST);
SYS(fail, "ip -net %s addr add %s/24 dev veth2", NS_TEST, ADDR_2);
+   SYS(fail, "ip -net %s addr add %s/64 dev veth2 nodad", NS_TEST, 
ADDR6_2);
SYS(fail, "ip -net %s link set dev veth2 up", NS_TEST);
-   if (SYS_NOFAIL("ip -net %s mptcp endpoint add %s %s", NS_TEST, ADDR_2, 
flags)) {
+
+   SYS(fail, "ip -net %s link add veth3 type veth peer name veth4", 
NS_TEST);
+   SYS(fail, "ip -net %s addr add %s/24 dev veth3", NS_TEST, ADDR_3);
+   SYS(fail, "ip -net %s addr add %s/64 dev veth3 nodad", NS_TEST, 
ADDR6_3);
+   SYS(fail, "ip -net %s link set dev veth3 up", NS_TEST);
+   SYS(fail, "ip -net %s addr add %s/24 dev veth4", NS_TEST, ADDR_4);
+   SYS(fail, "ip -net %s addr add %s/64 dev veth4 nodad", NS_TEST, 
ADDR6_4);
+   SYS(fail, "ip -net %s link set dev veth4 up", NS_TEST);
+
+   return 0;
+fail:
+   return -1;
+}
+
+static int endpoint_add(char *addr, char *flags)
+{
+   return SYS_NOFAIL("ip -net %s mptcp endpoint add %s %s", NS_TEST, addr, 
flags);
+}
+
+static int endpoint_init(char *flags, u8 endpoints)
+{
+   int ret = -1;
+
+   if (!endpoints || endpoints > 4)
+   goto fail;
+
+   if (address_init())
+   goto fail;
+
+   if (SYS_NOFAIL("ip -net %s mptcp limits set add_addr_accepted 4 
subflows 4",
+  NS_TEST)) {
printf("'ip mptcp' not supported, skip this test.\n");
test__skip();
goto fail;
}
 
-   return 0;
+   if (endpoints > 1)
+   ret = endpoint_add(ADDR_2, flags);
+   if (endpoints > 2)
+   ret = ret ?: endpoint_add(ADDR_3, flags);
+   if (endpoints > 3)
+   ret = ret ?: endpoint_add(ADDR_4, flags);
+
 fail:
-   return -1;
+   return ret;
 }
 
 static void wait_for_new_subflows(int fd)
@@ -423,7 +467,7 @@ static void test_subflow(void)
if (!ASSERT_OK_PTR(netns, "netns_new: mptcp_subflow"))
goto skel_destroy;
 
-   if (endpoint_init("subflow") < 0)
+   if (endpoint_init("subflow", 2) < 0)
goto close_netns;
 
run_subflow();

-- 
2.45.2

[PATCH bpf-next/net 3/5] bpf: Acquire and release mptcp socket

From: Geliang Tang 

The KF_TRUSTED_ARGS flag is used for bpf_iter_mptcp_subflow_new, it
indicates that the all pointer arguments are valid. It's necessary to
add a KF_ACQUIRE helper to get valid "msk".

This patch adds bpf_mptcp_sock_acquire() and bpf_mptcp_sock_release()
helpers for this. Increase sk->sk_refcnt in _acquire() and decrease it
in _release(). Register them with KF_ACQUIRE flag and KF_RELEASE flag.

Signed-off-by: Geliang Tang 
Reviewed-by: Mat Martineau 
Signed-off-by: Matthieu Baerts (NGI0) 
---
 net/mptcp/bpf.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
index 
d107c2865e97e6ccffb9e0720dfbbd232b63a3b8..5bd04548e846b4dc120dbc83725a604821fac772
 100644
--- a/net/mptcp/bpf.c
+++ b/net/mptcp/bpf.c
@@ -90,6 +90,23 @@ bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow 
*it)
 {
 }
 
+__bpf_kfunc static struct
+mptcp_sock *bpf_mptcp_sock_acquire(struct mptcp_sock *msk)
+{
+   struct sock *sk = (struct sock *)msk;
+
+   if (sk && refcount_inc_not_zero(&sk->sk_refcnt))
+   return msk;
+   return NULL;
+}
+
+__bpf_kfunc static void bpf_mptcp_sock_release(struct mptcp_sock *msk)
+{
+   struct sock *sk = (struct sock *)msk;
+
+   WARN_ON_ONCE(!sk || !refcount_dec_not_one(&sk->sk_refcnt));
+}
+
 __bpf_kfunc_end_defs();
 
 BTF_KFUNCS_START(bpf_mptcp_common_kfunc_ids)
@@ -99,6 +116,8 @@ BTF_ID_FLAGS(func, bpf_mptcp_subflow_tcp_sock)
 BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
 BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next, KF_ITER_NEXT | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy, KF_ITER_DESTROY)
+BTF_ID_FLAGS(func, bpf_mptcp_sock_acquire, KF_ACQUIRE | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_mptcp_sock_release, KF_RELEASE)
 BTF_KFUNCS_END(bpf_mptcp_common_kfunc_ids)
 
 static const struct btf_kfunc_id_set bpf_mptcp_common_kfunc_set = {

-- 
2.45.2

[PATCH bpf-next/net 2/5] bpf: Add mptcp_subflow bpf_iter

From: Geliang Tang 

It's necessary to traverse all subflows on the conn_list of an MPTCP
socket and then call kfunc to modify the fields of each subflow. In
kernel space, mptcp_for_each_subflow() helper is used for this:

mptcp_for_each_subflow(msk, subflow)
kfunc(subflow);

But in the MPTCP BPF program, this has not yet been implemented. As
Martin suggested recently, this conn_list walking + modify-by-kfunc
usage fits the bpf_iter use case.

So this patch adds a new bpf_iter type named "mptcp_subflow" to do
this and implements its helpers bpf_iter_mptcp_subflow_new()/_next()/
_destroy(). And register these bpf_iter mptcp_subflow into mptcp
common kfunc set. Then bpf_for_each() for mptcp_subflow can be used
in BPF program like this:

bpf_for_each(mptcp_subflow, subflow, msk)
kfunc(subflow);

Suggested-by: Martin KaFai Lau 
Signed-off-by: Geliang Tang 
Reviewed-by: Mat Martineau 
Signed-off-by: Matthieu Baerts (NGI0) 
---
Notes:
A few versions of this single patch have been previously posted to the
BPF mailing list by Geliang, before continuing to the MPTCP mailing list
only, with other patches of this series. The version of the whole series
has been reset to 1, but here is the ChangeLog for this patch here:
 - v2: remove msk->pm.lock in _new() and _destroy() (Martin)
   drop DEFINE_BPF_ITER_FUNC, change opaque[3] to opaque[2] (Andrii)
 - v3: drop bpf_iter__mptcp_subflow
 - v4: if msk is NULL, initialize kit->msk to NULL in _new() and check
   it in _next() (Andrii)
 - v5: use list_is_last() instead of list_entry_is_head() add
   KF_ITER_NEW/NEXT/DESTROY flags add msk_owned_by_me in _new()
 - v6: add KF_TRUSTED_ARGS flag (Andrii, Martin)
---
 net/mptcp/bpf.c | 45 +
 1 file changed, 45 insertions(+)

diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
index 
6f96a5927fd371f8ea92cbf96c875edef9272b98..d107c2865e97e6ccffb9e0720dfbbd232b63a3b8
 100644
--- a/net/mptcp/bpf.c
+++ b/net/mptcp/bpf.c
@@ -29,6 +29,15 @@ static const struct btf_kfunc_id_set bpf_mptcp_fmodret_set = 
{
.set   = &bpf_mptcp_fmodret_ids,
 };
 
+struct bpf_iter_mptcp_subflow {
+   __u64 __opaque[2];
+} __aligned(8);
+
+struct bpf_iter_mptcp_subflow_kern {
+   struct mptcp_sock *msk;
+   struct list_head *pos;
+} __aligned(8);
+
 __bpf_kfunc_start_defs();
 
 __bpf_kfunc static struct mptcp_sock *bpf_mptcp_sk(struct sock *sk)
@@ -48,12 +57,48 @@ bpf_mptcp_subflow_tcp_sock(const struct 
mptcp_subflow_context *subflow)
return mptcp_subflow_tcp_sock(subflow);
 }
 
+__bpf_kfunc static int
+bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it,
+  struct mptcp_sock *msk)
+{
+   struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
+
+   kit->msk = msk;
+   if (!msk)
+   return -EINVAL;
+
+   msk_owned_by_me(msk);
+
+   kit->pos = &msk->conn_list;
+   return 0;
+}
+
+__bpf_kfunc static struct mptcp_subflow_context *
+bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it)
+{
+   struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
+
+   if (!kit->msk || list_is_last(kit->pos, &kit->msk->conn_list))
+   return NULL;
+
+   kit->pos = kit->pos->next;
+   return list_entry(kit->pos, struct mptcp_subflow_context, node);
+}
+
+__bpf_kfunc static void
+bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it)
+{
+}
+
 __bpf_kfunc_end_defs();
 
 BTF_KFUNCS_START(bpf_mptcp_common_kfunc_ids)
 BTF_ID_FLAGS(func, bpf_mptcp_sk)
 BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx)
 BTF_ID_FLAGS(func, bpf_mptcp_subflow_tcp_sock)
+BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next, KF_ITER_NEXT | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy, KF_ITER_DESTROY)
 BTF_KFUNCS_END(bpf_mptcp_common_kfunc_ids)
 
 static const struct btf_kfunc_id_set bpf_mptcp_common_kfunc_set = {

-- 
2.45.2

[PATCH bpf-next/net 1/5] bpf: Register mptcp common kfunc set

From: Geliang Tang 

MPTCP helper mptcp_sk() is used to convert struct sock to mptcp_sock.
Helpers mptcp_subflow_ctx() and mptcp_subflow_tcp_sock() are used to
convert between struct mptcp_subflow_context and sock. They all will
be used in MPTCP BPF programs too.

This patch defines corresponding wrappers of them, and put the
wrappers into mptcp common kfunc set and register the set with the
flag BPF_PROG_TYPE_UNSPEC to let them accessible to all types of BPF
programs.

Signed-off-by: Geliang Tang 
Reviewed-by: Mat Martineau 
Reviewed-by: Matthieu Baerts (NGI0) 
Signed-off-by: Matthieu Baerts (NGI0) 
---
 net/mptcp/bpf.c | 40 +++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
index 
8a16672b94e2384f5263e1432296cbca1236bb30..6f96a5927fd371f8ea92cbf96c875edef9272b98
 100644
--- a/net/mptcp/bpf.c
+++ b/net/mptcp/bpf.c
@@ -29,8 +29,46 @@ static const struct btf_kfunc_id_set bpf_mptcp_fmodret_set = 
{
.set   = &bpf_mptcp_fmodret_ids,
 };
 
+__bpf_kfunc_start_defs();
+
+__bpf_kfunc static struct mptcp_sock *bpf_mptcp_sk(struct sock *sk)
+{
+   return mptcp_sk(sk);
+}
+
+__bpf_kfunc static struct mptcp_subflow_context *
+bpf_mptcp_subflow_ctx(const struct sock *sk)
+{
+   return mptcp_subflow_ctx(sk);
+}
+
+__bpf_kfunc static struct sock *
+bpf_mptcp_subflow_tcp_sock(const struct mptcp_subflow_context *subflow)
+{
+   return mptcp_subflow_tcp_sock(subflow);
+}
+
+__bpf_kfunc_end_defs();
+
+BTF_KFUNCS_START(bpf_mptcp_common_kfunc_ids)
+BTF_ID_FLAGS(func, bpf_mptcp_sk)
+BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx)
+BTF_ID_FLAGS(func, bpf_mptcp_subflow_tcp_sock)
+BTF_KFUNCS_END(bpf_mptcp_common_kfunc_ids)
+
+static const struct btf_kfunc_id_set bpf_mptcp_common_kfunc_set = {
+   .owner  = THIS_MODULE,
+   .set= &bpf_mptcp_common_kfunc_ids,
+};
+
 static int __init bpf_mptcp_kfunc_init(void)
 {
-   return register_btf_fmodret_id_set(&bpf_mptcp_fmodret_set);
+   int ret;
+
+   ret = register_btf_fmodret_id_set(&bpf_mptcp_fmodret_set);
+   ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_UNSPEC,
+  &bpf_mptcp_common_kfunc_set);
+
+   return ret;
 }
 late_initcall(bpf_mptcp_kfunc_init);

-- 
2.45.2

[PATCH net-next] selftests: net: include lib/sh/*.sh with lib.sh

2024-11-04 Thread Matthieu Baerts (NGI0)

Recently, the net/lib.sh file has been modified to include defer.sh from
net/lib/sh/ directory. The Makefile from net/lib has been modified
accordingly, but not the ones from the sub-targets using net/lib.sh.

Because of that, the new file is not installed as expected when
installing the Forwarding, MPTCP, and Netfilter targets, e.g.

  # make -C tools/testing/selftests TARGETS=net/mptcp install \
INSTALL_PATH=/tmp/kself
  # cd /tmp/kself/
  # ./run_kselftest.sh -c net/mptcp
TAP version 13
1..7
# timeout set to 1800
# selftests: net/mptcp: mptcp_connect.sh
# ./../lib.sh: line 5: /tmp/kself/net/lib/sh/defer.sh: No such file
  or directory
# (...)

This can be fixed simply by adding all the .sh files from net/lib/sh
directory to the TEST_INCLUDES variable in the different Makefile's.

Fixes: a6e263f125cd ("selftests: net: lib: Introduce deferred commands")
Signed-off-by: Matthieu Baerts (NGI0) 
---
 tools/testing/selftests/net/forwarding/Makefile | 3 ++-
 tools/testing/selftests/net/mptcp/Makefile  | 2 +-
 tools/testing/selftests/net/netfilter/Makefile  | 3 ++-
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/net/forwarding/Makefile 
b/tools/testing/selftests/net/forwarding/Makefile
index 
224346426ef2220470bf2dbef66198eae9331f56..7d885cff8d79bc6882d2341d0a1a59891fc1cb2e
 100644
--- a/tools/testing/selftests/net/forwarding/Makefile
+++ b/tools/testing/selftests/net/forwarding/Makefile
@@ -126,6 +126,7 @@ TEST_FILES := devlink_lib.sh \
tc_common.sh
 
 TEST_INCLUDES := \
-   ../lib.sh
+   ../lib.sh \
+   $(wildcard ../lib/sh/*.sh)
 
 include ../../lib.mk
diff --git a/tools/testing/selftests/net/mptcp/Makefile 
b/tools/testing/selftests/net/mptcp/Makefile
index 
5d796622e73099d0a0331c1bc8a41f09e1d36b4d..8e3fc05a539797c5e0feab72be69db623ef3fa96
 100644
--- a/tools/testing/selftests/net/mptcp/Makefile
+++ b/tools/testing/selftests/net/mptcp/Makefile
@@ -11,7 +11,7 @@ TEST_GEN_FILES = mptcp_connect pm_nl_ctl mptcp_sockopt 
mptcp_inq
 
 TEST_FILES := mptcp_lib.sh settings
 
-TEST_INCLUDES := ../lib.sh ../net_helper.sh
+TEST_INCLUDES := ../lib.sh $(wildcard ../lib/sh/*.sh) ../net_helper.sh
 
 EXTRA_CLEAN := *.pcap
 
diff --git a/tools/testing/selftests/net/netfilter/Makefile 
b/tools/testing/selftests/net/netfilter/Makefile
index 
542f7886a0bc2ac016f41d8b70357a8e0c1d271b..9d009f74cfc2da66428b1e23d5884e2c3bc4a85d
 100644
--- a/tools/testing/selftests/net/netfilter/Makefile
+++ b/tools/testing/selftests/net/netfilter/Makefile
@@ -55,4 +55,5 @@ TEST_FILES := lib.sh
 TEST_FILES += packetdrill
 
 TEST_INCLUDES := \
-   ../lib.sh
+   ../lib.sh \
+   $(wildcard ../lib/sh/*.sh)

---
base-commit: ecf99864ea6b1843773589a935bb026951bf12dd
change-id: 20241104-net-next-selftests-lib-sh-deps-cc359ca5602f

Best regards,
-- 
Matthieu Baerts (NGI0)

Re: [PATCH bpf-next] selftests/bpf: Fix compile error when MPTCP not support

2024-10-30 Thread Matthieu Baerts

Hi Tao, BPF maintainers,

On 30/10/2024 12:12, Tao Chen wrote:
> 在 2024/10/30 18:49, Matthieu Baerts 写道:
>> Hi Tao Chen,
>>
>> Thank you for having shared this patch.
>>
>> On 30/10/2024 11:01, Tao Chen wrote:
>>> Fix compile error when MPTCP feature not support, though eBPF core check
>>> already done which seems invalid in this situation, the error info like:
>>> progs/mptcp_sock.c:49:40: error: no member named 'is_mptcp' in 'struct
>>> tcp_sock'
>>>     49 | is_mptcp = bpf_core_field_exists(tsk->is_mptcp) ?
>>>
>>> The filed created in new definitions with eBPF core feature to solve
>>> this build problem, and test case result still ok in MPTCP kernel.
>>>
>>> 176/1   mptcp/base:OK
>>> 176/2   mptcp/mptcpify:OK
>>> 176 mptcp:OK
>>> Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED
>>>
>>> Fixes: 8039d353217c ("selftests/bpf: Add MPTCP test base")
>>
>> The commit you mentioned here is more than 2 years old, and as far as I
>> can see, nobody else reported this compilation issue. I guess that's
>> because people used tools/testing/selftests/bpf/config file as expected
>> to populate the kernel config, and I suppose you didn't, right?
>>
> 
> Hi Matt, thank you for your reply, as you said, i did not use tools/
> testing/selftests/bpf/config to compile kernel, i will use this helpful
> feature.
> 
>> I don't think other BPF selftests check for missing kernel config if
>> they are specified in the 'config' file, but even if it is the case, I
>> think it would be better to skip all the MPTCP tests, and not try to
>> have them checking something that doesn't exist: no need to validate
>> these tests if the expected kernel config has not been enabled.
>>
> 
> If i use the kernel not support MPTCP, the compile error still exists,
> and i can not build the bpf test successfully. Maybe skill the test case
> seems better when kernel not support. Now that bpf_core_field_exists
> check already used in the code, i think it is better to use new
> definition mode.

I understand it would be better, but it means more code to maintain to
handle that (and remembering that in future test cases). If that's not
necessary, then no need to do the effort.

@BPF maintainers: do we need to support kernels not respecting the
tools/testing/selftests/bpf/config file? Should we detect when a
required kernel config is not set and skip some tests?

>> But again, please correct me if I'm wrong, but I don't think there is
>> anything to change here to fix your compilation issue: simply make sure
>> to use this tools/testing/selftests/bpf/config file to generate your
>> kernel config, no?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH bpf-next] selftests/bpf: Fix compile error when MPTCP not support

2024-10-30 Thread Matthieu Baerts

Hi Tao Chen,

Thank you for having shared this patch.

On 30/10/2024 11:01, Tao Chen wrote:
> Fix compile error when MPTCP feature not support, though eBPF core check
> already done which seems invalid in this situation, the error info like:
> progs/mptcp_sock.c:49:40: error: no member named 'is_mptcp' in 'struct
> tcp_sock'
>49 | is_mptcp = bpf_core_field_exists(tsk->is_mptcp) ?
> 
> The filed created in new definitions with eBPF core feature to solve
> this build problem, and test case result still ok in MPTCP kernel.
> 
> 176/1   mptcp/base:OK
> 176/2   mptcp/mptcpify:OK
> 176 mptcp:OK
> Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED
> 
> Fixes: 8039d353217c ("selftests/bpf: Add MPTCP test base")

The commit you mentioned here is more than 2 years old, and as far as I
can see, nobody else reported this compilation issue. I guess that's
because people used tools/testing/selftests/bpf/config file as expected
to populate the kernel config, and I suppose you didn't, right?

I don't think other BPF selftests check for missing kernel config if
they are specified in the 'config' file, but even if it is the case, I
think it would be better to skip all the MPTCP tests, and not try to
have them checking something that doesn't exist: no need to validate
these tests if the expected kernel config has not been enabled.

But again, please correct me if I'm wrong, but I don't think there is
anything to change here to fix your compilation issue: simply make sure
to use this tools/testing/selftests/bpf/config file to generate your
kernel config, no?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH rcu] configs/debug: make sure PROVE_RCU_LIST=y takes effect

2024-10-28 Thread Matthieu Baerts

Hi Jakub,

On 28/10/2024 18:22, Jakub Kicinski wrote:
> On Wed, 16 Oct 2024 17:18:47 +0200 Matthieu Baerts wrote:
>> https://patchwork.kernel.org/project/netdevbpf/patch/20241016011144.3058445-1-k...@kernel.org/
>>
>> If the impact is important, it might be better to target linux-next
>> first, no?
> 
> Thanks for testing! I didn't anticipate it to be so effective.
> 
> Looks like it's not in -next, yet, and we got an Ack from Paul 
> and co. so I'll toss it into net-next.

Thanks!

Please note that the 3 issues I mentioned have been fixed somewhere:

- MPTCP: patches have been sent to the Netdev ML [1]
- Netfilter: patches have been sent to the Netfilter ML [2]
- perf (VM shutdown): patches have been applied in perf/urgent [3]

[1]
https://lore.kernel.org/20241021-net-mptcp-sched-lock-v1-1-637759cf0...@kernel.org
[2] https://lore.kernel.org/20241025133230.22491-2...@strlen.de
[3]
https://lore.kernel.org/20240913162340.2142976-1-kan.li...@linux.intel.com

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH net-next 1/2] net: netconsole: selftests: Change the IP subnet

2024-10-28 Thread Matthieu Baerts

Hi Breno,

On 28/10/2024 16:48, Breno Leitao wrote:
> Use a less populated IP range to run the tests, as suggested by Petr in
> Link: https://lore.kernel.org/netdev/87ikvukv3s@nvidia.com/.

It looks like this is the same version as the one you sent on Friday,
without the modification suggested by Petr:

  https://lore.kernel.org/20241025161415.238215-1-lei...@debian.org

I supposed these new patches have been sent by accident, right?

(BTW: it is often better to include a cover letter when there is more
than one patch: some CIs might not take patches sent without it.)

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH net 2/3] mptcp: remove unneeded lock when listing scheds

2024-10-23 Thread Matthieu Baerts

Hi Simon,

Thank you for the reviews!

On 23/10/2024 14:21, Simon Horman wrote:
> On Mon, Oct 21, 2024 at 12:25:27PM +0200, Matthieu Baerts (NGI0) wrote:
>> mptcp_get_available_schedulers() needs to iterate over the schedulers'
>> list only to read the names: it doesn't modify anything there.
>>
>> In this case, it is enough to hold the RCU read lock, no need to combine
>> this with the associated spin lock.
>>
>> Fixes: 73c900aa3660 ("mptcp: add net.mptcp.available_schedulers")
>> Cc: sta...@vger.kernel.org
>> Suggested-by: Paolo Abeni 
>> Reviewed-by: Geliang Tang 
>> Signed-off-by: Matthieu Baerts (NGI0) 
> 
> I do wonder if it would be more appropriate to route this via net-next
> (without a fixes tag) rather than via net. But either way this looks good
> to me.
Good point. On one hand, I marked it as a fix, because when working on
the patch 1/3, we noticed these spin_(un)lock() were not supposed to be
there in the first place. On the other hand, even it's fixing a small
performance issue, it is not fixing a regression.

I think it is easier to route this via -net, but I'm fine if it is
applied in net-next.

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

[PATCH net 3/3] selftests: mptcp: list sysctl data

Listing all the values linked to the MPTCP sysctl knobs was not
exercised in MPTCP test suite.

Let's do that to avoid any regressions, but also to have a kernel with a
debug kconfig verifying more assumptions. For the moment, we are not
interested by the output, only to avoid crashes and warnings.

Signed-off-by: Matthieu Baerts (NGI0) 
---
 tools/testing/selftests/net/mptcp/mptcp_connect.sh | 9 +
 1 file changed, 9 insertions(+)

diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh 
b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
index 
57325d57e4c6e3653019db2de09620d692143683..b48b4e56826a9cfdb3501242b707ae2ebe29b220
 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
@@ -259,6 +259,15 @@ check_mptcp_disabled()
mptcp_lib_ns_init disabled_ns
 
print_larger_title "New MPTCP socket can be blocked via sysctl"
+
+   # mainly to cover more code
+   if ! ip netns exec ${disabled_ns} sysctl net.mptcp >/dev/null; then
+   mptcp_lib_pr_fail "not able to list net.mptcp sysctl knobs"
+   mptcp_lib_result_fail "not able to list net.mptcp sysctl knobs"
+   ret=${KSFT_FAIL}
+   return 1
+   fi
+
# net.mptcp.enabled should be enabled by default
if [ "$(ip netns exec ${disabled_ns} sysctl net.mptcp.enabled | awk '{ 
print $3 }')" -ne 1 ]; then
mptcp_lib_pr_fail "net.mptcp.enabled sysctl is not 1 by default"

-- 
2.45.2

[PATCH net 2/3] mptcp: remove unneeded lock when listing scheds

mptcp_get_available_schedulers() needs to iterate over the schedulers'
list only to read the names: it doesn't modify anything there.

In this case, it is enough to hold the RCU read lock, no need to combine
this with the associated spin lock.

Fixes: 73c900aa3660 ("mptcp: add net.mptcp.available_schedulers")
Cc: sta...@vger.kernel.org
Suggested-by: Paolo Abeni 
Reviewed-by: Geliang Tang 
Signed-off-by: Matthieu Baerts (NGI0) 
---
 net/mptcp/sched.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/mptcp/sched.c b/net/mptcp/sched.c
index 
78ed508ebc1b8dd9f0e020cca1bdd86f24f0afeb..df7dbcfa3b71370cc4d7e4e4f16cc1e41a50dddf
 100644
--- a/net/mptcp/sched.c
+++ b/net/mptcp/sched.c
@@ -60,7 +60,6 @@ void mptcp_get_available_schedulers(char *buf, size_t maxlen)
size_t offs = 0;
 
rcu_read_lock();
-   spin_lock(&mptcp_sched_list_lock);
list_for_each_entry_rcu(sched, &mptcp_sched_list, list) {
offs += snprintf(buf + offs, maxlen - offs,
 "%s%s",
@@ -69,7 +68,6 @@ void mptcp_get_available_schedulers(char *buf, size_t maxlen)
if (WARN_ON_ONCE(offs >= maxlen))
break;
}
-   spin_unlock(&mptcp_sched_list_lock);
rcu_read_unlock();
 }
 

-- 
2.45.2

[PATCH net 1/3] mptcp: init: protect sched with rcu_read_lock

Enabling CONFIG_PROVE_RCU_LIST with its dependence CONFIG_RCU_EXPERT
creates this splat when an MPTCP socket is created:

  =
  WARNING: suspicious RCU usage
  6.12.0-rc2+ #11 Not tainted
  -
  net/mptcp/sched.c:44 RCU-list traversed in non-reader section!!

  other info that might help us debug this:

  rcu_scheduler_active = 2, debug_locks = 1
  no locks held by mptcp_connect/176.

  stack backtrace:
  CPU: 0 UID: 0 PID: 176 Comm: mptcp_connect Not tainted 6.12.0-rc2+ #11
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  Call Trace:
   
   dump_stack_lvl (lib/dump_stack.c:123)
   lockdep_rcu_suspicious (kernel/locking/lockdep.c:6822)
   mptcp_sched_find (net/mptcp/sched.c:44 (discriminator 7))
   mptcp_init_sock (net/mptcp/protocol.c:2867 (discriminator 1))
   ? sock_init_data_uid (arch/x86/include/asm/atomic.h:28)
   inet_create.part.0.constprop.0 (net/ipv4/af_inet.c:386)
   ? __sock_create (include/linux/rcupdate.h:347 (discriminator 1))
   __sock_create (net/socket.c:1576)
   __sys_socket (net/socket.c:1671)
   ? __pfx___sys_socket (net/socket.c:1712)
   ? do_user_addr_fault (arch/x86/mm/fault.c:1419 (discriminator 1))
   __x64_sys_socket (net/socket.c:1728)
   do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1))
   entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

That's because when the socket is initialised, rcu_read_lock() is not
used despite the explicit comment written above the declaration of
mptcp_sched_find() in sched.c. Adding the missing lock/unlock avoids the
warning.

Fixes: 1730b2b2c5a5 ("mptcp: add sched in mptcp_sock")
Cc: sta...@vger.kernel.org
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/523
Reviewed-by: Geliang Tang 
Signed-off-by: Matthieu Baerts (NGI0) 
---
 net/mptcp/protocol.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 
6d0e201c3eb26aa6ca0ff27e5a65cb6f911012f2..d263091659e076587bc3406dfdcb4409adb3247e
 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2864,8 +2864,10 @@ static int mptcp_init_sock(struct sock *sk)
if (unlikely(!net->mib.mptcp_statistics) && !mptcp_mib_alloc(net))
return -ENOMEM;
 
+   rcu_read_lock();
ret = mptcp_init_sched(mptcp_sk(sk),
   mptcp_sched_find(mptcp_get_scheduler(net)));
+   rcu_read_unlock();
if (ret)
return ret;
 

-- 
2.45.2

[PATCH net 0/3] mptcp: sched: fix some lock issues

Two small fixes related to the MPTCP packets scheduler:

- Patch 1: add missing rcu_read_(un)lock(). A fix for >= 6.6.

- Patch 2: remove unneeded lock when listing packets schedulers. A fix
  for >= 6.10.

And some modifications in the MPTCP selftests:

- Patch 3: a small addition to the MPTCP selftests to cover more code.

Signed-off-by: Matthieu Baerts (NGI0) 
---
Matthieu Baerts (NGI0) (3):
  mptcp: init: protect sched with rcu_read_lock
  mptcp: remove unneeded lock when listing scheds
  selftests: mptcp: list sysctl data

 net/mptcp/protocol.c   | 2 ++
 net/mptcp/sched.c  | 2 --
 tools/testing/selftests/net/mptcp/mptcp_connect.sh | 9 +
 3 files changed, 11 insertions(+), 2 deletions(-)
---
base-commit: 3b05b9c36ddd01338e1352588f2ec1ea23f97d43
change-id: 20241021-net-mptcp-sched-lock-10dfc75d1e00

Best regards,
-- 
Matthieu Baerts (NGI0)

Re: [PATCH rcu] configs/debug: make sure PROVE_RCU_LIST=y takes effect

2024-10-16 Thread Matthieu Baerts

On 16/10/2024 17:18, Matthieu Baerts wrote:
> Hi Jakub,
> 
> On 16/10/2024 03:11, Jakub Kicinski wrote:
>> Commit 0aaa8977acbf ("configs: introduce debug.config for CI-like setup")
>> added CONFIG_PROVE_RCU_LIST=y to the common CI config,
>> but RCU_EXPERT is not set, and it's a dependency for
>> CONFIG_PROVE_RCU_LIST=y. Make sure CIs take advantage
>> of CONFIG_PROVE_RCU_LIST=y, recent fixes in networking
>> indicate that it does catch bugs.
> 
> Good catch! I confirm it fixes the issue:
> 
> Acked-by: Matthieu Baerts (NGI0) 
> 
>> Signed-off-by: Jakub Kicinski 
>> ---
>> I'd be slightly tempted to still send it to Linux for v6.12.
> 
> Good idea, it sounds like a fix because I guess if it was on the list,
> it was supposed to be used. But be careful it might detect quite a few
> issues: I just enabled it on MPTCP tree, and it found issues.
Sorry, I forgot to mention that it found 3 issues: one specific to
MPTCP, one in Netfilter, but also one when shutting down the VM:


> # /usr/lib/klibc/bin/poweroff
> [ 2360.588763][T11825] 
> [ 2360.589006][T11825] =
> [ 2360.589424][T11825] WARNING: suspicious RCU usage
> [ 2360.589952][T11825] 6.12.0-rc2+ #1 Tainted: G N
> [ 2360.590522][T11825] -
> [ 2360.590896][T11825] kernel/events/core.c:13962 RCU-list traversed in 
> non-reader section!!
> [ 2360.592341][T11825] 
> [ 2360.592341][T11825] other info that might help us debug this:
> [ 2360.592341][T11825] 
> [ 2360.593343][T11825] 
> [ 2360.593343][T11825] rcu_scheduler_active = 2, debug_locks = 1
> [ 2360.593951][T11825] 3 locks held by poweroff/11825:
> [2360.594481][T11825] #0: 89641e28 
> (system_transition_mutex){+.+.}-{3:3}, at: __do_sys_reboot 
> (kernel/reboot.c:770) 
> [2360.594997][T11825] #1: 8963eab0 
> ((reboot_notifier_list).rwsem){}-{3:3}, at: blocking_notifier_call_chain 
> (kernel/notifier.c:388) 
> [2360.595859][T11825] #2: 898d1a68 (pmus_lock){+.+.}-{3:3}, at: 
> perf_event_exit_cpu_context (kernel/events/core.c:13983) 
> [ 2360.596645][T11825] 
> [ 2360.596645][T11825] stack backtrace:
> [ 2360.597476][T11825] CPU: 3 UID: 0 PID: 11825 Comm: poweroff Tainted: G 
> N 6.12.0-rc2+ #1
> [ 2360.597987][T11825] Tainted: [N]=TEST
> [ 2360.598291][T11825] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [ 2360.598890][T11825] Call Trace:
> [ 2360.599295][T11825]  
> [2360.599568][T11825] dump_stack_lvl (lib/dump_stack.c:123) 
> [2360.600032][T11825] lockdep_rcu_suspicious (kernel/locking/lockdep.c:6822) 
> [2360.600518][T11825] perf_event_clear_cpumask (kernel/events/core.c:13962 
> (discriminator 9)) 
> [2360.600972][T11825] ? __pfx_perf_event_clear_cpumask 
> (kernel/events/core.c:13939) 
> [2360.601607][T11825] ? acpi_execute_simple_method (drivers/acpi/utils.c:679) 
> [2360.601988][T11825] ? __pfx_acpi_execute_simple_method 
> (drivers/acpi/utils.c:679) 
> [2360.602577][T11825] ? md_notify_reboot (drivers/md/md.c:9860) 
> [2360.603043][T11825] perf_event_exit_cpu_context (kernel/events/core.c:13984 
> (discriminator 1)) 
> [2360.603543][T11825] perf_reboot (kernel/events/core.c:14066 (discriminator 
> 3)) 
> [2360.603979][T11825] ? trace_notifier_run 
> (include/trace/events/notifier.h:59 (discriminator 2)) 
> [2360.604454][T11825] notifier_call_chain (kernel/notifier.c:93) 
> [2360.604833][T11825] blocking_notifier_call_chain (kernel/notifier.c:389) 
> [2360.605522][T11825] kernel_power_off (kernel/reboot.c:294) 
> [2360.605908][T11825] __do_sys_reboot (kernel/reboot.c:771) 
> [2360.606320][T11825] ? __pfx___do_sys_reboot (kernel/reboot.c:717) 
> [2360.606739][T11825] ? __pfx_ksys_sync (fs/sync.c:98) 
> [2360.607113][T11825] do_syscall_64 (arch/x86/entry/common.c:52 
> (discriminator 1)) 
> [2360.607551][T11825] entry_SYSCALL_64_after_hwframe 
> (arch/x86/entry/entry_64.S:130) 
> [ 2360.607911][T11825] RIP: 0033:0x20e505
> [ 2360.608320][T11825] Code: 48 89 77 38 c3 89 f0 48 8b 1f 48 8b 67 08 48 8b 
> 6f 10 4c 8b 67 18 4c 8b 6f 20 4c 8b 77 28 4c 8b 7f 30 ff 67 38 49 89 ca 0f 05 
> <48> 3d 01 f0 ff ff 73 01 c3 f7 d8 89 05 d6 8e 00 00 48 83 c8 ff c3
> All code
> 
>0: 48 89 77 38 mov%rsi,0x38(%rdi)
>4: c3  ret
>5: 89 f0   mov%esi,%eax
>7: 48 8b 1fmov(%rdi),%rbx
>a: 48 8b 67 08 mov0x8(%rdi),%rsp
>e: 48 8b 6f 10 mov0x10(%rdi),%rbp
>   12: 4c 8b 67 18 mov0x18(%rdi),%r12
>   16: 4c 8b 6f 20 mov0x20(%rdi),%r13
>   1a: 4c 8b 77 28 mov0x28(%rdi),%r14
>   1e: 4c 8b 7f 30

Re: [PATCH rcu] configs/debug: make sure PROVE_RCU_LIST=y takes effect

2024-10-16 Thread Matthieu Baerts

Hi Jakub,

On 16/10/2024 03:11, Jakub Kicinski wrote:
> Commit 0aaa8977acbf ("configs: introduce debug.config for CI-like setup")
> added CONFIG_PROVE_RCU_LIST=y to the common CI config,
> but RCU_EXPERT is not set, and it's a dependency for
> CONFIG_PROVE_RCU_LIST=y. Make sure CIs take advantage
> of CONFIG_PROVE_RCU_LIST=y, recent fixes in networking
> indicate that it does catch bugs.

Good catch! I confirm it fixes the issue:

Acked-by: Matthieu Baerts (NGI0) 

> Signed-off-by: Jakub Kicinski 
> ---
> I'd be slightly tempted to still send it to Linux for v6.12.

Good idea, it sounds like a fix because I guess if it was on the list,
it was supposed to be used. But be careful it might detect quite a few
issues: I just enabled it on MPTCP tree, and it found issues.

It might be good to check the impact first, e.g. at least enabling it
when running the Networking tests. Please note that the CI didn't pick
up this patch, because it is marked for RCU (PATCH rcu):

https://patchwork.kernel.org/project/netdevbpf/patch/20241016011144.3058445-1-k...@kernel.org/

If the impact is important, it might be better to target linux-next
first, no?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

[PATCH net v2 2/2] selftests: mptcp: join: test for prohibited MPC to port-based endp

2024-10-14 Thread Matthieu Baerts (NGI0)

From: Paolo Abeni 

Explicitly verify that MPC connection attempts towards a port-based
signal endpoint fail with a reset.

Note that this new test is a bit different from the other ones, not
using 'run_tests'. It is then needed to add the capture capability, and
the picking the right port which have been extracted into three new
helpers. The info about the capture can also be printed from a single
point, which simplifies the exit paths in do_transfer().

The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.

Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port")
Cc: sta...@vger.kernel.org
Co-developed-by: Matthieu Baerts (NGI0) 
Signed-off-by: Matthieu Baerts (NGI0) 
Signed-off-by: Paolo Abeni 
Reviewed-by: Mat Martineau 
Signed-off-by: Matthieu Baerts (NGI0) 
---
 tools/testing/selftests/net/mptcp/mptcp_join.sh | 117 +---
 1 file changed, 86 insertions(+), 31 deletions(-)

diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh 
b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index 
e8d0a01b4144264615d92b953a69ebd934ce468e..c07e2bd3a315aac9c422fed85c3196ec46e060f7
 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -23,6 +23,7 @@ tmpfile=""
 cout=""
 err=""
 capout=""
+cappid=""
 ns1=""
 ns2=""
 iptables="iptables"
@@ -887,6 +888,44 @@ check_cestab()
fi
 }
 
+cond_start_capture()
+{
+   local ns="$1"
+
+   :> "$capout"
+
+   if $capture; then
+   local capuser capfile
+   if [ -z $SUDO_USER ]; then
+   capuser=""
+   else
+   capuser="-Z $SUDO_USER"
+   fi
+
+   capfile=$(printf "mp_join-%02u-%s.pcap" 
"$MPTCP_LIB_TEST_COUNTER" "$ns")
+
+   echo "Capturing traffic for test $MPTCP_LIB_TEST_COUNTER into 
$capfile"
+   ip netns exec "$ns" tcpdump -i any -s 65535 -B 32768 $capuser 
-w "$capfile" > "$capout" 2>&1 &
+   cappid=$!
+
+   sleep 1
+   fi
+}
+
+cond_stop_capture()
+{
+   if $capture; then
+   sleep 1
+   kill $cappid
+   cat "$capout"
+   fi
+}
+
+get_port()
+{
+   echo "$((1 + MPTCP_LIB_TEST_COUNTER - 1))"
+}
+
 do_transfer()
 {
local listener_ns="$1"
@@ -894,33 +933,17 @@ do_transfer()
local cl_proto="$3"
local srv_proto="$4"
local connect_addr="$5"
+   local port
 
-   local port=$((1 + MPTCP_LIB_TEST_COUNTER - 1))
-   local cappid
local FAILING_LINKS=${FAILING_LINKS:-""}
local fastclose=${fastclose:-""}
local speed=${speed:-"fast"}
+   port=$(get_port)
 
:> "$cout"
:> "$sout"
-   :> "$capout"
 
-   if $capture; then
-   local capuser
-   if [ -z $SUDO_USER ] ; then
-   capuser=""
-   else
-   capuser="-Z $SUDO_USER"
-   fi
-
-   capfile=$(printf "mp_join-%02u-%s.pcap" 
"$MPTCP_LIB_TEST_COUNTER" "${listener_ns}")
-
-   echo "Capturing traffic for test $MPTCP_LIB_TEST_COUNTER into 
$capfile"
-   ip netns exec ${listener_ns} tcpdump -i any -s 65535 -B 32768 
$capuser -w $capfile > "$capout" 2>&1 &
-   cappid=$!
-
-   sleep 1
-   fi
+   cond_start_capture ${listener_ns}
 
NSTAT_HISTORY=/tmp/${listener_ns}.nstat ip netns exec ${listener_ns} \
nstat -n
@@ -1007,10 +1030,7 @@ do_transfer()
wait $spid
local rets=$?
 
-   if $capture; then
-   sleep 1
-   kill $cappid
-   fi
+   cond_stop_capture
 
NSTAT_HISTORY=/tmp/${listener_ns}.nstat ip netns exec ${listener_ns} \
nstat | grep Tcp > /tmp/${listener_ns}.out
@@ -1026,7 +1046,6 @@ do_transfer()
ip netns exec ${connector_ns} ss -Menita 1>&2 -o "dport = 
:$port"
cat /tmp/${connector_ns}.out
 
-   cat "$capout"
return 1
fi
 
@@ -1043,13 +1062,7 @@ do_transfer()
fi
rets=$?
 
-   if [ $retc -eq 0 ] && [ $rets -eq 0 ];then
-   cat "$capout"
-   return 0
-   fi
-
-   cat "$capout"
-   return 1
+

[PATCH net v2 1/2] mptcp: prevent MPC handshake on port-based signal endpoints

2024-10-14 Thread Matthieu Baerts (NGI0)

:540 [inline]
   ip_finish_output2+0xd41/0x1390 net/ipv4/ip_output.c:235
   ip_local_out net/ipv4/ip_output.c:129 [inline]
   __ip_queue_xmit+0x118c/0x1b80 net/ipv4/ip_output.c:535
   __tcp_transmit_skb+0x2544/0x3b30 net/ipv4/tcp_output.c:1466
   tcp_rcv_synsent_state_process net/ipv4/tcp_input.c:6542 [inline]
   tcp_rcv_state_process+0x2c32/0x4570 net/ipv4/tcp_input.c:6729
   tcp_v4_do_rcv+0x77d/0xc70 net/ipv4/tcp_ipv4.c:1934
   sk_backlog_rcv include/net/sock.h: [inline]
   __release_sock+0x214/0x350 net/core/sock.c:3004
   release_sock+0x61/0x1f0 net/core/sock.c:3558
   mptcp_sendmsg_fastopen+0x1ad/0x530 net/mptcp/protocol.c:1733
   mptcp_sendmsg+0x1884/0x1b10 net/mptcp/protocol.c:1812
   sock_sendmsg_nosec net/socket.c:730 [inline]
   __sock_sendmsg+0x1a6/0x270 net/socket.c:745
   sys_sendmsg+0x525/0x7d0 net/socket.c:2597
   ___sys_sendmsg net/socket.c:2651 [inline]
   __sys_sendmmsg+0x3b2/0x740 net/socket.c:2737
   __do_sys_sendmmsg net/socket.c:2766 [inline]
   __se_sys_sendmmsg net/socket.c:2763 [inline]
   __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2763
   do_syscall_x64 arch/x86/entry/common.c:52 [inline]
   do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
  RIP: 0033:0x7f04fb13a6b9
  Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 01 1a 00 00 90 48 89 f8 48 89 f7 48 
89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 
c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
  RSP: 002b:7ffd651f42d8 EFLAGS: 0246 ORIG_RAX: 0133
  RAX: ffda RBX: 0003 RCX: 7f04fb13a6b9
  RDX: 0001 RSI: 2d00 RDI: 0004
  RBP: 7ffd651f4310 R08: 0001 R09: 0001
  R10: 2080 R11: 0246 R12: 000f4240
  R13: 7f04fb187449 R14: 7ffd651f42f4 R15: 7ffd651f4300
   

As noted by Cong Wang, the splat is false positive, but the code
path leading to the report is an unexpected one: a client is
attempting an MPC handshake towards the in-kernel listener created
by the in-kernel PM for a port based signal endpoint.

Such connection will be never accepted; many of them can make the
listener queue full and preventing the creation of MPJ subflow via
such listener - its intended role.

Explicitly detect this scenario at initial-syn time and drop the
incoming MPC request.

Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port")
Cc: sta...@vger.kernel.org
Reported-by: syzbot+f4aacdfef2c6a6529...@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f4aacdfef2c6a6529c3e
Cc: Cong Wang 
Signed-off-by: Paolo Abeni 
Reviewed-by: Matthieu Baerts (NGI0) 
Reviewed-by: Mat Martineau 
Signed-off-by: Matthieu Baerts (NGI0) 
---
 net/mptcp/mib.c|  1 +
 net/mptcp/mib.h|  1 +
 net/mptcp/pm_netlink.c |  1 +
 net/mptcp/protocol.h   |  1 +
 net/mptcp/subflow.c| 11 +++
 5 files changed, 15 insertions(+)

diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c
index 
ad88bd3c58dffed8335eedb43ca6290418e3c4f4..19eb9292bd6093a760b41f98c1774fd2490c48e3
 100644
--- a/net/mptcp/mib.c
+++ b/net/mptcp/mib.c
@@ -17,6 +17,7 @@ static const struct snmp_mib mptcp_snmp_list[] = {
SNMP_MIB_ITEM("MPCapableFallbackSYNACK", 
MPTCP_MIB_MPCAPABLEACTIVEFALLBACK),
SNMP_MIB_ITEM("MPCapableSYNTXDrop", MPTCP_MIB_MPCAPABLEACTIVEDROP),
SNMP_MIB_ITEM("MPCapableSYNTXDisabled", 
MPTCP_MIB_MPCAPABLEACTIVEDISABLED),
+   SNMP_MIB_ITEM("MPCapableEndpAttempt", MPTCP_MIB_MPCAPABLEENDPATTEMPT),
SNMP_MIB_ITEM("MPFallbackTokenInit", MPTCP_MIB_TOKENFALLBACKINIT),
SNMP_MIB_ITEM("MPTCPRetrans", MPTCP_MIB_RETRANSSEGS),
SNMP_MIB_ITEM("MPJoinNoTokenFound", MPTCP_MIB_JOINNOTOKEN),
diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h
index 
3206cdda8bb1067f9a8354fd45deed86b67ac7da..128282982843a07614a46f9b2c2f7c708306c769
 100644
--- a/net/mptcp/mib.h
+++ b/net/mptcp/mib.h
@@ -12,6 +12,7 @@ enum linux_mptcp_mib_field {
MPTCP_MIB_MPCAPABLEACTIVEFALLBACK, /* Client-side fallback during 3-way 
handshake */
MPTCP_MIB_MPCAPABLEACTIVEDROP,  /* Client-side fallback due to a MPC 
drop */
MPTCP_MIB_MPCAPABLEACTIVEDISABLED, /* Client-side disabled due to past 
issues */
+   MPTCP_MIB_MPCAPABLEENDPATTEMPT, /* Prohibited MPC to port-based endp */
MPTCP_MIB_TOKENFALLBACKINIT,/* Could not init/allocate token */
MPTCP_MIB_RETRANSSEGS,  /* Segments retransmitted at the 
MPTCP-level */
MPTCP_MIB_JOINNOTOKEN,  /* Received MP_JOIN but the token was 
not found */
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 
f6f0a38a0750f82bc909f02a75beec980d951f1f..1a78998fe1f49510ef1f487c8cd5b369a37fb20d
 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -1121,6 +1121,7 @@ static int mptcp_pm_nl_create_listen_socket(struct sock

[PATCH net v2 0/2] mptcp: prevent MPC handshake on port-based signal endpoints

2024-10-14 Thread Matthieu Baerts (NGI0)

MPTCP connection requests toward a listening socket created by the
in-kernel PM for a port based signal endpoint will never be accepted,
they need to be explicitly rejected.

- Patch 1: Explicitly reject such requests. A fix for >= v5.12.

- Patch 2: Cover this case in the MPTCP selftests to avoid regressions.

Signed-off-by: Matthieu Baerts (NGI0) 
---
Changes in v2:
- This new version fixes the root cause for the issue Cong Wang sent a
  patch for a few weeks ago, see the v1, and the explanations below. The
  new version is very different from the v1, from a different author.
  Thanks to Cong Wang for the first analysis, and to Paolo for having
  spot the root cause, and sent a fix for it.
- Link to v1: 
https://lore.kernel.org/r/20240908180620.822579-1-xiyou.wangc...@gmail.com
- Link: 
https://lore.kernel.org/r/a5289a0d-2557-40b8-9575-6f1a0bbf0...@redhat.com

---
Paolo Abeni (2):
  mptcp: prevent MPC handshake on port-based signal endpoints
  selftests: mptcp: join: test for prohibited MPC to port-based endp

 net/mptcp/mib.c |   1 +
 net/mptcp/mib.h |   1 +
 net/mptcp/pm_netlink.c  |   1 +
 net/mptcp/protocol.h|   1 +
 net/mptcp/subflow.c |  11 +++
 tools/testing/selftests/net/mptcp/mptcp_join.sh | 117 +---
 6 files changed, 101 insertions(+), 31 deletions(-)
---
base-commit: 174714f0e505070a16be6fbede30d32b81df789f
change-id: 20241014-net-mptcp-mpc-port-endp-4f88bd428ec7

Best regards,
-- 
Matthieu Baerts (NGI0)

[PATCH bpf-next/net v7 2/3] selftests/bpf: Add getsockopt to inspect mptcp subflow

From: Geliang Tang 

This patch adds a "cgroup/getsockopt" way to inspect the subflows of an
MPTCP socket, and verify the modifications done by the same BPF program
in the previous commit: a different mark per subflow, and a different
TCP CC set on the second one. This new hook will be used by the next
commit to verify the socket options set on each subflow.

This extra "cgroup/getsockopt" prog walks the msk->conn_list and use
bpf_core_cast to cast a pointer for readonly. It allows to inspect all
the fields of a structure.

Note that on the kernel side, the MPTCP socket stores a list of subflows
under 'msk->conn_list'. They can be iterated using the generic 'list'
helpers. They have been imported here, with a small difference:
list_for_each_entry() uses 'can_loop' to limit the number of iterations,
and ease its use. Because only data need to be read here, it is enough
to use this technique. It is planned to use bpf_iter, when BPF programs
will be used to modify data from the different subflows.
mptcp_subflow_tcp_sock() and mptcp_for_each_stubflow() helpers have also
be imported.

Suggested-by: Martin KaFai Lau 
Signed-off-by: Geliang Tang 
Reviewed-by: Matthieu Baerts (NGI0) 
Signed-off-by: Matthieu Baerts (NGI0) 
---
Notes:
  - v5: new patch, instead of using 'ss' in the following patch
  - v7: use 'can_loop' instead of 'cond_break'. (Martin)
---
 MAINTAINERS   |  2 +-
 tools/testing/selftests/bpf/progs/mptcp_bpf.h | 42 ++
 tools/testing/selftests/bpf/progs/mptcp_subflow.c | 69 +++
 3 files changed, 112 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 
3bce6cc05553dad53db5f06d36e6957061886dd0..8817aa26b2fc0ba3581576d040f5093124cc60a7
 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16097,7 +16097,7 @@ F:  include/net/mptcp.h
 F: include/trace/events/mptcp.h
 F: include/uapi/linux/mptcp*.h
 F: net/mptcp/
-F: tools/testing/selftests/bpf/*/*mptcp*.c
+F: tools/testing/selftests/bpf/*/*mptcp*.[ch]
 F: tools/testing/selftests/net/mptcp/
 
 NETWORKING [TCP]
diff --git a/tools/testing/selftests/bpf/progs/mptcp_bpf.h 
b/tools/testing/selftests/bpf/progs/mptcp_bpf.h
new file mode 100644
index 
..3b188ccdcc4041acb4f7ed38ae8ddf5a7305466a
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/mptcp_bpf.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
+#ifndef __MPTCP_BPF_H__
+#define __MPTCP_BPF_H__
+
+#include "bpf_experimental.h"
+
+/* list helpers from include/linux/list.h */
+static inline int list_is_head(const struct list_head *list,
+  const struct list_head *head)
+{
+   return list == head;
+}
+
+#define list_entry(ptr, type, member)  \
+   container_of(ptr, type, member)
+
+#define list_first_entry(ptr, type, member)\
+   list_entry((ptr)->next, type, member)
+
+#define list_next_entry(pos, member)   \
+   list_entry((pos)->member.next, typeof(*(pos)), member)
+
+#define list_entry_is_head(pos, head, member)  \
+   list_is_head(&pos->member, (head))
+
+/* small difference: 'can_loop' has been added in the conditions */
+#define list_for_each_entry(pos, head, member) \
+   for (pos = list_first_entry(head, typeof(*pos), member);\
+!list_entry_is_head(pos, head, member) && can_loop;\
+pos = list_next_entry(pos, member))
+
+/* mptcp helpers from protocol.h */
+#define mptcp_for_each_subflow(__msk, __subflow)   \
+   list_for_each_entry(__subflow, &((__msk)->conn_list), node)
+
+static __always_inline struct sock *
+mptcp_subflow_tcp_sock(const struct mptcp_subflow_context *subflow)
+{
+   return subflow->tcp_sock;
+}
+
+#endif
diff --git a/tools/testing/selftests/bpf/progs/mptcp_subflow.c 
b/tools/testing/selftests/bpf/progs/mptcp_subflow.c
index 
2e28f4a215b5469fcbc31168071887687ca34792..70302477e326eecaef6aad4ecf899aa3d6606f23
 100644
--- a/tools/testing/selftests/bpf/progs/mptcp_subflow.c
+++ b/tools/testing/selftests/bpf/progs/mptcp_subflow.c
@@ -4,10 +4,12 @@
 
 /* vmlinux.h, bpf_helpers.h and other 'define' */
 #include "bpf_tracing_net.h"
+#include "mptcp_bpf.h"
 
 char _license[] SEC("license") = "GPL";
 
 char cc[TCP_CA_NAME_MAX] = "reno";
+int pid;
 
 /* Associate a subflow counter to each token */
 struct {
@@ -57,3 +59,70 @@ int mptcp_subflow(struct bpf_sock_ops *skops)
 
return 1;
 }
+
+static int _check_getsockopt_subflow_mark(struct mptcp_sock *msk, struct 
bpf_sockopt *ctx)
+{
+   struct mptcp

[PATCH bpf-next/net v7 3/3] selftests/bpf: Add mptcp subflow subtest

From: Geliang Tang 

This patch adds a subtest named test_subflow in test_mptcp to load and
verify the newly added MPTCP subflow BPF program. To goal is to make
sure it is possible to set different socket options per subflows, while
the userspace socket interface only lets the application to set the same
socket options for the whole MPTCP connection and its multiple subflows.

To check that, a client and a server are started in a dedicated netns,
with veth interfaces to simulate multiple paths. They will exchange data
to allow the creation of an additional subflow.

When the different subflows are being created, the new MPTCP subflow BPF
program will set some socket options: marks and TCP CC. The validation
is done by the same program, when the userspace checks the value of the
modified socket options. On the userspace side, it will see that the
default values are still being used on the MPTCP connection, while the
BPF program will see different options set per subflow of the same MPTCP
connection.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/76
Signed-off-by: Geliang Tang 
Reviewed-by: Mat Martineau 
Signed-off-by: Matthieu Baerts (NGI0) 
---
Notes:
 - v2 -> v3:
   - Use './mptcp_pm_nl_ctl' instead of 'ip mptcp', not supported by the
 BPF CI running IPRoute 5.5.0.
   - Use SYS_NOFAIL() in _ss_search() instead of calling system()
 - v3 -> v4:
   - Drop './mptcp_pm_nl_ctl', but skip this new test if 'ip mptcp' is
 not supported.
 - v4 -> v5:
   - Note that this new test is no longer skipped on the BPF CI, because
 'ip mptcp' is now supported after the switch from Ubuntu 20.04 to
 22.04.
   - Update the commit message, reflecting the latest version.
   - The validations are no longer done using 'ss', but using the new
 BPF program added in the previous patch, to reduce the use of
 external dependences. (Martin)
 - v5 -> v6:
   - Use usleep() instead of sleep().
 - v6 -> v7:
   - Drop mptcp_subflow__attach(), use bpf_program__attach_cgroup()
 instead of bpf_prog_attach(), plus assign the returned value to
 skel->links.* directly. (Martin)
---
 tools/testing/selftests/bpf/prog_tests/mptcp.c | 121 +
 1 file changed, 121 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c 
b/tools/testing/selftests/bpf/prog_tests/mptcp.c
index 
d2ca32fa3b21e686d6ef2673b5953d5417edfedb..b61f26b8cdf2540a34e28ddb8a5f1f2378cf8c06
 100644
--- a/tools/testing/selftests/bpf/prog_tests/mptcp.c
+++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c
@@ -5,12 +5,17 @@
 #include 
 #include 
 #include 
+#include 
 #include "cgroup_helpers.h"
 #include "network_helpers.h"
 #include "mptcp_sock.skel.h"
 #include "mptcpify.skel.h"
+#include "mptcp_subflow.skel.h"
 
 #define NS_TEST "mptcp_ns"
+#define ADDR_1 "10.0.1.1"
+#define ADDR_2 "10.0.1.2"
+#define PORT_1 10001
 
 #ifndef IPPROTO_MPTCP
 #define IPPROTO_MPTCP 262
@@ -335,10 +340,126 @@ static void test_mptcpify(void)
close(cgroup_fd);
 }
 
+static int endpoint_init(char *flags)
+{
+   SYS(fail, "ip -net %s link add veth1 type veth peer name veth2", 
NS_TEST);
+   SYS(fail, "ip -net %s addr add %s/24 dev veth1", NS_TEST, ADDR_1);
+   SYS(fail, "ip -net %s link set dev veth1 up", NS_TEST);
+   SYS(fail, "ip -net %s addr add %s/24 dev veth2", NS_TEST, ADDR_2);
+   SYS(fail, "ip -net %s link set dev veth2 up", NS_TEST);
+   if (SYS_NOFAIL("ip -net %s mptcp endpoint add %s %s", NS_TEST, ADDR_2, 
flags)) {
+   printf("'ip mptcp' not supported, skip this test.\n");
+   test__skip();
+   goto fail;
+   }
+
+   return 0;
+fail:
+   return -1;
+}
+
+static void wait_for_new_subflows(int fd)
+{
+   socklen_t len;
+   u8 subflows;
+   int err, i;
+
+   len = sizeof(subflows);
+   /* Wait max 1 sec for new subflows to be created */
+   for (i = 0; i < 10; i++) {
+   err = getsockopt(fd, SOL_MPTCP, MPTCP_INFO, &subflows, &len);
+   if (!err && subflows > 0)
+   break;
+
+   usleep(10); /* 0.1s */
+   }
+}
+
+static void run_subflow(void)
+{
+   int server_fd, client_fd, err;
+   char new[TCP_CA_NAME_MAX];
+   char cc[TCP_CA_NAME_MAX];
+   unsigned int mark;
+   socklen_t len;
+
+   server_fd = start_mptcp_server(AF_INET, ADDR_1, PORT_1, 0);
+   if (!ASSERT_OK_FD(server_fd, "start_mptcp_server"))
+   return;
+
+   client_fd = connect_to_fd(server_fd, 0);
+   if (!ASSERT_OK_FD(client_fd, "connect_to_fd"))
+   goto close_server;
+
+   send_byte(client_fd);
+   wait_for_new_subflows(client_fd);

[PATCH bpf-next/net v7 1/3] selftests/bpf: Add mptcp subflow example

From: Nicolas Rybowski 

Move Nicolas' patch into bpf selftests directory. This example adds a
different mark (SO_MARK) on each subflow, and changes the TCP CC only on
the first subflow.

>From the userspace, an application can do a setsockopt() on an MPTCP
socket, and typically the same value will be propagated to all subflows
(paths). If someone wants to have different values per subflow, the
recommended way is to use BPF. So it is good to add such example here,
and make sure there is no regressions.

This example shows how it is possible to:

Identify the parent msk of an MPTCP subflow.
Put different sockopt for each subflow of a same MPTCP connection.

Here especially, two different behaviours are implemented:

A socket mark (SOL_SOCKET SO_MARK) is put on each subflow of a same
MPTCP connection. The order of creation of the current subflow defines
its mark. The TCP CC algorithm of the very first subflow of an MPTCP
connection is set to "reno".

This is just to show it is possible to identify an MPTCP connection, and
set socket options, from different SOL levels, per subflow. "reno" has
been picked because it is built-in and usually not set as default one.
It is easy to verify with 'ss' that these modifications have been
applied correctly. That's what the next patch is going to do.

Nicolas' code comes from:

commit 4d120186e4d6 ("bpf:examples: update mptcp_set_mark_kern.c")

from the MPTCP repo https://github.com/multipath-tcp/mptcp_net-next (the
"scripts" branch), and it has been adapted by Geliang.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/76
Co-developed-by: Geliang Tang 
Signed-off-by: Geliang Tang 
Signed-off-by: Nicolas Rybowski 
Reviewed-by: Mat Martineau 
Signed-off-by: Matthieu Baerts (NGI0) 
---
Notes:
 - v1 -> v2:
   - The commit message has been updated: why setting multiple socket
 options, why reno, the verification is done in a later patch
 (different author). (Alexei)
 - v2 -> v3:
   - Only #include "bpf_tracing_net.h", linked to:
 https://lore.kernel.org/20240509175026.3423614-1-martin@linux.dev
 - v4 -> v5:
   - Set reno as TCP cc on the second subflow, not to influence the
 getsockopt() done from the userspace, which will return the one
 from the first subflow, the default TCP cc then, not the modified
 one.
---
 tools/testing/selftests/bpf/progs/mptcp_subflow.c | 59 +++
 1 file changed, 59 insertions(+)

diff --git a/tools/testing/selftests/bpf/progs/mptcp_subflow.c 
b/tools/testing/selftests/bpf/progs/mptcp_subflow.c
new file mode 100644
index 
..2e28f4a215b5469fcbc31168071887687ca34792
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/mptcp_subflow.c
@@ -0,0 +1,59 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020, Tessares SA. */
+/* Copyright (c) 2024, Kylin Software */
+
+/* vmlinux.h, bpf_helpers.h and other 'define' */
+#include "bpf_tracing_net.h"
+
+char _license[] SEC("license") = "GPL";
+
+char cc[TCP_CA_NAME_MAX] = "reno";
+
+/* Associate a subflow counter to each token */
+struct {
+   __uint(type, BPF_MAP_TYPE_HASH);
+   __uint(key_size, sizeof(__u32));
+   __uint(value_size, sizeof(__u32));
+   __uint(max_entries, 100);
+} mptcp_sf SEC(".maps");
+
+SEC("sockops")
+int mptcp_subflow(struct bpf_sock_ops *skops)
+{
+   __u32 init = 1, key, mark, *cnt;
+   struct mptcp_sock *msk;
+   struct bpf_sock *sk;
+   int err;
+
+   if (skops->op != BPF_SOCK_OPS_TCP_CONNECT_CB)
+   return 1;
+
+   sk = skops->sk;
+   if (!sk)
+   return 1;
+
+   msk = bpf_skc_to_mptcp_sock(sk);
+   if (!msk)
+   return 1;
+
+   key = msk->token;
+   cnt = bpf_map_lookup_elem(&mptcp_sf, &key);
+   if (cnt) {
+   /* A new subflow is added to an existing MPTCP connection */
+   __sync_fetch_and_add(cnt, 1);
+   mark = *cnt;
+   } else {
+   /* A new MPTCP connection is just initiated and this is its 
primary subflow */
+   bpf_map_update_elem(&mptcp_sf, &key, &init, BPF_ANY);
+   mark = init;
+   }
+
+   /* Set the mark of the subflow's socket based on appearance order */
+   err = bpf_setsockopt(skops, SOL_SOCKET, SO_MARK, &mark, sizeof(mark));
+   if (err < 0)
+   return 1;
+   if (mark == 2)
+   err = bpf_setsockopt(skops, SOL_TCP, TCP_CONGESTION, cc, 
TCP_CA_NAME_MAX);
+
+   return 1;
+}

-- 
2.45.2

[PATCH bpf-next/net v7 0/3] selftests/bpf: new MPTCP subflow subtest

In this series from Geliang, modifying MPTCP BPF selftests, we have:

- A new MPTCP subflow BPF program setting socket options per subflow: it
  looks better to have this old test program in the BPF selftests to
  track regressions and to serve as example.

  Note: Nicolas is no longer working at Tessares, but he did this work
  while working for them, and his email address is no longer available.

- A new hook in the same BPF program to do the verification step.

- A new MPTCP BPF subtest validating the new BPF program added in the
  first patch, with the help of the new hook added in the second patch.

Signed-off-by: Matthieu Baerts (NGI0) 
---
Changes in v7:
- Patch 2/3: use 'can_loop' instead of 'cond_break'. (Martin)
- Patch 3/3: use bpf_program__attach_cgroup(). (Martin)
- Link to v6: 
https://lore.kernel.org/r/20240911-upstream-bpf-next-20240506-mptcp-subflow-test-v6-0-7872294c4...@kernel.org

Changes in v6:
- Patch 3/3: use usleep() instead of sleep()
- Series: rebased on top of bpf-next/net
- Link to v5: 
https://lore.kernel.org/r/20240910-upstream-bpf-next-20240506-mptcp-subflow-test-v5-0-2c664a7da...@kernel.org

Changes in v5:
- See the individual changelog for more details about them
- Patch 1/3: set TCP on the 2nd subflow
- Patch 2/3: new
- Patch 3/3: use the BPF program from patch 2/3 to do the validation
 instead of using ss.
- Series: rebased on top of bpf-next/net
- Link to v4: 
https://lore.kernel.org/r/20240805-upstream-bpf-next-20240506-mptcp-subflow-test-v4-0-2b4ca6994...@kernel.org

Changes in v4:
- Drop former patch 2/3: MPTCP's pm_nl_ctl requires a new header file:
  - I will check later if it is possible to avoid having duplicated
header files in tools/include/uapi, but no need to block this series
for that. Patch 2/3 can be added later if needed.
- Patch 2/2: skip the test if 'ip mptcp' is not available.
- Link to v3: 
https://lore.kernel.org/r/20240703-upstream-bpf-next-20240506-mptcp-subflow-test-v3-0-ebdc2d494...@kernel.org

Changes in v3:
- Sorry for the delay between v2 and v3, this series was conflicting
  with the "add netns helpers", but it looks like it is on hold:
  https://lore.kernel.org/cover.1715821541.git.tanggeli...@kylinos.cn
- Patch 1/3 includes "bpf_tracing_net.h", introduced in between.
- New patch 2/3: "selftests/bpf: Add mptcp pm_nl_ctl link".
- Patch 3/3: use the tool introduced in patch 2/3 + SYS_NOFAIL() helper.
- Link to v2: 
https://lore.kernel.org/r/20240509-upstream-bpf-next-20240506-mptcp-subflow-test-v2-0-4048c2948...@kernel.org

Changes in v2:
- Previous patches 1/4 and 2/4 have been dropped from this series:
  - 1/4: "selftests/bpf: Handle SIGINT when creating netns":
- A new version, more generic and no longer specific to MPTCP BPF
  selftest will be sent later, as part of a new series. (Alexei)
  - 2/4: "selftests/bpf: Add RUN_MPTCP_TEST macro":
- Removed, not to hide helper functions in macros. (Alexei)
- The commit message of patch 1/2 has been clarified to avoid some
  possible confusions spot by Alexei.
- Link to v1: 
https://lore.kernel.org/r/20240507-upstream-bpf-next-20240506-mptcp-subflow-test-v1-0-e2bcbdf49...@kernel.org

---
Geliang Tang (2):
  selftests/bpf: Add getsockopt to inspect mptcp subflow
  selftests/bpf: Add mptcp subflow subtest

Nicolas Rybowski (1):
  selftests/bpf: Add mptcp subflow example

 MAINTAINERS   |   2 +-
 tools/testing/selftests/bpf/prog_tests/mptcp.c| 121 
 tools/testing/selftests/bpf/progs/mptcp_bpf.h |  42 +++
 tools/testing/selftests/bpf/progs/mptcp_subflow.c | 128 ++
 4 files changed, 292 insertions(+), 1 deletion(-)
---
base-commit: 151ac45348afc5b56baa584c7cd4876addf461ff
change-id: 20240506-upstream-bpf-next-20240506-mptcp-subflow-test-faef6654bfa3

Best regards,
-- 
Matthieu Baerts (NGI0)

[PATCH bpf-next/net v6 1/3] selftests/bpf: Add mptcp subflow example

From: Nicolas Rybowski 

Move Nicolas' patch into bpf selftests directory. This example adds a
different mark (SO_MARK) on each subflow, and changes the TCP CC only on
the first subflow.

>From the userspace, an application can do a setsockopt() on an MPTCP
socket, and typically the same value will be propagated to all subflows
(paths). If someone wants to have different values per subflow, the
recommended way is to use BPF. So it is good to add such example here,
and make sure there is no regressions.

This example shows how it is possible to:

Identify the parent msk of an MPTCP subflow.
Put different sockopt for each subflow of a same MPTCP connection.

Here especially, two different behaviours are implemented:

A socket mark (SOL_SOCKET SO_MARK) is put on each subflow of a same
MPTCP connection. The order of creation of the current subflow defines
its mark. The TCP CC algorithm of the very first subflow of an MPTCP
connection is set to "reno".

This is just to show it is possible to identify an MPTCP connection, and
set socket options, from different SOL levels, per subflow. "reno" has
been picked because it is built-in and usually not set as default one.
It is easy to verify with 'ss' that these modifications have been
applied correctly. That's what the next patch is going to do.

Nicolas' code comes from:

commit 4d120186e4d6 ("bpf:examples: update mptcp_set_mark_kern.c")

from the MPTCP repo https://github.com/multipath-tcp/mptcp_net-next (the
"scripts" branch), and it has been adapted by Geliang.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/76
Co-developed-by: Geliang Tang 
Signed-off-by: Geliang Tang 
Signed-off-by: Nicolas Rybowski 
Reviewed-by: Mat Martineau 
Signed-off-by: Matthieu Baerts (NGI0) 
---
Notes:
 - v1 -> v2:
   - The commit message has been updated: why setting multiple socket
 options, why reno, the verification is done in a later patch
 (different author). (Alexei)
 - v2 -> v3:
   - Only #include "bpf_tracing_net.h", linked to:
 https://lore.kernel.org/20240509175026.3423614-1-martin@linux.dev
 - v4 -> v5:
   - Set reno as TCP cc on the second subflow, not to influence the
 getsockopt() done from the userspace, which will return the one
 from the first subflow, the default TCP cc then, not the modified
 one.
---
 tools/testing/selftests/bpf/progs/mptcp_subflow.c | 59 +++
 1 file changed, 59 insertions(+)

diff --git a/tools/testing/selftests/bpf/progs/mptcp_subflow.c 
b/tools/testing/selftests/bpf/progs/mptcp_subflow.c
new file mode 100644
index ..2e28f4a215b5
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/mptcp_subflow.c
@@ -0,0 +1,59 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020, Tessares SA. */
+/* Copyright (c) 2024, Kylin Software */
+
+/* vmlinux.h, bpf_helpers.h and other 'define' */
+#include "bpf_tracing_net.h"
+
+char _license[] SEC("license") = "GPL";
+
+char cc[TCP_CA_NAME_MAX] = "reno";
+
+/* Associate a subflow counter to each token */
+struct {
+   __uint(type, BPF_MAP_TYPE_HASH);
+   __uint(key_size, sizeof(__u32));
+   __uint(value_size, sizeof(__u32));
+   __uint(max_entries, 100);
+} mptcp_sf SEC(".maps");
+
+SEC("sockops")
+int mptcp_subflow(struct bpf_sock_ops *skops)
+{
+   __u32 init = 1, key, mark, *cnt;
+   struct mptcp_sock *msk;
+   struct bpf_sock *sk;
+   int err;
+
+   if (skops->op != BPF_SOCK_OPS_TCP_CONNECT_CB)
+   return 1;
+
+   sk = skops->sk;
+   if (!sk)
+   return 1;
+
+   msk = bpf_skc_to_mptcp_sock(sk);
+   if (!msk)
+   return 1;
+
+   key = msk->token;
+   cnt = bpf_map_lookup_elem(&mptcp_sf, &key);
+   if (cnt) {
+   /* A new subflow is added to an existing MPTCP connection */
+   __sync_fetch_and_add(cnt, 1);
+   mark = *cnt;
+   } else {
+   /* A new MPTCP connection is just initiated and this is its 
primary subflow */
+   bpf_map_update_elem(&mptcp_sf, &key, &init, BPF_ANY);
+   mark = init;
+   }
+
+   /* Set the mark of the subflow's socket based on appearance order */
+   err = bpf_setsockopt(skops, SOL_SOCKET, SO_MARK, &mark, sizeof(mark));
+   if (err < 0)
+   return 1;
+   if (mark == 2)
+   err = bpf_setsockopt(skops, SOL_TCP, TCP_CONGESTION, cc, 
TCP_CA_NAME_MAX);
+
+   return 1;
+}

-- 
2.45.2

[PATCH bpf-next/net v6 3/3] selftests/bpf: Add mptcp subflow subtest

From: Geliang Tang 

This patch adds a subtest named test_subflow in test_mptcp to load and
verify the newly added MPTCP subflow BPF program. To goal is to make
sure it is possible to set different socket options per subflows, while
the userspace socket interface only lets the application to set the same
socket options for the whole MPTCP connection and its multiple subflows.

To check that, a client and a server are started in a dedicated netns,
with veth interfaces to simulate multiple paths. They will exchange data
to allow the creation of an additional subflow.

When the different subflows are being created, the new MPTCP subflow BPF
program will set some socket options: marks and TCP CC. The validation
is done by the same program, when the userspace checks the value of the
modified socket options. On the userspace side, it will see that the
default values are still being used on the MPTCP connection, while the
BPF program will see different options set per subflow of the same MPTCP
connection.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/76
Signed-off-by: Geliang Tang 
Reviewed-by: Mat Martineau 
Signed-off-by: Matthieu Baerts (NGI0) 
---
Notes:
 - v2 -> v3:
   - Use './mptcp_pm_nl_ctl' instead of 'ip mptcp', not supported by the
 BPF CI running IPRoute 5.5.0.
   - Use SYS_NOFAIL() in _ss_search() instead of calling system()
 - v3 -> v4:
   - Drop './mptcp_pm_nl_ctl', but skip this new test if 'ip mptcp' is
 not supported.
 - v4 -> v5:
   - Note that this new test is no longer skipped on the BPF CI, because
 'ip mptcp' is now supported after the switch from Ubuntu 20.04 to
 22.04.
   - Update the commit message, reflecting the latest version.
   - The validations are no longer done using 'ss', but using the new
 BPF program added in the previous patch, to reduce the use of
 external dependences. (Martin)
 - v5 -> v6:
   - Use usleep() instead of sleep().
---
 tools/testing/selftests/bpf/prog_tests/mptcp.c | 127 +
 1 file changed, 127 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c 
b/tools/testing/selftests/bpf/prog_tests/mptcp.c
index d2ca32fa3b21..c76a0d8c8f93 100644
--- a/tools/testing/selftests/bpf/prog_tests/mptcp.c
+++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c
@@ -5,12 +5,17 @@
 #include 
 #include 
 #include 
+#include 
 #include "cgroup_helpers.h"
 #include "network_helpers.h"
 #include "mptcp_sock.skel.h"
 #include "mptcpify.skel.h"
+#include "mptcp_subflow.skel.h"
 
 #define NS_TEST "mptcp_ns"
+#define ADDR_1 "10.0.1.1"
+#define ADDR_2 "10.0.1.2"
+#define PORT_1 10001
 
 #ifndef IPPROTO_MPTCP
 #define IPPROTO_MPTCP 262
@@ -335,10 +340,132 @@ static void test_mptcpify(void)
close(cgroup_fd);
 }
 
+static int endpoint_init(char *flags)
+{
+   SYS(fail, "ip -net %s link add veth1 type veth peer name veth2", 
NS_TEST);
+   SYS(fail, "ip -net %s addr add %s/24 dev veth1", NS_TEST, ADDR_1);
+   SYS(fail, "ip -net %s link set dev veth1 up", NS_TEST);
+   SYS(fail, "ip -net %s addr add %s/24 dev veth2", NS_TEST, ADDR_2);
+   SYS(fail, "ip -net %s link set dev veth2 up", NS_TEST);
+   if (SYS_NOFAIL("ip -net %s mptcp endpoint add %s %s", NS_TEST, ADDR_2, 
flags)) {
+   printf("'ip mptcp' not supported, skip this test.\n");
+   test__skip();
+   goto fail;
+   }
+
+   return 0;
+fail:
+   return -1;
+}
+
+static void wait_for_new_subflows(int fd)
+{
+   socklen_t len;
+   u8 subflows;
+   int err, i;
+
+   len = sizeof(subflows);
+   /* Wait max 1 sec for new subflows to be created */
+   for (i = 0; i < 10; i++) {
+   err = getsockopt(fd, SOL_MPTCP, MPTCP_INFO, &subflows, &len);
+   if (!err && subflows > 0)
+   break;
+
+   usleep(10); /* 0.1s */
+   }
+}
+
+static void run_subflow(void)
+{
+   int server_fd, client_fd, err;
+   char new[TCP_CA_NAME_MAX];
+   char cc[TCP_CA_NAME_MAX];
+   unsigned int mark;
+   socklen_t len;
+
+   server_fd = start_mptcp_server(AF_INET, ADDR_1, PORT_1, 0);
+   if (!ASSERT_OK_FD(server_fd, "start_mptcp_server"))
+   return;
+
+   client_fd = connect_to_fd(server_fd, 0);
+   if (!ASSERT_OK_FD(client_fd, "connect_to_fd"))
+   goto close_server;
+
+   send_byte(client_fd);
+   wait_for_new_subflows(client_fd);
+
+   len = sizeof(mark);
+   err = getsockopt(client_fd, SOL_SOCKET, SO_MARK, &mark, &len);
+   if (ASSERT_OK(err, "getsockopt(client_fd, SO_MARK)"))
+   ASSERT_EQ(mark, 0, "mark");
+
+   len = siz

[PATCH bpf-next/net v6 2/3] selftests/bpf: Add getsockopt to inspect mptcp subflow

From: Geliang Tang 

This patch adds a "cgroup/getsockopt" way to inspect the subflows of an
MPTCP socket, and verify the modifications done by the same BPF program
in the previous commit: a different mark per subflow, and a different
TCP CC set on the second one. This new hook will be used by the next
commit to verify the socket options set on each subflow.

This extra "cgroup/getsockopt" prog walks the msk->conn_list and use
bpf_core_cast to cast a pointer for readonly. It allows to inspect all
the fields of a structure.

Note that on the kernel side, the MPTCP socket stores a list of subflows
under 'msk->conn_list'. They can be iterated using the generic 'list'
helpers. They have been imported here, with a small difference:
list_for_each_entry() uses 'cond_break' to limit the number of
iterations, and ease its use. Because only data need to be read here,
it is enough to use this technique. It is planned to use bpf_iter, when
BPF programs will be used to modify data from the different subflows.
mptcp_subflow_tcp_sock() and mptcp_for_each_stubflow() helpers have also
be imported.

Suggested-by: Martin KaFai Lau 
Signed-off-by: Geliang Tang 
Reviewed-by: Matthieu Baerts (NGI0) 
Signed-off-by: Matthieu Baerts (NGI0) 
---
Notes:
  - v5: new patch, instead of using 'ss' in the following patch
---
 MAINTAINERS   |  2 +-
 tools/testing/selftests/bpf/progs/mptcp_bpf.h | 42 ++
 tools/testing/selftests/bpf/progs/mptcp_subflow.c | 69 +++
 3 files changed, 112 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 30a9b9450e11..2674b86209d1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16058,7 +16058,7 @@ F:  include/net/mptcp.h
 F: include/trace/events/mptcp.h
 F: include/uapi/linux/mptcp*.h
 F: net/mptcp/
-F: tools/testing/selftests/bpf/*/*mptcp*.c
+F: tools/testing/selftests/bpf/*/*mptcp*.[ch]
 F: tools/testing/selftests/net/mptcp/
 
 NETWORKING [TCP]
diff --git a/tools/testing/selftests/bpf/progs/mptcp_bpf.h 
b/tools/testing/selftests/bpf/progs/mptcp_bpf.h
new file mode 100644
index ..179b74c1205f
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/mptcp_bpf.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
+#ifndef __MPTCP_BPF_H__
+#define __MPTCP_BPF_H__
+
+#include "bpf_experimental.h"
+
+/* list helpers from include/linux/list.h */
+static inline int list_is_head(const struct list_head *list,
+  const struct list_head *head)
+{
+   return list == head;
+}
+
+#define list_entry(ptr, type, member)  \
+   container_of(ptr, type, member)
+
+#define list_first_entry(ptr, type, member)\
+   list_entry((ptr)->next, type, member)
+
+#define list_next_entry(pos, member)   \
+   list_entry((pos)->member.next, typeof(*(pos)), member)
+
+#define list_entry_is_head(pos, head, member)  \
+   list_is_head(&pos->member, (head))
+
+/* small difference: 'cond_break' has been added in the conditions */
+#define list_for_each_entry(pos, head, member) \
+   for (pos = list_first_entry(head, typeof(*pos), member);\
+cond_break, !list_entry_is_head(pos, head, member);\
+pos = list_next_entry(pos, member))
+
+/* mptcp helpers from protocol.h */
+#define mptcp_for_each_subflow(__msk, __subflow)   \
+   list_for_each_entry(__subflow, &((__msk)->conn_list), node)
+
+static __always_inline struct sock *
+mptcp_subflow_tcp_sock(const struct mptcp_subflow_context *subflow)
+{
+   return subflow->tcp_sock;
+}
+
+#endif
diff --git a/tools/testing/selftests/bpf/progs/mptcp_subflow.c 
b/tools/testing/selftests/bpf/progs/mptcp_subflow.c
index 2e28f4a215b5..70302477e326 100644
--- a/tools/testing/selftests/bpf/progs/mptcp_subflow.c
+++ b/tools/testing/selftests/bpf/progs/mptcp_subflow.c
@@ -4,10 +4,12 @@
 
 /* vmlinux.h, bpf_helpers.h and other 'define' */
 #include "bpf_tracing_net.h"
+#include "mptcp_bpf.h"
 
 char _license[] SEC("license") = "GPL";
 
 char cc[TCP_CA_NAME_MAX] = "reno";
+int pid;
 
 /* Associate a subflow counter to each token */
 struct {
@@ -57,3 +59,70 @@ int mptcp_subflow(struct bpf_sock_ops *skops)
 
return 1;
 }
+
+static int _check_getsockopt_subflow_mark(struct mptcp_sock *msk, struct 
bpf_sockopt *ctx)
+{
+   struct mptcp_subflow_context *subflow;
+   int i = 0;
+
+   mptcp_for_each_subflow(msk, subflow) {
+   struct sock *ssk;
+
+   ssk = mptcp_subflow_tcp_sock(bpf_core_cast(subflow,
+  struct 
mptcp_subflow_context));
+
+

[PATCH bpf-next/net v6 0/3] selftests/bpf: new MPTCP subflow subtest

In this series from Geliang, modifying MPTCP BPF selftests, we have:

- A new MPTCP subflow BPF program setting socket options per subflow: it
  looks better to have this old test program in the BPF selftests to
  track regressions and to serve as example.

  Note: Nicolas is no longer working at Tessares, but he did this work
  while working for them, and his email address is no longer available.

- A new hook in the same BPF program to do the verification step.

- A new MPTCP BPF subtest validating the new BPF program added in the
  first patch, with the help of the new hook added in the second patch.

Signed-off-by: Matthieu Baerts (NGI0) 
---
Changes in v6:
- Patch 3/3: use usleep() instead of sleep()
- Series: rebased on top of bpf-next/net
- Link to v5: 
https://lore.kernel.org/r/20240910-upstream-bpf-next-20240506-mptcp-subflow-test-v5-0-2c664a7da...@kernel.org

Changes in v5:
- See the individual changelog for more details about them
- Patch 1/3: set TCP on the 2nd subflow
- Patch 2/3: new
- Patch 3/3: use the BPF program from patch 2/3 to do the validation
 instead of using ss.
- Series: rebased on top of bpf-next/net
- Link to v4: 
https://lore.kernel.org/r/20240805-upstream-bpf-next-20240506-mptcp-subflow-test-v4-0-2b4ca6994...@kernel.org

Changes in v4:
- Drop former patch 2/3: MPTCP's pm_nl_ctl requires a new header file:
  - I will check later if it is possible to avoid having duplicated
header files in tools/include/uapi, but no need to block this series
for that. Patch 2/3 can be added later if needed.
- Patch 2/2: skip the test if 'ip mptcp' is not available.
- Link to v3: 
https://lore.kernel.org/r/20240703-upstream-bpf-next-20240506-mptcp-subflow-test-v3-0-ebdc2d494...@kernel.org

Changes in v3:
- Sorry for the delay between v2 and v3, this series was conflicting
  with the "add netns helpers", but it looks like it is on hold:
  https://lore.kernel.org/cover.1715821541.git.tanggeli...@kylinos.cn
- Patch 1/3 includes "bpf_tracing_net.h", introduced in between.
- New patch 2/3: "selftests/bpf: Add mptcp pm_nl_ctl link".
- Patch 3/3: use the tool introduced in patch 2/3 + SYS_NOFAIL() helper.
- Link to v2: 
https://lore.kernel.org/r/20240509-upstream-bpf-next-20240506-mptcp-subflow-test-v2-0-4048c2948...@kernel.org

Changes in v2:
- Previous patches 1/4 and 2/4 have been dropped from this series:
  - 1/4: "selftests/bpf: Handle SIGINT when creating netns":
- A new version, more generic and no longer specific to MPTCP BPF
  selftest will be sent later, as part of a new series. (Alexei)
  - 2/4: "selftests/bpf: Add RUN_MPTCP_TEST macro":
- Removed, not to hide helper functions in macros. (Alexei)
- The commit message of patch 1/2 has been clarified to avoid some
  possible confusions spot by Alexei.
- Link to v1: 
https://lore.kernel.org/r/20240507-upstream-bpf-next-20240506-mptcp-subflow-test-v1-0-e2bcbdf49...@kernel.org

---
Geliang Tang (2):
  selftests/bpf: Add getsockopt to inspect mptcp subflow
  selftests/bpf: Add mptcp subflow subtest

Nicolas Rybowski (1):
  selftests/bpf: Add mptcp subflow example

 MAINTAINERS   |   2 +-
 tools/testing/selftests/bpf/prog_tests/mptcp.c| 127 +
 tools/testing/selftests/bpf/progs/mptcp_bpf.h |  42 +++
 tools/testing/selftests/bpf/progs/mptcp_subflow.c | 128 ++
 4 files changed, 298 insertions(+), 1 deletion(-)
---
base-commit: 23dc9867329c72b48e5039ac93fbf50d9099cdb3
change-id: 20240506-upstream-bpf-next-20240506-mptcp-subflow-test-faef6654bfa3

Best regards,
-- 
Matthieu Baerts (NGI0)

[PATCH net 3/3] selftests: mptcp: include net_helper.sh file

Similar to the previous commit, the net_helper.sh file from the parent
directory is used by the MPTCP selftests and it needs to be present when
running the tests.

This file then needs to be listed in the Makefile to be included when
exporting or installing the tests, e.g. with:

  make -C tools/testing/selftests \
  TARGETS=net/mptcp \
  install INSTALL_PATH=$KSFT_INSTALL_PATH

  cd $KSFT_INSTALL_PATH
  ./run_kselftest.sh -c net/mptcp

Fixes: 1af3bc912eac ("selftests: mptcp: lib: use wait_local_port_listen helper")
Signed-off-by: Matthieu Baerts (NGI0) 
---
 tools/testing/selftests/net/mptcp/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/mptcp/Makefile 
b/tools/testing/selftests/net/mptcp/Makefile
index 68924053a1e6..5d796622e730 100644
--- a/tools/testing/selftests/net/mptcp/Makefile
+++ b/tools/testing/selftests/net/mptcp/Makefile
@@ -11,7 +11,7 @@ TEST_GEN_FILES = mptcp_connect pm_nl_ctl mptcp_sockopt 
mptcp_inq
 
 TEST_FILES := mptcp_lib.sh settings
 
-TEST_INCLUDES := ../lib.sh
+TEST_INCLUDES := ../lib.sh ../net_helper.sh
 
 EXTRA_CLEAN := *.pcap
 

-- 
2.45.2

[PATCH net 2/3] selftests: mptcp: include lib.sh file

The lib.sh file from the parent directory is used by the MPTCP selftests
and it needs to be present when running the tests.

This file then needs to be listed in the Makefile to be included when
exporting or installing the tests, e.g. with:

  make -C tools/testing/selftests \
  TARGETS=net/mptcp \
  install INSTALL_PATH=$KSFT_INSTALL_PATH

  cd $KSFT_INSTALL_PATH
  ./run_kselftest.sh -c net/mptcp

Fixes: f265d3119a29 ("selftests: mptcp: lib: use setup/cleanup_ns helpers")
Signed-off-by: Matthieu Baerts (NGI0) 
---
 tools/testing/selftests/net/mptcp/Makefile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/testing/selftests/net/mptcp/Makefile 
b/tools/testing/selftests/net/mptcp/Makefile
index 7b936a926859..68924053a1e6 100644
--- a/tools/testing/selftests/net/mptcp/Makefile
+++ b/tools/testing/selftests/net/mptcp/Makefile
@@ -11,6 +11,8 @@ TEST_GEN_FILES = mptcp_connect pm_nl_ctl mptcp_sockopt 
mptcp_inq
 
 TEST_FILES := mptcp_lib.sh settings
 
+TEST_INCLUDES := ../lib.sh
+
 EXTRA_CLEAN := *.pcap
 
 include ../../lib.mk

-- 
2.45.2

[PATCH net 1/3] selftests: mptcp: join: restrict fullmesh endp on 1st sf

A new endpoint using the IP of the initial subflow has been recently
added to increase the code coverage. But it breaks the test when using
old kernels not having commit 86e39e04482b ("mptcp: keep track of local
endpoint still available for each msk"), e.g. on v5.15.

Similar to commit d4c81bbb8600 ("selftests: mptcp: join: support local
endpoint being tracked or not"), it is possible to add the new endpoint
conditionally, by checking if "mptcp_pm_subflow_check_next" is present
in kallsyms: this is not directly linked to the commit introducing this
symbol but for the parent one which is linked anyway. So we can know in
advance what will be the expected behaviour, and add the new endpoint
only when it makes sense to do so.

Fixes: 4878f9f8421f ("selftests: mptcp: join: validate fullmesh endp on 1st sf")
Cc: sta...@vger.kernel.org
Signed-off-by: Matthieu Baerts (NGI0) 
---
 tools/testing/selftests/net/mptcp/mptcp_join.sh | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh 
b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index a4762c49a878..cde041c93906 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -3064,7 +3064,9 @@ fullmesh_tests()
pm_nl_set_limits $ns1 1 3
pm_nl_set_limits $ns2 1 3
pm_nl_add_endpoint $ns1 10.0.2.1 flags signal
-   pm_nl_add_endpoint $ns2 10.0.1.2 flags subflow,fullmesh
+   if mptcp_lib_kallsyms_has "mptcp_pm_subflow_check_next$"; then
+   pm_nl_add_endpoint $ns2 10.0.1.2 flags subflow,fullmesh
+   fi
fullmesh=1 speed=slow \
run_tests $ns1 $ns2 10.0.1.1
chk_join_nr 3 3 3

-- 
2.45.2

[PATCH net 0/3] selftests: mptcp: misc. small fixes

Here are some various fixes for the MPTCP selftests.

Patch 1 fixes a recently modified test to continue to work as expected
on older kernels. This is a fix for a recent fix that can be backported
up to v5.15.

Patch 2 and 3 include dependences when exporting or installing the
tests. Two fixes for v6.11-rc1.

Signed-off-by: Matthieu Baerts (NGI0) 
---
Matthieu Baerts (NGI0) (3):
  selftests: mptcp: join: restrict fullmesh endp on 1st sf
  selftests: mptcp: include lib.sh file
  selftests: mptcp: include net_helper.sh file

 tools/testing/selftests/net/mptcp/Makefile  | 2 ++
 tools/testing/selftests/net/mptcp/mptcp_join.sh | 4 +++-
 2 files changed, 5 insertions(+), 1 deletion(-)
---
base-commit: 48aa361c5db0b380c2b75c24984c0d3e7c1e8c09
change-id: 20240910-net-selftests-mptcp-fix-install-2b82ae5a99c8

Best regards,
-- 
Matthieu Baerts (NGI0)

Re: [PATCH bpf-next/net v5 3/3] selftests/bpf: Add mptcp subflow subtest

2024-09-10 Thread Matthieu Baerts

Hello,

On 10/09/2024 16:13, Matthieu Baerts (NGI0) wrote:
> From: Geliang Tang 
> 
> This patch adds a subtest named test_subflow in test_mptcp to load and
> verify the newly added MPTCP subflow BPF program. To goal is to make
> sure it is possible to set different socket options per subflows, while
> the userspace socket interface only lets the application to set the same
> socket options for the whole MPTCP connection and its multiple subflows.
> 
> To check that, a client and a server are started in a dedicated netns,
> with veth interfaces to simulate multiple paths. They will exchange data
> to allow the creation of an additional subflow.

(...)

> diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c 
> b/tools/testing/selftests/bpf/prog_tests/mptcp.c
> index d2ca32fa3b21..c30f032edaca 100644
> --- a/tools/testing/selftests/bpf/prog_tests/mptcp.c
> +++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c
> @@ -335,10 +339,132 @@ static void test_mptcpify(void)
>   close(cgroup_fd);
>  }

(...)

> +static void wait_for_new_subflows(int fd)
> +{
> + socklen_t len;
> + u8 subflows;
> + int err, i;
> +
> + len = sizeof(subflows);
> + /* Wait max 1 sec for new subflows to be created */
> + for (i = 0; i < 10; i++) {
> + err = getsockopt(fd, SOL_MPTCP, MPTCP_INFO, &subflows, &len);
> + if (!err && subflows > 0)
> + break;
> +
> + sleep(0.1);

As reported by the CI, we are not in Python, usleep() should be used
here. I missed that one during my review. I will send a new version with
the fix tomorrow. Sorry for the noise.

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

[PATCH bpf-next/net v5 3/3] selftests/bpf: Add mptcp subflow subtest

From: Geliang Tang 

This patch adds a subtest named test_subflow in test_mptcp to load and
verify the newly added MPTCP subflow BPF program. To goal is to make
sure it is possible to set different socket options per subflows, while
the userspace socket interface only lets the application to set the same
socket options for the whole MPTCP connection and its multiple subflows.

To check that, a client and a server are started in a dedicated netns,
with veth interfaces to simulate multiple paths. They will exchange data
to allow the creation of an additional subflow.

When the different subflows are being created, the new MPTCP subflow BPF
program will set some socket options: marks and TCP CC. The validation
is done by the same program, when the userspace checks the value of the
modified socket options. On the userspace side, it will see that the
default values are still being used on the MPTCP connection, while the
BPF program will see different options set per subflow of the same MPTCP
connection.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/76
Signed-off-by: Geliang Tang 
Reviewed-by: Mat Martineau 
Signed-off-by: Matthieu Baerts (NGI0) 
---
Notes:
 - v2 -> v3:
   - Use './mptcp_pm_nl_ctl' instead of 'ip mptcp', not supported by the
 BPF CI running IPRoute 5.5.0.
   - Use SYS_NOFAIL() in _ss_search() instead of calling system()
 - v3 -> v4:
   - Drop './mptcp_pm_nl_ctl', but skip this new test if 'ip mptcp' is
 not supported.
 - v4 -> v5:
   - Note that this new test is no longer skipped on the BPF CI, because
 'ip mptcp' is now supported after the switch from Ubuntu 20.04 to
 22.04.
   - Update the commit message, reflecting the latest version.
   - The validations are no longer done using 'ss', but using the new
 BPF program added in the previous patch, to reduce the use of
 external dependences.
---
 tools/testing/selftests/bpf/prog_tests/mptcp.c | 126 +
 1 file changed, 126 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c 
b/tools/testing/selftests/bpf/prog_tests/mptcp.c
index d2ca32fa3b21..c30f032edaca 100644
--- a/tools/testing/selftests/bpf/prog_tests/mptcp.c
+++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c
@@ -9,8 +9,12 @@
 #include "network_helpers.h"
 #include "mptcp_sock.skel.h"
 #include "mptcpify.skel.h"
+#include "mptcp_subflow.skel.h"
 
 #define NS_TEST "mptcp_ns"
+#define ADDR_1 "10.0.1.1"
+#define ADDR_2 "10.0.1.2"
+#define PORT_1 10001
 
 #ifndef IPPROTO_MPTCP
 #define IPPROTO_MPTCP 262
@@ -335,10 +339,132 @@ static void test_mptcpify(void)
close(cgroup_fd);
 }
 
+static int endpoint_init(char *flags)
+{
+   SYS(fail, "ip -net %s link add veth1 type veth peer name veth2", 
NS_TEST);
+   SYS(fail, "ip -net %s addr add %s/24 dev veth1", NS_TEST, ADDR_1);
+   SYS(fail, "ip -net %s link set dev veth1 up", NS_TEST);
+   SYS(fail, "ip -net %s addr add %s/24 dev veth2", NS_TEST, ADDR_2);
+   SYS(fail, "ip -net %s link set dev veth2 up", NS_TEST);
+   if (SYS_NOFAIL("ip -net %s mptcp endpoint add %s %s", NS_TEST, ADDR_2, 
flags)) {
+   printf("'ip mptcp' not supported, skip this test.\n");
+   test__skip();
+   goto fail;
+   }
+
+   return 0;
+fail:
+   return -1;
+}
+
+static void wait_for_new_subflows(int fd)
+{
+   socklen_t len;
+   u8 subflows;
+   int err, i;
+
+   len = sizeof(subflows);
+   /* Wait max 1 sec for new subflows to be created */
+   for (i = 0; i < 10; i++) {
+   err = getsockopt(fd, SOL_MPTCP, MPTCP_INFO, &subflows, &len);
+   if (!err && subflows > 0)
+   break;
+
+   sleep(0.1);
+   }
+}
+
+static void run_subflow(void)
+{
+   int server_fd, client_fd, err;
+   char new[TCP_CA_NAME_MAX];
+   char cc[TCP_CA_NAME_MAX];
+   unsigned int mark;
+   socklen_t len;
+
+   server_fd = start_mptcp_server(AF_INET, ADDR_1, PORT_1, 0);
+   if (!ASSERT_OK_FD(server_fd, "start_mptcp_server"))
+   return;
+
+   client_fd = connect_to_fd(server_fd, 0);
+   if (!ASSERT_OK_FD(client_fd, "connect_to_fd"))
+   goto close_server;
+
+   send_byte(client_fd);
+   wait_for_new_subflows(client_fd);
+
+   len = sizeof(mark);
+   err = getsockopt(client_fd, SOL_SOCKET, SO_MARK, &mark, &len);
+   if (ASSERT_OK(err, "getsockopt(client_fd, SO_MARK)"))
+   ASSERT_EQ(mark, 0, "mark");
+
+   len = sizeof(new);
+   err = getsockopt(client_fd, SOL_TCP, TCP_CONGESTION, new, &len);
+   if (ASSERT_OK(err, "getsockopt(client_fd, TCP_CONGESTION)")) {
+

[PATCH bpf-next/net v5 2/3] selftests/bpf: Add getsockopt to inspect mptcp subflow

From: Geliang Tang 

This patch adds a "cgroup/getsockopt" way to inspect the subflows of an
MPTCP socket, and verify the modifications done by the same BPF program
in the previous commit: a different mark per subflow, and a different
TCP CC set on the second one. This new hook will be used by the next
commit to verify the socket options set on each subflow.

This extra "cgroup/getsockopt" prog walks the msk->conn_list and use
bpf_core_cast to cast a pointer for readonly. It allows to inspect all
the fields of a structure.

Note that on the kernel side, the MPTCP socket stores a list of subflows
under 'msk->conn_list'. They can be iterated using the generic 'list'
helpers. They have been imported here, with a small difference:
list_for_each_entry() uses 'cond_break' to limit the number of
iterations, and ease its use. Because only data need to be read here,
it is enough to use this technique. It is planned to use bpf_iter, when
BPF programs will be used to modify data from the different subflows.
mptcp_subflow_tcp_sock() and mptcp_for_each_stubflow() helpers have also
be imported.

Suggested-by: Martin KaFai Lau 
Signed-off-by: Geliang Tang 
Reviewed-by: Matthieu Baerts (NGI0) 
Signed-off-by: Matthieu Baerts (NGI0) 
---
Notes:
  - v5: new patch, instead of using 'ss' in the following patch
---
 MAINTAINERS   |  2 +-
 tools/testing/selftests/bpf/progs/mptcp_bpf.h | 42 ++
 tools/testing/selftests/bpf/progs/mptcp_subflow.c | 69 +++
 3 files changed, 112 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 30a9b9450e11..2674b86209d1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16058,7 +16058,7 @@ F:  include/net/mptcp.h
 F: include/trace/events/mptcp.h
 F: include/uapi/linux/mptcp*.h
 F: net/mptcp/
-F: tools/testing/selftests/bpf/*/*mptcp*.c
+F: tools/testing/selftests/bpf/*/*mptcp*.[ch]
 F: tools/testing/selftests/net/mptcp/
 
 NETWORKING [TCP]
diff --git a/tools/testing/selftests/bpf/progs/mptcp_bpf.h 
b/tools/testing/selftests/bpf/progs/mptcp_bpf.h
new file mode 100644
index ..179b74c1205f
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/mptcp_bpf.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
+#ifndef __MPTCP_BPF_H__
+#define __MPTCP_BPF_H__
+
+#include "bpf_experimental.h"
+
+/* list helpers from include/linux/list.h */
+static inline int list_is_head(const struct list_head *list,
+  const struct list_head *head)
+{
+   return list == head;
+}
+
+#define list_entry(ptr, type, member)  \
+   container_of(ptr, type, member)
+
+#define list_first_entry(ptr, type, member)\
+   list_entry((ptr)->next, type, member)
+
+#define list_next_entry(pos, member)   \
+   list_entry((pos)->member.next, typeof(*(pos)), member)
+
+#define list_entry_is_head(pos, head, member)  \
+   list_is_head(&pos->member, (head))
+
+/* small difference: 'cond_break' has been added in the conditions */
+#define list_for_each_entry(pos, head, member) \
+   for (pos = list_first_entry(head, typeof(*pos), member);\
+cond_break, !list_entry_is_head(pos, head, member);\
+pos = list_next_entry(pos, member))
+
+/* mptcp helpers from protocol.h */
+#define mptcp_for_each_subflow(__msk, __subflow)   \
+   list_for_each_entry(__subflow, &((__msk)->conn_list), node)
+
+static __always_inline struct sock *
+mptcp_subflow_tcp_sock(const struct mptcp_subflow_context *subflow)
+{
+   return subflow->tcp_sock;
+}
+
+#endif
diff --git a/tools/testing/selftests/bpf/progs/mptcp_subflow.c 
b/tools/testing/selftests/bpf/progs/mptcp_subflow.c
index 2e28f4a215b5..70302477e326 100644
--- a/tools/testing/selftests/bpf/progs/mptcp_subflow.c
+++ b/tools/testing/selftests/bpf/progs/mptcp_subflow.c
@@ -4,10 +4,12 @@
 
 /* vmlinux.h, bpf_helpers.h and other 'define' */
 #include "bpf_tracing_net.h"
+#include "mptcp_bpf.h"
 
 char _license[] SEC("license") = "GPL";
 
 char cc[TCP_CA_NAME_MAX] = "reno";
+int pid;
 
 /* Associate a subflow counter to each token */
 struct {
@@ -57,3 +59,70 @@ int mptcp_subflow(struct bpf_sock_ops *skops)
 
return 1;
 }
+
+static int _check_getsockopt_subflow_mark(struct mptcp_sock *msk, struct 
bpf_sockopt *ctx)
+{
+   struct mptcp_subflow_context *subflow;
+   int i = 0;
+
+   mptcp_for_each_subflow(msk, subflow) {
+   struct sock *ssk;
+
+   ssk = mptcp_subflow_tcp_sock(bpf_core_cast(subflow,
+  struct 
mptcp_subflow_context));
+
+

[PATCH bpf-next/net v5 1/3] selftests/bpf: Add mptcp subflow example

From: Nicolas Rybowski 

Move Nicolas' patch into bpf selftests directory. This example adds a
different mark (SO_MARK) on each subflow, and changes the TCP CC only on
the first subflow.

>From the userspace, an application can do a setsockopt() on an MPTCP
socket, and typically the same value will be propagated to all subflows
(paths). If someone wants to have different values per subflow, the
recommended way is to use BPF. So it is good to add such example here,
and make sure there is no regressions.

This example shows how it is possible to:

Identify the parent msk of an MPTCP subflow.
Put different sockopt for each subflow of a same MPTCP connection.

Here especially, two different behaviours are implemented:

A socket mark (SOL_SOCKET SO_MARK) is put on each subflow of a same
MPTCP connection. The order of creation of the current subflow defines
its mark. The TCP CC algorithm of the very first subflow of an MPTCP
connection is set to "reno".

This is just to show it is possible to identify an MPTCP connection, and
set socket options, from different SOL levels, per subflow. "reno" has
been picked because it is built-in and usually not set as default one.
It is easy to verify with 'ss' that these modifications have been
applied correctly. That's what the next patch is going to do.

Nicolas' code comes from:

commit 4d120186e4d6 ("bpf:examples: update mptcp_set_mark_kern.c")

from the MPTCP repo https://github.com/multipath-tcp/mptcp_net-next (the
"scripts" branch), and it has been adapted by Geliang.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/76
Co-developed-by: Geliang Tang 
Signed-off-by: Geliang Tang 
Signed-off-by: Nicolas Rybowski 
Reviewed-by: Mat Martineau 
Signed-off-by: Matthieu Baerts (NGI0) 
---
Notes:
 - v1 -> v2:
   - The commit message has been updated: why setting multiple socket
 options, why reno, the verification is done in a later patch
 (different author). (Alexei)
 - v2 -> v3:
   - Only #include "bpf_tracing_net.h", linked to:
 https://lore.kernel.org/20240509175026.3423614-1-martin@linux.dev
 - v4 -> v5:
   - Set reno as TCP cc on the second subflow, not to influence the
 getsockopt() done from the userspace, which will return the one
 from the first subflow, the default TCP cc then, not the modified
 one.
---
 tools/testing/selftests/bpf/progs/mptcp_subflow.c | 59 +++
 1 file changed, 59 insertions(+)

diff --git a/tools/testing/selftests/bpf/progs/mptcp_subflow.c 
b/tools/testing/selftests/bpf/progs/mptcp_subflow.c
new file mode 100644
index ..2e28f4a215b5
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/mptcp_subflow.c
@@ -0,0 +1,59 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020, Tessares SA. */
+/* Copyright (c) 2024, Kylin Software */
+
+/* vmlinux.h, bpf_helpers.h and other 'define' */
+#include "bpf_tracing_net.h"
+
+char _license[] SEC("license") = "GPL";
+
+char cc[TCP_CA_NAME_MAX] = "reno";
+
+/* Associate a subflow counter to each token */
+struct {
+   __uint(type, BPF_MAP_TYPE_HASH);
+   __uint(key_size, sizeof(__u32));
+   __uint(value_size, sizeof(__u32));
+   __uint(max_entries, 100);
+} mptcp_sf SEC(".maps");
+
+SEC("sockops")
+int mptcp_subflow(struct bpf_sock_ops *skops)
+{
+   __u32 init = 1, key, mark, *cnt;
+   struct mptcp_sock *msk;
+   struct bpf_sock *sk;
+   int err;
+
+   if (skops->op != BPF_SOCK_OPS_TCP_CONNECT_CB)
+   return 1;
+
+   sk = skops->sk;
+   if (!sk)
+   return 1;
+
+   msk = bpf_skc_to_mptcp_sock(sk);
+   if (!msk)
+   return 1;
+
+   key = msk->token;
+   cnt = bpf_map_lookup_elem(&mptcp_sf, &key);
+   if (cnt) {
+   /* A new subflow is added to an existing MPTCP connection */
+   __sync_fetch_and_add(cnt, 1);
+   mark = *cnt;
+   } else {
+   /* A new MPTCP connection is just initiated and this is its 
primary subflow */
+   bpf_map_update_elem(&mptcp_sf, &key, &init, BPF_ANY);
+   mark = init;
+   }
+
+   /* Set the mark of the subflow's socket based on appearance order */
+   err = bpf_setsockopt(skops, SOL_SOCKET, SO_MARK, &mark, sizeof(mark));
+   if (err < 0)
+   return 1;
+   if (mark == 2)
+   err = bpf_setsockopt(skops, SOL_TCP, TCP_CONGESTION, cc, 
TCP_CA_NAME_MAX);
+
+   return 1;
+}

-- 
2.45.2

[PATCH bpf-next/net v5 0/3] selftests/bpf: new MPTCP subflow subtest

In this series from Geliang, modifying MPTCP BPF selftests, we have:

- A new MPTCP subflow BPF program setting socket options per subflow: it
  looks better to have this old test program in the BPF selftests to
  track regressions and to serve as example.

  Note: Nicolas is no longer working at Tessares, but he did this work
  while working for them, and his email address is no longer available.

- A new hook in the same BPF program to do the verification step.

- A new MPTCP BPF subtest validating the new BPF program added in the
  first patch, with the help of the new hook added in the second patch.

Signed-off-by: Matthieu Baerts (NGI0) 
---
Changes in v5:
- See the individual changelog for more details about them
- Patch 1/3: set TCP on the 2nd subflow
- Patch 2/3: new
- Patch 3/3: use the BPF program from patch 2/3 to do the validation
 instead of using ss.
- Link to v4: 
https://lore.kernel.org/r/20240805-upstream-bpf-next-20240506-mptcp-subflow-test-v4-0-2b4ca6994...@kernel.org

Changes in v4:
- Drop former patch 2/3: MPTCP's pm_nl_ctl requires a new header file:
  - I will check later if it is possible to avoid having duplicated
header files in tools/include/uapi, but no need to block this series
for that. Patch 2/3 can be added later if needed.
- Patch 2/2: skip the test if 'ip mptcp' is not available.
- Link to v3: 
https://lore.kernel.org/r/20240703-upstream-bpf-next-20240506-mptcp-subflow-test-v3-0-ebdc2d494...@kernel.org

Changes in v3:
- Sorry for the delay between v2 and v3, this series was conflicting
  with the "add netns helpers", but it looks like it is on hold:
  https://lore.kernel.org/cover.1715821541.git.tanggeli...@kylinos.cn
- Patch 1/3 includes "bpf_tracing_net.h", introduced in between.
- New patch 2/3: "selftests/bpf: Add mptcp pm_nl_ctl link".
- Patch 3/3: use the tool introduced in patch 2/3 + SYS_NOFAIL() helper.
- Link to v2: 
https://lore.kernel.org/r/20240509-upstream-bpf-next-20240506-mptcp-subflow-test-v2-0-4048c2948...@kernel.org

Changes in v2:
- Previous patches 1/4 and 2/4 have been dropped from this series:
  - 1/4: "selftests/bpf: Handle SIGINT when creating netns":
- A new version, more generic and no longer specific to MPTCP BPF
  selftest will be sent later, as part of a new series. (Alexei)
  - 2/4: "selftests/bpf: Add RUN_MPTCP_TEST macro":
- Removed, not to hide helper functions in macros. (Alexei)
- The commit message of patch 1/2 has been clarified to avoid some
  possible confusions spot by Alexei.
- Link to v1: 
https://lore.kernel.org/r/20240507-upstream-bpf-next-20240506-mptcp-subflow-test-v1-0-e2bcbdf49...@kernel.org

---
Geliang Tang (2):
  selftests/bpf: Add getsockopt to inspect mptcp subflow
  selftests/bpf: Add mptcp subflow subtest

Nicolas Rybowski (1):
  selftests/bpf: Add mptcp subflow example

 MAINTAINERS   |   2 +-
 tools/testing/selftests/bpf/prog_tests/mptcp.c| 126 +
 tools/testing/selftests/bpf/progs/mptcp_bpf.h |  42 +++
 tools/testing/selftests/bpf/progs/mptcp_subflow.c | 128 ++
 4 files changed, 297 insertions(+), 1 deletion(-)
---
base-commit: 6b083650a37318112fb60c65fbb6070584f53d93
change-id: 20240506-upstream-bpf-next-20240506-mptcp-subflow-test-faef6654bfa3

Best regards,
-- 
Matthieu Baerts (NGI0)

[PATCH net-next v2 5/5] selftests: mptcp: connect: remove duplicated spaces in TAP output

It is nice to have a visual alignment in the test output to present the
different results, but it makes less sense in the TAP output that is
there for computers.

It sounds then better to remove the duplicated whitespaces in the TAP
output, also because it can cause some issues with TAP parsers expecting
only one space around the directive delimiter (#).

While at it, change the variable name (result_msg) to something more
explicit.

Signed-off-by: Matthieu Baerts (NGI0) 
---
 tools/testing/selftests/net/mptcp/mptcp_connect.sh | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh 
b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
index 49d90c4dbc01..57325d57e4c6 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
@@ -345,9 +345,11 @@ do_transfer()
 
local addr_port
addr_port=$(printf "%s:%d" ${connect_addr} ${port})
-   local result_msg
-   result_msg="$(printf "%.3s %-5s -> %.3s (%-20s) %-5s" ${connector_ns} 
${cl_proto} ${listener_ns} ${addr_port} ${srv_proto})"
-   mptcp_lib_print_title "${result_msg}"
+   local pretty_title
+   pretty_title="$(printf "%.3s %-5s -> %.3s (%-20s) %-5s" ${connector_ns} 
${cl_proto} ${listener_ns} ${addr_port} ${srv_proto})"
+   mptcp_lib_print_title "${pretty_title}"
+
+   local tap_title="${connector_ns:0:3} ${cl_proto} -> ${listener_ns:0:3} 
(${addr_port}) ${srv_proto}"
 
if $capture; then
local capuser
@@ -443,7 +445,7 @@ do_transfer()
 
echo
cat "$capout"
-   mptcp_lib_result_fail "${TEST_GROUP}: ${result_msg}"
+   mptcp_lib_result_fail "${TEST_GROUP}: ${tap_title}"
return 1
fi
 
@@ -543,12 +545,12 @@ do_transfer()
 
if [ $retc -eq 0 ] && [ $rets -eq 0 ]; then
mptcp_lib_pr_ok "${extra:1}"
-   mptcp_lib_result_pass "${TEST_GROUP}: ${result_msg}"
+   mptcp_lib_result_pass "${TEST_GROUP}: ${tap_title}"
else
if [ -n "${extra}" ]; then
mptcp_lib_print_warn "${extra:1}"
fi
-   mptcp_lib_result_fail "${TEST_GROUP}: ${result_msg}"
+   mptcp_lib_result_fail "${TEST_GROUP}: ${tap_title}"
fi
 
cat "$capout"

-- 
2.45.2

[PATCH net-next v2 4/5] selftests: mptcp: diag: remove trailing whitespace

It doesn't need to be there, and it can cause some issues with TAP
parsers expecting only one space around the directive delimiter (#).

Signed-off-by: Matthieu Baerts (NGI0) 
---
 tools/testing/selftests/net/mptcp/diag.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/mptcp/diag.sh 
b/tools/testing/selftests/net/mptcp/diag.sh
index 776d43a6922d..2bd0c1eb70c5 100755
--- a/tools/testing/selftests/net/mptcp/diag.sh
+++ b/tools/testing/selftests/net/mptcp/diag.sh
@@ -284,7 +284,7 @@ echo "b" | \
./mptcp_connect -p 1 -r 0 -t ${timeout_poll} -w 20 \
127.0.0.1 >/dev/null &
 wait_connected $ns 1
-chk_msk_nr 2 "after MPC handshake "
+chk_msk_nr 2 "after MPC handshake"
 chk_last_time_info 1
 chk_msk_remote_key_nr 2 "chk remote_key"
 chk_msk_fallback_nr 0 "chk no fallback"

-- 
2.45.2

[PATCH net-next v2 3/5] selftests: mptcp: reset the last TS before the first test

Just to slightly improve the precision of the duration of the first
test.

In mptcp_join.sh, the last append_prev_results is now done as soon as
the last test is over: this will add the last result in the list, and
get a more precise time for this last test.

Reviewed-by: Mat Martineau 
Signed-off-by: Matthieu Baerts (NGI0) 
---
 tools/testing/selftests/net/mptcp/mptcp_connect.sh | 2 ++
 tools/testing/selftests/net/mptcp/mptcp_join.sh| 3 ++-
 tools/testing/selftests/net/mptcp/mptcp_sockopt.sh | 1 +
 tools/testing/selftests/net/mptcp/pm_netlink.sh| 2 ++
 tools/testing/selftests/net/mptcp/simult_flows.sh  | 1 +
 tools/testing/selftests/net/mptcp/userspace_pm.sh  | 1 +
 6 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh 
b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
index f61e2f5870ea..49d90c4dbc01 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
@@ -847,6 +847,8 @@ stop_if_error()
 make_file "$cin" "client"
 make_file "$sin" "server"
 
+mptcp_lib_subtests_last_ts_reset
+
 check_mptcp_disabled
 
 stop_if_error "The kernel configuration is not valid for MPTCP"
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh 
b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index 43f8a9bd84c4..3564cd06643c 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -3959,9 +3959,11 @@ if [ ${#tests[@]} -eq 0 ]; then
tests=("${all_tests_names[@]}")
 fi
 
+mptcp_lib_subtests_last_ts_reset
 for subtests in "${tests[@]}"; do
"${subtests}"
 done
+append_prev_results
 
 if [ ${ret} -ne 0 ]; then
echo
@@ -3972,7 +3974,6 @@ if [ ${ret} -ne 0 ]; then
echo
 fi
 
-append_prev_results
 mptcp_lib_result_print_all_tap
 
 exit $ret
diff --git a/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh 
b/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh
index 68899a303a1a..5e8d5b83e2d0 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh
@@ -349,6 +349,7 @@ init
 make_file "$cin" "client" 1
 make_file "$sin" "server" 1
 trap cleanup EXIT
+mptcp_lib_subtests_last_ts_reset
 
 run_tests $ns1 $ns2 10.0.1.1
 run_tests $ns1 $ns2 dead:beef:1::1
diff --git a/tools/testing/selftests/net/mptcp/pm_netlink.sh 
b/tools/testing/selftests/net/mptcp/pm_netlink.sh
index 2757378b1b13..2e6648a2b2c0 100755
--- a/tools/testing/selftests/net/mptcp/pm_netlink.sh
+++ b/tools/testing/selftests/net/mptcp/pm_netlink.sh
@@ -137,6 +137,8 @@ check()
fi
 }
 
+mptcp_lib_subtests_last_ts_reset
+
 check "show_endpoints" "" "defaults addr list"
 
 default_limits="$(get_limits)"
diff --git a/tools/testing/selftests/net/mptcp/simult_flows.sh 
b/tools/testing/selftests/net/mptcp/simult_flows.sh
index f74e1c3c126d..8fa77c8e9b65 100755
--- a/tools/testing/selftests/net/mptcp/simult_flows.sh
+++ b/tools/testing/selftests/net/mptcp/simult_flows.sh
@@ -286,6 +286,7 @@ while getopts "bcdhi" option;do
 done
 
 setup
+mptcp_lib_subtests_last_ts_reset
 run_test 10 10 0 0 "balanced bwidth"
 run_test 10 10 1 25 "balanced bwidth with unbalanced delay"
 
diff --git a/tools/testing/selftests/net/mptcp/userspace_pm.sh 
b/tools/testing/selftests/net/mptcp/userspace_pm.sh
index 9cb05978269d..3651f73451cf 100755
--- a/tools/testing/selftests/net/mptcp/userspace_pm.sh
+++ b/tools/testing/selftests/net/mptcp/userspace_pm.sh
@@ -150,6 +150,7 @@ mptcp_lib_events "${ns2}" "${client_evts}" client_evts_pid
 server_evts=$(mktemp)
 mptcp_lib_events "${ns1}" "${server_evts}" server_evts_pid
 sleep 0.5
+mptcp_lib_subtests_last_ts_reset
 
 print_title "Init"
 print_test "Created network namespaces ns1, ns2"

-- 
2.45.2

[PATCH net-next v2 2/5] selftests: mptcp: connect: remote time in TAP output

It is now added by the MPTCP lib automatically, see the parent commit.

The time in the TAP output might be slightly different from the one
displayed before, but that's OK.

Reviewed-by: Mat Martineau 
Signed-off-by: Matthieu Baerts (NGI0) 
---
v2:
  - Fixed typo in the commit message (Jakub)
---
 tools/testing/selftests/net/mptcp/mptcp_connect.sh | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh 
b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
index b77fb7065bfb..f61e2f5870ea 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
@@ -431,7 +431,6 @@ do_transfer()
 
local duration
duration=$((stop-start))
-   result_msg+=" # time=${duration}ms"
printf "(duration %05sms) " "${duration}"
if [ ${rets} -ne 0 ] || [ ${retc} -ne 0 ]; then
mptcp_lib_pr_fail "client exit code $retc, server $rets"

-- 
2.45.2

[PATCH net-next v2 1/5] selftests: mptcp: lib: add time per subtests in TAP output

It adds 'time=ms' in the diagnostic data of the TAP output, e.g.

  ok 1 - pm_netlink: defaults addr list # time=9ms

This addition is useful to quickly identify which subtests are taking a
longer time than the others, or more than expected.

Note that there are no specific formats to follow to show this time
according to the TAP 13 [1], TAP 14 [2] and KTAP [3] specifications.
Let's then define this one here.

Link: https://testanything.org/tap-version-13-specification.html [1]
Link: https://testanything.org/tap-version-14-specification.html [2]
Link: https://docs.kernel.org/dev-tools/ktap.html [3]
Reviewed-by: Mat Martineau 
Signed-off-by: Matthieu Baerts (NGI0) 
---
 tools/testing/selftests/net/mptcp/mptcp_lib.sh | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/mptcp/mptcp_lib.sh 
b/tools/testing/selftests/net/mptcp/mptcp_lib.sh
index 4578a331041e..975d4d4c862a 100644
--- a/tools/testing/selftests/net/mptcp/mptcp_lib.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_lib.sh
@@ -29,6 +29,7 @@ declare -rx MPTCP_LIB_AF_INET6=10
 MPTCP_LIB_SUBTESTS=()
 MPTCP_LIB_SUBTESTS_DUPLICATED=0
 MPTCP_LIB_SUBTEST_FLAKY=0
+MPTCP_LIB_SUBTESTS_LAST_TS_MS=
 MPTCP_LIB_TEST_COUNTER=0
 MPTCP_LIB_TEST_FORMAT="%02u %-50s"
 MPTCP_LIB_IP_MPTCP=0
@@ -205,6 +206,11 @@ mptcp_lib_kversion_ge() {
mptcp_lib_fail_if_expected_feature "kernel version ${1} lower than ${v}"
 }
 
+mptcp_lib_subtests_last_ts_reset() {
+   MPTCP_LIB_SUBTESTS_LAST_TS_MS="$(date +%s%3N)"
+}
+mptcp_lib_subtests_last_ts_reset
+
 __mptcp_lib_result_check_duplicated() {
local subtest
 
@@ -219,13 +225,22 @@ __mptcp_lib_result_check_duplicated() {
 
 __mptcp_lib_result_add() {
local result="${1}"
+   local time="time="
+   local ts_prev_ms
shift
 
local id=$((${#MPTCP_LIB_SUBTESTS[@]} + 1))
 
__mptcp_lib_result_check_duplicated "${*}"
 
-   MPTCP_LIB_SUBTESTS+=("${result} ${id} - ${KSFT_TEST}: ${*}")
+   # not to add two '#'
+   [[ "${*}" != *"#"* ]] && time="# ${time}"
+
+   ts_prev_ms="${MPTCP_LIB_SUBTESTS_LAST_TS_MS}"
+   mptcp_lib_subtests_last_ts_reset
+   time+="$((MPTCP_LIB_SUBTESTS_LAST_TS_MS - ts_prev_ms))ms"
+
+   MPTCP_LIB_SUBTESTS+=("${result} ${id} - ${KSFT_TEST}: ${*} ${time}")
 }
 
 # $1: test name

-- 
2.45.2

[PATCH net-next v2 0/5] selftests: mptcp: add time per subtests in TAP output