[PATCH net] selftests: mptcp: connect: -f: no reconnect
The '-f' parameter is there to force the kernel to emit MPTCP FASTCLOSE by closing the connection with unread bytes in the receive queue. The xdisconnect() helper was used to stop the connection, but it does more than that: it will shut it down, then wait before reconnecting to the same address. This causes the mptcp_join's "fastclose test" to fail all the time. This failure is due to a recent change, with commit 218cc166321f ("selftests: mptcp: avoid spurious errors on disconnect"), but that went unnoticed because the test is currently ignored. The recent modification only shown an existing issue: xdisconnect() doesn't need to be used here, only the shutdown() part is needed. Fixes: 6bf41020b72b ("selftests: mptcp: update and extend fastclose test-cases") Cc: sta...@vger.kernel.org Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- Notes: - The failure was not clearly visible on NIPA, because the results for the two impacted sub-tests are currently ignored (unstable). Still, it looks important to fix that, as this will help when the tests will be improved not to be unstable any more. --- tools/testing/selftests/net/mptcp/mptcp_connect.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c b/tools/testing/selftests/net/mptcp/mptcp_connect.c index 414addef9a4514c489ecd09249143fe0ce2af649..d240d02fa443a1cd802f0e705ab36db5c22063a8 100644 --- a/tools/testing/selftests/net/mptcp/mptcp_connect.c +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c @@ -1302,7 +1302,7 @@ int main_loop(void) return ret; if (cfg_truncate > 0) { - xdisconnect(fd); + shutdown(fd, SHUT_WR); } else if (--cfg_repeat > 0) { xdisconnect(fd); --- base-commit: 4241a702e0d0c2ca9364cfac08dbf134264962de change-id: 20250204-net-mptcp-sft-conn-f-d1c14274ba66 Best regards, -- Matthieu Baerts (NGI0)
Re: [PATCH bpf-next/net v2 4/7] bpf: Add mptcp_subflow bpf_iter
Hi Martin, Thank you for your review! Sorry for the delay here, Geliang started to work on a new version, but it might take a bit of time as he is currently off for a few days. On 24/01/2025 01:47, Martin KaFai Lau wrote: > On 12/19/24 7:46 AM, Matthieu Baerts (NGI0) wrote: >> From: Geliang Tang >> >> It's necessary to traverse all subflows on the conn_list of an MPTCP >> socket and then call kfunc to modify the fields of each subflow. In >> kernel space, mptcp_for_each_subflow() helper is used for this: >> >> mptcp_for_each_subflow(msk, subflow) >> kfunc(subflow); >> >> But in the MPTCP BPF program, this has not yet been implemented. As >> Martin suggested recently, this conn_list walking + modify-by-kfunc >> usage fits the bpf_iter use case. >> >> So this patch adds a new bpf_iter type named "mptcp_subflow" to do >> this and implements its helpers bpf_iter_mptcp_subflow_new()/_next()/ >> _destroy(). And register these bpf_iter mptcp_subflow into mptcp >> common kfunc set. Then bpf_for_each() for mptcp_subflow can be used >> in BPF program like this: >> >> bpf_for_each(mptcp_subflow, subflow, msk) >> kfunc(subflow); (...) >> diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c >> index >> c5bfd84c16c43230d9d8e1fd8ff781a767e647b5..e39f0e4fb683c1aa31ee075281daee218dac5878 >> 100644 >> --- a/net/mptcp/bpf.c >> +++ b/net/mptcp/bpf.c (...) >> @@ -47,10 +56,54 @@ bpf_mptcp_subflow_ctx(const struct sock *sk) >> return NULL; >> } >> +__bpf_kfunc static int >> +bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it, >> + struct mptcp_sock *msk) >> +{ >> + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it; >> + struct sock *sk = (struct sock *)msk; >> + >> + BUILD_BUG_ON(sizeof(struct bpf_iter_mptcp_subflow_kern) > >> + sizeof(struct bpf_iter_mptcp_subflow)); >> + BUILD_BUG_ON(__alignof__(struct bpf_iter_mptcp_subflow_kern) != >> + __alignof__(struct bpf_iter_mptcp_subflow)); >> + >> + kit->msk = msk; >> + if (!msk) > > NULL check is not needed. verifier should have rejected it for > KF_TRUSTED_ARGS. > >> + return -EINVAL; >> + >> + if (!sock_owned_by_user_nocheck(sk) && >> + !spin_is_locked(&sk->sk_lock.slock)) > > I could have missed something. If it is to catch bug, should it be > sock_owned_by_me() that has the lockdep splat? For the cg get/setsockopt > hook here, the lock should have already been held earlier in the kernel. Good point. Because in this series, the kfunc is currently restricted to CG [gs]etsockopt hooks, we should use msk_owned_by_me(msk) here. > This set is only showing the cg sockopt bpf prog but missing the major > struct_ops piece. It is hard to comment. I assumed the lock situation is > the same for the struct_ops where the lock will be held before calling > the struct_ops prog? I understand it is hard to comment on that point. In the 'struct_ops' we are designing, the lock will indeed be held before calling the stuct_ops program. So we will just need to make sure this assumption is correct for all callbacks of our struct_ops. Also, if I understood correctly, it is possible to restrict a kfunc to some specific struct_ops, e.g. not to call this kfunc for the TCP CA struct_ops. So these checks should indeed not be needed, but I will double-check that with Geliang. Cheers, Matt -- Sponsored by the NGI0 Core fund.
Re: [PATCH bpf-next/net v2 0/7] bpf: Add mptcp_subflow bpf_iter support
Hello BPF maintainers and reviewers, On 19/12/2024 16:46, Matthieu Baerts (NGI0) wrote: > Here is a series from Geliang, adding mptcp_subflow bpf_iter support. > > We are working on extending MPTCP with BPF, e.g. to control the path > manager -- in charge of the creation, deletion, and announcements of > subflows (paths) -- and the packet scheduler -- in charge of selecting > which available path the next data will be sent to. These extensions > need to iterate over the list of subflows attached to an MPTCP > connection, and do some specific actions via some new kfunc that will be > added later on. (...) > Changes in v2: > - Patches 1-2: new ones. > - Patch 3: remove two kfunc, more restrictions. (Martin) > - Patch 4: add BUILD_BUG_ON(), more restrictions. (Martin) > - Patch 7: adaptations due to modifications in patches 1-4. > - Link to v1: > https://lore.kernel.org/r/20241108-bpf-next-net-mptcp-bpf_iter-subflows-v1-0-cf1695303...@kernel.org The v2 of this series didn't get any reviews, probably because it has been sent the week before Xmas. Do you prefer if I resend it? There is no hurry, I can also re-send it later if "now" is not a good time. Cheers, Matt -- Sponsored by the NGI0 Core fund.
[PATCH net-next 4/6] selftests: mptcp: add -m with ss in case of errors
Recently, we had an issue where getting info about the memory would have helped better understanding what went wrong. Let add it just in case for later. Reviewed-by: Geliang Tang Signed-off-by: Matthieu Baerts (NGI0) --- tools/testing/selftests/net/mptcp/mptcp_lib.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/net/mptcp/mptcp_lib.sh b/tools/testing/selftests/net/mptcp/mptcp_lib.sh index 91a1d3b76e664bd95fc36310ac2e2c89bfba1aa1..051e289d79676c5feb9f46da67a08116548f4b47 100644 --- a/tools/testing/selftests/net/mptcp/mptcp_lib.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_lib.sh @@ -118,11 +118,11 @@ mptcp_lib_pr_err_stats() { echo -en "${MPTCP_LIB_COLOR_RED}" { printf "\nnetns %s (listener) socket stat for %d:\n" "${lns}" "${port}" - ip netns exec "${lns}" ss -Menita -o "sport = :${port}" + ip netns exec "${lns}" ss -Menitam -o "sport = :${port}" cat "${lstat}" printf "\nnetns %s (connector) socket stat for %d:\n" "${cns}" "${port}" - ip netns exec "${cns}" ss -Menita -o "dport = :${port}" + ip netns exec "${cns}" ss -Menitam -o "dport = :${port}" [ "${lstat}" != "${cstat}" ] && cat "${cstat}" } 1>&2 echo -en "${MPTCP_LIB_COLOR_RESET}" -- 2.47.1
Re: [PATCH 0/3] selftests: mptcp: Fix various issues in main_loop
Hi Cong Liu, On 13/01/2025 09:52, Cong Liu wrote: > Fix several issues in the mptcp connect test's main_loop function. > > - Fix a bug where the wrong file descriptor was being checked for errors > - Fix the input file descriptor lifecycle in the reconnection loop to >prevent use of invalid fd > - Add proper resource cleanup in error paths Thank you for these fixes! Please note that when sending patches to the Netdev mailing list, it is asked to follow some specific rules, e.g. - designate your patch to a tree - [PATCH net] or [PATCH net-next] - for fixes the Fixes: tag is required, regardless of the tree https://docs.kernel.org/process/maintainer-netdev.html If I'm not mistaken, it looks like you have here fixes for net, and the Fixes tag is missing. Do you mind sending a v2 with this being fixed please? Because these fixes are not urgent, do you mind sending them only to the MPTCP ML for the moment, and not to the other ones (and no other people in CC). > Cong Liu (3): > selftests: mptcp: Fix incorrect file descriptor check in main_loop > selftests: mptcp: Fix input fd lifecycle in reconnection loop > selftests: mptcp: Clean up resources properly in main_loop > > .../selftests/net/mptcp/mptcp_connect.c| 18 +- > 1 file changed, 9 insertions(+), 9 deletions(-) > > > base-commit: 2b88851f583d3c4e40bcd40cfe1965241ec229dd Note that this base-commit doesn't seem to exist. Because of that, our MPTCP CI was not able to automatically apply this commit. Talking about our CI, it looks like that 'mptcp_connect' now crashes in some cases with: free(): double free detected in tcache 2 Do you mind checking this please? https://github.com/multipath-tcp/mptcp_net-next/actions/runs/12751035845 pw-bot: cr Cheers, Matt -- Sponsored by the NGI0 Core fund.
[PATCH net-next 6/6] selftests: mptcp: connect: better display the files size
'du' will print the name of the file, which was already displayed before, e.g. Created /tmp/tmp.UOyy0ghfmQ (size 4703740/tmp/tmp.UOyy0ghfmQ) containing data sent by client Created /tmp/tmp.xq3zvFinGo (size 1391724/tmp/tmp.xq3zvFinGo) containing data sent by server 'stat' can be used instead, to display this instead: Created /tmp/tmp.UOyy0ghfmQ (size 4703740 B) containing data sent by client Created /tmp/tmp.xq3zvFinGo (size 1391724 B) containing data sent by server So easier to spot the file sizes. Reviewed-by: Geliang Tang Signed-off-by: Matthieu Baerts (NGI0) --- tools/testing/selftests/net/mptcp/mptcp_connect.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh b/tools/testing/selftests/net/mptcp/mptcp_connect.sh index e508d356fcdaebbfb95750bba0fa834a8463e32a..5e3c56253274a1f938d2ed9986c4290fcea8b96b 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh @@ -582,7 +582,7 @@ make_file() mptcp_lib_make_file $name 1024 $ksize dd if=/dev/urandom conv=notrunc of="$name" oflag=append bs=1 count=$rem 2> /dev/null - echo "Created $name (size $(du -b "$name")) containing data sent by $who" + echo "Created $name (size $(stat -c "%s" "$name") B) containing data sent by $who" } run_tests_lo() -- 2.47.1
[PATCH net-next 5/6] selftests: mptcp: connect: remove unused variable
'cin_disconnect' is used in run_tests_disconnect(), but not 'cout_disconnect', so it is safe to drop it. Reviewed-by: Geliang Tang Signed-off-by: Matthieu Baerts (NGI0) --- tools/testing/selftests/net/mptcp/mptcp_connect.sh | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh b/tools/testing/selftests/net/mptcp/mptcp_connect.sh index bfdaecd0a6a0564020530345daf91bed296bc15c..e508d356fcdaebbfb95750bba0fa834a8463e32a 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh @@ -137,7 +137,7 @@ TEST_GROUP="" #shellcheck disable=SC2317 cleanup() { - rm -f "$cin_disconnect" "$cout_disconnect" + rm -f "$cin_disconnect" rm -f "$cin" "$cout" rm -f "$sin" "$sout" rm -f "$capout" @@ -155,7 +155,6 @@ cin=$(mktemp) cout=$(mktemp) capout=$(mktemp) cin_disconnect="$cin".disconnect -cout_disconnect="$cout".disconnect trap cleanup EXIT mptcp_lib_ns_init ns1 ns2 ns3 ns4 -- 2.47.1
[PATCH net-next 3/6] selftests: mptcp: move stats info in case of errors to lib.sh
A few MPTCP selftests are using the same code to print stats in case of error. This code can then be moved to mptcp_lib.sh. No behaviour changes intended, except to print the error in red and to stderr, like most error messages. Reviewed-by: Geliang Tang Signed-off-by: Matthieu Baerts (NGI0) --- tools/testing/selftests/net/mptcp/mptcp_connect.sh | 8 ++-- tools/testing/selftests/net/mptcp/mptcp_join.sh| 9 ++--- tools/testing/selftests/net/mptcp/mptcp_lib.sh | 21 + tools/testing/selftests/net/mptcp/mptcp_sockopt.sh | 8 ++-- tools/testing/selftests/net/mptcp/simult_flows.sh | 8 ++-- 5 files changed, 29 insertions(+), 25 deletions(-) diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh b/tools/testing/selftests/net/mptcp/mptcp_connect.sh index b48b4e56826a9cfdb3501242b707ae2ebe29b220..bfdaecd0a6a0564020530345daf91bed296bc15c 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh @@ -445,12 +445,8 @@ do_transfer() printf "(duration %05sms) " "${duration}" if [ ${rets} -ne 0 ] || [ ${retc} -ne 0 ]; then mptcp_lib_pr_fail "client exit code $retc, server $rets" - echo -e "\nnetns ${listener_ns} socket stat for ${port}:" 1>&2 - ip netns exec ${listener_ns} ss -Menita 1>&2 -o "sport = :$port" - cat /tmp/${listener_ns}.out - echo -e "\nnetns ${connector_ns} socket stat for ${port}:" 1>&2 - ip netns exec ${connector_ns} ss -Menita 1>&2 -o "dport = :$port" - [ ${listener_ns} != ${connector_ns} ] && cat /tmp/${connector_ns}.out + mptcp_lib_pr_err_stats "${listener_ns}" "${connector_ns}" "${port}" \ + "/tmp/${listener_ns}.out" "/tmp/${connector_ns}.out" echo cat "$capout" diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh index c07e2bd3a315aac9c422fed85c3196ec46e060f7..13a3b68181ee14eb628a858e5738094c3c936b74 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -1039,13 +1039,8 @@ do_transfer() if [ ${rets} -ne 0 ] || [ ${retc} -ne 0 ]; then fail_test "client exit code $retc, server $rets" - echo -e "\nnetns ${listener_ns} socket stat for ${port}:" 1>&2 - ip netns exec ${listener_ns} ss -Menita 1>&2 -o "sport = :$port" - cat /tmp/${listener_ns}.out - echo -e "\nnetns ${connector_ns} socket stat for ${port}:" 1>&2 - ip netns exec ${connector_ns} ss -Menita 1>&2 -o "dport = :$port" - cat /tmp/${connector_ns}.out - + mptcp_lib_pr_err_stats "${listener_ns}" "${connector_ns}" "${port}" \ + "/tmp/${listener_ns}.out" "/tmp/${connector_ns}.out" return 1 fi diff --git a/tools/testing/selftests/net/mptcp/mptcp_lib.sh b/tools/testing/selftests/net/mptcp/mptcp_lib.sh index 975d4d4c862afff2e685e86dc08a892dbd09d783..91a1d3b76e664bd95fc36310ac2e2c89bfba1aa1 100644 --- a/tools/testing/selftests/net/mptcp/mptcp_lib.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_lib.sh @@ -107,6 +107,27 @@ mptcp_lib_pr_info() { mptcp_lib_print_info "INFO: ${*}" } +# $1-2: listener/connector ns ; $3 port ; $4-5 listener/connector stat file +mptcp_lib_pr_err_stats() { + local lns="${1}" + local cns="${2}" + local port="${3}" + local lstat="${4}" + local cstat="${5}" + + echo -en "${MPTCP_LIB_COLOR_RED}" + { + printf "\nnetns %s (listener) socket stat for %d:\n" "${lns}" "${port}" + ip netns exec "${lns}" ss -Menita -o "sport = :${port}" + cat "${lstat}" + + printf "\nnetns %s (connector) socket stat for %d:\n" "${cns}" "${port}" + ip netns exec "${cns}" ss -Menita -o "dport = :${port}" + [ "${lstat}" != "${cstat}" ] && cat "${cstat}" + } 1>&2 + echo -en "${MPTCP_LIB_COLOR_RESET}" +} + # SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var can be set when validating all # features using the last version of the kernel and the selftests to make sure # a test is not being skipped by mistake. diff --git a/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh b/tools/testin
[PATCH net-next 2/6] selftests: mptcp: sockopt: save nstat infos
From: Geliang Tang Similar to the way nstat information is stored in mptcp_connect.sh and mptcp_join.sh scripts, this patch adds a similar way for mptcp_sockopt.sh and displays the nstat information when errors occur. Signed-off-by: Geliang Tang Reviewed-by: Matthieu Baerts (NGI0) Signed-off-by: Matthieu Baerts (NGI0) --- tools/testing/selftests/net/mptcp/mptcp_sockopt.sh | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh b/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh index 5e8d5b83e2d092879efc179f1a450542be4e575e..9a78bfdc3d5e27fdf6859d34f8d313bd08dd4457 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh @@ -169,6 +169,11 @@ do_transfer() cmsg+=",TCPINQ" fi + NSTAT_HISTORY=/tmp/${listener_ns}.nstat ip netns exec ${listener_ns} \ + nstat -n + NSTAT_HISTORY=/tmp/${connector_ns}.nstat ip netns exec ${connector_ns} \ + nstat -n + timeout ${timeout_test} \ ip netns exec ${listener_ns} \ $mptcp_connect -t ${timeout_poll} -l -M 1 -p $port -s ${srv_proto} -c "${cmsg}" \ @@ -189,14 +194,20 @@ do_transfer() wait $spid local rets=$? + NSTAT_HISTORY=/tmp/${listener_ns}.nstat ip netns exec ${listener_ns} \ + nstat | grep Tcp > /tmp/${listener_ns}.out + NSTAT_HISTORY=/tmp/${connector_ns}.nstat ip netns exec ${connector_ns} \ + nstat | grep Tcp > /tmp/${connector_ns}.out + print_title "Transfer ${ip:2}" if [ ${rets} -ne 0 ] || [ ${retc} -ne 0 ]; then mptcp_lib_pr_fail "client exit code $retc, server $rets" echo -e "\nnetns ${listener_ns} socket stat for ${port}:" 1>&2 ip netns exec ${listener_ns} ss -Menita 1>&2 -o "sport = :$port" - + cat /tmp/${listener_ns}.out echo -e "\nnetns ${connector_ns} socket stat for ${port}:" 1>&2 ip netns exec ${connector_ns} ss -Menita 1>&2 -o "dport = :$port" + cat /tmp/${connector_ns}.out mptcp_lib_result_fail "transfer ${ip}" -- 2.47.1
[PATCH net-next 1/6] selftests: mptcp: simult_flows: unify errors msgs
In order to unify what is printed in case of error, similar to what is done in mptcp_connect.sh and mptcp_join.sh, it is interesting to do the following modifications in simult_flows.sh: - Print the rc errors at the end of the line. - Print the MIB counters. - Use the same ss options: add -M (MPTCP sockets) and -e (detailed socket information). While at it, also print of the 'max' time only in case of success, because 'mptcp_connect.c' will already print this info in case of error, e.g.: transfer slower than expected! runtime 11948 ms, expected 11921 ms Reviewed-by: Geliang Tang Signed-off-by: Matthieu Baerts (NGI0) --- tools/testing/selftests/net/mptcp/simult_flows.sh | 21 - 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/net/mptcp/simult_flows.sh b/tools/testing/selftests/net/mptcp/simult_flows.sh index 8fa77c8e9b651171a34c89bfd5c9ded0288a5bde..e98e5907d52c2d0e9c0152efda82176861905cf1 100755 --- a/tools/testing/selftests/net/mptcp/simult_flows.sh +++ b/tools/testing/selftests/net/mptcp/simult_flows.sh @@ -155,6 +155,11 @@ do_transfer() sleep 1 fi + NSTAT_HISTORY=/tmp/${ns3}.nstat ip netns exec ${ns3} \ + nstat -n + NSTAT_HISTORY=/tmp/${ns1}.nstat ip netns exec ${ns1} \ + nstat -n + timeout ${timeout_test} \ ip netns exec ${ns3} \ ./mptcp_connect -jt ${timeout_poll} -l -p $port -T $max_time \ @@ -180,25 +185,31 @@ do_transfer() kill ${cappid_connector} fi + NSTAT_HISTORY=/tmp/${ns3}.nstat ip netns exec ${ns3} \ + nstat | grep Tcp > /tmp/${ns3}.out + NSTAT_HISTORY=/tmp/${ns1}.nstat ip netns exec ${ns1} \ + nstat | grep Tcp > /tmp/${ns1}.out + cmp $sin $cout > /dev/null 2>&1 local cmps=$? cmp $cin $sout > /dev/null 2>&1 local cmpc=$? - printf "%-16s" " max $max_time " if [ $retc -eq 0 ] && [ $rets -eq 0 ] && \ [ $cmpc -eq 0 ] && [ $cmps -eq 0 ]; then + printf "%-16s" " max $max_time " mptcp_lib_pr_ok cat "$capout" return 0 fi - mptcp_lib_pr_fail - echo "client exit code $retc, server $rets" 1>&2 + mptcp_lib_pr_fail "client exit code $retc, server $rets" echo -e "\nnetns ${ns3} socket stat for $port:" 1>&2 - ip netns exec ${ns3} ss -nita 1>&2 -o "sport = :$port" + ip netns exec ${ns3} ss -Menita 1>&2 -o "sport = :$port" + cat /tmp/${ns3}.out echo -e "\nnetns ${ns1} socket stat for $port:" 1>&2 - ip netns exec ${ns1} ss -nita 1>&2 -o "dport = :$port" + ip netns exec ${ns1} ss -Menita 1>&2 -o "dport = :$port" + cat /tmp/${ns1}.out ls -l $sin $cout ls -l $cin $sout -- 2.47.1
[PATCH net-next 0/6] mptcp: selftests: more debug in case of errors
Here are just a bunch of small improvements for the MPTCP selftests: Patch 1: Unify errors messages in simult_flows: print MIB and 'ss -Me'. Patch 2: Unify errors messages in sockopt: print MIB. Patch 3: Move common code to print debug info to mptcp_lib.sh. Patch 4: Use 'ss' with '-m' in case of errors. Patch 5: Remove an unused variable. Patch 6: Print only the size instead of size + filename again. Signed-off-by: Matthieu Baerts (NGI0) --- Geliang Tang (1): selftests: mptcp: sockopt: save nstat infos Matthieu Baerts (NGI0) (5): selftests: mptcp: simult_flows: unify errors msgs selftests: mptcp: move stats info in case of errors to lib.sh selftests: mptcp: add -m with ss in case of errors selftests: mptcp: connect: remove unused variable selftests: mptcp: connect: better display the files size tools/testing/selftests/net/mptcp/mptcp_connect.sh | 13 - tools/testing/selftests/net/mptcp/mptcp_join.sh| 9 ++--- tools/testing/selftests/net/mptcp/mptcp_lib.sh | 21 + tools/testing/selftests/net/mptcp/mptcp_sockopt.sh | 17 - tools/testing/selftests/net/mptcp/simult_flows.sh | 21 ++--- 5 files changed, 53 insertions(+), 28 deletions(-) --- base-commit: 9c7ad35632297edc08d0f2c7b599137e9fb5f9ff change-id: 20250114-net-next-mptcp-st-more-debug-err-3f3f1aa15a10 Best regards, -- Matthieu Baerts (NGI0)
[PATCH net 3/3] selftests: mptcp: avoid spurious errors on disconnect
From: Paolo Abeni The disconnect test-case generates spurious errors: INFO: disconnect INFO: extra options: -I 3 -i /tmp/tmp.r43niviyoI 01 ns1 MPTCP -> ns1 (10.0.1.1:1 ) MPTCP (duration 140ms) [FAIL] file received by server does not match (in, out): Unexpected revents: POLLERR/POLLNVAL(19) -rw-r--r-- 1 root root 10028676 Jan 10 10:47 /tmp/tmp.r43niviyoI.disconnect Trailing bytes are: ��\R���!8��u2��5N% -rw--- 1 root root 9992290 Jan 10 10:47 /tmp/tmp.Os4UbnWbI1 Trailing bytes are: ��\R���!8��u2��5N% 02 ns1 MPTCP -> ns1 (dead:beef:1::1:10001) MPTCP (duration 206ms) [ OK ] 03 ns1 MPTCP -> ns1 (dead:beef:1::1:10002) TCP (duration 31ms) [ OK ] 04 ns1 TCP -> ns1 (dead:beef:1::1:10003) MPTCP (duration 26ms) [ OK ] [FAIL] Tests of the full disconnection have failed Time: 2 seconds The root cause is actually in the user-space bits: the test program currently disconnects as soon as all the pending data has been spooled, generating an FASTCLOSE. If such option reaches the peer before the latter has reached the closed status, the msk socket will report an error to the user-space, as per protocol specification, causing the above failure. Address the issue explicitly waiting for all the relevant sockets to reach a closed status before performing the disconnect. Fixes: 05be5e273c84 ("selftests: mptcp: add disconnect tests") Cc: sta...@vger.kernel.org Signed-off-by: Paolo Abeni Reviewed-by: Matthieu Baerts (NGI0) Signed-off-by: Matthieu Baerts (NGI0) --- tools/testing/selftests/net/mptcp/mptcp_connect.c | 43 +-- 1 file changed, 32 insertions(+), 11 deletions(-) diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c b/tools/testing/selftests/net/mptcp/mptcp_connect.c index 4209b95690394b7565f330a37606bf39b6d2d228..414addef9a4514c489ecd09249143fe0ce2af649 100644 --- a/tools/testing/selftests/net/mptcp/mptcp_connect.c +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c @@ -25,6 +25,8 @@ #include #include +#include + #include #include @@ -1211,23 +1213,42 @@ static void parse_setsock_options(const char *name) exit(1); } -void xdisconnect(int fd, int addrlen) +void xdisconnect(int fd) { - struct sockaddr_storage empty; + socklen_t addrlen = sizeof(struct sockaddr_storage); + struct sockaddr_storage addr, empty; int msec_sleep = 10; - int queued = 1; - int i; + void *raw_addr; + int i, cmdlen; + char cmd[128]; + + /* get the local address and convert it to string */ + if (getsockname(fd, (struct sockaddr *)&addr, &addrlen) < 0) + xerror("getsockname"); + + if (addr.ss_family == AF_INET) + raw_addr = &(((struct sockaddr_in *)&addr)->sin_addr); + else if (addr.ss_family == AF_INET6) + raw_addr = &(((struct sockaddr_in6 *)&addr)->sin6_addr); + else + xerror("bad family"); + + strcpy(cmd, "ss -M | grep -q "); + cmdlen = strlen(cmd); + if (!inet_ntop(addr.ss_family, raw_addr, &cmd[cmdlen], + sizeof(cmd) - cmdlen)) + xerror("inet_ntop"); shutdown(fd, SHUT_WR); - /* while until the pending data is completely flushed, the later + /* +* wait until the pending data is completely flushed and all +* the MPTCP sockets reached the closed status. * disconnect will bypass/ignore/drop any pending data. */ for (i = 0; ; i += msec_sleep) { - if (ioctl(fd, SIOCOUTQ, &queued) < 0) - xerror("can't query out socket queue: %d", errno); - - if (!queued) + /* closed socket are not listed by 'ss' */ + if (system(cmd) != 0) break; if (i > poll_timeout) @@ -1281,9 +1302,9 @@ int main_loop(void) return ret; if (cfg_truncate > 0) { - xdisconnect(fd, peer->ai_addrlen); + xdisconnect(fd); } else if (--cfg_repeat > 0) { - xdisconnect(fd, peer->ai_addrlen); + xdisconnect(fd); /* the socket could be unblocking at this point, we need the * connect to be blocking -- 2.47.1
[PATCH net 2/3] mptcp: fix spurious wake-up on under memory pressure
From: Paolo Abeni The wake-up condition currently implemented by mptcp_epollin_ready() is wrong, as it could mark the MPTCP socket as readable even when no data are present and the system is under memory pressure. Explicitly check for some data being available in the receive queue. Fixes: 5684ab1a0eff ("mptcp: give rcvlowat some love") Cc: sta...@vger.kernel.org Signed-off-by: Paolo Abeni Reviewed-by: Matthieu Baerts (NGI0) Signed-off-by: Matthieu Baerts (NGI0) --- net/mptcp/protocol.h | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index a93e661ef5c435155066ce9cc109092661f0711c..73526f1d768fcb6ba5bcf1a43eb09ed5ff9d67bf 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -760,10 +760,15 @@ static inline u64 mptcp_data_avail(const struct mptcp_sock *msk) static inline bool mptcp_epollin_ready(const struct sock *sk) { + u64 data_avail = mptcp_data_avail(mptcp_sk(sk)); + + if (!data_avail) + return false; + /* mptcp doesn't have to deal with small skbs in the receive queue, -* at it can always coalesce them +* as it can always coalesce them */ - return (mptcp_data_avail(mptcp_sk(sk)) >= sk->sk_rcvlowat) || + return (data_avail >= sk->sk_rcvlowat) || (mem_cgroup_sockets_enabled && sk->sk_memcg && mem_cgroup_under_socket_pressure(sk->sk_memcg)) || READ_ONCE(tcp_memory_pressure); -- 2.47.1
[PATCH net 1/3] mptcp: be sure to send ack when mptcp-level window re-opens
From: Paolo Abeni mptcp_cleanup_rbuf() is responsible to send acks when the user-space reads enough data to update the receive windows significantly. It tries hard to avoid acquiring the subflow sockets locks by checking conditions similar to the ones implemented at the TCP level. To avoid too much code duplication - the MPTCP protocol can't reuse the TCP helpers as part of the relevant status is maintained into the msk socket - and multiple costly window size computation, mptcp_cleanup_rbuf uses a rough estimate for the most recently advertised window size: the MPTCP receive free space, as recorded as at last-ack time. Unfortunately the above does not allow mptcp_cleanup_rbuf() to detect a zero to non-zero win change in some corner cases, skipping the tcp_cleanup_rbuf call and leaving the peer stuck. After commit ea66758c1795 ("tcp: allow MPTCP to update the announced window"), MPTCP has actually cheap access to the announced window value. Use it in mptcp_cleanup_rbuf() for a more accurate ack generation. Fixes: e3859603ba13 ("mptcp: better msk receive window updates") Cc: sta...@vger.kernel.org Reported-by: Jakub Kicinski Closes: https://lore.kernel.org/20250107131845.5e5de...@kernel.org Signed-off-by: Paolo Abeni Reviewed-by: Matthieu Baerts (NGI0) Signed-off-by: Matthieu Baerts (NGI0) --- net/mptcp/options.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/mptcp/options.c b/net/mptcp/options.c index a62bc874bf1e17c4ee3b4d8d94eef4d0e7c9f272..123f3f2972841aab9fc2ef338687b4c4758efd4a 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -607,7 +607,6 @@ static bool mptcp_established_options_dss(struct sock *sk, struct sk_buff *skb, } opts->ext_copy.use_ack = 1; opts->suboptions = OPTION_MPTCP_DSS; - WRITE_ONCE(msk->old_wspace, __mptcp_space((struct sock *)msk)); /* Add kind/length/subtype/flag overhead if mapping is not populated */ if (dss_size == 0) @@ -1288,7 +1287,7 @@ static void mptcp_set_rwin(struct tcp_sock *tp, struct tcphdr *th) } MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_RCVWNDCONFLICT); } - return; + goto update_wspace; } if (rcv_wnd_new != rcv_wnd_old) { @@ -1313,6 +1312,9 @@ static void mptcp_set_rwin(struct tcp_sock *tp, struct tcphdr *th) th->window = htons(new_win); MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_RCVWNDSHARED); } + +update_wspace: + WRITE_ONCE(msk->old_wspace, tp->rcv_wnd); } __sum16 __mptcp_make_csum(u64 data_seq, u32 subflow_seq, u16 data_len, __wsum sum) -- 2.47.1
[PATCH net 0/3] mptcp: fixes for connect selftest flakes
Last week, Jakub reported [1] that the MPTCP Connect selftest was unstable. It looked like it started after the introduction of some fixes [2]. After analysis from Paolo, these patches revealed existing bugs, that should be fixed by the following patches. - Patch 1: Make sure ACK are sent when MPTCP-level window re-opens. In some corner cases, the other peer was not notified when more data could be sent. A fix for v5.11, but depending on a feature introduced in v5.19. - Patch 2: Fix spurious wake-up under memory pressure. In this situation, the userspace could be invited to read data not being there yet. A fix for v6.7. - Patch 3: Fix a false positive error when running the MPTCP Connect selftest with the "disconnect" cases. The userspace could disconnect the socket too soon, which would reset (MP_FASTCLOSE) the connection, interpreted as an error by the test. A fix for v5.17. Link: https://lore.kernel.org/20250107131845.5e5de...@kernel.org [1] Link: https://lore.kernel.org/20241230-net-mptcp-rbuf-fixes-v1-0-8608af434...@kernel.org [2] Signed-off-by: Matthieu Baerts (NGI0) --- Paolo Abeni (3): mptcp: be sure to send ack when mptcp-level window re-opens mptcp: fix spurious wake-up on under memory pressure selftests: mptcp: avoid spurious errors on disconnect net/mptcp/options.c | 6 ++-- net/mptcp/protocol.h | 9 +++-- tools/testing/selftests/net/mptcp/mptcp_connect.c | 43 +-- 3 files changed, 43 insertions(+), 15 deletions(-) --- base-commit: 76201b5979768500bca362871db66d77cb4c225e change-id: 20250113-net-mptcp-connect-st-flakes-4af6389808de Best regards, -- Matthieu Baerts (NGI0)
[PATCH bpf-next/net v2 7/7] selftests/bpf: Add mptcp_subflow bpf_iter subtest
From: Geliang Tang This patch adds a "cgroup/getsockopt" program "iters_subflow" to test the newly added mptcp_subflow bpf_iter. Export mptcp_subflow helpers bpf_iter_mptcp_subflow_new/_next/_destroy, bpf_mptcp_sock_acquire/_release and other helpers into bpf_experimental.h. Use bpf_mptcp_sock_acquire() to acquire the msk, then use bpf_for_each() to walk the subflow list of this msk. From there, future MPTCP-specific kfunc can be called in the loop. Because they are not there yet, this test doesn't do anything very useful for the moment, but it focuses on validating the 'bpf_iter' part and the basic MPTCP kfunc. That's why it simply adds all subflow ids to local variable local_ids to make sure all subflows have been seen, then invoke mptcp_subflow_tcp_sock() in the loop to pick the subflow context. Out of the loop, use bpf_mptcp_subflow_ctx() to get the subflow context of the picked subflow context and do some verifications. Finally, assign local_ids to global variable ids so that the application can obtain this value, and release the msk. A related subtest called test_iters_subflow is added to load and verify the newly added mptcp_subflow type bpf_iter example in test_mptcp. The endpoint_init() helper is used to add 3 new subflow endpoints. Then one byte of message is sent to trigger the creation of new subflows. getsockopt() is invoked once the subflows have been created to trigger the "cgroup/getsockopt" test program "iters_subflow". skel->bss->ids is then checked to make sure it equals 10, the sum of each subflow ID: we should have 4 subflows: 1 + 2 + 3 + 4 = 10. If that's the case, the bpf_iter loop did the job as expected. Signed-off-by: Geliang Tang Reviewed-by: Matthieu Baerts (NGI0) Signed-off-by: Matthieu Baerts (NGI0) --- Notes: - v2: - explicit sk protocol checks are no longer needed, implicitly done in bpf_skc_to_mptcp_sock(). - use bpf_skc_to_mptcp_sock() instead of bpf_mptcp_sk(), and mptcp_subflow_tcp_sock() instead of bpf_mptcp_subflow_tcp_sock(). - bpf_mptcp_subflow_ctx() can now return NULL. --- tools/testing/selftests/bpf/bpf_experimental.h | 8 +++ tools/testing/selftests/bpf/prog_tests/mptcp.c | 73 ++ tools/testing/selftests/bpf/progs/mptcp_bpf.h | 9 +++ .../testing/selftests/bpf/progs/mptcp_bpf_iters.c | 63 +++ 4 files changed, 153 insertions(+) diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h index cd8ecd39c3f3c68d40c6e3e1465b42ed66537027..2ab3f0063c0fd6091ee19da4787671b89b5661f0 100644 --- a/tools/testing/selftests/bpf/bpf_experimental.h +++ b/tools/testing/selftests/bpf/bpf_experimental.h @@ -575,6 +575,14 @@ extern int bpf_iter_css_new(struct bpf_iter_css *it, extern struct cgroup_subsys_state *bpf_iter_css_next(struct bpf_iter_css *it) __weak __ksym; extern void bpf_iter_css_destroy(struct bpf_iter_css *it) __weak __ksym; +struct bpf_iter_mptcp_subflow; +extern int bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it, + struct mptcp_sock *msk) __weak __ksym; +extern struct mptcp_subflow_context * +bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it) __weak __ksym; +extern void +bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it) __weak __ksym; + extern int bpf_wq_init(struct bpf_wq *wq, void *p__map, unsigned int flags) __weak __ksym; extern int bpf_wq_start(struct bpf_wq *wq, unsigned int flags) __weak __ksym; extern int bpf_wq_set_callback_impl(struct bpf_wq *wq, diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c index 85f3d4119802a85c86cde7b74a0b857252bad8b8..f37574b5ef68d8f32f8002df317869dfdf1d4b2d 100644 --- a/tools/testing/selftests/bpf/prog_tests/mptcp.c +++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c @@ -11,6 +11,7 @@ #include "mptcp_sock.skel.h" #include "mptcpify.skel.h" #include "mptcp_subflow.skel.h" +#include "mptcp_bpf_iters.skel.h" #define NS_TEST "mptcp_ns" #define ADDR_1 "10.0.1.1" @@ -33,6 +34,9 @@ #ifndef MPTCP_INFO #define MPTCP_INFO 1 #endif +#ifndef TCP_IS_MPTCP +#define TCP_IS_MPTCP 43 /* Is MPTCP being used? */ +#endif #ifndef MPTCP_INFO_FLAG_FALLBACK #define MPTCP_INFO_FLAG_FALLBACK _BITUL(0) #endif @@ -480,6 +484,73 @@ static void test_subflow(void) close(cgroup_fd); } +static void run_iters_subflow(void) +{ + int server_fd, client_fd; + int is_mptcp, err; + socklen_t len; + + server_fd = start_mptcp_server(AF_INET, ADDR_1, PORT_1, 0); + if (!ASSERT_OK_FD(server_fd, "start_mptcp_server")) + return; + + client_fd = connect_to_fd(server_fd, 0); + if (!ASSERT_OK_FD(client_fd, "connect_to_
[PATCH bpf-next/net v2 6/7] selftests/bpf: More endpoints for endpoint_init
From: Geliang Tang This patch changes ADDR_2 from "10.0.1.2" to "10.0.2.1", and adds two more IPv4 test addresses ADDR_3 - ADDR_4, four IPv6 addresses ADDR6_1 - ADDR6_4. Add a new helper address_init() to initialize all these addresses. Add a new parameter "endpoints" for endpoint_init() to control how many endpoints are used for the tests. This makes it more flexible. Update the parameters of endpoint_init() in test_subflow(). Signed-off-by: Geliang Tang Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- tools/testing/selftests/bpf/prog_tests/mptcp.c | 56 +++--- 1 file changed, 50 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c index f8eb7f9d4fd20bbb7ee018728f7ae0f0a09d4d30..85f3d4119802a85c86cde7b74a0b857252bad8b8 100644 --- a/tools/testing/selftests/bpf/prog_tests/mptcp.c +++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c @@ -14,7 +14,13 @@ #define NS_TEST "mptcp_ns" #define ADDR_1 "10.0.1.1" -#define ADDR_2 "10.0.1.2" +#define ADDR_2 "10.0.2.1" +#define ADDR_3 "10.0.3.1" +#define ADDR_4 "10.0.4.1" +#define ADDR6_1"dead:beef:1::1" +#define ADDR6_2"dead:beef:2::1" +#define ADDR6_3"dead:beef:3::1" +#define ADDR6_4"dead:beef:4::1" #define PORT_1 10001 #ifndef IPPROTO_MPTCP @@ -322,22 +328,60 @@ static void test_mptcpify(void) close(cgroup_fd); } -static int endpoint_init(char *flags) +static int address_init(void) { SYS(fail, "ip -net %s link add veth1 type veth peer name veth2", NS_TEST); SYS(fail, "ip -net %s addr add %s/24 dev veth1", NS_TEST, ADDR_1); + SYS(fail, "ip -net %s addr add %s/64 dev veth1 nodad", NS_TEST, ADDR6_1); SYS(fail, "ip -net %s link set dev veth1 up", NS_TEST); SYS(fail, "ip -net %s addr add %s/24 dev veth2", NS_TEST, ADDR_2); + SYS(fail, "ip -net %s addr add %s/64 dev veth2 nodad", NS_TEST, ADDR6_2); SYS(fail, "ip -net %s link set dev veth2 up", NS_TEST); - if (SYS_NOFAIL("ip -net %s mptcp endpoint add %s %s", NS_TEST, ADDR_2, flags)) { + + SYS(fail, "ip -net %s link add veth3 type veth peer name veth4", NS_TEST); + SYS(fail, "ip -net %s addr add %s/24 dev veth3", NS_TEST, ADDR_3); + SYS(fail, "ip -net %s addr add %s/64 dev veth3 nodad", NS_TEST, ADDR6_3); + SYS(fail, "ip -net %s link set dev veth3 up", NS_TEST); + SYS(fail, "ip -net %s addr add %s/24 dev veth4", NS_TEST, ADDR_4); + SYS(fail, "ip -net %s addr add %s/64 dev veth4 nodad", NS_TEST, ADDR6_4); + SYS(fail, "ip -net %s link set dev veth4 up", NS_TEST); + + return 0; +fail: + return -1; +} + +static int endpoint_add(char *addr, char *flags) +{ + return SYS_NOFAIL("ip -net %s mptcp endpoint add %s %s", NS_TEST, addr, flags); +} + +static int endpoint_init(char *flags, u8 endpoints) +{ + int ret = -1; + + if (!endpoints || endpoints > 4) + goto fail; + + if (address_init()) + goto fail; + + if (SYS_NOFAIL("ip -net %s mptcp limits set add_addr_accepted 4 subflows 4", + NS_TEST)) { printf("'ip mptcp' not supported, skip this test.\n"); test__skip(); goto fail; } - return 0; + if (endpoints > 1) + ret = endpoint_add(ADDR_2, flags); + if (endpoints > 2) + ret = ret ?: endpoint_add(ADDR_3, flags); + if (endpoints > 3) + ret = ret ?: endpoint_add(ADDR_4, flags); + fail: - return -1; + return ret; } static void wait_for_new_subflows(int fd) @@ -423,7 +467,7 @@ static void test_subflow(void) if (!ASSERT_OK_PTR(netns, "netns_new: mptcp_subflow")) goto skel_destroy; - if (endpoint_init("subflow") < 0) + if (endpoint_init("subflow", 2) < 0) goto close_netns; run_subflow(); -- 2.47.1
[PATCH bpf-next/net v2 5/7] bpf: Acquire and release mptcp socket
From: Geliang Tang The KF_TRUSTED_ARGS flag is used for bpf_iter_mptcp_subflow_new, it indicates that the all pointer arguments are valid. It's necessary to add a KF_ACQUIRE helper to get valid "msk". This patch adds bpf_mptcp_sock_acquire() and bpf_mptcp_sock_release() helpers for this. Increase sk->sk_refcnt in _acquire() and decrease it in _release(). Register them with KF_ACQUIRE flag and KF_RELEASE flag. Signed-off-by: Geliang Tang Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- net/mptcp/bpf.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c index e39f0e4fb683c1aa31ee075281daee218dac5878..d50bd1ea7f6d0ff1abff32deef9a98b98ee8f42c 100644 --- a/net/mptcp/bpf.c +++ b/net/mptcp/bpf.c @@ -97,6 +97,23 @@ bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it) { } +__bpf_kfunc static struct +mptcp_sock *bpf_mptcp_sock_acquire(struct mptcp_sock *msk) +{ + struct sock *sk = (struct sock *)msk; + + if (sk && refcount_inc_not_zero(&sk->sk_refcnt)) + return msk; + return NULL; +} + +__bpf_kfunc static void bpf_mptcp_sock_release(struct mptcp_sock *msk) +{ + struct sock *sk = (struct sock *)msk; + + WARN_ON_ONCE(!sk || !refcount_dec_not_one(&sk->sk_refcnt)); +} + __bpf_kfunc_end_defs(); BTF_KFUNCS_START(bpf_mptcp_common_kfunc_ids) @@ -104,6 +121,8 @@ BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx, KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW | KF_TRUSTED_ARGS) BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next, KF_ITER_NEXT | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy, KF_ITER_DESTROY) +BTF_ID_FLAGS(func, bpf_mptcp_sock_acquire, KF_ACQUIRE | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_mptcp_sock_release, KF_RELEASE) BTF_KFUNCS_END(bpf_mptcp_common_kfunc_ids) static const struct btf_kfunc_id_set bpf_mptcp_common_kfunc_set = { -- 2.47.1
[PATCH bpf-next/net v2 4/7] bpf: Add mptcp_subflow bpf_iter
From: Geliang Tang It's necessary to traverse all subflows on the conn_list of an MPTCP socket and then call kfunc to modify the fields of each subflow. In kernel space, mptcp_for_each_subflow() helper is used for this: mptcp_for_each_subflow(msk, subflow) kfunc(subflow); But in the MPTCP BPF program, this has not yet been implemented. As Martin suggested recently, this conn_list walking + modify-by-kfunc usage fits the bpf_iter use case. So this patch adds a new bpf_iter type named "mptcp_subflow" to do this and implements its helpers bpf_iter_mptcp_subflow_new()/_next()/ _destroy(). And register these bpf_iter mptcp_subflow into mptcp common kfunc set. Then bpf_for_each() for mptcp_subflow can be used in BPF program like this: bpf_for_each(mptcp_subflow, subflow, msk) kfunc(subflow); Suggested-by: Martin KaFai Lau Signed-off-by: Geliang Tang Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- Notes: - v2: - Add BUILD_BUG_ON() checks, similar to the ones done with other bpf_iter_(...) helpers. - Replace msk_owned_by_me() by sock_owned_by_user_nocheck() and !spin_is_locked() (Martin). A few versions of this single patch have been previously posted to the BPF mailing list by Geliang, before continuing to the MPTCP mailing list only, with other patches of this series. The version of the whole series has been reset to 1, but here is the ChangeLog for the previous ones: - v2: remove msk->pm.lock in _new() and _destroy() (Martin) drop DEFINE_BPF_ITER_FUNC, change opaque[3] to opaque[2] (Andrii) - v3: drop bpf_iter__mptcp_subflow - v4: if msk is NULL, initialize kit->msk to NULL in _new() and check it in _next() (Andrii) - v5: use list_is_last() instead of list_entry_is_head() add KF_ITER_NEW/NEXT/DESTROY flags add msk_owned_by_me in _new() - v6: add KF_TRUSTED_ARGS flag (Andrii, Martin) --- net/mptcp/bpf.c | 53 + 1 file changed, 53 insertions(+) diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c index c5bfd84c16c43230d9d8e1fd8ff781a767e647b5..e39f0e4fb683c1aa31ee075281daee218dac5878 100644 --- a/net/mptcp/bpf.c +++ b/net/mptcp/bpf.c @@ -35,6 +35,15 @@ static const struct btf_kfunc_id_set bpf_mptcp_fmodret_set = { .set = &bpf_mptcp_fmodret_ids, }; +struct bpf_iter_mptcp_subflow { + __u64 __opaque[2]; +} __aligned(8); + +struct bpf_iter_mptcp_subflow_kern { + struct mptcp_sock *msk; + struct list_head *pos; +} __aligned(8); + __bpf_kfunc_start_defs(); __bpf_kfunc static struct mptcp_subflow_context * @@ -47,10 +56,54 @@ bpf_mptcp_subflow_ctx(const struct sock *sk) return NULL; } +__bpf_kfunc static int +bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it, + struct mptcp_sock *msk) +{ + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it; + struct sock *sk = (struct sock *)msk; + + BUILD_BUG_ON(sizeof(struct bpf_iter_mptcp_subflow_kern) > +sizeof(struct bpf_iter_mptcp_subflow)); + BUILD_BUG_ON(__alignof__(struct bpf_iter_mptcp_subflow_kern) != +__alignof__(struct bpf_iter_mptcp_subflow)); + + kit->msk = msk; + if (!msk) + return -EINVAL; + + if (!sock_owned_by_user_nocheck(sk) && + !spin_is_locked(&sk->sk_lock.slock)) + return -EINVAL; + + kit->pos = &msk->conn_list; + return 0; +} + +__bpf_kfunc static struct mptcp_subflow_context * +bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it) +{ + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it; + + if (!kit->msk || list_is_last(kit->pos, &kit->msk->conn_list)) + return NULL; + + kit->pos = kit->pos->next; + return list_entry(kit->pos, struct mptcp_subflow_context, node); +} + +__bpf_kfunc static void +bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it) +{ +} + __bpf_kfunc_end_defs(); BTF_KFUNCS_START(bpf_mptcp_common_kfunc_ids) BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx, KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW | KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next, KF_ITER_NEXT | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy, KF_ITER_DESTROY) BTF_KFUNCS_END(bpf_mptcp_common_kfunc_ids) static const struct btf_kfunc_id_set bpf_mptcp_common_kfunc_set = { -- 2.47.1
[PATCH bpf-next/net v2 3/7] bpf: Register mptcp common kfunc set
From: Geliang Tang MPTCP helper mptcp_sk() is used to convert struct sock to mptcp_sock. Helpers mptcp_subflow_ctx() and mptcp_subflow_tcp_sock() are used to convert between struct mptcp_subflow_context and sock. They all will be used in MPTCP BPF programs too. This patch defines corresponding wrappers of them, and put the wrappers into mptcp common kfunc set and register the set with the flag BPF_PROG_TYPE_UNSPEC to let them accessible to all types of BPF programs. Signed-off-by: Geliang Tang Reviewed-by: Mat Martineau Reviewed-by: Matthieu Baerts (NGI0) Signed-off-by: Matthieu Baerts (NGI0) --- Notes: - v2: - Thanks to the two new previous patches, bpf_mptcp_sk() and bpf_mptcp_subflow_tcp_sock() are no longer needed. - bpf_mptcp_subflow_ctx(): make sure the socket is an MPTCP subflow, and add KF_RET_NULL (Martin). - Restrict this kfunc to BPF_PROG_TYPE_CGROUP_SOCKOPT for the moment. --- net/mptcp/bpf.c | 31 ++- 1 file changed, 30 insertions(+), 1 deletion(-) diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c index 988b22a06331ac293b36f3c6f9bc29ff0f6db048..c5bfd84c16c43230d9d8e1fd8ff781a767e647b5 100644 --- a/net/mptcp/bpf.c +++ b/net/mptcp/bpf.c @@ -35,8 +35,37 @@ static const struct btf_kfunc_id_set bpf_mptcp_fmodret_set = { .set = &bpf_mptcp_fmodret_ids, }; +__bpf_kfunc_start_defs(); + +__bpf_kfunc static struct mptcp_subflow_context * +bpf_mptcp_subflow_ctx(const struct sock *sk) +{ + if (sk && sk_fullsock(sk) && + sk->sk_protocol == IPPROTO_TCP && sk_is_mptcp(sk)) + return mptcp_subflow_ctx(sk); + + return NULL; +} + +__bpf_kfunc_end_defs(); + +BTF_KFUNCS_START(bpf_mptcp_common_kfunc_ids) +BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx, KF_RET_NULL) +BTF_KFUNCS_END(bpf_mptcp_common_kfunc_ids) + +static const struct btf_kfunc_id_set bpf_mptcp_common_kfunc_set = { + .owner = THIS_MODULE, + .set= &bpf_mptcp_common_kfunc_ids, +}; + static int __init bpf_mptcp_kfunc_init(void) { - return register_btf_fmodret_id_set(&bpf_mptcp_fmodret_set); + int ret; + + ret = register_btf_fmodret_id_set(&bpf_mptcp_fmodret_set); + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCKOPT, + &bpf_mptcp_common_kfunc_set); + + return ret; } late_initcall(bpf_mptcp_kfunc_init); -- 2.47.1
[PATCH bpf-next/net v2 2/7] bpf: Allow use of skc_to_mptcp_sock in cg_sockopt
From: Geliang Tang Currently, bpf_skc_to_mptcp_sock() helper is not allowed to be used in cg_sockopt. This patch adds this permission. Thanks to the previous patch allowing skc_to_mptcp_sock() to be used with MPTCP sockets, this permission allows this helper to be use it in CGroup BPF hooks, e.g. [gs]etsocktopt. Signed-off-by: Geliang Tang Reviewed-by: Matthieu Baerts (NGI0) Signed-off-by: Matthieu Baerts (NGI0) --- Notes: - v2: new patch. --- kernel/bpf/cgroup.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index 46e5db65dbc8d8c6591b53dfc77bb689357f33ea..1ca22e4842cf6b6c6bdbc37bf415187e706b2b69 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -2358,6 +2358,8 @@ cg_sockopt_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) #ifdef CONFIG_INET case BPF_FUNC_tcp_sock: return &bpf_tcp_sock_proto; + case BPF_FUNC_skc_to_mptcp_sock: + return &bpf_skc_to_mptcp_sock_proto; #endif case BPF_FUNC_perf_event_output: return &bpf_event_output_data_proto; -- 2.47.1
[PATCH bpf-next/net v2 1/7] bpf: Extend bpf_skc_to_mptcp_sock to MPTCP sock
From: Geliang Tang Currently, bpf_skc_to_mptcp_sock() can only be used with sockets that are MPTCP subflows: TCP sockets with tp->is_mptcp, created by the kernel from an MPTCP socket (IPPROTO_MPTCP). Typically used with BPF sock_ops operators. Here, this helper is extended to support MPTCP sockets, the ones created by the userspace (IPPROTO_MPTCP). This is useful for BPF hooks involving these sockets, e.g. [gs]etsocktopt. bpf_skc_to_mptcp_sock() uses bpf_mptcp_sock_from_subflow(). The former suggests any MPTCP type/subtype can be used, but the latter only accepts subflow ones. So bpf_mptcp_sock_from_subflow is modified here to support MPTCP socket, and renamed to avoid confusions. Signed-off-by: Geliang Tang Reviewed-by: Matthieu Baerts (NGI0) Signed-off-by: Matthieu Baerts (NGI0) --- Notes: - v2: new patch. --- include/net/mptcp.h | 4 ++-- net/core/filter.c | 2 +- net/mptcp/bpf.c | 10 -- 3 files changed, 11 insertions(+), 5 deletions(-) diff --git a/include/net/mptcp.h b/include/net/mptcp.h index 814b5f2e3ed5e3e474a2bac5e4cca5a89abcfe1c..94d5976f7b8d8eb8b82f298f7b33ecd653937dc0 100644 --- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@ -322,9 +322,9 @@ static inline void mptcpv6_handle_mapped(struct sock *sk, bool mapped) { } #endif #if defined(CONFIG_MPTCP) && defined(CONFIG_BPF_SYSCALL) -struct mptcp_sock *bpf_mptcp_sock_from_subflow(struct sock *sk); +struct mptcp_sock *bpf_mptcp_sock_from_sock(struct sock *sk); #else -static inline struct mptcp_sock *bpf_mptcp_sock_from_subflow(struct sock *sk) { return NULL; } +static inline struct mptcp_sock *bpf_mptcp_sock_from_sock(struct sock *sk) { return NULL; } #endif #if !IS_ENABLED(CONFIG_MPTCP) diff --git a/net/core/filter.c b/net/core/filter.c index 6625b3f563a4a3110a76b9c12a46340828e16b6e..40ed3854a899bf9d11847ec0e8aee174923f03d2 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -11833,7 +11833,7 @@ const struct bpf_func_proto bpf_skc_to_unix_sock_proto = { BPF_CALL_1(bpf_skc_to_mptcp_sock, struct sock *, sk) { BTF_TYPE_EMIT(struct mptcp_sock); - return (unsigned long)bpf_mptcp_sock_from_subflow(sk); + return (unsigned long)bpf_mptcp_sock_from_sock(sk); } const struct bpf_func_proto bpf_skc_to_mptcp_sock_proto = { diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c index 8a16672b94e2384f5263e1432296cbca1236bb30..988b22a06331ac293b36f3c6f9bc29ff0f6db048 100644 --- a/net/mptcp/bpf.c +++ b/net/mptcp/bpf.c @@ -12,9 +12,15 @@ #include #include "protocol.h" -struct mptcp_sock *bpf_mptcp_sock_from_subflow(struct sock *sk) +struct mptcp_sock *bpf_mptcp_sock_from_sock(struct sock *sk) { - if (sk && sk_fullsock(sk) && sk->sk_protocol == IPPROTO_TCP && sk_is_mptcp(sk)) + if (unlikely(!sk || !sk_fullsock(sk))) + return NULL; + + if (sk->sk_protocol == IPPROTO_MPTCP) + return mptcp_sk(sk); + + if (sk->sk_protocol == IPPROTO_TCP && sk_is_mptcp(sk)) return mptcp_sk(mptcp_subflow_ctx(sk)->conn); return NULL; -- 2.47.1
[PATCH bpf-next/net v2 0/7] bpf: Add mptcp_subflow bpf_iter support
Here is a series from Geliang, adding mptcp_subflow bpf_iter support. We are working on extending MPTCP with BPF, e.g. to control the path manager -- in charge of the creation, deletion, and announcements of subflows (paths) -- and the packet scheduler -- in charge of selecting which available path the next data will be sent to. These extensions need to iterate over the list of subflows attached to an MPTCP connection, and do some specific actions via some new kfunc that will be added later on. This preparation work is split in different patches: - Patch 1: extend bpf_skc_to_mptcp_sock() to be called with msk. - Patch 2: allow using skc_to_mptcp_sock() in CGroup sockopt hooks. - Patch 3: register some "basic" MPTCP kfunc. - Patch 4: add mptcp_subflow bpf_iter support. Note that previous versions of this single patch have already been shared to the BPF mailing list. The changelog has been kept with a comment, but the version number has been reset to avoid confusions. - Patch 5: add kfunc to make sure the msk is valid - Patch 6: add more MPTCP endpoints in the selftests, in order to create more than 2 subflows. - Patch 7: add a very simple test validating mptcp_subflow bpf_iter support. This test could be written without the new bpf_iter, but it is there only to make sure this specific feature works as expected. Signed-off-by: Matthieu Baerts (NGI0) --- Changes in v2: - Patches 1-2: new ones. - Patch 3: remove two kfunc, more restrictions. (Martin) - Patch 4: add BUILD_BUG_ON(), more restrictions. (Martin) - Patch 7: adaptations due to modifications in patches 1-4. - Link to v1: https://lore.kernel.org/r/20241108-bpf-next-net-mptcp-bpf_iter-subflows-v1-0-cf1695303...@kernel.org --- Geliang Tang (7): bpf: Extend bpf_skc_to_mptcp_sock to MPTCP sock bpf: Allow use of skc_to_mptcp_sock in cg_sockopt bpf: Register mptcp common kfunc set bpf: Add mptcp_subflow bpf_iter bpf: Acquire and release mptcp socket selftests/bpf: More endpoints for endpoint_init selftests/bpf: Add mptcp_subflow bpf_iter subtest include/net/mptcp.h| 4 +- kernel/bpf/cgroup.c| 2 + net/core/filter.c | 2 +- net/mptcp/bpf.c| 113 +- tools/testing/selftests/bpf/bpf_experimental.h | 8 ++ tools/testing/selftests/bpf/prog_tests/mptcp.c | 129 - tools/testing/selftests/bpf/progs/mptcp_bpf.h | 9 ++ .../testing/selftests/bpf/progs/mptcp_bpf_iters.c | 63 ++ 8 files changed, 318 insertions(+), 12 deletions(-) --- base-commit: dad704ebe38642cd405e15b9c51263356391355c change-id: 20241108-bpf-next-net-mptcp-bpf_iter-subflows-027f6d87770e Best regards, -- Matthieu Baerts (NGI0)
Re: [PATCH net-next v12 11/22] ovpn: implement TCP transport
On 09/12/2024 15:08, Antonio Quartulli wrote: > On 09/12/2024 12:31, Matthieu Baerts wrote: >> On 09/12/2024 11:58, Antonio Quartulli wrote: >>> On 09/12/2024 11:46, Matthieu Baerts wrote: >>>> Hi Antonio, >>>> >>>> Thank you for working on this, and sharing your work here! >>>> >>>> On 05/12/2024 00:09, Antonio Quartulli wrote: >>>>> On 04/12/2024 23:52, Antonio Quartulli wrote: >>>>>> Paolo, >>>>>> >>>>>> On 04/12/2024 12:15, Antonio Quartulli wrote: >>>>>> [...] >>>>>>>>> + mutex_lock(&tcp6_prot_mutex); >>>>>>>>> + if (!ovpn_tcp6_prot.recvmsg) >>>>>>>>> + ovpn_tcp_build_protos(&ovpn_tcp6_prot, >>>>>>>>> &ovpn_tcp6_ops, >>>>>>>>> + sock->sk->sk_prot, >>>>>>>>> + sock->sk->sk_socket->ops); >>>>>>>>> + mutex_unlock(&tcp6_prot_mutex); >>>>>>>> >>>>>>>> This looks like an hack to avoid a build dependency on IPV6, I >>>>>>>> think >>>>>>>> the >>>>>>>> explicit >>>>>>> >>>>>>> I happily copied this approach from >>>>>>> espintcp.c:espintcp_init_sk() :-D >>>>>>> >>>>>>>> >>>>>>>> #if IS_ENABLED(CONFIG_IPV6) >>>>>>>> >>>>>>>> at init time should be preferable >>>>>> >>>>>> To get this done at init time I need inet6_stream_ops to be >>>>>> accessible, but it seems there is no EXPORT_SYMBOL() for this object. >>>>>> >>>>>> However, I see that mptcp/protocol.c is happily accessing it. >>>>>> Any clue how this is possible? >>>>> >>>>> I answer myself: mptcp is not tristate and it can only be compiled as >>>>> built-in. >>>> >>>> Indeed, that's why. >>>> >>>> Talking about MPTCP, by chance, do you plan to support it later on? :) >>> >>> Hi Matthieu, >>> >>> It is not on our current roadmap (TCP doesn't get much love in the VPN >>> world), but I agree it could be an interesting option to explore! >> >> I understand, it makes sense not to recommend using TCP for the >> transport layer for tunnelling solutions. >> >>> I have to admit that I haven't played much with MPTCP myself yet, but I >>> am more than happy to talk about potential advantages for the ovpn use >>> case. >> >> Some people told me they were interested in using OpenVPN with MPTCP to >> use multiple (low-capacity) network links at the same time. I think >> intercepting and proxying TCP traffic would always be the best in terms >> of performances, but using OpenVPN with MPTCP seems to be enough for >> some, especially when they want to "improve" some type of UDP traffic >> that cannot be intercepted: QUIC, VPN, etc. >> >> I don't have numbers to share, but I can understand this feature can >> help in some cases. > > Yeah, some people may definitely benefit from this feature. > I'll have a look at MPTCP once ovpn is merged. Thank you :) Don't hesitate to email the ML, or open an issue on the GitHub repo if needed! (More details: https://www.mptcp.dev/#communication) >> (This reminds me this: https://github.com/OpenVPN/ovpn-dco/issues/60) >> (and this: https://github.com/arinc9/openvpn/pull/1) > > Right, this definitely shows some interest and it means we should easily > find people willing to test :-) Indeed! Cheers, Matt -- Sponsored by the NGI0 Core fund.
Re: [PATCH net-next v12 11/22] ovpn: implement TCP transport
On 09/12/2024 11:58, Antonio Quartulli wrote: > On 09/12/2024 11:46, Matthieu Baerts wrote: >> Hi Antonio, >> >> Thank you for working on this, and sharing your work here! >> >> On 05/12/2024 00:09, Antonio Quartulli wrote: >>> On 04/12/2024 23:52, Antonio Quartulli wrote: >>>> Paolo, >>>> >>>> On 04/12/2024 12:15, Antonio Quartulli wrote: >>>> [...] >>>>>>> + mutex_lock(&tcp6_prot_mutex); >>>>>>> + if (!ovpn_tcp6_prot.recvmsg) >>>>>>> + ovpn_tcp_build_protos(&ovpn_tcp6_prot, &ovpn_tcp6_ops, >>>>>>> + sock->sk->sk_prot, >>>>>>> + sock->sk->sk_socket->ops); >>>>>>> + mutex_unlock(&tcp6_prot_mutex); >>>>>> >>>>>> This looks like an hack to avoid a build dependency on IPV6, I think >>>>>> the >>>>>> explicit >>>>> >>>>> I happily copied this approach from espintcp.c:espintcp_init_sk() :-D >>>>> >>>>>> >>>>>> #if IS_ENABLED(CONFIG_IPV6) >>>>>> >>>>>> at init time should be preferable >>>> >>>> To get this done at init time I need inet6_stream_ops to be >>>> accessible, but it seems there is no EXPORT_SYMBOL() for this object. >>>> >>>> However, I see that mptcp/protocol.c is happily accessing it. >>>> Any clue how this is possible? >>> >>> I answer myself: mptcp is not tristate and it can only be compiled as >>> built-in. >> >> Indeed, that's why. >> >> Talking about MPTCP, by chance, do you plan to support it later on? :) > > Hi Matthieu, > > It is not on our current roadmap (TCP doesn't get much love in the VPN > world), but I agree it could be an interesting option to explore! I understand, it makes sense not to recommend using TCP for the transport layer for tunnelling solutions. > I have to admit that I haven't played much with MPTCP myself yet, but I > am more than happy to talk about potential advantages for the ovpn use > case. Some people told me they were interested in using OpenVPN with MPTCP to use multiple (low-capacity) network links at the same time. I think intercepting and proxying TCP traffic would always be the best in terms of performances, but using OpenVPN with MPTCP seems to be enough for some, especially when they want to "improve" some type of UDP traffic that cannot be intercepted: QUIC, VPN, etc. I don't have numbers to share, but I can understand this feature can help in some cases. (This reminds me this: https://github.com/OpenVPN/ovpn-dco/issues/60) (and this: https://github.com/arinc9/openvpn/pull/1) Cheers, Matt -- Sponsored by the NGI0 Core fund.
Re: [PATCH net-next v12 11/22] ovpn: implement TCP transport
Hi Antonio, Thank you for working on this, and sharing your work here! On 05/12/2024 00:09, Antonio Quartulli wrote: > On 04/12/2024 23:52, Antonio Quartulli wrote: >> Paolo, >> >> On 04/12/2024 12:15, Antonio Quartulli wrote: >> [...] > + mutex_lock(&tcp6_prot_mutex); > + if (!ovpn_tcp6_prot.recvmsg) > + ovpn_tcp_build_protos(&ovpn_tcp6_prot, &ovpn_tcp6_ops, > + sock->sk->sk_prot, > + sock->sk->sk_socket->ops); > + mutex_unlock(&tcp6_prot_mutex); This looks like an hack to avoid a build dependency on IPV6, I think the explicit >>> >>> I happily copied this approach from espintcp.c:espintcp_init_sk() :-D >>> #if IS_ENABLED(CONFIG_IPV6) at init time should be preferable >> >> To get this done at init time I need inet6_stream_ops to be >> accessible, but it seems there is no EXPORT_SYMBOL() for this object. >> >> However, I see that mptcp/protocol.c is happily accessing it. >> Any clue how this is possible? > > I answer myself: mptcp is not tristate and it can only be compiled as > built-in. Indeed, that's why. Talking about MPTCP, by chance, do you plan to support it later on? :) Cheers, Matt -- Sponsored by the NGI0 Core fund.
Re: LKFT CI: improving Networking selftests results when validating stable kernels
Hi Dan, On 15/11/2024 14:07, Dan Carpenter wrote: > On Fri, Nov 15, 2024 at 01:43:14PM +0100, Matthieu Baerts wrote: >> Regarding the other questions from my previous email -- skipped tests >> (e.g. I think Netfilter tests are no longer validated), KVM, >> notifications -- do you know who at Linaro could eventually look at them? >> > > The skip tests were because they lead to hangs. We're going to look at those > again to see if they're still an issue. And we're also going to try enable > the > other tests you mentioned. Great, thank you! For KVM (or similar), I guess it is not available, right? Some time-sensitive tests might be unstable in such environment, and need to be skipped. For the notifications, do not hesitate to contact the corresponding maintainers, the last people who modified the problematic selftests and the netdev list. These "net" selftests are now better maintained, and they are regularly validated on the development branches: https://netdev.bots.linux.dev/status.html Cheers, Matt -- Sponsored by the NGI0 Core fund.
Re: LKFT CI: improving Networking selftests results when validating stable kernels
Hi Dan, Thank you for your reply! On 13/11/2024 18:08, Dan Carpenter wrote: > On Fri, Nov 08, 2024 at 07:21:59PM +0100, Matthieu Baerts wrote: >> KSelftests from the same version >> >> >> According to the doc [2], kselftests should support all previous kernel >> versions. The LKFT CI is then using the kselftests from the last stable >> release to validate all stable versions. Even if there are good reasons >> to do that, we would like to ask for an opt-out for this policy for the >> networking tests: this is hard to maintain with the increased >> complexity, hard to validate on all stable kernels before applying >> patches, and hard to put in place in some situations. As a result, many >> tests are failing on older kernels, and it looks like it is a lot of >> work to support older kernels, and to maintain this. >> >> Many networking tests are validating the internal behaviour that is not >> exposed to the userspace. A typical example: some tests look at the raw >> packets being exchanged during a test, and this behaviour can change >> without modifying how the userspace is interacting with the kernel. The >> kernel could expose capabilities, but that's not something that seems >> natural to put in place for internal behaviours that are not exposed to >> end users. Maybe workarounds could be used, e.g. looking at kernel >> symbols, etc. Nut that doesn't always work, increase the complexity, and >> often "false positive" issue will be noticed only after a patch hits >> stable, and will cause a bunch of tests to be ignored. >> >> Regarding fixes, ideally they will come with a new or modified test that >> can also be backported. So the coverage can continue to grow in stable >> versions too. >> >> Do you think that from the kernel v6.12 (or before?), the LKFT CI could >> run the networking kselftests from the version that is being validated, >> and not from a newer one? So validating the selftests from v6.12.1 on a >> v6.12.1, and not the ones from a future v6.16.y on a v6.12.42. >> > > These kinds of decisions are something that Greg and Shuah need to decide on. Thank you, it makes sense. > You would still need some way to automatically detect that kselftest is > running > on an old kernel and disable the networking checks. Otherwise when random > people on the internet try to run selftests they would run into issues. Indeed. I guess we can always add a warning when the kernel and selftests versions are different. I suppose the selftests are built using the same kernel version, then executed on older versions: we could then compare the kernel versions at build time and run time, no? Regarding the other questions from my previous email -- skipped tests (e.g. I think Netfilter tests are no longer validated), KVM, notifications -- do you know who at Linaro could eventually look at them? Cheers, Matt -- Sponsored by the NGI0 Core fund.
Re: LKFT CI: improving Networking selftests results when validating stable kernels
Hi Shuah, Greg, Thank you for your reply! On 13/11/2024 19:33, Shuah Khan wrote: > On 11/8/24 11:21, Matthieu Baerts wrote: >> Hello LKFT maintainers, CI operators, >> >> First, I would like to say thank you to the people behind the LKFT >> project for validating stable kernels (and more), and including some >> Network selftests in their tests suites. >> >> A lot of improvements around the networking kselftests have been done >> this year. At the last Netconf [1], we discussed how these tests were >> validated on stable kernels from CIs like the LKFT one, and we have some >> suggestions to improve the situation. >> >> >> KSelftests from the same version >> >> >> According to the doc [2], kselftests should support all previous kernel >> versions. The LKFT CI is then using the kselftests from the last stable >> release to validate all stable versions. Even if there are good reasons >> to do that, we would like to ask for an opt-out for this policy for the >> networking tests: this is hard to maintain with the increased >> complexity, hard to validate on all stable kernels before applying >> patches, and hard to put in place in some situations. As a result, many >> tests are failing on older kernels, and it looks like it is a lot of >> work to support older kernels, and to maintain this. >> > > This is from the Documentation/dev-tools/kselftest.rst: > > Kselftest from mainline can be run on older stable kernels. Running tests > from mainline offers the best coverage. Several test rings run mainline > kselftest suite on stable releases. The reason is that when a new test > gets added to test existing code to regression test a bug, we should be > able to run that test on an older kernel. Hence, it is important to keep > code that can still test an older kernel and make sure it skips the test > gracefully on newer releases. > > > As it states, running tests from mainline increases the coverage when new > tests are added to regression test an existing kernel feature in a stable > release. > > It also says that when mainline tests are running on an older kernel, the > test should detect missing features and report skips. > > The above paragraph addresses test developers and users. I would say the > policy regarding the test development will not change. We want to keep > it the same, continuing to take measures to skip tests when a feature > isn't supported in the kernel the tests are running on. This addresses > not just a kernel and test revision mismatch, but also when a feature > isn't enabled when kernel and test revisions match. > > This policy helps us find bugs in the tests failing when they should > skip. If test rings move to a new policy, our ability to find bugs > like this goes down. > > As per users and test ring maintainers, they need to be aware of the > reduced coverage if they revision match kernel and tests. > Revision matching example: 6.11.8 tests on 6.11.8 stable > > Greg KH and other stable maintainers can weigh in on whether they would > like LKFT to go from running mainline tests on stable releases to > revision matching. I appreciate these explanations. When we discussed this subject at Netconf, we looked at the documentation, and we understood the advantages of running newer kselftests on older kernels. But the issue we have is to "detect missing features and report skips": that's hard to maintain, because it increases the code complexity, and it is hard to validate before applying patches. One of the reasons is that many networking selftests are validating internal behaviours that are not exposed to the userspace. That makes it hard to detect what behaviour to expect, and checking the kernel version doesn't seem to be the right thing to do here. Or does it mean that these essential tests should not validate the internal behaviours, e.g. checking that the packets sent on the wire are formatted correctly? A compromise could be to mark the tests checking internal behaviours, and warn testers that they should be executed on the same version. Or even run all the tests twice: once with the kselftests from the same version, and once using the kselftests from the latest stable version. WDYT? The main problem we saw when using kselftests from a newer version is that the code coverage of many 'net' tests might even decrease over time. In this subsystem, it is common to have "big" selftests running many subtests. When a new feature is added, a new subtest might be added in an existing selftest. When one subtest fails -- e.g. because the test is not skipped on older kernels -- the whole selftest is marked as failed. In a situation
LKFT CI: improving Networking selftests results when validating stable kernels
Hello LKFT maintainers, CI operators, First, I would like to say thank you to the people behind the LKFT project for validating stable kernels (and more), and including some Network selftests in their tests suites. A lot of improvements around the networking kselftests have been done this year. At the last Netconf [1], we discussed how these tests were validated on stable kernels from CIs like the LKFT one, and we have some suggestions to improve the situation. KSelftests from the same version According to the doc [2], kselftests should support all previous kernel versions. The LKFT CI is then using the kselftests from the last stable release to validate all stable versions. Even if there are good reasons to do that, we would like to ask for an opt-out for this policy for the networking tests: this is hard to maintain with the increased complexity, hard to validate on all stable kernels before applying patches, and hard to put in place in some situations. As a result, many tests are failing on older kernels, and it looks like it is a lot of work to support older kernels, and to maintain this. Many networking tests are validating the internal behaviour that is not exposed to the userspace. A typical example: some tests look at the raw packets being exchanged during a test, and this behaviour can change without modifying how the userspace is interacting with the kernel. The kernel could expose capabilities, but that's not something that seems natural to put in place for internal behaviours that are not exposed to end users. Maybe workarounds could be used, e.g. looking at kernel symbols, etc. Nut that doesn't always work, increase the complexity, and often "false positive" issue will be noticed only after a patch hits stable, and will cause a bunch of tests to be ignored. Regarding fixes, ideally they will come with a new or modified test that can also be backported. So the coverage can continue to grow in stable versions too. Do you think that from the kernel v6.12 (or before?), the LKFT CI could run the networking kselftests from the version that is being validated, and not from a newer one? So validating the selftests from v6.12.1 on a v6.12.1, and not the ones from a future v6.16.y on a v6.12.42. Skipped tests - It looks like many tests are skipped: - Some have been in a skip file [3] for a while: maybe they can be removed? - Some are skipped because of missing tools: maybe they can be added? e.g. iputils, tshark, ipv6toolkit, etc. - Some tests are in 'net', but in subdirectories, and hence not tested, e.g. forwarding, packetdrill, netfilter, tcp_ao. Could they be tested too? How can we change this to increase the code coverage using existing tests? KVM --- It looks like different VMs are being used to execute the different tests. Do these VMs benefit from any accelerations like KVM? If not, some tests might fail because the environment is too slow. The KSFT_MACHINE_SLOW=yes env var can be set to increase some tolerances, timeout or to skip some parts, but that might not be enough for some tests. Notifications - In case of new regressions, who is being notified? Are the people from the MAINTAINERS file, and linked to the corresponding selftests being notified or do they need to do the monitoring on their side? Looking forward to improving the networking selftests results when validating stable kernels! [1] https://netdev.bots.linux.dev/netconf/2024/ [2] https://docs.kernel.org/dev-tools/kselftest.html [3] https://github.com/Linaro/test-definitions/blob/master/automated/linux/kselftest/skipfile-lkft.yaml Cheers, Matt -- Sponsored by the NGI0 Core fund.
[PATCH bpf-next/net 0/5] bpf: Add mptcp_subflow bpf_iter support
Here is a series from Geliang, adding mptcp_subflow bpf_iter support. We are working on extending MPTCP with BPF, e.g. to control the path manager -- in charge of the creation, deletion, and announcements of subflows (paths) -- and the packet scheduler -- in charge of selecting which available path the next data will be sent to. These extensions need to iterate over the list of subflows attached to an MPTCP connection, and do some specific actions via some new kfunc that will be added later on. This preparation work is split in different patches: - Patch 1: register some "basic" MPTCP kfunc. - Patch 2: add mptcp_subflow bpf_iter support. Note that previous versions of this single patch have already been shared to the BPF mailing list. The changelog has been kept with a comment, but the version number has been reset to avoid confusions. - Patch 3: add kfunc to make sure the msk is valid - Patch 4: add more MPTCP endpoints in the selftests, in order to create more than 2 subflows. - Patch 5: add a very simple test validating mptcp_subflow bpf_iter support. This test could be written without the new bpf_iter, but it is there only to make sure this specific feature works as expected. Signed-off-by: Matthieu Baerts (NGI0) --- Geliang Tang (5): bpf: Register mptcp common kfunc set bpf: Add mptcp_subflow bpf_iter bpf: Acquire and release mptcp socket selftests/bpf: More endpoints for endpoint_init selftests/bpf: Add mptcp_subflow bpf_iter subtest net/mptcp/bpf.c| 104 - tools/testing/selftests/bpf/bpf_experimental.h | 8 ++ tools/testing/selftests/bpf/prog_tests/mptcp.c | 129 - tools/testing/selftests/bpf/progs/mptcp_bpf.h | 10 ++ .../testing/selftests/bpf/progs/mptcp_bpf_iters.c | 64 ++ 5 files changed, 308 insertions(+), 7 deletions(-) --- base-commit: 141b4d6a8049cecdc8124f87e044b83a9e80730d change-id: 20241108-bpf-next-net-mptcp-bpf_iter-subflows-027f6d87770e Best regards, -- Matthieu Baerts (NGI0)
[PATCH bpf-next/net 5/5] selftests/bpf: Add mptcp_subflow bpf_iter subtest
From: Geliang Tang This patch adds a "cgroup/getsockopt" program "iters_subflow" to test the newly added mptcp_subflow bpf_iter. Export mptcp_subflow helpers bpf_iter_mptcp_subflow_new/_next/_destroy, bpf_mptcp_sock_acquire/_release and other helpers into bpf_experimental.h. Use bpf_mptcp_sock_acquire() to acquire the msk, then use bpf_for_each() to walk the subflow list of this msk. From there, future MPTCP-specific kfunc can be called in the loop. Because they are not there yet, this test doesn't do anything very useful for the moment, but it focuses on validating the 'bpf_iter' part and the basic MPTCP kfunc. That's why it simply adds all subflow ids to local variable local_ids to make sure all subflows have been seen, then invoke the kfunc bpf_mptcp_subflow_tcp_sock() in the loop to pick a subsocket. Out of the loop, use bpf_mptcp_subflow_ctx() to get the subflow context of the picked subsocket and do some verifications. Finally, assign local_ids to global variable ids so that the application can obtain this value, and release the msk. A related subtest called test_iters_subflow is added to load and verify the newly added mptcp_subflow type bpf_iter example in test_mptcp. The endpoint_init() helper is used to add 3 new subflow endpoints. Then one byte of message is sent to trigger the creation of new subflows. getsockopt() is invoked once the subflows have been created to trigger the "cgroup/getsockopt" test program "iters_subflow". skel->bss->ids is then checked to make sure it equals 10, the sum of each subflow ID: we should have 4 subflows: 1 + 2 + 3 + 4 = 10. If that's the case, the bpf_iter loop did the job as expected. Signed-off-by: Geliang Tang Reviewed-by: Matthieu Baerts (NGI0) Signed-off-by: Matthieu Baerts (NGI0) --- tools/testing/selftests/bpf/bpf_experimental.h | 8 +++ tools/testing/selftests/bpf/prog_tests/mptcp.c | 73 ++ tools/testing/selftests/bpf/progs/mptcp_bpf.h | 10 +++ .../testing/selftests/bpf/progs/mptcp_bpf_iters.c | 64 +++ 4 files changed, 155 insertions(+) diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h index b0668f29f7b394eb5294b6c9cade28fc1b17265a..08eaa431aafd758117f8e818410c4a3e7fd0b70c 100644 --- a/tools/testing/selftests/bpf/bpf_experimental.h +++ b/tools/testing/selftests/bpf/bpf_experimental.h @@ -575,6 +575,14 @@ extern int bpf_iter_css_new(struct bpf_iter_css *it, extern struct cgroup_subsys_state *bpf_iter_css_next(struct bpf_iter_css *it) __weak __ksym; extern void bpf_iter_css_destroy(struct bpf_iter_css *it) __weak __ksym; +struct bpf_iter_mptcp_subflow; +extern int bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it, + struct mptcp_sock *msk) __weak __ksym; +extern struct mptcp_subflow_context * +bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it) __weak __ksym; +extern void +bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it) __weak __ksym; + extern int bpf_wq_init(struct bpf_wq *wq, void *p__map, unsigned int flags) __weak __ksym; extern int bpf_wq_start(struct bpf_wq *wq, unsigned int flags) __weak __ksym; extern int bpf_wq_set_callback_impl(struct bpf_wq *wq, diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c index 85f3d4119802a85c86cde7b74a0b857252bad8b8..f37574b5ef68d8f32f8002df317869dfdf1d4b2d 100644 --- a/tools/testing/selftests/bpf/prog_tests/mptcp.c +++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c @@ -11,6 +11,7 @@ #include "mptcp_sock.skel.h" #include "mptcpify.skel.h" #include "mptcp_subflow.skel.h" +#include "mptcp_bpf_iters.skel.h" #define NS_TEST "mptcp_ns" #define ADDR_1 "10.0.1.1" @@ -33,6 +34,9 @@ #ifndef MPTCP_INFO #define MPTCP_INFO 1 #endif +#ifndef TCP_IS_MPTCP +#define TCP_IS_MPTCP 43 /* Is MPTCP being used? */ +#endif #ifndef MPTCP_INFO_FLAG_FALLBACK #define MPTCP_INFO_FLAG_FALLBACK _BITUL(0) #endif @@ -480,6 +484,73 @@ static void test_subflow(void) close(cgroup_fd); } +static void run_iters_subflow(void) +{ + int server_fd, client_fd; + int is_mptcp, err; + socklen_t len; + + server_fd = start_mptcp_server(AF_INET, ADDR_1, PORT_1, 0); + if (!ASSERT_OK_FD(server_fd, "start_mptcp_server")) + return; + + client_fd = connect_to_fd(server_fd, 0); + if (!ASSERT_OK_FD(client_fd, "connect_to_fd")) + goto close_server; + + send_byte(client_fd); + wait_for_new_subflows(client_fd); + + len = sizeof(is_mptcp); + /* mainly to trigger the BPF program */ + err = getsockopt(client_fd, SOL_TCP, TCP_IS_MPTCP, &is_mptcp, &len); + if (ASSERT_OK(err,
[PATCH bpf-next/net 4/5] selftests/bpf: More endpoints for endpoint_init
From: Geliang Tang This patch changes ADDR_2 from "10.0.1.2" to "10.0.2.1", and adds two more IPv4 test addresses ADDR_3 - ADDR_4, four IPv6 addresses ADDR6_1 - ADDR6_4. Add a new helper address_init() to initialize all these addresses. Add a new parameter "endpoints" for endpoint_init() to control how many endpoints are used for the tests. This makes it more flexible. Update the parameters of endpoint_init() in test_subflow(). Signed-off-by: Geliang Tang Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- tools/testing/selftests/bpf/prog_tests/mptcp.c | 56 +++--- 1 file changed, 50 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c index f8eb7f9d4fd20bbb7ee018728f7ae0f0a09d4d30..85f3d4119802a85c86cde7b74a0b857252bad8b8 100644 --- a/tools/testing/selftests/bpf/prog_tests/mptcp.c +++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c @@ -14,7 +14,13 @@ #define NS_TEST "mptcp_ns" #define ADDR_1 "10.0.1.1" -#define ADDR_2 "10.0.1.2" +#define ADDR_2 "10.0.2.1" +#define ADDR_3 "10.0.3.1" +#define ADDR_4 "10.0.4.1" +#define ADDR6_1"dead:beef:1::1" +#define ADDR6_2"dead:beef:2::1" +#define ADDR6_3"dead:beef:3::1" +#define ADDR6_4"dead:beef:4::1" #define PORT_1 10001 #ifndef IPPROTO_MPTCP @@ -322,22 +328,60 @@ static void test_mptcpify(void) close(cgroup_fd); } -static int endpoint_init(char *flags) +static int address_init(void) { SYS(fail, "ip -net %s link add veth1 type veth peer name veth2", NS_TEST); SYS(fail, "ip -net %s addr add %s/24 dev veth1", NS_TEST, ADDR_1); + SYS(fail, "ip -net %s addr add %s/64 dev veth1 nodad", NS_TEST, ADDR6_1); SYS(fail, "ip -net %s link set dev veth1 up", NS_TEST); SYS(fail, "ip -net %s addr add %s/24 dev veth2", NS_TEST, ADDR_2); + SYS(fail, "ip -net %s addr add %s/64 dev veth2 nodad", NS_TEST, ADDR6_2); SYS(fail, "ip -net %s link set dev veth2 up", NS_TEST); - if (SYS_NOFAIL("ip -net %s mptcp endpoint add %s %s", NS_TEST, ADDR_2, flags)) { + + SYS(fail, "ip -net %s link add veth3 type veth peer name veth4", NS_TEST); + SYS(fail, "ip -net %s addr add %s/24 dev veth3", NS_TEST, ADDR_3); + SYS(fail, "ip -net %s addr add %s/64 dev veth3 nodad", NS_TEST, ADDR6_3); + SYS(fail, "ip -net %s link set dev veth3 up", NS_TEST); + SYS(fail, "ip -net %s addr add %s/24 dev veth4", NS_TEST, ADDR_4); + SYS(fail, "ip -net %s addr add %s/64 dev veth4 nodad", NS_TEST, ADDR6_4); + SYS(fail, "ip -net %s link set dev veth4 up", NS_TEST); + + return 0; +fail: + return -1; +} + +static int endpoint_add(char *addr, char *flags) +{ + return SYS_NOFAIL("ip -net %s mptcp endpoint add %s %s", NS_TEST, addr, flags); +} + +static int endpoint_init(char *flags, u8 endpoints) +{ + int ret = -1; + + if (!endpoints || endpoints > 4) + goto fail; + + if (address_init()) + goto fail; + + if (SYS_NOFAIL("ip -net %s mptcp limits set add_addr_accepted 4 subflows 4", + NS_TEST)) { printf("'ip mptcp' not supported, skip this test.\n"); test__skip(); goto fail; } - return 0; + if (endpoints > 1) + ret = endpoint_add(ADDR_2, flags); + if (endpoints > 2) + ret = ret ?: endpoint_add(ADDR_3, flags); + if (endpoints > 3) + ret = ret ?: endpoint_add(ADDR_4, flags); + fail: - return -1; + return ret; } static void wait_for_new_subflows(int fd) @@ -423,7 +467,7 @@ static void test_subflow(void) if (!ASSERT_OK_PTR(netns, "netns_new: mptcp_subflow")) goto skel_destroy; - if (endpoint_init("subflow") < 0) + if (endpoint_init("subflow", 2) < 0) goto close_netns; run_subflow(); -- 2.45.2
[PATCH bpf-next/net 3/5] bpf: Acquire and release mptcp socket
From: Geliang Tang The KF_TRUSTED_ARGS flag is used for bpf_iter_mptcp_subflow_new, it indicates that the all pointer arguments are valid. It's necessary to add a KF_ACQUIRE helper to get valid "msk". This patch adds bpf_mptcp_sock_acquire() and bpf_mptcp_sock_release() helpers for this. Increase sk->sk_refcnt in _acquire() and decrease it in _release(). Register them with KF_ACQUIRE flag and KF_RELEASE flag. Signed-off-by: Geliang Tang Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- net/mptcp/bpf.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c index d107c2865e97e6ccffb9e0720dfbbd232b63a3b8..5bd04548e846b4dc120dbc83725a604821fac772 100644 --- a/net/mptcp/bpf.c +++ b/net/mptcp/bpf.c @@ -90,6 +90,23 @@ bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it) { } +__bpf_kfunc static struct +mptcp_sock *bpf_mptcp_sock_acquire(struct mptcp_sock *msk) +{ + struct sock *sk = (struct sock *)msk; + + if (sk && refcount_inc_not_zero(&sk->sk_refcnt)) + return msk; + return NULL; +} + +__bpf_kfunc static void bpf_mptcp_sock_release(struct mptcp_sock *msk) +{ + struct sock *sk = (struct sock *)msk; + + WARN_ON_ONCE(!sk || !refcount_dec_not_one(&sk->sk_refcnt)); +} + __bpf_kfunc_end_defs(); BTF_KFUNCS_START(bpf_mptcp_common_kfunc_ids) @@ -99,6 +116,8 @@ BTF_ID_FLAGS(func, bpf_mptcp_subflow_tcp_sock) BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW | KF_TRUSTED_ARGS) BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next, KF_ITER_NEXT | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy, KF_ITER_DESTROY) +BTF_ID_FLAGS(func, bpf_mptcp_sock_acquire, KF_ACQUIRE | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_mptcp_sock_release, KF_RELEASE) BTF_KFUNCS_END(bpf_mptcp_common_kfunc_ids) static const struct btf_kfunc_id_set bpf_mptcp_common_kfunc_set = { -- 2.45.2
[PATCH bpf-next/net 2/5] bpf: Add mptcp_subflow bpf_iter
From: Geliang Tang It's necessary to traverse all subflows on the conn_list of an MPTCP socket and then call kfunc to modify the fields of each subflow. In kernel space, mptcp_for_each_subflow() helper is used for this: mptcp_for_each_subflow(msk, subflow) kfunc(subflow); But in the MPTCP BPF program, this has not yet been implemented. As Martin suggested recently, this conn_list walking + modify-by-kfunc usage fits the bpf_iter use case. So this patch adds a new bpf_iter type named "mptcp_subflow" to do this and implements its helpers bpf_iter_mptcp_subflow_new()/_next()/ _destroy(). And register these bpf_iter mptcp_subflow into mptcp common kfunc set. Then bpf_for_each() for mptcp_subflow can be used in BPF program like this: bpf_for_each(mptcp_subflow, subflow, msk) kfunc(subflow); Suggested-by: Martin KaFai Lau Signed-off-by: Geliang Tang Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- Notes: A few versions of this single patch have been previously posted to the BPF mailing list by Geliang, before continuing to the MPTCP mailing list only, with other patches of this series. The version of the whole series has been reset to 1, but here is the ChangeLog for this patch here: - v2: remove msk->pm.lock in _new() and _destroy() (Martin) drop DEFINE_BPF_ITER_FUNC, change opaque[3] to opaque[2] (Andrii) - v3: drop bpf_iter__mptcp_subflow - v4: if msk is NULL, initialize kit->msk to NULL in _new() and check it in _next() (Andrii) - v5: use list_is_last() instead of list_entry_is_head() add KF_ITER_NEW/NEXT/DESTROY flags add msk_owned_by_me in _new() - v6: add KF_TRUSTED_ARGS flag (Andrii, Martin) --- net/mptcp/bpf.c | 45 + 1 file changed, 45 insertions(+) diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c index 6f96a5927fd371f8ea92cbf96c875edef9272b98..d107c2865e97e6ccffb9e0720dfbbd232b63a3b8 100644 --- a/net/mptcp/bpf.c +++ b/net/mptcp/bpf.c @@ -29,6 +29,15 @@ static const struct btf_kfunc_id_set bpf_mptcp_fmodret_set = { .set = &bpf_mptcp_fmodret_ids, }; +struct bpf_iter_mptcp_subflow { + __u64 __opaque[2]; +} __aligned(8); + +struct bpf_iter_mptcp_subflow_kern { + struct mptcp_sock *msk; + struct list_head *pos; +} __aligned(8); + __bpf_kfunc_start_defs(); __bpf_kfunc static struct mptcp_sock *bpf_mptcp_sk(struct sock *sk) @@ -48,12 +57,48 @@ bpf_mptcp_subflow_tcp_sock(const struct mptcp_subflow_context *subflow) return mptcp_subflow_tcp_sock(subflow); } +__bpf_kfunc static int +bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it, + struct mptcp_sock *msk) +{ + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it; + + kit->msk = msk; + if (!msk) + return -EINVAL; + + msk_owned_by_me(msk); + + kit->pos = &msk->conn_list; + return 0; +} + +__bpf_kfunc static struct mptcp_subflow_context * +bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it) +{ + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it; + + if (!kit->msk || list_is_last(kit->pos, &kit->msk->conn_list)) + return NULL; + + kit->pos = kit->pos->next; + return list_entry(kit->pos, struct mptcp_subflow_context, node); +} + +__bpf_kfunc static void +bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it) +{ +} + __bpf_kfunc_end_defs(); BTF_KFUNCS_START(bpf_mptcp_common_kfunc_ids) BTF_ID_FLAGS(func, bpf_mptcp_sk) BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx) BTF_ID_FLAGS(func, bpf_mptcp_subflow_tcp_sock) +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW | KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next, KF_ITER_NEXT | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy, KF_ITER_DESTROY) BTF_KFUNCS_END(bpf_mptcp_common_kfunc_ids) static const struct btf_kfunc_id_set bpf_mptcp_common_kfunc_set = { -- 2.45.2
[PATCH bpf-next/net 1/5] bpf: Register mptcp common kfunc set
From: Geliang Tang MPTCP helper mptcp_sk() is used to convert struct sock to mptcp_sock. Helpers mptcp_subflow_ctx() and mptcp_subflow_tcp_sock() are used to convert between struct mptcp_subflow_context and sock. They all will be used in MPTCP BPF programs too. This patch defines corresponding wrappers of them, and put the wrappers into mptcp common kfunc set and register the set with the flag BPF_PROG_TYPE_UNSPEC to let them accessible to all types of BPF programs. Signed-off-by: Geliang Tang Reviewed-by: Mat Martineau Reviewed-by: Matthieu Baerts (NGI0) Signed-off-by: Matthieu Baerts (NGI0) --- net/mptcp/bpf.c | 40 +++- 1 file changed, 39 insertions(+), 1 deletion(-) diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c index 8a16672b94e2384f5263e1432296cbca1236bb30..6f96a5927fd371f8ea92cbf96c875edef9272b98 100644 --- a/net/mptcp/bpf.c +++ b/net/mptcp/bpf.c @@ -29,8 +29,46 @@ static const struct btf_kfunc_id_set bpf_mptcp_fmodret_set = { .set = &bpf_mptcp_fmodret_ids, }; +__bpf_kfunc_start_defs(); + +__bpf_kfunc static struct mptcp_sock *bpf_mptcp_sk(struct sock *sk) +{ + return mptcp_sk(sk); +} + +__bpf_kfunc static struct mptcp_subflow_context * +bpf_mptcp_subflow_ctx(const struct sock *sk) +{ + return mptcp_subflow_ctx(sk); +} + +__bpf_kfunc static struct sock * +bpf_mptcp_subflow_tcp_sock(const struct mptcp_subflow_context *subflow) +{ + return mptcp_subflow_tcp_sock(subflow); +} + +__bpf_kfunc_end_defs(); + +BTF_KFUNCS_START(bpf_mptcp_common_kfunc_ids) +BTF_ID_FLAGS(func, bpf_mptcp_sk) +BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx) +BTF_ID_FLAGS(func, bpf_mptcp_subflow_tcp_sock) +BTF_KFUNCS_END(bpf_mptcp_common_kfunc_ids) + +static const struct btf_kfunc_id_set bpf_mptcp_common_kfunc_set = { + .owner = THIS_MODULE, + .set= &bpf_mptcp_common_kfunc_ids, +}; + static int __init bpf_mptcp_kfunc_init(void) { - return register_btf_fmodret_id_set(&bpf_mptcp_fmodret_set); + int ret; + + ret = register_btf_fmodret_id_set(&bpf_mptcp_fmodret_set); + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_UNSPEC, + &bpf_mptcp_common_kfunc_set); + + return ret; } late_initcall(bpf_mptcp_kfunc_init); -- 2.45.2
[PATCH net-next] selftests: net: include lib/sh/*.sh with lib.sh
Recently, the net/lib.sh file has been modified to include defer.sh from net/lib/sh/ directory. The Makefile from net/lib has been modified accordingly, but not the ones from the sub-targets using net/lib.sh. Because of that, the new file is not installed as expected when installing the Forwarding, MPTCP, and Netfilter targets, e.g. # make -C tools/testing/selftests TARGETS=net/mptcp install \ INSTALL_PATH=/tmp/kself # cd /tmp/kself/ # ./run_kselftest.sh -c net/mptcp TAP version 13 1..7 # timeout set to 1800 # selftests: net/mptcp: mptcp_connect.sh # ./../lib.sh: line 5: /tmp/kself/net/lib/sh/defer.sh: No such file or directory # (...) This can be fixed simply by adding all the .sh files from net/lib/sh directory to the TEST_INCLUDES variable in the different Makefile's. Fixes: a6e263f125cd ("selftests: net: lib: Introduce deferred commands") Signed-off-by: Matthieu Baerts (NGI0) --- tools/testing/selftests/net/forwarding/Makefile | 3 ++- tools/testing/selftests/net/mptcp/Makefile | 2 +- tools/testing/selftests/net/netfilter/Makefile | 3 ++- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/net/forwarding/Makefile b/tools/testing/selftests/net/forwarding/Makefile index 224346426ef2220470bf2dbef66198eae9331f56..7d885cff8d79bc6882d2341d0a1a59891fc1cb2e 100644 --- a/tools/testing/selftests/net/forwarding/Makefile +++ b/tools/testing/selftests/net/forwarding/Makefile @@ -126,6 +126,7 @@ TEST_FILES := devlink_lib.sh \ tc_common.sh TEST_INCLUDES := \ - ../lib.sh + ../lib.sh \ + $(wildcard ../lib/sh/*.sh) include ../../lib.mk diff --git a/tools/testing/selftests/net/mptcp/Makefile b/tools/testing/selftests/net/mptcp/Makefile index 5d796622e73099d0a0331c1bc8a41f09e1d36b4d..8e3fc05a539797c5e0feab72be69db623ef3fa96 100644 --- a/tools/testing/selftests/net/mptcp/Makefile +++ b/tools/testing/selftests/net/mptcp/Makefile @@ -11,7 +11,7 @@ TEST_GEN_FILES = mptcp_connect pm_nl_ctl mptcp_sockopt mptcp_inq TEST_FILES := mptcp_lib.sh settings -TEST_INCLUDES := ../lib.sh ../net_helper.sh +TEST_INCLUDES := ../lib.sh $(wildcard ../lib/sh/*.sh) ../net_helper.sh EXTRA_CLEAN := *.pcap diff --git a/tools/testing/selftests/net/netfilter/Makefile b/tools/testing/selftests/net/netfilter/Makefile index 542f7886a0bc2ac016f41d8b70357a8e0c1d271b..9d009f74cfc2da66428b1e23d5884e2c3bc4a85d 100644 --- a/tools/testing/selftests/net/netfilter/Makefile +++ b/tools/testing/selftests/net/netfilter/Makefile @@ -55,4 +55,5 @@ TEST_FILES := lib.sh TEST_FILES += packetdrill TEST_INCLUDES := \ - ../lib.sh + ../lib.sh \ + $(wildcard ../lib/sh/*.sh) --- base-commit: ecf99864ea6b1843773589a935bb026951bf12dd change-id: 20241104-net-next-selftests-lib-sh-deps-cc359ca5602f Best regards, -- Matthieu Baerts (NGI0)
Re: [PATCH bpf-next] selftests/bpf: Fix compile error when MPTCP not support
Hi Tao, BPF maintainers, On 30/10/2024 12:12, Tao Chen wrote: > 在 2024/10/30 18:49, Matthieu Baerts 写道: >> Hi Tao Chen, >> >> Thank you for having shared this patch. >> >> On 30/10/2024 11:01, Tao Chen wrote: >>> Fix compile error when MPTCP feature not support, though eBPF core check >>> already done which seems invalid in this situation, the error info like: >>> progs/mptcp_sock.c:49:40: error: no member named 'is_mptcp' in 'struct >>> tcp_sock' >>> 49 | is_mptcp = bpf_core_field_exists(tsk->is_mptcp) ? >>> >>> The filed created in new definitions with eBPF core feature to solve >>> this build problem, and test case result still ok in MPTCP kernel. >>> >>> 176/1 mptcp/base:OK >>> 176/2 mptcp/mptcpify:OK >>> 176 mptcp:OK >>> Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED >>> >>> Fixes: 8039d353217c ("selftests/bpf: Add MPTCP test base") >> >> The commit you mentioned here is more than 2 years old, and as far as I >> can see, nobody else reported this compilation issue. I guess that's >> because people used tools/testing/selftests/bpf/config file as expected >> to populate the kernel config, and I suppose you didn't, right? >> > > Hi Matt, thank you for your reply, as you said, i did not use tools/ > testing/selftests/bpf/config to compile kernel, i will use this helpful > feature. > >> I don't think other BPF selftests check for missing kernel config if >> they are specified in the 'config' file, but even if it is the case, I >> think it would be better to skip all the MPTCP tests, and not try to >> have them checking something that doesn't exist: no need to validate >> these tests if the expected kernel config has not been enabled. >> > > If i use the kernel not support MPTCP, the compile error still exists, > and i can not build the bpf test successfully. Maybe skill the test case > seems better when kernel not support. Now that bpf_core_field_exists > check already used in the code, i think it is better to use new > definition mode. I understand it would be better, but it means more code to maintain to handle that (and remembering that in future test cases). If that's not necessary, then no need to do the effort. @BPF maintainers: do we need to support kernels not respecting the tools/testing/selftests/bpf/config file? Should we detect when a required kernel config is not set and skip some tests? >> But again, please correct me if I'm wrong, but I don't think there is >> anything to change here to fix your compilation issue: simply make sure >> to use this tools/testing/selftests/bpf/config file to generate your >> kernel config, no? Cheers, Matt -- Sponsored by the NGI0 Core fund.
Re: [PATCH bpf-next] selftests/bpf: Fix compile error when MPTCP not support
Hi Tao Chen, Thank you for having shared this patch. On 30/10/2024 11:01, Tao Chen wrote: > Fix compile error when MPTCP feature not support, though eBPF core check > already done which seems invalid in this situation, the error info like: > progs/mptcp_sock.c:49:40: error: no member named 'is_mptcp' in 'struct > tcp_sock' >49 | is_mptcp = bpf_core_field_exists(tsk->is_mptcp) ? > > The filed created in new definitions with eBPF core feature to solve > this build problem, and test case result still ok in MPTCP kernel. > > 176/1 mptcp/base:OK > 176/2 mptcp/mptcpify:OK > 176 mptcp:OK > Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED > > Fixes: 8039d353217c ("selftests/bpf: Add MPTCP test base") The commit you mentioned here is more than 2 years old, and as far as I can see, nobody else reported this compilation issue. I guess that's because people used tools/testing/selftests/bpf/config file as expected to populate the kernel config, and I suppose you didn't, right? I don't think other BPF selftests check for missing kernel config if they are specified in the 'config' file, but even if it is the case, I think it would be better to skip all the MPTCP tests, and not try to have them checking something that doesn't exist: no need to validate these tests if the expected kernel config has not been enabled. But again, please correct me if I'm wrong, but I don't think there is anything to change here to fix your compilation issue: simply make sure to use this tools/testing/selftests/bpf/config file to generate your kernel config, no? Cheers, Matt -- Sponsored by the NGI0 Core fund.
Re: [PATCH rcu] configs/debug: make sure PROVE_RCU_LIST=y takes effect
Hi Jakub, On 28/10/2024 18:22, Jakub Kicinski wrote: > On Wed, 16 Oct 2024 17:18:47 +0200 Matthieu Baerts wrote: >> https://patchwork.kernel.org/project/netdevbpf/patch/20241016011144.3058445-1-k...@kernel.org/ >> >> If the impact is important, it might be better to target linux-next >> first, no? > > Thanks for testing! I didn't anticipate it to be so effective. > > Looks like it's not in -next, yet, and we got an Ack from Paul > and co. so I'll toss it into net-next. Thanks! Please note that the 3 issues I mentioned have been fixed somewhere: - MPTCP: patches have been sent to the Netdev ML [1] - Netfilter: patches have been sent to the Netfilter ML [2] - perf (VM shutdown): patches have been applied in perf/urgent [3] [1] https://lore.kernel.org/20241021-net-mptcp-sched-lock-v1-1-637759cf0...@kernel.org [2] https://lore.kernel.org/20241025133230.22491-2...@strlen.de [3] https://lore.kernel.org/20240913162340.2142976-1-kan.li...@linux.intel.com Cheers, Matt -- Sponsored by the NGI0 Core fund.
Re: [PATCH net-next 1/2] net: netconsole: selftests: Change the IP subnet
Hi Breno, On 28/10/2024 16:48, Breno Leitao wrote: > Use a less populated IP range to run the tests, as suggested by Petr in > Link: https://lore.kernel.org/netdev/87ikvukv3s@nvidia.com/. It looks like this is the same version as the one you sent on Friday, without the modification suggested by Petr: https://lore.kernel.org/20241025161415.238215-1-lei...@debian.org I supposed these new patches have been sent by accident, right? (BTW: it is often better to include a cover letter when there is more than one patch: some CIs might not take patches sent without it.) Cheers, Matt -- Sponsored by the NGI0 Core fund.
Re: [PATCH net 2/3] mptcp: remove unneeded lock when listing scheds
Hi Simon, Thank you for the reviews! On 23/10/2024 14:21, Simon Horman wrote: > On Mon, Oct 21, 2024 at 12:25:27PM +0200, Matthieu Baerts (NGI0) wrote: >> mptcp_get_available_schedulers() needs to iterate over the schedulers' >> list only to read the names: it doesn't modify anything there. >> >> In this case, it is enough to hold the RCU read lock, no need to combine >> this with the associated spin lock. >> >> Fixes: 73c900aa3660 ("mptcp: add net.mptcp.available_schedulers") >> Cc: sta...@vger.kernel.org >> Suggested-by: Paolo Abeni >> Reviewed-by: Geliang Tang >> Signed-off-by: Matthieu Baerts (NGI0) > > I do wonder if it would be more appropriate to route this via net-next > (without a fixes tag) rather than via net. But either way this looks good > to me. Good point. On one hand, I marked it as a fix, because when working on the patch 1/3, we noticed these spin_(un)lock() were not supposed to be there in the first place. On the other hand, even it's fixing a small performance issue, it is not fixing a regression. I think it is easier to route this via -net, but I'm fine if it is applied in net-next. Cheers, Matt -- Sponsored by the NGI0 Core fund.
[PATCH net 3/3] selftests: mptcp: list sysctl data
Listing all the values linked to the MPTCP sysctl knobs was not exercised in MPTCP test suite. Let's do that to avoid any regressions, but also to have a kernel with a debug kconfig verifying more assumptions. For the moment, we are not interested by the output, only to avoid crashes and warnings. Signed-off-by: Matthieu Baerts (NGI0) --- tools/testing/selftests/net/mptcp/mptcp_connect.sh | 9 + 1 file changed, 9 insertions(+) diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh b/tools/testing/selftests/net/mptcp/mptcp_connect.sh index 57325d57e4c6e3653019db2de09620d692143683..b48b4e56826a9cfdb3501242b707ae2ebe29b220 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh @@ -259,6 +259,15 @@ check_mptcp_disabled() mptcp_lib_ns_init disabled_ns print_larger_title "New MPTCP socket can be blocked via sysctl" + + # mainly to cover more code + if ! ip netns exec ${disabled_ns} sysctl net.mptcp >/dev/null; then + mptcp_lib_pr_fail "not able to list net.mptcp sysctl knobs" + mptcp_lib_result_fail "not able to list net.mptcp sysctl knobs" + ret=${KSFT_FAIL} + return 1 + fi + # net.mptcp.enabled should be enabled by default if [ "$(ip netns exec ${disabled_ns} sysctl net.mptcp.enabled | awk '{ print $3 }')" -ne 1 ]; then mptcp_lib_pr_fail "net.mptcp.enabled sysctl is not 1 by default" -- 2.45.2
[PATCH net 2/3] mptcp: remove unneeded lock when listing scheds
mptcp_get_available_schedulers() needs to iterate over the schedulers' list only to read the names: it doesn't modify anything there. In this case, it is enough to hold the RCU read lock, no need to combine this with the associated spin lock. Fixes: 73c900aa3660 ("mptcp: add net.mptcp.available_schedulers") Cc: sta...@vger.kernel.org Suggested-by: Paolo Abeni Reviewed-by: Geliang Tang Signed-off-by: Matthieu Baerts (NGI0) --- net/mptcp/sched.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/net/mptcp/sched.c b/net/mptcp/sched.c index 78ed508ebc1b8dd9f0e020cca1bdd86f24f0afeb..df7dbcfa3b71370cc4d7e4e4f16cc1e41a50dddf 100644 --- a/net/mptcp/sched.c +++ b/net/mptcp/sched.c @@ -60,7 +60,6 @@ void mptcp_get_available_schedulers(char *buf, size_t maxlen) size_t offs = 0; rcu_read_lock(); - spin_lock(&mptcp_sched_list_lock); list_for_each_entry_rcu(sched, &mptcp_sched_list, list) { offs += snprintf(buf + offs, maxlen - offs, "%s%s", @@ -69,7 +68,6 @@ void mptcp_get_available_schedulers(char *buf, size_t maxlen) if (WARN_ON_ONCE(offs >= maxlen)) break; } - spin_unlock(&mptcp_sched_list_lock); rcu_read_unlock(); } -- 2.45.2
[PATCH net 1/3] mptcp: init: protect sched with rcu_read_lock
Enabling CONFIG_PROVE_RCU_LIST with its dependence CONFIG_RCU_EXPERT creates this splat when an MPTCP socket is created: = WARNING: suspicious RCU usage 6.12.0-rc2+ #11 Not tainted - net/mptcp/sched.c:44 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 no locks held by mptcp_connect/176. stack backtrace: CPU: 0 UID: 0 PID: 176 Comm: mptcp_connect Not tainted 6.12.0-rc2+ #11 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Call Trace: dump_stack_lvl (lib/dump_stack.c:123) lockdep_rcu_suspicious (kernel/locking/lockdep.c:6822) mptcp_sched_find (net/mptcp/sched.c:44 (discriminator 7)) mptcp_init_sock (net/mptcp/protocol.c:2867 (discriminator 1)) ? sock_init_data_uid (arch/x86/include/asm/atomic.h:28) inet_create.part.0.constprop.0 (net/ipv4/af_inet.c:386) ? __sock_create (include/linux/rcupdate.h:347 (discriminator 1)) __sock_create (net/socket.c:1576) __sys_socket (net/socket.c:1671) ? __pfx___sys_socket (net/socket.c:1712) ? do_user_addr_fault (arch/x86/mm/fault.c:1419 (discriminator 1)) __x64_sys_socket (net/socket.c:1728) do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1)) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) That's because when the socket is initialised, rcu_read_lock() is not used despite the explicit comment written above the declaration of mptcp_sched_find() in sched.c. Adding the missing lock/unlock avoids the warning. Fixes: 1730b2b2c5a5 ("mptcp: add sched in mptcp_sock") Cc: sta...@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/523 Reviewed-by: Geliang Tang Signed-off-by: Matthieu Baerts (NGI0) --- net/mptcp/protocol.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 6d0e201c3eb26aa6ca0ff27e5a65cb6f911012f2..d263091659e076587bc3406dfdcb4409adb3247e 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2864,8 +2864,10 @@ static int mptcp_init_sock(struct sock *sk) if (unlikely(!net->mib.mptcp_statistics) && !mptcp_mib_alloc(net)) return -ENOMEM; + rcu_read_lock(); ret = mptcp_init_sched(mptcp_sk(sk), mptcp_sched_find(mptcp_get_scheduler(net))); + rcu_read_unlock(); if (ret) return ret; -- 2.45.2
[PATCH net 0/3] mptcp: sched: fix some lock issues
Two small fixes related to the MPTCP packets scheduler: - Patch 1: add missing rcu_read_(un)lock(). A fix for >= 6.6. - Patch 2: remove unneeded lock when listing packets schedulers. A fix for >= 6.10. And some modifications in the MPTCP selftests: - Patch 3: a small addition to the MPTCP selftests to cover more code. Signed-off-by: Matthieu Baerts (NGI0) --- Matthieu Baerts (NGI0) (3): mptcp: init: protect sched with rcu_read_lock mptcp: remove unneeded lock when listing scheds selftests: mptcp: list sysctl data net/mptcp/protocol.c | 2 ++ net/mptcp/sched.c | 2 -- tools/testing/selftests/net/mptcp/mptcp_connect.sh | 9 + 3 files changed, 11 insertions(+), 2 deletions(-) --- base-commit: 3b05b9c36ddd01338e1352588f2ec1ea23f97d43 change-id: 20241021-net-mptcp-sched-lock-10dfc75d1e00 Best regards, -- Matthieu Baerts (NGI0)
Re: [PATCH rcu] configs/debug: make sure PROVE_RCU_LIST=y takes effect
On 16/10/2024 17:18, Matthieu Baerts wrote: > Hi Jakub, > > On 16/10/2024 03:11, Jakub Kicinski wrote: >> Commit 0aaa8977acbf ("configs: introduce debug.config for CI-like setup") >> added CONFIG_PROVE_RCU_LIST=y to the common CI config, >> but RCU_EXPERT is not set, and it's a dependency for >> CONFIG_PROVE_RCU_LIST=y. Make sure CIs take advantage >> of CONFIG_PROVE_RCU_LIST=y, recent fixes in networking >> indicate that it does catch bugs. > > Good catch! I confirm it fixes the issue: > > Acked-by: Matthieu Baerts (NGI0) > >> Signed-off-by: Jakub Kicinski >> --- >> I'd be slightly tempted to still send it to Linux for v6.12. > > Good idea, it sounds like a fix because I guess if it was on the list, > it was supposed to be used. But be careful it might detect quite a few > issues: I just enabled it on MPTCP tree, and it found issues. Sorry, I forgot to mention that it found 3 issues: one specific to MPTCP, one in Netfilter, but also one when shutting down the VM: > # /usr/lib/klibc/bin/poweroff > [ 2360.588763][T11825] > [ 2360.589006][T11825] = > [ 2360.589424][T11825] WARNING: suspicious RCU usage > [ 2360.589952][T11825] 6.12.0-rc2+ #1 Tainted: G N > [ 2360.590522][T11825] - > [ 2360.590896][T11825] kernel/events/core.c:13962 RCU-list traversed in > non-reader section!! > [ 2360.592341][T11825] > [ 2360.592341][T11825] other info that might help us debug this: > [ 2360.592341][T11825] > [ 2360.593343][T11825] > [ 2360.593343][T11825] rcu_scheduler_active = 2, debug_locks = 1 > [ 2360.593951][T11825] 3 locks held by poweroff/11825: > [2360.594481][T11825] #0: 89641e28 > (system_transition_mutex){+.+.}-{3:3}, at: __do_sys_reboot > (kernel/reboot.c:770) > [2360.594997][T11825] #1: 8963eab0 > ((reboot_notifier_list).rwsem){}-{3:3}, at: blocking_notifier_call_chain > (kernel/notifier.c:388) > [2360.595859][T11825] #2: 898d1a68 (pmus_lock){+.+.}-{3:3}, at: > perf_event_exit_cpu_context (kernel/events/core.c:13983) > [ 2360.596645][T11825] > [ 2360.596645][T11825] stack backtrace: > [ 2360.597476][T11825] CPU: 3 UID: 0 PID: 11825 Comm: poweroff Tainted: G > N 6.12.0-rc2+ #1 > [ 2360.597987][T11825] Tainted: [N]=TEST > [ 2360.598291][T11825] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > [ 2360.598890][T11825] Call Trace: > [ 2360.599295][T11825] > [2360.599568][T11825] dump_stack_lvl (lib/dump_stack.c:123) > [2360.600032][T11825] lockdep_rcu_suspicious (kernel/locking/lockdep.c:6822) > [2360.600518][T11825] perf_event_clear_cpumask (kernel/events/core.c:13962 > (discriminator 9)) > [2360.600972][T11825] ? __pfx_perf_event_clear_cpumask > (kernel/events/core.c:13939) > [2360.601607][T11825] ? acpi_execute_simple_method (drivers/acpi/utils.c:679) > [2360.601988][T11825] ? __pfx_acpi_execute_simple_method > (drivers/acpi/utils.c:679) > [2360.602577][T11825] ? md_notify_reboot (drivers/md/md.c:9860) > [2360.603043][T11825] perf_event_exit_cpu_context (kernel/events/core.c:13984 > (discriminator 1)) > [2360.603543][T11825] perf_reboot (kernel/events/core.c:14066 (discriminator > 3)) > [2360.603979][T11825] ? trace_notifier_run > (include/trace/events/notifier.h:59 (discriminator 2)) > [2360.604454][T11825] notifier_call_chain (kernel/notifier.c:93) > [2360.604833][T11825] blocking_notifier_call_chain (kernel/notifier.c:389) > [2360.605522][T11825] kernel_power_off (kernel/reboot.c:294) > [2360.605908][T11825] __do_sys_reboot (kernel/reboot.c:771) > [2360.606320][T11825] ? __pfx___do_sys_reboot (kernel/reboot.c:717) > [2360.606739][T11825] ? __pfx_ksys_sync (fs/sync.c:98) > [2360.607113][T11825] do_syscall_64 (arch/x86/entry/common.c:52 > (discriminator 1)) > [2360.607551][T11825] entry_SYSCALL_64_after_hwframe > (arch/x86/entry/entry_64.S:130) > [ 2360.607911][T11825] RIP: 0033:0x20e505 > [ 2360.608320][T11825] Code: 48 89 77 38 c3 89 f0 48 8b 1f 48 8b 67 08 48 8b > 6f 10 4c 8b 67 18 4c 8b 6f 20 4c 8b 77 28 4c 8b 7f 30 ff 67 38 49 89 ca 0f 05 > <48> 3d 01 f0 ff ff 73 01 c3 f7 d8 89 05 d6 8e 00 00 48 83 c8 ff c3 > All code > >0: 48 89 77 38 mov%rsi,0x38(%rdi) >4: c3 ret >5: 89 f0 mov%esi,%eax >7: 48 8b 1fmov(%rdi),%rbx >a: 48 8b 67 08 mov0x8(%rdi),%rsp >e: 48 8b 6f 10 mov0x10(%rdi),%rbp > 12: 4c 8b 67 18 mov0x18(%rdi),%r12 > 16: 4c 8b 6f 20 mov0x20(%rdi),%r13 > 1a: 4c 8b 77 28 mov0x28(%rdi),%r14 > 1e: 4c 8b 7f 30
Re: [PATCH rcu] configs/debug: make sure PROVE_RCU_LIST=y takes effect
Hi Jakub, On 16/10/2024 03:11, Jakub Kicinski wrote: > Commit 0aaa8977acbf ("configs: introduce debug.config for CI-like setup") > added CONFIG_PROVE_RCU_LIST=y to the common CI config, > but RCU_EXPERT is not set, and it's a dependency for > CONFIG_PROVE_RCU_LIST=y. Make sure CIs take advantage > of CONFIG_PROVE_RCU_LIST=y, recent fixes in networking > indicate that it does catch bugs. Good catch! I confirm it fixes the issue: Acked-by: Matthieu Baerts (NGI0) > Signed-off-by: Jakub Kicinski > --- > I'd be slightly tempted to still send it to Linux for v6.12. Good idea, it sounds like a fix because I guess if it was on the list, it was supposed to be used. But be careful it might detect quite a few issues: I just enabled it on MPTCP tree, and it found issues. It might be good to check the impact first, e.g. at least enabling it when running the Networking tests. Please note that the CI didn't pick up this patch, because it is marked for RCU (PATCH rcu): https://patchwork.kernel.org/project/netdevbpf/patch/20241016011144.3058445-1-k...@kernel.org/ If the impact is important, it might be better to target linux-next first, no? Cheers, Matt -- Sponsored by the NGI0 Core fund.
[PATCH net v2 2/2] selftests: mptcp: join: test for prohibited MPC to port-based endp
From: Paolo Abeni Explicitly verify that MPC connection attempts towards a port-based signal endpoint fail with a reset. Note that this new test is a bit different from the other ones, not using 'run_tests'. It is then needed to add the capture capability, and the picking the right port which have been extracted into three new helpers. The info about the capture can also be printed from a single point, which simplifies the exit paths in do_transfer(). The 'Fixes' tag here below is the same as the one from the previous commit: this patch here is not fixing anything wrong in the selftests, but it validates the previous fix for an issue introduced by this commit ID. Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port") Cc: sta...@vger.kernel.org Co-developed-by: Matthieu Baerts (NGI0) Signed-off-by: Matthieu Baerts (NGI0) Signed-off-by: Paolo Abeni Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 117 +--- 1 file changed, 86 insertions(+), 31 deletions(-) diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh index e8d0a01b4144264615d92b953a69ebd934ce468e..c07e2bd3a315aac9c422fed85c3196ec46e060f7 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -23,6 +23,7 @@ tmpfile="" cout="" err="" capout="" +cappid="" ns1="" ns2="" iptables="iptables" @@ -887,6 +888,44 @@ check_cestab() fi } +cond_start_capture() +{ + local ns="$1" + + :> "$capout" + + if $capture; then + local capuser capfile + if [ -z $SUDO_USER ]; then + capuser="" + else + capuser="-Z $SUDO_USER" + fi + + capfile=$(printf "mp_join-%02u-%s.pcap" "$MPTCP_LIB_TEST_COUNTER" "$ns") + + echo "Capturing traffic for test $MPTCP_LIB_TEST_COUNTER into $capfile" + ip netns exec "$ns" tcpdump -i any -s 65535 -B 32768 $capuser -w "$capfile" > "$capout" 2>&1 & + cappid=$! + + sleep 1 + fi +} + +cond_stop_capture() +{ + if $capture; then + sleep 1 + kill $cappid + cat "$capout" + fi +} + +get_port() +{ + echo "$((1 + MPTCP_LIB_TEST_COUNTER - 1))" +} + do_transfer() { local listener_ns="$1" @@ -894,33 +933,17 @@ do_transfer() local cl_proto="$3" local srv_proto="$4" local connect_addr="$5" + local port - local port=$((1 + MPTCP_LIB_TEST_COUNTER - 1)) - local cappid local FAILING_LINKS=${FAILING_LINKS:-""} local fastclose=${fastclose:-""} local speed=${speed:-"fast"} + port=$(get_port) :> "$cout" :> "$sout" - :> "$capout" - if $capture; then - local capuser - if [ -z $SUDO_USER ] ; then - capuser="" - else - capuser="-Z $SUDO_USER" - fi - - capfile=$(printf "mp_join-%02u-%s.pcap" "$MPTCP_LIB_TEST_COUNTER" "${listener_ns}") - - echo "Capturing traffic for test $MPTCP_LIB_TEST_COUNTER into $capfile" - ip netns exec ${listener_ns} tcpdump -i any -s 65535 -B 32768 $capuser -w $capfile > "$capout" 2>&1 & - cappid=$! - - sleep 1 - fi + cond_start_capture ${listener_ns} NSTAT_HISTORY=/tmp/${listener_ns}.nstat ip netns exec ${listener_ns} \ nstat -n @@ -1007,10 +1030,7 @@ do_transfer() wait $spid local rets=$? - if $capture; then - sleep 1 - kill $cappid - fi + cond_stop_capture NSTAT_HISTORY=/tmp/${listener_ns}.nstat ip netns exec ${listener_ns} \ nstat | grep Tcp > /tmp/${listener_ns}.out @@ -1026,7 +1046,6 @@ do_transfer() ip netns exec ${connector_ns} ss -Menita 1>&2 -o "dport = :$port" cat /tmp/${connector_ns}.out - cat "$capout" return 1 fi @@ -1043,13 +1062,7 @@ do_transfer() fi rets=$? - if [ $retc -eq 0 ] && [ $rets -eq 0 ];then - cat "$capout" - return 0 - fi - - cat "$capout" - return 1 +
[PATCH net v2 1/2] mptcp: prevent MPC handshake on port-based signal endpoints
:540 [inline] ip_finish_output2+0xd41/0x1390 net/ipv4/ip_output.c:235 ip_local_out net/ipv4/ip_output.c:129 [inline] __ip_queue_xmit+0x118c/0x1b80 net/ipv4/ip_output.c:535 __tcp_transmit_skb+0x2544/0x3b30 net/ipv4/tcp_output.c:1466 tcp_rcv_synsent_state_process net/ipv4/tcp_input.c:6542 [inline] tcp_rcv_state_process+0x2c32/0x4570 net/ipv4/tcp_input.c:6729 tcp_v4_do_rcv+0x77d/0xc70 net/ipv4/tcp_ipv4.c:1934 sk_backlog_rcv include/net/sock.h: [inline] __release_sock+0x214/0x350 net/core/sock.c:3004 release_sock+0x61/0x1f0 net/core/sock.c:3558 mptcp_sendmsg_fastopen+0x1ad/0x530 net/mptcp/protocol.c:1733 mptcp_sendmsg+0x1884/0x1b10 net/mptcp/protocol.c:1812 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x1a6/0x270 net/socket.c:745 sys_sendmsg+0x525/0x7d0 net/socket.c:2597 ___sys_sendmsg net/socket.c:2651 [inline] __sys_sendmmsg+0x3b2/0x740 net/socket.c:2737 __do_sys_sendmmsg net/socket.c:2766 [inline] __se_sys_sendmmsg net/socket.c:2763 [inline] __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2763 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f04fb13a6b9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 01 1a 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:7ffd651f42d8 EFLAGS: 0246 ORIG_RAX: 0133 RAX: ffda RBX: 0003 RCX: 7f04fb13a6b9 RDX: 0001 RSI: 2d00 RDI: 0004 RBP: 7ffd651f4310 R08: 0001 R09: 0001 R10: 2080 R11: 0246 R12: 000f4240 R13: 7f04fb187449 R14: 7ffd651f42f4 R15: 7ffd651f4300 As noted by Cong Wang, the splat is false positive, but the code path leading to the report is an unexpected one: a client is attempting an MPC handshake towards the in-kernel listener created by the in-kernel PM for a port based signal endpoint. Such connection will be never accepted; many of them can make the listener queue full and preventing the creation of MPJ subflow via such listener - its intended role. Explicitly detect this scenario at initial-syn time and drop the incoming MPC request. Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port") Cc: sta...@vger.kernel.org Reported-by: syzbot+f4aacdfef2c6a6529...@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=f4aacdfef2c6a6529c3e Cc: Cong Wang Signed-off-by: Paolo Abeni Reviewed-by: Matthieu Baerts (NGI0) Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- net/mptcp/mib.c| 1 + net/mptcp/mib.h| 1 + net/mptcp/pm_netlink.c | 1 + net/mptcp/protocol.h | 1 + net/mptcp/subflow.c| 11 +++ 5 files changed, 15 insertions(+) diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c index ad88bd3c58dffed8335eedb43ca6290418e3c4f4..19eb9292bd6093a760b41f98c1774fd2490c48e3 100644 --- a/net/mptcp/mib.c +++ b/net/mptcp/mib.c @@ -17,6 +17,7 @@ static const struct snmp_mib mptcp_snmp_list[] = { SNMP_MIB_ITEM("MPCapableFallbackSYNACK", MPTCP_MIB_MPCAPABLEACTIVEFALLBACK), SNMP_MIB_ITEM("MPCapableSYNTXDrop", MPTCP_MIB_MPCAPABLEACTIVEDROP), SNMP_MIB_ITEM("MPCapableSYNTXDisabled", MPTCP_MIB_MPCAPABLEACTIVEDISABLED), + SNMP_MIB_ITEM("MPCapableEndpAttempt", MPTCP_MIB_MPCAPABLEENDPATTEMPT), SNMP_MIB_ITEM("MPFallbackTokenInit", MPTCP_MIB_TOKENFALLBACKINIT), SNMP_MIB_ITEM("MPTCPRetrans", MPTCP_MIB_RETRANSSEGS), SNMP_MIB_ITEM("MPJoinNoTokenFound", MPTCP_MIB_JOINNOTOKEN), diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h index 3206cdda8bb1067f9a8354fd45deed86b67ac7da..128282982843a07614a46f9b2c2f7c708306c769 100644 --- a/net/mptcp/mib.h +++ b/net/mptcp/mib.h @@ -12,6 +12,7 @@ enum linux_mptcp_mib_field { MPTCP_MIB_MPCAPABLEACTIVEFALLBACK, /* Client-side fallback during 3-way handshake */ MPTCP_MIB_MPCAPABLEACTIVEDROP, /* Client-side fallback due to a MPC drop */ MPTCP_MIB_MPCAPABLEACTIVEDISABLED, /* Client-side disabled due to past issues */ + MPTCP_MIB_MPCAPABLEENDPATTEMPT, /* Prohibited MPC to port-based endp */ MPTCP_MIB_TOKENFALLBACKINIT,/* Could not init/allocate token */ MPTCP_MIB_RETRANSSEGS, /* Segments retransmitted at the MPTCP-level */ MPTCP_MIB_JOINNOTOKEN, /* Received MP_JOIN but the token was not found */ diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c index f6f0a38a0750f82bc909f02a75beec980d951f1f..1a78998fe1f49510ef1f487c8cd5b369a37fb20d 100644 --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -1121,6 +1121,7 @@ static int mptcp_pm_nl_create_listen_socket(struct sock
[PATCH net v2 0/2] mptcp: prevent MPC handshake on port-based signal endpoints
MPTCP connection requests toward a listening socket created by the in-kernel PM for a port based signal endpoint will never be accepted, they need to be explicitly rejected. - Patch 1: Explicitly reject such requests. A fix for >= v5.12. - Patch 2: Cover this case in the MPTCP selftests to avoid regressions. Signed-off-by: Matthieu Baerts (NGI0) --- Changes in v2: - This new version fixes the root cause for the issue Cong Wang sent a patch for a few weeks ago, see the v1, and the explanations below. The new version is very different from the v1, from a different author. Thanks to Cong Wang for the first analysis, and to Paolo for having spot the root cause, and sent a fix for it. - Link to v1: https://lore.kernel.org/r/20240908180620.822579-1-xiyou.wangc...@gmail.com - Link: https://lore.kernel.org/r/a5289a0d-2557-40b8-9575-6f1a0bbf0...@redhat.com --- Paolo Abeni (2): mptcp: prevent MPC handshake on port-based signal endpoints selftests: mptcp: join: test for prohibited MPC to port-based endp net/mptcp/mib.c | 1 + net/mptcp/mib.h | 1 + net/mptcp/pm_netlink.c | 1 + net/mptcp/protocol.h| 1 + net/mptcp/subflow.c | 11 +++ tools/testing/selftests/net/mptcp/mptcp_join.sh | 117 +--- 6 files changed, 101 insertions(+), 31 deletions(-) --- base-commit: 174714f0e505070a16be6fbede30d32b81df789f change-id: 20241014-net-mptcp-mpc-port-endp-4f88bd428ec7 Best regards, -- Matthieu Baerts (NGI0)
[PATCH bpf-next/net v7 2/3] selftests/bpf: Add getsockopt to inspect mptcp subflow
From: Geliang Tang This patch adds a "cgroup/getsockopt" way to inspect the subflows of an MPTCP socket, and verify the modifications done by the same BPF program in the previous commit: a different mark per subflow, and a different TCP CC set on the second one. This new hook will be used by the next commit to verify the socket options set on each subflow. This extra "cgroup/getsockopt" prog walks the msk->conn_list and use bpf_core_cast to cast a pointer for readonly. It allows to inspect all the fields of a structure. Note that on the kernel side, the MPTCP socket stores a list of subflows under 'msk->conn_list'. They can be iterated using the generic 'list' helpers. They have been imported here, with a small difference: list_for_each_entry() uses 'can_loop' to limit the number of iterations, and ease its use. Because only data need to be read here, it is enough to use this technique. It is planned to use bpf_iter, when BPF programs will be used to modify data from the different subflows. mptcp_subflow_tcp_sock() and mptcp_for_each_stubflow() helpers have also be imported. Suggested-by: Martin KaFai Lau Signed-off-by: Geliang Tang Reviewed-by: Matthieu Baerts (NGI0) Signed-off-by: Matthieu Baerts (NGI0) --- Notes: - v5: new patch, instead of using 'ss' in the following patch - v7: use 'can_loop' instead of 'cond_break'. (Martin) --- MAINTAINERS | 2 +- tools/testing/selftests/bpf/progs/mptcp_bpf.h | 42 ++ tools/testing/selftests/bpf/progs/mptcp_subflow.c | 69 +++ 3 files changed, 112 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 3bce6cc05553dad53db5f06d36e6957061886dd0..8817aa26b2fc0ba3581576d040f5093124cc60a7 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16097,7 +16097,7 @@ F: include/net/mptcp.h F: include/trace/events/mptcp.h F: include/uapi/linux/mptcp*.h F: net/mptcp/ -F: tools/testing/selftests/bpf/*/*mptcp*.c +F: tools/testing/selftests/bpf/*/*mptcp*.[ch] F: tools/testing/selftests/net/mptcp/ NETWORKING [TCP] diff --git a/tools/testing/selftests/bpf/progs/mptcp_bpf.h b/tools/testing/selftests/bpf/progs/mptcp_bpf.h new file mode 100644 index ..3b188ccdcc4041acb4f7ed38ae8ddf5a7305466a --- /dev/null +++ b/tools/testing/selftests/bpf/progs/mptcp_bpf.h @@ -0,0 +1,42 @@ +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */ +#ifndef __MPTCP_BPF_H__ +#define __MPTCP_BPF_H__ + +#include "bpf_experimental.h" + +/* list helpers from include/linux/list.h */ +static inline int list_is_head(const struct list_head *list, + const struct list_head *head) +{ + return list == head; +} + +#define list_entry(ptr, type, member) \ + container_of(ptr, type, member) + +#define list_first_entry(ptr, type, member)\ + list_entry((ptr)->next, type, member) + +#define list_next_entry(pos, member) \ + list_entry((pos)->member.next, typeof(*(pos)), member) + +#define list_entry_is_head(pos, head, member) \ + list_is_head(&pos->member, (head)) + +/* small difference: 'can_loop' has been added in the conditions */ +#define list_for_each_entry(pos, head, member) \ + for (pos = list_first_entry(head, typeof(*pos), member);\ +!list_entry_is_head(pos, head, member) && can_loop;\ +pos = list_next_entry(pos, member)) + +/* mptcp helpers from protocol.h */ +#define mptcp_for_each_subflow(__msk, __subflow) \ + list_for_each_entry(__subflow, &((__msk)->conn_list), node) + +static __always_inline struct sock * +mptcp_subflow_tcp_sock(const struct mptcp_subflow_context *subflow) +{ + return subflow->tcp_sock; +} + +#endif diff --git a/tools/testing/selftests/bpf/progs/mptcp_subflow.c b/tools/testing/selftests/bpf/progs/mptcp_subflow.c index 2e28f4a215b5469fcbc31168071887687ca34792..70302477e326eecaef6aad4ecf899aa3d6606f23 100644 --- a/tools/testing/selftests/bpf/progs/mptcp_subflow.c +++ b/tools/testing/selftests/bpf/progs/mptcp_subflow.c @@ -4,10 +4,12 @@ /* vmlinux.h, bpf_helpers.h and other 'define' */ #include "bpf_tracing_net.h" +#include "mptcp_bpf.h" char _license[] SEC("license") = "GPL"; char cc[TCP_CA_NAME_MAX] = "reno"; +int pid; /* Associate a subflow counter to each token */ struct { @@ -57,3 +59,70 @@ int mptcp_subflow(struct bpf_sock_ops *skops) return 1; } + +static int _check_getsockopt_subflow_mark(struct mptcp_sock *msk, struct bpf_sockopt *ctx) +{ + struct mptcp
[PATCH bpf-next/net v7 3/3] selftests/bpf: Add mptcp subflow subtest
From: Geliang Tang This patch adds a subtest named test_subflow in test_mptcp to load and verify the newly added MPTCP subflow BPF program. To goal is to make sure it is possible to set different socket options per subflows, while the userspace socket interface only lets the application to set the same socket options for the whole MPTCP connection and its multiple subflows. To check that, a client and a server are started in a dedicated netns, with veth interfaces to simulate multiple paths. They will exchange data to allow the creation of an additional subflow. When the different subflows are being created, the new MPTCP subflow BPF program will set some socket options: marks and TCP CC. The validation is done by the same program, when the userspace checks the value of the modified socket options. On the userspace side, it will see that the default values are still being used on the MPTCP connection, while the BPF program will see different options set per subflow of the same MPTCP connection. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/76 Signed-off-by: Geliang Tang Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- Notes: - v2 -> v3: - Use './mptcp_pm_nl_ctl' instead of 'ip mptcp', not supported by the BPF CI running IPRoute 5.5.0. - Use SYS_NOFAIL() in _ss_search() instead of calling system() - v3 -> v4: - Drop './mptcp_pm_nl_ctl', but skip this new test if 'ip mptcp' is not supported. - v4 -> v5: - Note that this new test is no longer skipped on the BPF CI, because 'ip mptcp' is now supported after the switch from Ubuntu 20.04 to 22.04. - Update the commit message, reflecting the latest version. - The validations are no longer done using 'ss', but using the new BPF program added in the previous patch, to reduce the use of external dependences. (Martin) - v5 -> v6: - Use usleep() instead of sleep(). - v6 -> v7: - Drop mptcp_subflow__attach(), use bpf_program__attach_cgroup() instead of bpf_prog_attach(), plus assign the returned value to skel->links.* directly. (Martin) --- tools/testing/selftests/bpf/prog_tests/mptcp.c | 121 + 1 file changed, 121 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c index d2ca32fa3b21e686d6ef2673b5953d5417edfedb..b61f26b8cdf2540a34e28ddb8a5f1f2378cf8c06 100644 --- a/tools/testing/selftests/bpf/prog_tests/mptcp.c +++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c @@ -5,12 +5,17 @@ #include #include #include +#include #include "cgroup_helpers.h" #include "network_helpers.h" #include "mptcp_sock.skel.h" #include "mptcpify.skel.h" +#include "mptcp_subflow.skel.h" #define NS_TEST "mptcp_ns" +#define ADDR_1 "10.0.1.1" +#define ADDR_2 "10.0.1.2" +#define PORT_1 10001 #ifndef IPPROTO_MPTCP #define IPPROTO_MPTCP 262 @@ -335,10 +340,126 @@ static void test_mptcpify(void) close(cgroup_fd); } +static int endpoint_init(char *flags) +{ + SYS(fail, "ip -net %s link add veth1 type veth peer name veth2", NS_TEST); + SYS(fail, "ip -net %s addr add %s/24 dev veth1", NS_TEST, ADDR_1); + SYS(fail, "ip -net %s link set dev veth1 up", NS_TEST); + SYS(fail, "ip -net %s addr add %s/24 dev veth2", NS_TEST, ADDR_2); + SYS(fail, "ip -net %s link set dev veth2 up", NS_TEST); + if (SYS_NOFAIL("ip -net %s mptcp endpoint add %s %s", NS_TEST, ADDR_2, flags)) { + printf("'ip mptcp' not supported, skip this test.\n"); + test__skip(); + goto fail; + } + + return 0; +fail: + return -1; +} + +static void wait_for_new_subflows(int fd) +{ + socklen_t len; + u8 subflows; + int err, i; + + len = sizeof(subflows); + /* Wait max 1 sec for new subflows to be created */ + for (i = 0; i < 10; i++) { + err = getsockopt(fd, SOL_MPTCP, MPTCP_INFO, &subflows, &len); + if (!err && subflows > 0) + break; + + usleep(10); /* 0.1s */ + } +} + +static void run_subflow(void) +{ + int server_fd, client_fd, err; + char new[TCP_CA_NAME_MAX]; + char cc[TCP_CA_NAME_MAX]; + unsigned int mark; + socklen_t len; + + server_fd = start_mptcp_server(AF_INET, ADDR_1, PORT_1, 0); + if (!ASSERT_OK_FD(server_fd, "start_mptcp_server")) + return; + + client_fd = connect_to_fd(server_fd, 0); + if (!ASSERT_OK_FD(client_fd, "connect_to_fd")) + goto close_server; + + send_byte(client_fd); + wait_for_new_subflows(client_fd);
[PATCH bpf-next/net v7 1/3] selftests/bpf: Add mptcp subflow example
From: Nicolas Rybowski Move Nicolas' patch into bpf selftests directory. This example adds a different mark (SO_MARK) on each subflow, and changes the TCP CC only on the first subflow. >From the userspace, an application can do a setsockopt() on an MPTCP socket, and typically the same value will be propagated to all subflows (paths). If someone wants to have different values per subflow, the recommended way is to use BPF. So it is good to add such example here, and make sure there is no regressions. This example shows how it is possible to: Identify the parent msk of an MPTCP subflow. Put different sockopt for each subflow of a same MPTCP connection. Here especially, two different behaviours are implemented: A socket mark (SOL_SOCKET SO_MARK) is put on each subflow of a same MPTCP connection. The order of creation of the current subflow defines its mark. The TCP CC algorithm of the very first subflow of an MPTCP connection is set to "reno". This is just to show it is possible to identify an MPTCP connection, and set socket options, from different SOL levels, per subflow. "reno" has been picked because it is built-in and usually not set as default one. It is easy to verify with 'ss' that these modifications have been applied correctly. That's what the next patch is going to do. Nicolas' code comes from: commit 4d120186e4d6 ("bpf:examples: update mptcp_set_mark_kern.c") from the MPTCP repo https://github.com/multipath-tcp/mptcp_net-next (the "scripts" branch), and it has been adapted by Geliang. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/76 Co-developed-by: Geliang Tang Signed-off-by: Geliang Tang Signed-off-by: Nicolas Rybowski Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- Notes: - v1 -> v2: - The commit message has been updated: why setting multiple socket options, why reno, the verification is done in a later patch (different author). (Alexei) - v2 -> v3: - Only #include "bpf_tracing_net.h", linked to: https://lore.kernel.org/20240509175026.3423614-1-martin@linux.dev - v4 -> v5: - Set reno as TCP cc on the second subflow, not to influence the getsockopt() done from the userspace, which will return the one from the first subflow, the default TCP cc then, not the modified one. --- tools/testing/selftests/bpf/progs/mptcp_subflow.c | 59 +++ 1 file changed, 59 insertions(+) diff --git a/tools/testing/selftests/bpf/progs/mptcp_subflow.c b/tools/testing/selftests/bpf/progs/mptcp_subflow.c new file mode 100644 index ..2e28f4a215b5469fcbc31168071887687ca34792 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/mptcp_subflow.c @@ -0,0 +1,59 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2020, Tessares SA. */ +/* Copyright (c) 2024, Kylin Software */ + +/* vmlinux.h, bpf_helpers.h and other 'define' */ +#include "bpf_tracing_net.h" + +char _license[] SEC("license") = "GPL"; + +char cc[TCP_CA_NAME_MAX] = "reno"; + +/* Associate a subflow counter to each token */ +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(key_size, sizeof(__u32)); + __uint(value_size, sizeof(__u32)); + __uint(max_entries, 100); +} mptcp_sf SEC(".maps"); + +SEC("sockops") +int mptcp_subflow(struct bpf_sock_ops *skops) +{ + __u32 init = 1, key, mark, *cnt; + struct mptcp_sock *msk; + struct bpf_sock *sk; + int err; + + if (skops->op != BPF_SOCK_OPS_TCP_CONNECT_CB) + return 1; + + sk = skops->sk; + if (!sk) + return 1; + + msk = bpf_skc_to_mptcp_sock(sk); + if (!msk) + return 1; + + key = msk->token; + cnt = bpf_map_lookup_elem(&mptcp_sf, &key); + if (cnt) { + /* A new subflow is added to an existing MPTCP connection */ + __sync_fetch_and_add(cnt, 1); + mark = *cnt; + } else { + /* A new MPTCP connection is just initiated and this is its primary subflow */ + bpf_map_update_elem(&mptcp_sf, &key, &init, BPF_ANY); + mark = init; + } + + /* Set the mark of the subflow's socket based on appearance order */ + err = bpf_setsockopt(skops, SOL_SOCKET, SO_MARK, &mark, sizeof(mark)); + if (err < 0) + return 1; + if (mark == 2) + err = bpf_setsockopt(skops, SOL_TCP, TCP_CONGESTION, cc, TCP_CA_NAME_MAX); + + return 1; +} -- 2.45.2
[PATCH bpf-next/net v7 0/3] selftests/bpf: new MPTCP subflow subtest
In this series from Geliang, modifying MPTCP BPF selftests, we have: - A new MPTCP subflow BPF program setting socket options per subflow: it looks better to have this old test program in the BPF selftests to track regressions and to serve as example. Note: Nicolas is no longer working at Tessares, but he did this work while working for them, and his email address is no longer available. - A new hook in the same BPF program to do the verification step. - A new MPTCP BPF subtest validating the new BPF program added in the first patch, with the help of the new hook added in the second patch. Signed-off-by: Matthieu Baerts (NGI0) --- Changes in v7: - Patch 2/3: use 'can_loop' instead of 'cond_break'. (Martin) - Patch 3/3: use bpf_program__attach_cgroup(). (Martin) - Link to v6: https://lore.kernel.org/r/20240911-upstream-bpf-next-20240506-mptcp-subflow-test-v6-0-7872294c4...@kernel.org Changes in v6: - Patch 3/3: use usleep() instead of sleep() - Series: rebased on top of bpf-next/net - Link to v5: https://lore.kernel.org/r/20240910-upstream-bpf-next-20240506-mptcp-subflow-test-v5-0-2c664a7da...@kernel.org Changes in v5: - See the individual changelog for more details about them - Patch 1/3: set TCP on the 2nd subflow - Patch 2/3: new - Patch 3/3: use the BPF program from patch 2/3 to do the validation instead of using ss. - Series: rebased on top of bpf-next/net - Link to v4: https://lore.kernel.org/r/20240805-upstream-bpf-next-20240506-mptcp-subflow-test-v4-0-2b4ca6994...@kernel.org Changes in v4: - Drop former patch 2/3: MPTCP's pm_nl_ctl requires a new header file: - I will check later if it is possible to avoid having duplicated header files in tools/include/uapi, but no need to block this series for that. Patch 2/3 can be added later if needed. - Patch 2/2: skip the test if 'ip mptcp' is not available. - Link to v3: https://lore.kernel.org/r/20240703-upstream-bpf-next-20240506-mptcp-subflow-test-v3-0-ebdc2d494...@kernel.org Changes in v3: - Sorry for the delay between v2 and v3, this series was conflicting with the "add netns helpers", but it looks like it is on hold: https://lore.kernel.org/cover.1715821541.git.tanggeli...@kylinos.cn - Patch 1/3 includes "bpf_tracing_net.h", introduced in between. - New patch 2/3: "selftests/bpf: Add mptcp pm_nl_ctl link". - Patch 3/3: use the tool introduced in patch 2/3 + SYS_NOFAIL() helper. - Link to v2: https://lore.kernel.org/r/20240509-upstream-bpf-next-20240506-mptcp-subflow-test-v2-0-4048c2948...@kernel.org Changes in v2: - Previous patches 1/4 and 2/4 have been dropped from this series: - 1/4: "selftests/bpf: Handle SIGINT when creating netns": - A new version, more generic and no longer specific to MPTCP BPF selftest will be sent later, as part of a new series. (Alexei) - 2/4: "selftests/bpf: Add RUN_MPTCP_TEST macro": - Removed, not to hide helper functions in macros. (Alexei) - The commit message of patch 1/2 has been clarified to avoid some possible confusions spot by Alexei. - Link to v1: https://lore.kernel.org/r/20240507-upstream-bpf-next-20240506-mptcp-subflow-test-v1-0-e2bcbdf49...@kernel.org --- Geliang Tang (2): selftests/bpf: Add getsockopt to inspect mptcp subflow selftests/bpf: Add mptcp subflow subtest Nicolas Rybowski (1): selftests/bpf: Add mptcp subflow example MAINTAINERS | 2 +- tools/testing/selftests/bpf/prog_tests/mptcp.c| 121 tools/testing/selftests/bpf/progs/mptcp_bpf.h | 42 +++ tools/testing/selftests/bpf/progs/mptcp_subflow.c | 128 ++ 4 files changed, 292 insertions(+), 1 deletion(-) --- base-commit: 151ac45348afc5b56baa584c7cd4876addf461ff change-id: 20240506-upstream-bpf-next-20240506-mptcp-subflow-test-faef6654bfa3 Best regards, -- Matthieu Baerts (NGI0)
[PATCH bpf-next/net v6 1/3] selftests/bpf: Add mptcp subflow example
From: Nicolas Rybowski Move Nicolas' patch into bpf selftests directory. This example adds a different mark (SO_MARK) on each subflow, and changes the TCP CC only on the first subflow. >From the userspace, an application can do a setsockopt() on an MPTCP socket, and typically the same value will be propagated to all subflows (paths). If someone wants to have different values per subflow, the recommended way is to use BPF. So it is good to add such example here, and make sure there is no regressions. This example shows how it is possible to: Identify the parent msk of an MPTCP subflow. Put different sockopt for each subflow of a same MPTCP connection. Here especially, two different behaviours are implemented: A socket mark (SOL_SOCKET SO_MARK) is put on each subflow of a same MPTCP connection. The order of creation of the current subflow defines its mark. The TCP CC algorithm of the very first subflow of an MPTCP connection is set to "reno". This is just to show it is possible to identify an MPTCP connection, and set socket options, from different SOL levels, per subflow. "reno" has been picked because it is built-in and usually not set as default one. It is easy to verify with 'ss' that these modifications have been applied correctly. That's what the next patch is going to do. Nicolas' code comes from: commit 4d120186e4d6 ("bpf:examples: update mptcp_set_mark_kern.c") from the MPTCP repo https://github.com/multipath-tcp/mptcp_net-next (the "scripts" branch), and it has been adapted by Geliang. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/76 Co-developed-by: Geliang Tang Signed-off-by: Geliang Tang Signed-off-by: Nicolas Rybowski Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- Notes: - v1 -> v2: - The commit message has been updated: why setting multiple socket options, why reno, the verification is done in a later patch (different author). (Alexei) - v2 -> v3: - Only #include "bpf_tracing_net.h", linked to: https://lore.kernel.org/20240509175026.3423614-1-martin@linux.dev - v4 -> v5: - Set reno as TCP cc on the second subflow, not to influence the getsockopt() done from the userspace, which will return the one from the first subflow, the default TCP cc then, not the modified one. --- tools/testing/selftests/bpf/progs/mptcp_subflow.c | 59 +++ 1 file changed, 59 insertions(+) diff --git a/tools/testing/selftests/bpf/progs/mptcp_subflow.c b/tools/testing/selftests/bpf/progs/mptcp_subflow.c new file mode 100644 index ..2e28f4a215b5 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/mptcp_subflow.c @@ -0,0 +1,59 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2020, Tessares SA. */ +/* Copyright (c) 2024, Kylin Software */ + +/* vmlinux.h, bpf_helpers.h and other 'define' */ +#include "bpf_tracing_net.h" + +char _license[] SEC("license") = "GPL"; + +char cc[TCP_CA_NAME_MAX] = "reno"; + +/* Associate a subflow counter to each token */ +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(key_size, sizeof(__u32)); + __uint(value_size, sizeof(__u32)); + __uint(max_entries, 100); +} mptcp_sf SEC(".maps"); + +SEC("sockops") +int mptcp_subflow(struct bpf_sock_ops *skops) +{ + __u32 init = 1, key, mark, *cnt; + struct mptcp_sock *msk; + struct bpf_sock *sk; + int err; + + if (skops->op != BPF_SOCK_OPS_TCP_CONNECT_CB) + return 1; + + sk = skops->sk; + if (!sk) + return 1; + + msk = bpf_skc_to_mptcp_sock(sk); + if (!msk) + return 1; + + key = msk->token; + cnt = bpf_map_lookup_elem(&mptcp_sf, &key); + if (cnt) { + /* A new subflow is added to an existing MPTCP connection */ + __sync_fetch_and_add(cnt, 1); + mark = *cnt; + } else { + /* A new MPTCP connection is just initiated and this is its primary subflow */ + bpf_map_update_elem(&mptcp_sf, &key, &init, BPF_ANY); + mark = init; + } + + /* Set the mark of the subflow's socket based on appearance order */ + err = bpf_setsockopt(skops, SOL_SOCKET, SO_MARK, &mark, sizeof(mark)); + if (err < 0) + return 1; + if (mark == 2) + err = bpf_setsockopt(skops, SOL_TCP, TCP_CONGESTION, cc, TCP_CA_NAME_MAX); + + return 1; +} -- 2.45.2
[PATCH bpf-next/net v6 3/3] selftests/bpf: Add mptcp subflow subtest
From: Geliang Tang This patch adds a subtest named test_subflow in test_mptcp to load and verify the newly added MPTCP subflow BPF program. To goal is to make sure it is possible to set different socket options per subflows, while the userspace socket interface only lets the application to set the same socket options for the whole MPTCP connection and its multiple subflows. To check that, a client and a server are started in a dedicated netns, with veth interfaces to simulate multiple paths. They will exchange data to allow the creation of an additional subflow. When the different subflows are being created, the new MPTCP subflow BPF program will set some socket options: marks and TCP CC. The validation is done by the same program, when the userspace checks the value of the modified socket options. On the userspace side, it will see that the default values are still being used on the MPTCP connection, while the BPF program will see different options set per subflow of the same MPTCP connection. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/76 Signed-off-by: Geliang Tang Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- Notes: - v2 -> v3: - Use './mptcp_pm_nl_ctl' instead of 'ip mptcp', not supported by the BPF CI running IPRoute 5.5.0. - Use SYS_NOFAIL() in _ss_search() instead of calling system() - v3 -> v4: - Drop './mptcp_pm_nl_ctl', but skip this new test if 'ip mptcp' is not supported. - v4 -> v5: - Note that this new test is no longer skipped on the BPF CI, because 'ip mptcp' is now supported after the switch from Ubuntu 20.04 to 22.04. - Update the commit message, reflecting the latest version. - The validations are no longer done using 'ss', but using the new BPF program added in the previous patch, to reduce the use of external dependences. (Martin) - v5 -> v6: - Use usleep() instead of sleep(). --- tools/testing/selftests/bpf/prog_tests/mptcp.c | 127 + 1 file changed, 127 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c index d2ca32fa3b21..c76a0d8c8f93 100644 --- a/tools/testing/selftests/bpf/prog_tests/mptcp.c +++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c @@ -5,12 +5,17 @@ #include #include #include +#include #include "cgroup_helpers.h" #include "network_helpers.h" #include "mptcp_sock.skel.h" #include "mptcpify.skel.h" +#include "mptcp_subflow.skel.h" #define NS_TEST "mptcp_ns" +#define ADDR_1 "10.0.1.1" +#define ADDR_2 "10.0.1.2" +#define PORT_1 10001 #ifndef IPPROTO_MPTCP #define IPPROTO_MPTCP 262 @@ -335,10 +340,132 @@ static void test_mptcpify(void) close(cgroup_fd); } +static int endpoint_init(char *flags) +{ + SYS(fail, "ip -net %s link add veth1 type veth peer name veth2", NS_TEST); + SYS(fail, "ip -net %s addr add %s/24 dev veth1", NS_TEST, ADDR_1); + SYS(fail, "ip -net %s link set dev veth1 up", NS_TEST); + SYS(fail, "ip -net %s addr add %s/24 dev veth2", NS_TEST, ADDR_2); + SYS(fail, "ip -net %s link set dev veth2 up", NS_TEST); + if (SYS_NOFAIL("ip -net %s mptcp endpoint add %s %s", NS_TEST, ADDR_2, flags)) { + printf("'ip mptcp' not supported, skip this test.\n"); + test__skip(); + goto fail; + } + + return 0; +fail: + return -1; +} + +static void wait_for_new_subflows(int fd) +{ + socklen_t len; + u8 subflows; + int err, i; + + len = sizeof(subflows); + /* Wait max 1 sec for new subflows to be created */ + for (i = 0; i < 10; i++) { + err = getsockopt(fd, SOL_MPTCP, MPTCP_INFO, &subflows, &len); + if (!err && subflows > 0) + break; + + usleep(10); /* 0.1s */ + } +} + +static void run_subflow(void) +{ + int server_fd, client_fd, err; + char new[TCP_CA_NAME_MAX]; + char cc[TCP_CA_NAME_MAX]; + unsigned int mark; + socklen_t len; + + server_fd = start_mptcp_server(AF_INET, ADDR_1, PORT_1, 0); + if (!ASSERT_OK_FD(server_fd, "start_mptcp_server")) + return; + + client_fd = connect_to_fd(server_fd, 0); + if (!ASSERT_OK_FD(client_fd, "connect_to_fd")) + goto close_server; + + send_byte(client_fd); + wait_for_new_subflows(client_fd); + + len = sizeof(mark); + err = getsockopt(client_fd, SOL_SOCKET, SO_MARK, &mark, &len); + if (ASSERT_OK(err, "getsockopt(client_fd, SO_MARK)")) + ASSERT_EQ(mark, 0, "mark"); + + len = siz
[PATCH bpf-next/net v6 2/3] selftests/bpf: Add getsockopt to inspect mptcp subflow
From: Geliang Tang This patch adds a "cgroup/getsockopt" way to inspect the subflows of an MPTCP socket, and verify the modifications done by the same BPF program in the previous commit: a different mark per subflow, and a different TCP CC set on the second one. This new hook will be used by the next commit to verify the socket options set on each subflow. This extra "cgroup/getsockopt" prog walks the msk->conn_list and use bpf_core_cast to cast a pointer for readonly. It allows to inspect all the fields of a structure. Note that on the kernel side, the MPTCP socket stores a list of subflows under 'msk->conn_list'. They can be iterated using the generic 'list' helpers. They have been imported here, with a small difference: list_for_each_entry() uses 'cond_break' to limit the number of iterations, and ease its use. Because only data need to be read here, it is enough to use this technique. It is planned to use bpf_iter, when BPF programs will be used to modify data from the different subflows. mptcp_subflow_tcp_sock() and mptcp_for_each_stubflow() helpers have also be imported. Suggested-by: Martin KaFai Lau Signed-off-by: Geliang Tang Reviewed-by: Matthieu Baerts (NGI0) Signed-off-by: Matthieu Baerts (NGI0) --- Notes: - v5: new patch, instead of using 'ss' in the following patch --- MAINTAINERS | 2 +- tools/testing/selftests/bpf/progs/mptcp_bpf.h | 42 ++ tools/testing/selftests/bpf/progs/mptcp_subflow.c | 69 +++ 3 files changed, 112 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 30a9b9450e11..2674b86209d1 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16058,7 +16058,7 @@ F: include/net/mptcp.h F: include/trace/events/mptcp.h F: include/uapi/linux/mptcp*.h F: net/mptcp/ -F: tools/testing/selftests/bpf/*/*mptcp*.c +F: tools/testing/selftests/bpf/*/*mptcp*.[ch] F: tools/testing/selftests/net/mptcp/ NETWORKING [TCP] diff --git a/tools/testing/selftests/bpf/progs/mptcp_bpf.h b/tools/testing/selftests/bpf/progs/mptcp_bpf.h new file mode 100644 index ..179b74c1205f --- /dev/null +++ b/tools/testing/selftests/bpf/progs/mptcp_bpf.h @@ -0,0 +1,42 @@ +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */ +#ifndef __MPTCP_BPF_H__ +#define __MPTCP_BPF_H__ + +#include "bpf_experimental.h" + +/* list helpers from include/linux/list.h */ +static inline int list_is_head(const struct list_head *list, + const struct list_head *head) +{ + return list == head; +} + +#define list_entry(ptr, type, member) \ + container_of(ptr, type, member) + +#define list_first_entry(ptr, type, member)\ + list_entry((ptr)->next, type, member) + +#define list_next_entry(pos, member) \ + list_entry((pos)->member.next, typeof(*(pos)), member) + +#define list_entry_is_head(pos, head, member) \ + list_is_head(&pos->member, (head)) + +/* small difference: 'cond_break' has been added in the conditions */ +#define list_for_each_entry(pos, head, member) \ + for (pos = list_first_entry(head, typeof(*pos), member);\ +cond_break, !list_entry_is_head(pos, head, member);\ +pos = list_next_entry(pos, member)) + +/* mptcp helpers from protocol.h */ +#define mptcp_for_each_subflow(__msk, __subflow) \ + list_for_each_entry(__subflow, &((__msk)->conn_list), node) + +static __always_inline struct sock * +mptcp_subflow_tcp_sock(const struct mptcp_subflow_context *subflow) +{ + return subflow->tcp_sock; +} + +#endif diff --git a/tools/testing/selftests/bpf/progs/mptcp_subflow.c b/tools/testing/selftests/bpf/progs/mptcp_subflow.c index 2e28f4a215b5..70302477e326 100644 --- a/tools/testing/selftests/bpf/progs/mptcp_subflow.c +++ b/tools/testing/selftests/bpf/progs/mptcp_subflow.c @@ -4,10 +4,12 @@ /* vmlinux.h, bpf_helpers.h and other 'define' */ #include "bpf_tracing_net.h" +#include "mptcp_bpf.h" char _license[] SEC("license") = "GPL"; char cc[TCP_CA_NAME_MAX] = "reno"; +int pid; /* Associate a subflow counter to each token */ struct { @@ -57,3 +59,70 @@ int mptcp_subflow(struct bpf_sock_ops *skops) return 1; } + +static int _check_getsockopt_subflow_mark(struct mptcp_sock *msk, struct bpf_sockopt *ctx) +{ + struct mptcp_subflow_context *subflow; + int i = 0; + + mptcp_for_each_subflow(msk, subflow) { + struct sock *ssk; + + ssk = mptcp_subflow_tcp_sock(bpf_core_cast(subflow, + struct mptcp_subflow_context)); + +
[PATCH bpf-next/net v6 0/3] selftests/bpf: new MPTCP subflow subtest
In this series from Geliang, modifying MPTCP BPF selftests, we have: - A new MPTCP subflow BPF program setting socket options per subflow: it looks better to have this old test program in the BPF selftests to track regressions and to serve as example. Note: Nicolas is no longer working at Tessares, but he did this work while working for them, and his email address is no longer available. - A new hook in the same BPF program to do the verification step. - A new MPTCP BPF subtest validating the new BPF program added in the first patch, with the help of the new hook added in the second patch. Signed-off-by: Matthieu Baerts (NGI0) --- Changes in v6: - Patch 3/3: use usleep() instead of sleep() - Series: rebased on top of bpf-next/net - Link to v5: https://lore.kernel.org/r/20240910-upstream-bpf-next-20240506-mptcp-subflow-test-v5-0-2c664a7da...@kernel.org Changes in v5: - See the individual changelog for more details about them - Patch 1/3: set TCP on the 2nd subflow - Patch 2/3: new - Patch 3/3: use the BPF program from patch 2/3 to do the validation instead of using ss. - Series: rebased on top of bpf-next/net - Link to v4: https://lore.kernel.org/r/20240805-upstream-bpf-next-20240506-mptcp-subflow-test-v4-0-2b4ca6994...@kernel.org Changes in v4: - Drop former patch 2/3: MPTCP's pm_nl_ctl requires a new header file: - I will check later if it is possible to avoid having duplicated header files in tools/include/uapi, but no need to block this series for that. Patch 2/3 can be added later if needed. - Patch 2/2: skip the test if 'ip mptcp' is not available. - Link to v3: https://lore.kernel.org/r/20240703-upstream-bpf-next-20240506-mptcp-subflow-test-v3-0-ebdc2d494...@kernel.org Changes in v3: - Sorry for the delay between v2 and v3, this series was conflicting with the "add netns helpers", but it looks like it is on hold: https://lore.kernel.org/cover.1715821541.git.tanggeli...@kylinos.cn - Patch 1/3 includes "bpf_tracing_net.h", introduced in between. - New patch 2/3: "selftests/bpf: Add mptcp pm_nl_ctl link". - Patch 3/3: use the tool introduced in patch 2/3 + SYS_NOFAIL() helper. - Link to v2: https://lore.kernel.org/r/20240509-upstream-bpf-next-20240506-mptcp-subflow-test-v2-0-4048c2948...@kernel.org Changes in v2: - Previous patches 1/4 and 2/4 have been dropped from this series: - 1/4: "selftests/bpf: Handle SIGINT when creating netns": - A new version, more generic and no longer specific to MPTCP BPF selftest will be sent later, as part of a new series. (Alexei) - 2/4: "selftests/bpf: Add RUN_MPTCP_TEST macro": - Removed, not to hide helper functions in macros. (Alexei) - The commit message of patch 1/2 has been clarified to avoid some possible confusions spot by Alexei. - Link to v1: https://lore.kernel.org/r/20240507-upstream-bpf-next-20240506-mptcp-subflow-test-v1-0-e2bcbdf49...@kernel.org --- Geliang Tang (2): selftests/bpf: Add getsockopt to inspect mptcp subflow selftests/bpf: Add mptcp subflow subtest Nicolas Rybowski (1): selftests/bpf: Add mptcp subflow example MAINTAINERS | 2 +- tools/testing/selftests/bpf/prog_tests/mptcp.c| 127 + tools/testing/selftests/bpf/progs/mptcp_bpf.h | 42 +++ tools/testing/selftests/bpf/progs/mptcp_subflow.c | 128 ++ 4 files changed, 298 insertions(+), 1 deletion(-) --- base-commit: 23dc9867329c72b48e5039ac93fbf50d9099cdb3 change-id: 20240506-upstream-bpf-next-20240506-mptcp-subflow-test-faef6654bfa3 Best regards, -- Matthieu Baerts (NGI0)
[PATCH net 3/3] selftests: mptcp: include net_helper.sh file
Similar to the previous commit, the net_helper.sh file from the parent directory is used by the MPTCP selftests and it needs to be present when running the tests. This file then needs to be listed in the Makefile to be included when exporting or installing the tests, e.g. with: make -C tools/testing/selftests \ TARGETS=net/mptcp \ install INSTALL_PATH=$KSFT_INSTALL_PATH cd $KSFT_INSTALL_PATH ./run_kselftest.sh -c net/mptcp Fixes: 1af3bc912eac ("selftests: mptcp: lib: use wait_local_port_listen helper") Signed-off-by: Matthieu Baerts (NGI0) --- tools/testing/selftests/net/mptcp/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/mptcp/Makefile b/tools/testing/selftests/net/mptcp/Makefile index 68924053a1e6..5d796622e730 100644 --- a/tools/testing/selftests/net/mptcp/Makefile +++ b/tools/testing/selftests/net/mptcp/Makefile @@ -11,7 +11,7 @@ TEST_GEN_FILES = mptcp_connect pm_nl_ctl mptcp_sockopt mptcp_inq TEST_FILES := mptcp_lib.sh settings -TEST_INCLUDES := ../lib.sh +TEST_INCLUDES := ../lib.sh ../net_helper.sh EXTRA_CLEAN := *.pcap -- 2.45.2
[PATCH net 2/3] selftests: mptcp: include lib.sh file
The lib.sh file from the parent directory is used by the MPTCP selftests and it needs to be present when running the tests. This file then needs to be listed in the Makefile to be included when exporting or installing the tests, e.g. with: make -C tools/testing/selftests \ TARGETS=net/mptcp \ install INSTALL_PATH=$KSFT_INSTALL_PATH cd $KSFT_INSTALL_PATH ./run_kselftest.sh -c net/mptcp Fixes: f265d3119a29 ("selftests: mptcp: lib: use setup/cleanup_ns helpers") Signed-off-by: Matthieu Baerts (NGI0) --- tools/testing/selftests/net/mptcp/Makefile | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/testing/selftests/net/mptcp/Makefile b/tools/testing/selftests/net/mptcp/Makefile index 7b936a926859..68924053a1e6 100644 --- a/tools/testing/selftests/net/mptcp/Makefile +++ b/tools/testing/selftests/net/mptcp/Makefile @@ -11,6 +11,8 @@ TEST_GEN_FILES = mptcp_connect pm_nl_ctl mptcp_sockopt mptcp_inq TEST_FILES := mptcp_lib.sh settings +TEST_INCLUDES := ../lib.sh + EXTRA_CLEAN := *.pcap include ../../lib.mk -- 2.45.2
[PATCH net 1/3] selftests: mptcp: join: restrict fullmesh endp on 1st sf
A new endpoint using the IP of the initial subflow has been recently added to increase the code coverage. But it breaks the test when using old kernels not having commit 86e39e04482b ("mptcp: keep track of local endpoint still available for each msk"), e.g. on v5.15. Similar to commit d4c81bbb8600 ("selftests: mptcp: join: support local endpoint being tracked or not"), it is possible to add the new endpoint conditionally, by checking if "mptcp_pm_subflow_check_next" is present in kallsyms: this is not directly linked to the commit introducing this symbol but for the parent one which is linked anyway. So we can know in advance what will be the expected behaviour, and add the new endpoint only when it makes sense to do so. Fixes: 4878f9f8421f ("selftests: mptcp: join: validate fullmesh endp on 1st sf") Cc: sta...@vger.kernel.org Signed-off-by: Matthieu Baerts (NGI0) --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh index a4762c49a878..cde041c93906 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -3064,7 +3064,9 @@ fullmesh_tests() pm_nl_set_limits $ns1 1 3 pm_nl_set_limits $ns2 1 3 pm_nl_add_endpoint $ns1 10.0.2.1 flags signal - pm_nl_add_endpoint $ns2 10.0.1.2 flags subflow,fullmesh + if mptcp_lib_kallsyms_has "mptcp_pm_subflow_check_next$"; then + pm_nl_add_endpoint $ns2 10.0.1.2 flags subflow,fullmesh + fi fullmesh=1 speed=slow \ run_tests $ns1 $ns2 10.0.1.1 chk_join_nr 3 3 3 -- 2.45.2
[PATCH net 0/3] selftests: mptcp: misc. small fixes
Here are some various fixes for the MPTCP selftests. Patch 1 fixes a recently modified test to continue to work as expected on older kernels. This is a fix for a recent fix that can be backported up to v5.15. Patch 2 and 3 include dependences when exporting or installing the tests. Two fixes for v6.11-rc1. Signed-off-by: Matthieu Baerts (NGI0) --- Matthieu Baerts (NGI0) (3): selftests: mptcp: join: restrict fullmesh endp on 1st sf selftests: mptcp: include lib.sh file selftests: mptcp: include net_helper.sh file tools/testing/selftests/net/mptcp/Makefile | 2 ++ tools/testing/selftests/net/mptcp/mptcp_join.sh | 4 +++- 2 files changed, 5 insertions(+), 1 deletion(-) --- base-commit: 48aa361c5db0b380c2b75c24984c0d3e7c1e8c09 change-id: 20240910-net-selftests-mptcp-fix-install-2b82ae5a99c8 Best regards, -- Matthieu Baerts (NGI0)
Re: [PATCH bpf-next/net v5 3/3] selftests/bpf: Add mptcp subflow subtest
Hello, On 10/09/2024 16:13, Matthieu Baerts (NGI0) wrote: > From: Geliang Tang > > This patch adds a subtest named test_subflow in test_mptcp to load and > verify the newly added MPTCP subflow BPF program. To goal is to make > sure it is possible to set different socket options per subflows, while > the userspace socket interface only lets the application to set the same > socket options for the whole MPTCP connection and its multiple subflows. > > To check that, a client and a server are started in a dedicated netns, > with veth interfaces to simulate multiple paths. They will exchange data > to allow the creation of an additional subflow. (...) > diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c > b/tools/testing/selftests/bpf/prog_tests/mptcp.c > index d2ca32fa3b21..c30f032edaca 100644 > --- a/tools/testing/selftests/bpf/prog_tests/mptcp.c > +++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c > @@ -335,10 +339,132 @@ static void test_mptcpify(void) > close(cgroup_fd); > } (...) > +static void wait_for_new_subflows(int fd) > +{ > + socklen_t len; > + u8 subflows; > + int err, i; > + > + len = sizeof(subflows); > + /* Wait max 1 sec for new subflows to be created */ > + for (i = 0; i < 10; i++) { > + err = getsockopt(fd, SOL_MPTCP, MPTCP_INFO, &subflows, &len); > + if (!err && subflows > 0) > + break; > + > + sleep(0.1); As reported by the CI, we are not in Python, usleep() should be used here. I missed that one during my review. I will send a new version with the fix tomorrow. Sorry for the noise. Cheers, Matt -- Sponsored by the NGI0 Core fund.
[PATCH bpf-next/net v5 3/3] selftests/bpf: Add mptcp subflow subtest
From: Geliang Tang This patch adds a subtest named test_subflow in test_mptcp to load and verify the newly added MPTCP subflow BPF program. To goal is to make sure it is possible to set different socket options per subflows, while the userspace socket interface only lets the application to set the same socket options for the whole MPTCP connection and its multiple subflows. To check that, a client and a server are started in a dedicated netns, with veth interfaces to simulate multiple paths. They will exchange data to allow the creation of an additional subflow. When the different subflows are being created, the new MPTCP subflow BPF program will set some socket options: marks and TCP CC. The validation is done by the same program, when the userspace checks the value of the modified socket options. On the userspace side, it will see that the default values are still being used on the MPTCP connection, while the BPF program will see different options set per subflow of the same MPTCP connection. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/76 Signed-off-by: Geliang Tang Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- Notes: - v2 -> v3: - Use './mptcp_pm_nl_ctl' instead of 'ip mptcp', not supported by the BPF CI running IPRoute 5.5.0. - Use SYS_NOFAIL() in _ss_search() instead of calling system() - v3 -> v4: - Drop './mptcp_pm_nl_ctl', but skip this new test if 'ip mptcp' is not supported. - v4 -> v5: - Note that this new test is no longer skipped on the BPF CI, because 'ip mptcp' is now supported after the switch from Ubuntu 20.04 to 22.04. - Update the commit message, reflecting the latest version. - The validations are no longer done using 'ss', but using the new BPF program added in the previous patch, to reduce the use of external dependences. --- tools/testing/selftests/bpf/prog_tests/mptcp.c | 126 + 1 file changed, 126 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c index d2ca32fa3b21..c30f032edaca 100644 --- a/tools/testing/selftests/bpf/prog_tests/mptcp.c +++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c @@ -9,8 +9,12 @@ #include "network_helpers.h" #include "mptcp_sock.skel.h" #include "mptcpify.skel.h" +#include "mptcp_subflow.skel.h" #define NS_TEST "mptcp_ns" +#define ADDR_1 "10.0.1.1" +#define ADDR_2 "10.0.1.2" +#define PORT_1 10001 #ifndef IPPROTO_MPTCP #define IPPROTO_MPTCP 262 @@ -335,10 +339,132 @@ static void test_mptcpify(void) close(cgroup_fd); } +static int endpoint_init(char *flags) +{ + SYS(fail, "ip -net %s link add veth1 type veth peer name veth2", NS_TEST); + SYS(fail, "ip -net %s addr add %s/24 dev veth1", NS_TEST, ADDR_1); + SYS(fail, "ip -net %s link set dev veth1 up", NS_TEST); + SYS(fail, "ip -net %s addr add %s/24 dev veth2", NS_TEST, ADDR_2); + SYS(fail, "ip -net %s link set dev veth2 up", NS_TEST); + if (SYS_NOFAIL("ip -net %s mptcp endpoint add %s %s", NS_TEST, ADDR_2, flags)) { + printf("'ip mptcp' not supported, skip this test.\n"); + test__skip(); + goto fail; + } + + return 0; +fail: + return -1; +} + +static void wait_for_new_subflows(int fd) +{ + socklen_t len; + u8 subflows; + int err, i; + + len = sizeof(subflows); + /* Wait max 1 sec for new subflows to be created */ + for (i = 0; i < 10; i++) { + err = getsockopt(fd, SOL_MPTCP, MPTCP_INFO, &subflows, &len); + if (!err && subflows > 0) + break; + + sleep(0.1); + } +} + +static void run_subflow(void) +{ + int server_fd, client_fd, err; + char new[TCP_CA_NAME_MAX]; + char cc[TCP_CA_NAME_MAX]; + unsigned int mark; + socklen_t len; + + server_fd = start_mptcp_server(AF_INET, ADDR_1, PORT_1, 0); + if (!ASSERT_OK_FD(server_fd, "start_mptcp_server")) + return; + + client_fd = connect_to_fd(server_fd, 0); + if (!ASSERT_OK_FD(client_fd, "connect_to_fd")) + goto close_server; + + send_byte(client_fd); + wait_for_new_subflows(client_fd); + + len = sizeof(mark); + err = getsockopt(client_fd, SOL_SOCKET, SO_MARK, &mark, &len); + if (ASSERT_OK(err, "getsockopt(client_fd, SO_MARK)")) + ASSERT_EQ(mark, 0, "mark"); + + len = sizeof(new); + err = getsockopt(client_fd, SOL_TCP, TCP_CONGESTION, new, &len); + if (ASSERT_OK(err, "getsockopt(client_fd, TCP_CONGESTION)")) { +
[PATCH bpf-next/net v5 2/3] selftests/bpf: Add getsockopt to inspect mptcp subflow
From: Geliang Tang This patch adds a "cgroup/getsockopt" way to inspect the subflows of an MPTCP socket, and verify the modifications done by the same BPF program in the previous commit: a different mark per subflow, and a different TCP CC set on the second one. This new hook will be used by the next commit to verify the socket options set on each subflow. This extra "cgroup/getsockopt" prog walks the msk->conn_list and use bpf_core_cast to cast a pointer for readonly. It allows to inspect all the fields of a structure. Note that on the kernel side, the MPTCP socket stores a list of subflows under 'msk->conn_list'. They can be iterated using the generic 'list' helpers. They have been imported here, with a small difference: list_for_each_entry() uses 'cond_break' to limit the number of iterations, and ease its use. Because only data need to be read here, it is enough to use this technique. It is planned to use bpf_iter, when BPF programs will be used to modify data from the different subflows. mptcp_subflow_tcp_sock() and mptcp_for_each_stubflow() helpers have also be imported. Suggested-by: Martin KaFai Lau Signed-off-by: Geliang Tang Reviewed-by: Matthieu Baerts (NGI0) Signed-off-by: Matthieu Baerts (NGI0) --- Notes: - v5: new patch, instead of using 'ss' in the following patch --- MAINTAINERS | 2 +- tools/testing/selftests/bpf/progs/mptcp_bpf.h | 42 ++ tools/testing/selftests/bpf/progs/mptcp_subflow.c | 69 +++ 3 files changed, 112 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 30a9b9450e11..2674b86209d1 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16058,7 +16058,7 @@ F: include/net/mptcp.h F: include/trace/events/mptcp.h F: include/uapi/linux/mptcp*.h F: net/mptcp/ -F: tools/testing/selftests/bpf/*/*mptcp*.c +F: tools/testing/selftests/bpf/*/*mptcp*.[ch] F: tools/testing/selftests/net/mptcp/ NETWORKING [TCP] diff --git a/tools/testing/selftests/bpf/progs/mptcp_bpf.h b/tools/testing/selftests/bpf/progs/mptcp_bpf.h new file mode 100644 index ..179b74c1205f --- /dev/null +++ b/tools/testing/selftests/bpf/progs/mptcp_bpf.h @@ -0,0 +1,42 @@ +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */ +#ifndef __MPTCP_BPF_H__ +#define __MPTCP_BPF_H__ + +#include "bpf_experimental.h" + +/* list helpers from include/linux/list.h */ +static inline int list_is_head(const struct list_head *list, + const struct list_head *head) +{ + return list == head; +} + +#define list_entry(ptr, type, member) \ + container_of(ptr, type, member) + +#define list_first_entry(ptr, type, member)\ + list_entry((ptr)->next, type, member) + +#define list_next_entry(pos, member) \ + list_entry((pos)->member.next, typeof(*(pos)), member) + +#define list_entry_is_head(pos, head, member) \ + list_is_head(&pos->member, (head)) + +/* small difference: 'cond_break' has been added in the conditions */ +#define list_for_each_entry(pos, head, member) \ + for (pos = list_first_entry(head, typeof(*pos), member);\ +cond_break, !list_entry_is_head(pos, head, member);\ +pos = list_next_entry(pos, member)) + +/* mptcp helpers from protocol.h */ +#define mptcp_for_each_subflow(__msk, __subflow) \ + list_for_each_entry(__subflow, &((__msk)->conn_list), node) + +static __always_inline struct sock * +mptcp_subflow_tcp_sock(const struct mptcp_subflow_context *subflow) +{ + return subflow->tcp_sock; +} + +#endif diff --git a/tools/testing/selftests/bpf/progs/mptcp_subflow.c b/tools/testing/selftests/bpf/progs/mptcp_subflow.c index 2e28f4a215b5..70302477e326 100644 --- a/tools/testing/selftests/bpf/progs/mptcp_subflow.c +++ b/tools/testing/selftests/bpf/progs/mptcp_subflow.c @@ -4,10 +4,12 @@ /* vmlinux.h, bpf_helpers.h and other 'define' */ #include "bpf_tracing_net.h" +#include "mptcp_bpf.h" char _license[] SEC("license") = "GPL"; char cc[TCP_CA_NAME_MAX] = "reno"; +int pid; /* Associate a subflow counter to each token */ struct { @@ -57,3 +59,70 @@ int mptcp_subflow(struct bpf_sock_ops *skops) return 1; } + +static int _check_getsockopt_subflow_mark(struct mptcp_sock *msk, struct bpf_sockopt *ctx) +{ + struct mptcp_subflow_context *subflow; + int i = 0; + + mptcp_for_each_subflow(msk, subflow) { + struct sock *ssk; + + ssk = mptcp_subflow_tcp_sock(bpf_core_cast(subflow, + struct mptcp_subflow_context)); + +
[PATCH bpf-next/net v5 1/3] selftests/bpf: Add mptcp subflow example
From: Nicolas Rybowski Move Nicolas' patch into bpf selftests directory. This example adds a different mark (SO_MARK) on each subflow, and changes the TCP CC only on the first subflow. >From the userspace, an application can do a setsockopt() on an MPTCP socket, and typically the same value will be propagated to all subflows (paths). If someone wants to have different values per subflow, the recommended way is to use BPF. So it is good to add such example here, and make sure there is no regressions. This example shows how it is possible to: Identify the parent msk of an MPTCP subflow. Put different sockopt for each subflow of a same MPTCP connection. Here especially, two different behaviours are implemented: A socket mark (SOL_SOCKET SO_MARK) is put on each subflow of a same MPTCP connection. The order of creation of the current subflow defines its mark. The TCP CC algorithm of the very first subflow of an MPTCP connection is set to "reno". This is just to show it is possible to identify an MPTCP connection, and set socket options, from different SOL levels, per subflow. "reno" has been picked because it is built-in and usually not set as default one. It is easy to verify with 'ss' that these modifications have been applied correctly. That's what the next patch is going to do. Nicolas' code comes from: commit 4d120186e4d6 ("bpf:examples: update mptcp_set_mark_kern.c") from the MPTCP repo https://github.com/multipath-tcp/mptcp_net-next (the "scripts" branch), and it has been adapted by Geliang. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/76 Co-developed-by: Geliang Tang Signed-off-by: Geliang Tang Signed-off-by: Nicolas Rybowski Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- Notes: - v1 -> v2: - The commit message has been updated: why setting multiple socket options, why reno, the verification is done in a later patch (different author). (Alexei) - v2 -> v3: - Only #include "bpf_tracing_net.h", linked to: https://lore.kernel.org/20240509175026.3423614-1-martin@linux.dev - v4 -> v5: - Set reno as TCP cc on the second subflow, not to influence the getsockopt() done from the userspace, which will return the one from the first subflow, the default TCP cc then, not the modified one. --- tools/testing/selftests/bpf/progs/mptcp_subflow.c | 59 +++ 1 file changed, 59 insertions(+) diff --git a/tools/testing/selftests/bpf/progs/mptcp_subflow.c b/tools/testing/selftests/bpf/progs/mptcp_subflow.c new file mode 100644 index ..2e28f4a215b5 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/mptcp_subflow.c @@ -0,0 +1,59 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2020, Tessares SA. */ +/* Copyright (c) 2024, Kylin Software */ + +/* vmlinux.h, bpf_helpers.h and other 'define' */ +#include "bpf_tracing_net.h" + +char _license[] SEC("license") = "GPL"; + +char cc[TCP_CA_NAME_MAX] = "reno"; + +/* Associate a subflow counter to each token */ +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(key_size, sizeof(__u32)); + __uint(value_size, sizeof(__u32)); + __uint(max_entries, 100); +} mptcp_sf SEC(".maps"); + +SEC("sockops") +int mptcp_subflow(struct bpf_sock_ops *skops) +{ + __u32 init = 1, key, mark, *cnt; + struct mptcp_sock *msk; + struct bpf_sock *sk; + int err; + + if (skops->op != BPF_SOCK_OPS_TCP_CONNECT_CB) + return 1; + + sk = skops->sk; + if (!sk) + return 1; + + msk = bpf_skc_to_mptcp_sock(sk); + if (!msk) + return 1; + + key = msk->token; + cnt = bpf_map_lookup_elem(&mptcp_sf, &key); + if (cnt) { + /* A new subflow is added to an existing MPTCP connection */ + __sync_fetch_and_add(cnt, 1); + mark = *cnt; + } else { + /* A new MPTCP connection is just initiated and this is its primary subflow */ + bpf_map_update_elem(&mptcp_sf, &key, &init, BPF_ANY); + mark = init; + } + + /* Set the mark of the subflow's socket based on appearance order */ + err = bpf_setsockopt(skops, SOL_SOCKET, SO_MARK, &mark, sizeof(mark)); + if (err < 0) + return 1; + if (mark == 2) + err = bpf_setsockopt(skops, SOL_TCP, TCP_CONGESTION, cc, TCP_CA_NAME_MAX); + + return 1; +} -- 2.45.2
[PATCH bpf-next/net v5 0/3] selftests/bpf: new MPTCP subflow subtest
In this series from Geliang, modifying MPTCP BPF selftests, we have: - A new MPTCP subflow BPF program setting socket options per subflow: it looks better to have this old test program in the BPF selftests to track regressions and to serve as example. Note: Nicolas is no longer working at Tessares, but he did this work while working for them, and his email address is no longer available. - A new hook in the same BPF program to do the verification step. - A new MPTCP BPF subtest validating the new BPF program added in the first patch, with the help of the new hook added in the second patch. Signed-off-by: Matthieu Baerts (NGI0) --- Changes in v5: - See the individual changelog for more details about them - Patch 1/3: set TCP on the 2nd subflow - Patch 2/3: new - Patch 3/3: use the BPF program from patch 2/3 to do the validation instead of using ss. - Link to v4: https://lore.kernel.org/r/20240805-upstream-bpf-next-20240506-mptcp-subflow-test-v4-0-2b4ca6994...@kernel.org Changes in v4: - Drop former patch 2/3: MPTCP's pm_nl_ctl requires a new header file: - I will check later if it is possible to avoid having duplicated header files in tools/include/uapi, but no need to block this series for that. Patch 2/3 can be added later if needed. - Patch 2/2: skip the test if 'ip mptcp' is not available. - Link to v3: https://lore.kernel.org/r/20240703-upstream-bpf-next-20240506-mptcp-subflow-test-v3-0-ebdc2d494...@kernel.org Changes in v3: - Sorry for the delay between v2 and v3, this series was conflicting with the "add netns helpers", but it looks like it is on hold: https://lore.kernel.org/cover.1715821541.git.tanggeli...@kylinos.cn - Patch 1/3 includes "bpf_tracing_net.h", introduced in between. - New patch 2/3: "selftests/bpf: Add mptcp pm_nl_ctl link". - Patch 3/3: use the tool introduced in patch 2/3 + SYS_NOFAIL() helper. - Link to v2: https://lore.kernel.org/r/20240509-upstream-bpf-next-20240506-mptcp-subflow-test-v2-0-4048c2948...@kernel.org Changes in v2: - Previous patches 1/4 and 2/4 have been dropped from this series: - 1/4: "selftests/bpf: Handle SIGINT when creating netns": - A new version, more generic and no longer specific to MPTCP BPF selftest will be sent later, as part of a new series. (Alexei) - 2/4: "selftests/bpf: Add RUN_MPTCP_TEST macro": - Removed, not to hide helper functions in macros. (Alexei) - The commit message of patch 1/2 has been clarified to avoid some possible confusions spot by Alexei. - Link to v1: https://lore.kernel.org/r/20240507-upstream-bpf-next-20240506-mptcp-subflow-test-v1-0-e2bcbdf49...@kernel.org --- Geliang Tang (2): selftests/bpf: Add getsockopt to inspect mptcp subflow selftests/bpf: Add mptcp subflow subtest Nicolas Rybowski (1): selftests/bpf: Add mptcp subflow example MAINTAINERS | 2 +- tools/testing/selftests/bpf/prog_tests/mptcp.c| 126 + tools/testing/selftests/bpf/progs/mptcp_bpf.h | 42 +++ tools/testing/selftests/bpf/progs/mptcp_subflow.c | 128 ++ 4 files changed, 297 insertions(+), 1 deletion(-) --- base-commit: 6b083650a37318112fb60c65fbb6070584f53d93 change-id: 20240506-upstream-bpf-next-20240506-mptcp-subflow-test-faef6654bfa3 Best regards, -- Matthieu Baerts (NGI0)
[PATCH net-next v2 5/5] selftests: mptcp: connect: remove duplicated spaces in TAP output
It is nice to have a visual alignment in the test output to present the different results, but it makes less sense in the TAP output that is there for computers. It sounds then better to remove the duplicated whitespaces in the TAP output, also because it can cause some issues with TAP parsers expecting only one space around the directive delimiter (#). While at it, change the variable name (result_msg) to something more explicit. Signed-off-by: Matthieu Baerts (NGI0) --- tools/testing/selftests/net/mptcp/mptcp_connect.sh | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh b/tools/testing/selftests/net/mptcp/mptcp_connect.sh index 49d90c4dbc01..57325d57e4c6 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh @@ -345,9 +345,11 @@ do_transfer() local addr_port addr_port=$(printf "%s:%d" ${connect_addr} ${port}) - local result_msg - result_msg="$(printf "%.3s %-5s -> %.3s (%-20s) %-5s" ${connector_ns} ${cl_proto} ${listener_ns} ${addr_port} ${srv_proto})" - mptcp_lib_print_title "${result_msg}" + local pretty_title + pretty_title="$(printf "%.3s %-5s -> %.3s (%-20s) %-5s" ${connector_ns} ${cl_proto} ${listener_ns} ${addr_port} ${srv_proto})" + mptcp_lib_print_title "${pretty_title}" + + local tap_title="${connector_ns:0:3} ${cl_proto} -> ${listener_ns:0:3} (${addr_port}) ${srv_proto}" if $capture; then local capuser @@ -443,7 +445,7 @@ do_transfer() echo cat "$capout" - mptcp_lib_result_fail "${TEST_GROUP}: ${result_msg}" + mptcp_lib_result_fail "${TEST_GROUP}: ${tap_title}" return 1 fi @@ -543,12 +545,12 @@ do_transfer() if [ $retc -eq 0 ] && [ $rets -eq 0 ]; then mptcp_lib_pr_ok "${extra:1}" - mptcp_lib_result_pass "${TEST_GROUP}: ${result_msg}" + mptcp_lib_result_pass "${TEST_GROUP}: ${tap_title}" else if [ -n "${extra}" ]; then mptcp_lib_print_warn "${extra:1}" fi - mptcp_lib_result_fail "${TEST_GROUP}: ${result_msg}" + mptcp_lib_result_fail "${TEST_GROUP}: ${tap_title}" fi cat "$capout" -- 2.45.2
[PATCH net-next v2 4/5] selftests: mptcp: diag: remove trailing whitespace
It doesn't need to be there, and it can cause some issues with TAP parsers expecting only one space around the directive delimiter (#). Signed-off-by: Matthieu Baerts (NGI0) --- tools/testing/selftests/net/mptcp/diag.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/mptcp/diag.sh b/tools/testing/selftests/net/mptcp/diag.sh index 776d43a6922d..2bd0c1eb70c5 100755 --- a/tools/testing/selftests/net/mptcp/diag.sh +++ b/tools/testing/selftests/net/mptcp/diag.sh @@ -284,7 +284,7 @@ echo "b" | \ ./mptcp_connect -p 1 -r 0 -t ${timeout_poll} -w 20 \ 127.0.0.1 >/dev/null & wait_connected $ns 1 -chk_msk_nr 2 "after MPC handshake " +chk_msk_nr 2 "after MPC handshake" chk_last_time_info 1 chk_msk_remote_key_nr 2 "chk remote_key" chk_msk_fallback_nr 0 "chk no fallback" -- 2.45.2
[PATCH net-next v2 3/5] selftests: mptcp: reset the last TS before the first test
Just to slightly improve the precision of the duration of the first test. In mptcp_join.sh, the last append_prev_results is now done as soon as the last test is over: this will add the last result in the list, and get a more precise time for this last test. Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- tools/testing/selftests/net/mptcp/mptcp_connect.sh | 2 ++ tools/testing/selftests/net/mptcp/mptcp_join.sh| 3 ++- tools/testing/selftests/net/mptcp/mptcp_sockopt.sh | 1 + tools/testing/selftests/net/mptcp/pm_netlink.sh| 2 ++ tools/testing/selftests/net/mptcp/simult_flows.sh | 1 + tools/testing/selftests/net/mptcp/userspace_pm.sh | 1 + 6 files changed, 9 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh b/tools/testing/selftests/net/mptcp/mptcp_connect.sh index f61e2f5870ea..49d90c4dbc01 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh @@ -847,6 +847,8 @@ stop_if_error() make_file "$cin" "client" make_file "$sin" "server" +mptcp_lib_subtests_last_ts_reset + check_mptcp_disabled stop_if_error "The kernel configuration is not valid for MPTCP" diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh index 43f8a9bd84c4..3564cd06643c 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -3959,9 +3959,11 @@ if [ ${#tests[@]} -eq 0 ]; then tests=("${all_tests_names[@]}") fi +mptcp_lib_subtests_last_ts_reset for subtests in "${tests[@]}"; do "${subtests}" done +append_prev_results if [ ${ret} -ne 0 ]; then echo @@ -3972,7 +3974,6 @@ if [ ${ret} -ne 0 ]; then echo fi -append_prev_results mptcp_lib_result_print_all_tap exit $ret diff --git a/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh b/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh index 68899a303a1a..5e8d5b83e2d0 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh @@ -349,6 +349,7 @@ init make_file "$cin" "client" 1 make_file "$sin" "server" 1 trap cleanup EXIT +mptcp_lib_subtests_last_ts_reset run_tests $ns1 $ns2 10.0.1.1 run_tests $ns1 $ns2 dead:beef:1::1 diff --git a/tools/testing/selftests/net/mptcp/pm_netlink.sh b/tools/testing/selftests/net/mptcp/pm_netlink.sh index 2757378b1b13..2e6648a2b2c0 100755 --- a/tools/testing/selftests/net/mptcp/pm_netlink.sh +++ b/tools/testing/selftests/net/mptcp/pm_netlink.sh @@ -137,6 +137,8 @@ check() fi } +mptcp_lib_subtests_last_ts_reset + check "show_endpoints" "" "defaults addr list" default_limits="$(get_limits)" diff --git a/tools/testing/selftests/net/mptcp/simult_flows.sh b/tools/testing/selftests/net/mptcp/simult_flows.sh index f74e1c3c126d..8fa77c8e9b65 100755 --- a/tools/testing/selftests/net/mptcp/simult_flows.sh +++ b/tools/testing/selftests/net/mptcp/simult_flows.sh @@ -286,6 +286,7 @@ while getopts "bcdhi" option;do done setup +mptcp_lib_subtests_last_ts_reset run_test 10 10 0 0 "balanced bwidth" run_test 10 10 1 25 "balanced bwidth with unbalanced delay" diff --git a/tools/testing/selftests/net/mptcp/userspace_pm.sh b/tools/testing/selftests/net/mptcp/userspace_pm.sh index 9cb05978269d..3651f73451cf 100755 --- a/tools/testing/selftests/net/mptcp/userspace_pm.sh +++ b/tools/testing/selftests/net/mptcp/userspace_pm.sh @@ -150,6 +150,7 @@ mptcp_lib_events "${ns2}" "${client_evts}" client_evts_pid server_evts=$(mktemp) mptcp_lib_events "${ns1}" "${server_evts}" server_evts_pid sleep 0.5 +mptcp_lib_subtests_last_ts_reset print_title "Init" print_test "Created network namespaces ns1, ns2" -- 2.45.2
[PATCH net-next v2 2/5] selftests: mptcp: connect: remote time in TAP output
It is now added by the MPTCP lib automatically, see the parent commit. The time in the TAP output might be slightly different from the one displayed before, but that's OK. Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- v2: - Fixed typo in the commit message (Jakub) --- tools/testing/selftests/net/mptcp/mptcp_connect.sh | 1 - 1 file changed, 1 deletion(-) diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh b/tools/testing/selftests/net/mptcp/mptcp_connect.sh index b77fb7065bfb..f61e2f5870ea 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh @@ -431,7 +431,6 @@ do_transfer() local duration duration=$((stop-start)) - result_msg+=" # time=${duration}ms" printf "(duration %05sms) " "${duration}" if [ ${rets} -ne 0 ] || [ ${retc} -ne 0 ]; then mptcp_lib_pr_fail "client exit code $retc, server $rets" -- 2.45.2
[PATCH net-next v2 1/5] selftests: mptcp: lib: add time per subtests in TAP output
It adds 'time=ms' in the diagnostic data of the TAP output, e.g. ok 1 - pm_netlink: defaults addr list # time=9ms This addition is useful to quickly identify which subtests are taking a longer time than the others, or more than expected. Note that there are no specific formats to follow to show this time according to the TAP 13 [1], TAP 14 [2] and KTAP [3] specifications. Let's then define this one here. Link: https://testanything.org/tap-version-13-specification.html [1] Link: https://testanything.org/tap-version-14-specification.html [2] Link: https://docs.kernel.org/dev-tools/ktap.html [3] Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- tools/testing/selftests/net/mptcp/mptcp_lib.sh | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/mptcp/mptcp_lib.sh b/tools/testing/selftests/net/mptcp/mptcp_lib.sh index 4578a331041e..975d4d4c862a 100644 --- a/tools/testing/selftests/net/mptcp/mptcp_lib.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_lib.sh @@ -29,6 +29,7 @@ declare -rx MPTCP_LIB_AF_INET6=10 MPTCP_LIB_SUBTESTS=() MPTCP_LIB_SUBTESTS_DUPLICATED=0 MPTCP_LIB_SUBTEST_FLAKY=0 +MPTCP_LIB_SUBTESTS_LAST_TS_MS= MPTCP_LIB_TEST_COUNTER=0 MPTCP_LIB_TEST_FORMAT="%02u %-50s" MPTCP_LIB_IP_MPTCP=0 @@ -205,6 +206,11 @@ mptcp_lib_kversion_ge() { mptcp_lib_fail_if_expected_feature "kernel version ${1} lower than ${v}" } +mptcp_lib_subtests_last_ts_reset() { + MPTCP_LIB_SUBTESTS_LAST_TS_MS="$(date +%s%3N)" +} +mptcp_lib_subtests_last_ts_reset + __mptcp_lib_result_check_duplicated() { local subtest @@ -219,13 +225,22 @@ __mptcp_lib_result_check_duplicated() { __mptcp_lib_result_add() { local result="${1}" + local time="time=" + local ts_prev_ms shift local id=$((${#MPTCP_LIB_SUBTESTS[@]} + 1)) __mptcp_lib_result_check_duplicated "${*}" - MPTCP_LIB_SUBTESTS+=("${result} ${id} - ${KSFT_TEST}: ${*}") + # not to add two '#' + [[ "${*}" != *"#"* ]] && time="# ${time}" + + ts_prev_ms="${MPTCP_LIB_SUBTESTS_LAST_TS_MS}" + mptcp_lib_subtests_last_ts_reset + time+="$((MPTCP_LIB_SUBTESTS_LAST_TS_MS - ts_prev_ms))ms" + + MPTCP_LIB_SUBTESTS+=("${result} ${id} - ${KSFT_TEST}: ${*} ${time}") } # $1: test name -- 2.45.2
[PATCH net-next v2 0/5] selftests: mptcp: add time per subtests in TAP output
Patches here add 'time=ms' in the diagnostic data of the TAP output, e.g. ok 1 - pm_netlink: defaults addr list # time=9ms This addition is useful to quickly identify which subtests are taking a longer time than the others, or more than expected. Note that there are no specific formats to follow to show this time according to the TAP 13, TAP 14 and KTAP specifications, but we follow the format being parsed by NIPA [1]. Patch 1 modifies mptcp_lib.sh to add this support to all MPTCP selftests. Patch 2 removes the now duplicated info in mptcp_connect.sh Patch 3 slightly improves the precision of the first subtests in all MPTCP subtests. Patches 4 and 5 remove duplicated spaces in TAP output, for the TAP parsers that cannot handle them properly. Link: https://github.com/linux-netdev/nipa/pull/36 Signed-off-by: Matthieu Baerts (NGI0) --- Changes in v2: - Typo in the commit message of patch 2 (Jakub) - Two additional patches to remove duplicated spaces in TAP output - Link to v1: https://lore.kernel.org/r/20240902-net-next-mptcp-ksft-subtest-time-v1-0-f1ed499a1...@kernel.org --- Matthieu Baerts (NGI0) (5): selftests: mptcp: lib: add time per subtests in TAP output selftests: mptcp: connect: remote time in TAP output selftests: mptcp: reset the last TS before the first test selftests: mptcp: diag: remove trailing whitespace selftests: mptcp: connect: remove duplicated spaces in TAP output tools/testing/selftests/net/mptcp/diag.sh | 2 +- tools/testing/selftests/net/mptcp/mptcp_connect.sh | 17 ++--- tools/testing/selftests/net/mptcp/mptcp_join.sh| 3 ++- tools/testing/selftests/net/mptcp/mptcp_lib.sh | 17 - tools/testing/selftests/net/mptcp/mptcp_sockopt.sh | 1 + tools/testing/selftests/net/mptcp/pm_netlink.sh| 2 ++ tools/testing/selftests/net/mptcp/simult_flows.sh | 1 + tools/testing/selftests/net/mptcp/userspace_pm.sh | 1 + 8 files changed, 34 insertions(+), 10 deletions(-) --- base-commit: 52fc70a32573707f70d6b1b5c5fe85cc91457393 change-id: 20240902-net-next-mptcp-ksft-subtest-time-a83cec43d894 Best regards, -- Matthieu Baerts (NGI0)
Re: [PATCH net-next 0/3] selftests: mptcp: add time per subtests in TAP output
Hi Jakub, Thank you for your reply! On 04/09/2024 21:40, Jakub Kicinski wrote: > On Wed, 4 Sep 2024 18:15:09 +0200 Matthieu Baerts wrote: >>> Best I could come up with is: >>> >>> diff --git a/contest/remote/vmksft-p.py b/contest/remote/vmksft-p.py >>> index fe9e87abdb5c..a37245bd5b30 100755 >>> --- a/contest/remote/vmksft-p.py >>> +++ b/contest/remote/vmksft-p.py >>> @@ -73,7 +73,7 @@ group3 testV skip >>> tests = [] >>> nested_tests = False >>> >>> -result_re = re.compile(r"(not )?ok (\d+)( -)? ([^#]*[^ ])( # )?([^ >>> ].*)?$") >>> +result_re = re.compile(r"(not )?ok (\d+)( -)? ([^#]*[^ ])( +# )?([^ >>> ].*)?$") >> >> Looks good to me. While at it, we can add a '+' for the spaces after the >> '#': >> >> ( +# +) > > 👍️ > >> I see you didn't commit the previous modification. I can open a PR if it >> helps. > > I was just playing with the regexps in the interpreter. If you could > send a PR that'd perfect. Sure, done: https://github.com/linux-netdev/nipa/pull/38 >>> time_re = re.compile(r"time=(\d+)ms") >>> >>> for line in full_run.split('\n'): >>> >>> Thoughts? >> >> In my v2, I will also strip these trailing whitespaces in the selftests, >> they don't need to be there. > > Up to you - it doesn't violate the KTAP format and the visual alignment > is nice. But it may trip up more regexps.. Yes, better be safe... I will keep the visual alignment for the non-TAP output. Cheers, Matt -- Sponsored by the NGI0 Core fund.
Re: [PATCH net-next 0/3] selftests: mptcp: add time per subtests in TAP output
Hi Jakub, On 04/09/2024 01:22, Jakub Kicinski wrote: > On Mon, 02 Sep 2024 13:13:03 +0200 Matthieu Baerts (NGI0) wrote: >> Patches here add 'time=ms' in the diagnostic data of the TAP output, >> e.g. >> >> ok 1 - pm_netlink: defaults addr list # time=9ms > > Looking closer, this: > > # ok 3 - mptcp[...] MPTCP # time=7184ms > # ok 4 - mptcp[...] TCP # time=6458ms > > Makes NIPA unhappy. The match results for regexps look like this: > > (None, '4', ' -', 'mptcp[...] MPTCP', ' # ', 'time=6173ms') > (None, '4', ' -', 'mptcp[...] TC', None, 'P # time=6173ms') > > IOW the first one is neat, second one gepooped. The regex really wants > there to be no more than a single space before the #. Good catch! > KTAP definition > doesn't say that description must not have trailing white space. Indeed. Same for TAP 13. (TAP 14 is clearer about that and allows multiple spaces) > Best I could come up with is: > > diff --git a/contest/remote/vmksft-p.py b/contest/remote/vmksft-p.py > index fe9e87abdb5c..a37245bd5b30 100755 > --- a/contest/remote/vmksft-p.py > +++ b/contest/remote/vmksft-p.py > @@ -73,7 +73,7 @@ group3 testV skip > tests = [] > nested_tests = False > > -result_re = re.compile(r"(not )?ok (\d+)( -)? ([^#]*[^ ])( # )?([^ > ].*)?$") > +result_re = re.compile(r"(not )?ok (\d+)( -)? ([^#]*[^ ])( +# )?([^ > ].*)?$") Looks good to me. While at it, we can add a '+' for the spaces after the '#': ( +# +) I see you didn't commit the previous modification. I can open a PR if it helps. > time_re = re.compile(r"time=(\d+)ms") > > for line in full_run.split('\n'): > > Thoughts? In my v2, I will also strip these trailing whitespaces in the selftests, they don't need to be there. Cheers, Matt -- Sponsored by the NGI0 Core fund.
Re: [PATCH v2 5/6] kunit: mptcp: adhear to KUNIT formatting standard
Hi David, Nico, On 17/04/2021 06:24, David Gow wrote: Hi Matt, Like patch 1/6, I can apply it in MPTCP tree and send it later to net-next with other patches. Except if you guys prefer to apply it in KUnit tree and send it to linux-next? Given 1/6 is going to net-next, it makes sense to send this out that way too, then, IMHO. Great! Mat sent this patch to net-next and David already applied it (thanks!): https://git.kernel.org/netdev/net-next/c/3fcc8a25e391 The only slight concern I have is that the m68k test config patch in the series will get split from the others, but that should resolve itself when they pick up the last patch. I see. I guess for this MPTCP patch, we are fine because MPTCP_KUNIT_TESTS was not used in m68k config. > At the very least, this shouldn't cause any conflicts with anything > we're doing in the KUnit tree. Thanks for having checked! Cheers, Matt -- Tessares | Belgium | Hybrid Access Solutions www.tessares.net
Re: [PATCH v2 5/6] kunit: mptcp: adhear to KUNIT formatting standard
Hi David, Thank you for your very clear reply! On 15/04/2021 08:01, David Gow wrote: On Wed, Apr 14, 2021 at 5:25 PM Matthieu Baerts wrote: Up to the KUnit maintainers to decide ;-) To summarise my view: personally, I'd prefer things the way this patch works: have everything end in _KUNIT_TEST, even if that enables a couple of suites. The extra 'S' on the end isn't a huge problem if you have a good reason to particularly want to keep it, though: as long as you don't have something like _K_UNIT_VERIFICATION or something equally silly that'd break grepping for '_KUNIT_TEST', it's fine be me. Indeed it makes sense: we don't need to split nor to have a meta-Kconfig entry. We can then remove the extra 'S' and update our tests suite: Reviewed-by: Matthieu Baerts I see that the whole series has been marked as "Not Applicable" on Netdev's patchwork: https://patchwork.kernel.org/project/netdevbpf/patch/0fa191715b236766ad13c5f786d8daf92a9a0cf2.1618388989.git.npa...@redhat.com/ Like patch 1/6, I can apply it in MPTCP tree and send it later to net-next with other patches. Except if you guys prefer to apply it in KUnit tree and send it to linux-next? Cheers, Matt -- Tessares | Belgium | Hybrid Access Solutions www.tessares.net
Re: [PATCH v2 5/6] kunit: mptcp: adhear to KUNIT formatting standard
Hi Nico, On 14/04/2021 10:58, Nico Pache wrote: Drop 'S' from end of CONFIG_MPTCP_KUNIT_TESTS inorder to adhear to the KUNIT *_KUNIT_TEST config name format. For MPTCP, we have multiple KUnit tests: crypto and token. That's why we wrote TESTS with a S. I'm fine without S if we need to stick with KUnit' standard. But maybe the standard wants us to split the two tests and create MPTCP_TOKEN_KUNIT_TEST and MPTCP_TOKEN_KUNIT_TEST config? In this case, we could eventually keep MPTCP_KUNIT_TESTS which will in charge of selecting the two new ones. Up to the KUnit maintainers to decide ;-) Cheers, Matt -- Tessares | Belgium | Hybrid Access Solutions www.tessares.net
[tip: locking/core] static_call: Fix unused variable warn w/o MODULE
The following commit has been merged into the locking/core branch of tip: Commit-ID: 7d95f22798ecea513f37b792b39fec4bcf20fec3 Gitweb: https://git.kernel.org/tip/7d95f22798ecea513f37b792b39fec4bcf20fec3 Author:Matthieu Baerts AuthorDate:Fri, 26 Mar 2021 11:50:23 +01:00 Committer: Peter Zijlstra CommitterDate: Fri, 09 Apr 2021 13:22:12 +02:00 static_call: Fix unused variable warn w/o MODULE Here is the warning converted as error and reported by GCC: kernel/static_call.c: In function ‘__static_call_update’: kernel/static_call.c:153:18: error: unused variable ‘mod’ [-Werror=unused-variable] 153 | struct module *mod = site_mod->mod; | ^~~ cc1: all warnings being treated as errors make[1]: *** [scripts/Makefile.build:271: kernel/static_call.o] Error 1 This is simply because since recently, we no longer use 'mod' variable elsewhere if MODULE is unset. When using 'make tinyconfig' to generate the default kconfig, MODULE is unset. There are different ways to fix this warning. Here I tried to minimised the number of modified lines and not add more #ifdef. We could also move the declaration of the 'mod' variable inside the if-statement or directly use site_mod->mod. Fixes: 698bacefe993 ("static_call: Align static_call_is_init() patching condition") Signed-off-by: Matthieu Baerts Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20210326105023.2058860-1-matthieu.bae...@tessares.net --- kernel/static_call.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/static_call.c b/kernel/static_call.c index 2c5950b..723fcc9 100644 --- a/kernel/static_call.c +++ b/kernel/static_call.c @@ -165,13 +165,13 @@ void __static_call_update(struct static_call_key *key, void *tramp, void *func) stop = __stop_static_call_sites; -#ifdef CONFIG_MODULES if (mod) { +#ifdef CONFIG_MODULES stop = mod->static_call_sites + mod->num_static_call_sites; init = mod->state == MODULE_STATE_COMING; - } #endif + } for (site = site_mod->sites; site < stop && static_call_key(site) == key; site++) {
[PATCH mptcp-next] static_call: fix unused variable warn w/o MODULE
Here is the warning converted as error and reported by GCC: kernel/static_call.c: In function ‘__static_call_update’: kernel/static_call.c:153:18: error: unused variable ‘mod’ [-Werror=unused-variable] 153 | struct module *mod = site_mod->mod; | ^~~ cc1: all warnings being treated as errors make[1]: *** [scripts/Makefile.build:271: kernel/static_call.o] Error 1 This is simply because since recently, we no longer use 'mod' variable elsewhere if MODULE is unset. When using 'make tinyconfig' to generate the default kconfig, MODULE is unset. There are different ways to fix this warning. Here I tried to minimised the number of modified lines and not add more #ifdef. We could also move the declaration of the 'mod' variable inside the if-statement or directly use site_mod->mod. Fixes: 698bacefe993 ("static_call: Align static_call_is_init() patching condition") Signed-off-by: Matthieu Baerts --- Notes: Feel free to modify this patch directly if you prefer. I can of course send a new version if needed. kernel/static_call.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/static_call.c b/kernel/static_call.c index 2c5950b0b90e..723fcc9d20db 100644 --- a/kernel/static_call.c +++ b/kernel/static_call.c @@ -165,13 +165,13 @@ void __static_call_update(struct static_call_key *key, void *tramp, void *func) stop = __stop_static_call_sites; -#ifdef CONFIG_MODULES if (mod) { +#ifdef CONFIG_MODULES stop = mod->static_call_sites + mod->num_static_call_sites; init = mod->state == MODULE_STATE_COMING; - } #endif + } for (site = site_mod->sites; site < stop && static_call_key(site) == key; site++) { -- 2.30.2
Re: possible deadlock in ipv6_sock_mc_close
Hi Chuck, (+ cc: MPTCP list) On 01/03/2021 15:52, Chuck Lever wrote: On Mar 1, 2021, at 8:49 AM, syzbot wrote: (...) syzbot found the following issue on: (...) Hi, thanks for the report. Initial analysis: c8e88e3aa738 ("NFSD: Replace READ* macros in nfsd4_decode_layoutget()" changes code several layers above the network layer. In addition, neither of the stack traces contain NFSD functions. And, repro.c does not appear to exercise any filesystem code. Therefore the bisect result looks implausible to me. I don't see any obvious connection between the lockdep splat and c8e88e3aa738. (If someone else does, please let me know where to look). From what I can see from the bisect logs, it looks like this issue is difficult to reproduce. But it really looks like the issue is on MPTCP side and not directly related to your patch. Thanks to the syzbot team for reporting this! It doesn't look very simple to fix but we are tracking this on our side: https://github.com/multipath-tcp/mptcp_net-next/issues/170 Cheers, Matt -- Tessares | Belgium | Hybrid Access Solutions www.tessares.net
Re: [MPTCP] WARNING: net/mptcp/protocol.c:719 mptcp_reset_timer+0x40/0x50
Hi Naresh, On 19/11/2020 08:04, Naresh Kamboju wrote: While running kselftest net/mptcp: mptcp_join.sh on x86_64 device running linux next 20201118 tag the following warning was noticed. Thank you for testing MPTCP and having reported this bug! It looks like it is similar to what syzbot reported yesterday: https://lists.01.org/hyperkitty/list/mp...@lists.01.org/thread/N4IWIL6CYR6VSOOOXF25N3EWDW4GTQ6A/ If it is the same, a patch has already been sent to netdev ML: https://patchwork.kernel.org/project/netdevbpf/patch/1a72039f112cae048c44d398ffa14e0a1432db3d.1605737083.git.pab...@redhat.com/ If it is easy for you to reproduce the issue, please feel free to validate Paolo's patch :) Cheers, Matt -- Tessares | Belgium | Hybrid Access Solutions www.tessares.net
Re: [PATCH net-next v4] net: linux/skbuff.h: combine SKB_EXTENSIONS + KCOV handling
Hi Randy, On 16/11/2020 04:17, Randy Dunlap wrote: The previous Kconfig patch led to some other build errors as reported by the 0day bot and my own overnight build testing. These are all in when KCOV is enabled but SKB_EXTENSIONS is not enabled, so fix those by combining those conditions in the header file. Also, add stubs for skb_ext_add() and skb_ext_find() to reduce the amount of ifdef-ery. (Jakub) It makes sense, good idea! Thank you for the new version! --- linux-next-20201113.orig/include/linux/skbuff.h +++ linux-next-20201113/include/linux/skbuff.h @@ -4137,7 +4137,6 @@ static inline void skb_set_nfct(struct s #endif } -#ifdef CONFIG_SKB_EXTENSIONS enum skb_ext_id { #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER) SKB_EXT_BRIDGE_NF, @@ -4151,12 +4150,11 @@ enum skb_ext_id { #if IS_ENABLED(CONFIG_MPTCP) SKB_EXT_MPTCP, #endif -#if IS_ENABLED(CONFIG_KCOV) SKB_EXT_KCOV_HANDLE, -#endif I don't think we should remove this #ifdef: the number of extensions are currently limited to 8, we might not want to always have KCOV there even if we don't want it. I think adding items in this enum only when needed was the intension of Florian (+cc) when creating these SKB extensions. Also, this will increase a tiny bit some structures, see "struct skb_ext()". But apart from that, I think we are fine, even if we add new extensions in the future after this kcov one. So if we think it is better to remove these #ifdef here, we should be OK. But if we prefer not to do that, we should then not add stubs for skb_ext_{add,find}() and keep the ones for skb_[gs]et_kcov_handle(). Cheers, Matt -- Tessares | Belgium | Hybrid Access Solutions www.tessares.net
Re: [PATCH net-next] net: linux/skbuff.h: combine NET + KCOV handling
Hi Randy, On 14/11/2020 02:11, Randy Dunlap wrote: The previous Kconfig patch led to some other build errors as reported by the 0day bot and my own overnight build testing. Thank you for looking at that! I had the same issue and I was going to propose a similar fix with one small difference, please see below. --- linux-next-20201113.orig/include/linux/skbuff.h +++ linux-next-20201113/include/linux/skbuff.h @@ -4608,7 +4608,7 @@ static inline void skb_reset_redirect(st #endif } -#ifdef CONFIG_KCOV +#if IS_ENABLED(CONFIG_KCOV) && IS_ENABLED(CONFIG_NET) static inline void skb_set_kcov_handle(struct sk_buff *skb, Should we have here CONFIG_SKB_EXTENSIONS instead of CONFIG_NET? It is valid to use NET thanks to your commit 85ce50d337d1 ("net: kcov: don't select SKB_EXTENSIONS when there is no NET") that links SKB_EXTENSIONS with NET for KCOV but it looks strange to me to use a "non direct" dependence :) I mean: here below, skb_ext_add() and skb_ext_find() are called but they are defined only if SKB_EXTENSIONS is enabled, not only NET. But as I said, this patch fixes the issue. It's fine for me if we prefer to use CONFIG_NET. @@ -4636,7 +4636,7 @@ static inline u64 skb_get_kcov_handle(st static inline void skb_set_kcov_handle(struct sk_buff *skb, const u64 kcov_handle) { } static inline u64 skb_get_kcov_handle(struct sk_buff *skb) { return 0; } -#endif /* CONFIG_KCOV */ +#endif /* CONFIG_KCOV && CONFIG_NET */ (Small detail if you post a v2: there is an extra space between "&&" and "CONFIG_NET") Cheers, Matt -- Tessares | Belgium | Hybrid Access Solutions www.tessares.net
Re: [MPTCP][PATCH net 1/2] mptcp: fix static checker warnings in mptcp_pm_add_timer
Hi Jakub, 09 Nov 2020 21:57:05 Jakub Kicinski : > On Mon, 9 Nov 2020 17:28:54 +0100 Matthieu Baerts wrote: >> A small detail (I think): the Signed-off-by of the sender (Geliang) >> should be the last one in the list if I am not mistaken. >> But I guess this is not blocking. >> >> Reviewed-by: Matthieu Baerts > > I take it you'd like me to apply patch 1 directly to net? Sorry, I didn't know it was OK to apply only one patch of the series. Then yes, if you don't mind, please apply this patch :) Cheers, Matt Tessares | Belgium | Hybrid Access Solutions www.tessares.net
Re: [MPTCP][PATCH net 2/2] mptcp: cleanup for mptcp_pm_alloc_anno_list
Hi Geliang, On 09/11/2020 14:59, Geliang Tang wrote: This patch added NULL pointer check for mptcp_pm_alloc_anno_list, and avoided similar static checker warnings in mptcp_pm_add_timer. Signed-off-by: Geliang Tang Reviewed-by: Dan Carpenter I think Dan reviewed the v1 of your patch -- without some modifications below -- but not the v2 nor this one. --- net/mptcp/pm_netlink.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c index 03f2c28f11f5..dfc1bed4a55f 100644 --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -266,7 +266,9 @@ static bool mptcp_pm_alloc_anno_list(struct mptcp_sock *msk, { struct mptcp_pm_add_entry *add_entry = NULL; struct sock *sk = (struct sock *)msk; - struct net *net = sock_net(sk); + + if (!msk) + return false; As Dan mentioned on MPTCP ML, this check is not required: "msk" cannot be NULL here. We can maybe keep the cleanup (only move sock_net() below) but I don't think we need or want this in -net. I am not even sure we want it in net-next but why not :) This could also be part of other refactors. Cheers, Matt -- Tessares | Belgium | Hybrid Access Solutions www.tessares.net
Re: [MPTCP][PATCH net 1/2] mptcp: fix static checker warnings in mptcp_pm_add_timer
Hi Geliang, Dan, On 09/11/2020 14:59, Geliang Tang wrote: Fix the following Smatch complaint: Thanks for the report and the patch! net/mptcp/pm_netlink.c:213 mptcp_pm_add_timer() warn: variable dereferenced before check 'msk' (see line 208) net/mptcp/pm_netlink.c 207 struct mptcp_sock *msk = entry->sock; 208 struct sock *sk = (struct sock *)msk; 209 struct net *net = sock_net(sk); ^^ "msk" dereferenced here. 210 211 pr_debug("msk=%p", msk); 212 213 if (!msk) Too late. 214 return; 215 Fixes: 93f323b9 ("mptcp: add a new sysctl add_addr_timeout") Reported-by: Dan Carpenter Signed-off-by: Geliang Tang Reviewed-by: Dan Carpenter A small detail (I think): the Signed-off-by of the sender (Geliang) should be the last one in the list if I am not mistaken. But I guess this is not blocking. Reviewed-by: Matthieu Baerts Cheers, Matt -- Tessares | Belgium | Hybrid Access Solutions www.tessares.net
Re: [MPTCP] Re: [selftests] f2ff7f11f9: WARNING:suspicious_RCU_usage
Hi Philip, On 28/10/2020 00:00, Philip Li wrote: On Tue, Oct 27, 2020 at 04:07:28PM +0100, Matthieu Baerts wrote: FYI, this was already reported earlier: https://github.com/multipath-tcp/mptcp_net-next/issues/102 Thanks for the info, we didn't notice this. We will take a look of reported issues in future to avoid duplicated report. Thanks but I also hope we would not have a lot of other similar issues :) In this case, duplicated report is not a big deal! Cheers, Matt -- Tessares | Belgium | Hybrid Access Solutions www.tessares.net
Re: [MPTCP] [selftests] f2ff7f11f9: WARNING:suspicious_RCU_usage
Hi all, On 27/10/2020 14:16, kernel test robot wrote: Greeting, FYI, we noticed the following commit (built with gcc-9): commit: f2ff7f11f9a74842245db52d685bf9bc7ac2c4b1 ("selftests: mptcp: add ADD_ADDR IPv6 test cases") https://github.com/multipath-tcp/mptcp_net-next.git export Thanks for the maintainer of the kernel test robot at Intel for this bug report! FYI, this was already reported earlier: https://github.com/multipath-tcp/mptcp_net-next/issues/102 And Geliang is working on a fix (a v2 is available). Cheers, Matt -- Tessares | Belgium | Hybrid Access Solutions www.tessares.net
Re: [PATCH 5.9 580/757] selftests: mptcp: depends on built-in IPv6
Hi Greg, On 27/10/2020 14:53, Greg Kroah-Hartman wrote: From: Matthieu Baerts [ Upstream commit 287d35405989cfe0090e3059f7788dc531879a8d ] Recently, CONFIG_MPTCP_IPV6 no longer selects CONFIG_IPV6. As a consequence, if CONFIG_MPTCP_IPV6=y is added to the kconfig, it will no longer ensure CONFIG_IPV6=y. If it is not enabled, CONFIG_MPTCP_IPV6 will stay disabled and selftests will fail. I think Sasha wanted to drop this patch [1]. But as I said earlier [2], this patch is not wrong, it is just not needed in v5.9 because the "Fixes" commit 010b430d5df5 ("mptcp: MPTCP_IPV6 should depend on IPV6 instead of selecting it") is not in v5.9. I guess it is certainly easier to keep it because it is not wrong and doesn't hurt. [1] https://lore.kernel.org/stable/20201026142329.GJ4060117@sasha-vm/ [2] https://lore.kernel.org/stable/1de2bf78-4b47-21b0-9d56-3c8063cdf...@tessares.net/ Cheers, Matt -- Tessares | Belgium | Hybrid Access Solutions www.tessares.net
[PATCH net] selftests: mptcp: depends on built-in IPv6
Recently, CONFIG_MPTCP_IPV6 no longer selects CONFIG_IPV6. As a consequence, if CONFIG_MPTCP_IPV6=y is added to the kconfig, it will no longer ensure CONFIG_IPV6=y. If it is not enabled, CONFIG_MPTCP_IPV6 will stay disabled and selftests will fail. We also need CONFIG_IPV6 to be built-in. For more details, please see commit 0ed37ac586c0 ("mptcp: depends on IPV6 but not as a module"). Note that 'make kselftest-merge' will take all 'config' files found in 'tools/testsing/selftests'. Because some of them already set CONFIG_IPV6=y, MPTCP selftests were still passing. But they will fail if MPTCP selftests are launched manually after having executed this command to prepare the kernel config: ./scripts/kconfig/merge_config.sh -m .config \ ./tools/testing/selftests/net/mptcp/config Fixes: 010b430d5df5 ("mptcp: MPTCP_IPV6 should depend on IPV6 instead of selecting it") Signed-off-by: Matthieu Baerts --- tools/testing/selftests/net/mptcp/config | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/net/mptcp/config b/tools/testing/selftests/net/mptcp/config index 8df5cb8f71ff..741a1c4f4ae8 100644 --- a/tools/testing/selftests/net/mptcp/config +++ b/tools/testing/selftests/net/mptcp/config @@ -1,4 +1,5 @@ CONFIG_MPTCP=y +CONFIG_IPV6=y CONFIG_MPTCP_IPV6=y CONFIG_INET_DIAG=m CONFIG_INET_MPTCP_DIAG=m -- 2.27.0
Re: [PATCH] mptcp: MPTCP_IPV6 should depend on IPV6 instead of selecting it
Hi Geert, On 21/10/2020 11:52, Geert Uytterhoeven wrote: Hi Matthieu, On Wed, Oct 21, 2020 at 11:47 AM Matthieu Baerts wrote: On 21/10/2020 11:43, Geert Uytterhoeven wrote: On Wed, Oct 21, 2020 at 5:56 AM Jakub Kicinski wrote: On Tue, 20 Oct 2020 11:26:34 +0200 Matthieu Baerts wrote: On 20/10/2020 09:38, Geert Uytterhoeven wrote: MPTCP_IPV6 selects IPV6, thus enabling an optional feature the user may not want to enable. Fix this by making MPTCP_IPV6 depend on IPV6, like is done for all other IPv6 features. Here again, the intension was to select IPv6 from MPTCP but I understand the issue: if we enable MPTCP, we will select IPV6 as well by default. Maybe not what we want on some embedded devices with very limited memory where IPV6 is already off. We should instead enable MPTCP_IPV6 only if IPV6=y. LGTM then! Reviewed-by: Matthieu Baerts Applied, thanks! My apologies, this fails for the CONFIG_IPV6=m and CONFIG_MPTCP=y case: Good point, MPTCP cannot be compiled as a module (like TCP). It should then depends on IPV6=y. I thought it would be the case. Do you want me to send a patch or do you already have one? I don't have a patch yet, so feel free to send one. Just did: https://lore.kernel.org/netdev/20201021105154.628257-1-matthieu.bae...@tessares.net/ Groetjes, Matt -- Tessares | Belgium | Hybrid Access Solutions www.tessares.net
[PATCH net] mptcp: depends on IPV6 but not as a module
Like TCP, MPTCP cannot be compiled as a module. Obviously, MPTCP IPv6' support also depends on CONFIG_IPV6. But not all functions from IPv6 code are exported. To simplify the code and reduce modifications outside MPTCP, it was decided from the beginning to support MPTCP with IPv6 only if CONFIG_IPV6 was built inlined. That's also why CONFIG_MPTCP_IPV6 was created. More modifications are needed to support CONFIG_IPV6=m. Even if it was not explicit, until recently, we were forcing CONFIG_IPV6 to be built-in because we had "select IPV6" in Kconfig. Now that we have "depends on IPV6", we have to explicitly set "IPV6=y" to force CONFIG_IPV6 not to be built as a module. In other words, we can now only have CONFIG_MPTCP_IPV6=y if CONFIG_IPV6=y. Note that the new dependency might hide the fact IPv6 is not supported in MPTCP even if we have CONFIG_IPV6=m. But selecting IPV6 like we did before was forcing it to be built-in while it was maybe not what the user wants. Reported-by: Geert Uytterhoeven Fixes: 010b430d5df5 ("mptcp: MPTCP_IPV6 should depend on IPV6 instead of selecting it") Signed-off-by: Matthieu Baerts --- Notes: For more details about the issue we have when we try to compile CONFIG_IPV6=m and CONFIG_MPTCP_IPV6=y, please refer to: https://lore.kernel.org/netdev/CAMuHMdW=1LfE8UoGRVBvrvrintQMNKUdTe5PPQz=PN3=gJmw=q...@mail.gmail.com/ net/mptcp/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/mptcp/Kconfig b/net/mptcp/Kconfig index 8936604b3bf9..a014149aa323 100644 --- a/net/mptcp/Kconfig +++ b/net/mptcp/Kconfig @@ -19,7 +19,7 @@ config INET_MPTCP_DIAG config MPTCP_IPV6 bool "MPTCP: IPv6 support for Multipath TCP" - depends on IPV6 + depends on IPV6=y default y config MPTCP_KUNIT_TESTS -- 2.27.0
Re: [PATCH] mptcp: MPTCP_IPV6 should depend on IPV6 instead of selecting it
Hi Geert, On 21/10/2020 11:43, Geert Uytterhoeven wrote: Hi Jakub, On Wed, Oct 21, 2020 at 5:56 AM Jakub Kicinski wrote: On Tue, 20 Oct 2020 11:26:34 +0200 Matthieu Baerts wrote: On 20/10/2020 09:38, Geert Uytterhoeven wrote: MPTCP_IPV6 selects IPV6, thus enabling an optional feature the user may not want to enable. Fix this by making MPTCP_IPV6 depend on IPV6, like is done for all other IPv6 features. Here again, the intension was to select IPv6 from MPTCP but I understand the issue: if we enable MPTCP, we will select IPV6 as well by default. Maybe not what we want on some embedded devices with very limited memory where IPV6 is already off. We should instead enable MPTCP_IPV6 only if IPV6=y. LGTM then! Reviewed-by: Matthieu Baerts Applied, thanks! My apologies, this fails for the CONFIG_IPV6=m and CONFIG_MPTCP=y case: Good point, MPTCP cannot be compiled as a module (like TCP). It should then depends on IPV6=y. I thought it would be the case. Do you want me to send a patch or do you already have one? Cheers, Matt -- Tessares | Belgium | Hybrid Access Solutions www.tessares.net
Re: [MPTCP][PATCH net-next 0/2] init ahmac and port of mptcp_options_received
Hi Jakub, On 19/10/2020 22:40, Jakub Kicinski wrote: On Mon, 19 Oct 2020 18:27:55 +0200 Matthieu Baerts wrote: Hi Geliang, On 19/10/2020 12:23, Geliang Tang wrote: This patchset deals with initializations of mptcp_options_received's two fields, ahmac and port. Geliang Tang (2): mptcp: initialize mptcp_options_received's ahmac mptcp: move mptcp_options_received's port initialization Thank you for these two patches. They look good to me except one detail: these two patches are for -net and not net-next. I don't know if it is alright for Jakub to apply them to -net or if it is clearer to re-send them with an updated subject. If it is OK to apply them to -net without a re-submit, here is my: Reviewed-by: Matthieu Baerts Thanks, I can apply to net. Great, thank you! BTW, nice work with the maintenance of Net! More reasons for davem to take time recovering :) Also, if you don't mind and while I am here, I never know: is it OK for you the maintainers to send one Acked/Reviewed-by for a whole series -- but then this is not reflected on patchwork -- or should we send one tag for each patch? It's fine, we propagate those semi-manually, but it's not a problem. Hopefully patchwork will address this at some point :( Thank you for your reply, good to know! Then next time, I will send these tags per patch to save you some semi-manual operations :) Some preparation works have been done on patchwork side but the feature is not available yet: https://github.com/getpatchwork/patchwork/issues/113 Hopefully soon! Cheers, Matt -- Tessares | Belgium | Hybrid Access Solutions www.tessares.net
Re: [PATCH] mptcp: MPTCP_KUNIT_TESTS should depend on MPTCP instead of selecting it
Hi Geert, On 20/10/2020 09:40, Geert Uytterhoeven wrote: On Mon, Oct 19, 2020 at 10:38 PM Geert Uytterhoeven wrote: On Mon, Oct 19, 2020 at 5:47 PM Matthieu Baerts wrote: On 19/10/2020 13:32, Geert Uytterhoeven wrote: MPTCP_KUNIT_TESTS selects MPTCP, thus enabling an optional feature the user may not want to enable. Fix this by making the test depend on MPTCP instead. I think the initial intension was to select MPTCP to have an easy way to enable all KUnit tests. We imitated what was and is still done in fs/ext4/Kconfig. But it probably makes sense to depend on MPTCP instead of selecting it. So that's fine for me. But then please also send a patch to ext4 maintainer to do the same there. Thanks, good point. I didn't notice, as I did have ext4 enabled anyway. Will send a patch for ext4. Looks like ext4 and MPTCP where the only test modules selecting their dependencies. FTR, "[PATCH] ext: EXT4_KUNIT_TESTS should depend on EXT4_FS instead of selecting it" https://lore.kernel.org/lkml/20201020073740.29081-1-ge...@linux-m68k.org/ Thank you for having sent this other patch and shared the link here! Groetjes, Matt -- Tessares | Belgium | Hybrid Access Solutions www.tessares.net
Re: [PATCH] mptcp: MPTCP_IPV6 should depend on IPV6 instead of selecting it
Hi Geert, Thank you for the patch! On 20/10/2020 09:38, Geert Uytterhoeven wrote: MPTCP_IPV6 selects IPV6, thus enabling an optional feature the user may not want to enable. Fix this by making MPTCP_IPV6 depend on IPV6, like is done for all other IPv6 features. Here again, the intension was to select IPv6 from MPTCP but I understand the issue: if we enable MPTCP, we will select IPV6 as well by default. Maybe not what we want on some embedded devices with very limited memory where IPV6 is already off. We should instead enable MPTCP_IPV6 only if IPV6=y. LGTM then! Reviewed-by: Matthieu Baerts Cheers, Matt -- Tessares | Belgium | Hybrid Access Solutions www.tessares.net
Re: [MPTCP][PATCH net-next 0/2] init ahmac and port of mptcp_options_received
Hi Geliang, On 19/10/2020 12:23, Geliang Tang wrote: This patchset deals with initializations of mptcp_options_received's two fields, ahmac and port. Geliang Tang (2): mptcp: initialize mptcp_options_received's ahmac mptcp: move mptcp_options_received's port initialization Thank you for these two patches. They look good to me except one detail: these two patches are for -net and not net-next. I don't know if it is alright for Jakub to apply them to -net or if it is clearer to re-send them with an updated subject. If it is OK to apply them to -net without a re-submit, here is my: Reviewed-by: Matthieu Baerts Also, if you don't mind and while I am here, I never know: is it OK for you the maintainers to send one Acked/Reviewed-by for a whole series -- but then this is not reflected on patchwork -- or should we send one tag for each patch? Cheers, Matt -- Tessares | Belgium | Hybrid Access Solutions www.tessares.net
Re: [PATCH] mptcp: MPTCP_KUNIT_TESTS should depend on MPTCP instead of selecting it
Hi Geert, Thank you for the patch! On 19/10/2020 13:32, Geert Uytterhoeven wrote: MPTCP_KUNIT_TESTS selects MPTCP, thus enabling an optional feature the user may not want to enable. Fix this by making the test depend on MPTCP instead. I think the initial intension was to select MPTCP to have an easy way to enable all KUnit tests. We imitated what was and is still done in fs/ext4/Kconfig. But it probably makes sense to depend on MPTCP instead of selecting it. So that's fine for me. But then please also send a patch to ext4 maintainer to do the same there. Reviewed-by: Matthieu Baerts Cheers, Matt -- Tessares | Belgium | Hybrid Access Solutions www.tessares.net