On 3/20/25 10:48 AM, Matthieu Baerts (NGI0) wrote:
From: Geliang Tang <tanggeli...@kylinos.cn>
It's necessary to traverse all subflows on the conn_list of an MPTCP
socket and then call kfunc to modify the fields of each subflow. In
kernel space, mptcp_for_each_subflow() helper is used for this:
mptcp_for_each_subflow(msk, subflow)
kfunc(subflow);
But in the MPTCP BPF program, this has not yet been implemented. As
Martin suggested recently, this conn_list walking + modify-by-kfunc
usage fits the bpf_iter use case.
So this patch adds a new bpf_iter type named "mptcp_subflow" to do
this and implements its helpers bpf_iter_mptcp_subflow_new()/_next()/
_destroy(). And register these bpf_iter mptcp_subflow into mptcp
common kfunc set. Then bpf_for_each() for mptcp_subflow can be used
in BPF program like this:
bpf_for_each(mptcp_subflow, subflow, msk)
kfunc(subflow);
Suggested-by: Martin KaFai Lau <martin....@kernel.org>
Signed-off-by: Geliang Tang <tanggeli...@kylinos.cn>
Reviewed-by: Mat Martineau <martin...@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matt...@kernel.org>
---
Notes:
- v2:
- Add BUILD_BUG_ON() checks, similar to the ones done with other
bpf_iter_(...) helpers.
- Replace msk_owned_by_me() by sock_owned_by_user_nocheck() and
!spin_is_locked() (Martin).
- v3:
- Switch parameter from 'struct mptcp_sock' to 'struct sock' (Martin)
- Remove unneeded !msk check (Martin)
- Remove locks checks, add msk_owned_by_me for lockdep (Martin)
- The following note and 2 questions have been added below.
This new bpf_iter will be used by our future BPF packet schedulers and
path managers. To see how we are going to use them, please check our
export branch [1], especially these two commits:
- "bpf: Add mptcp packet scheduler struct_ops": introduce a new
struct_ops.
- "selftests/bpf: Add bpf_burst scheduler & test": new test showing
how the new struct_ops and bpf_iter are being used.
[1] https://github.com/multipath-tcp/mptcp_net-next/commits/export
@BPF maintainers: we would like to allow this new mptcp_subflow bpf_iter
to be used with struct_ops, but only with the two new ones we are going
to introduce that are specific to MPTCP, and with not others struct_ops
(TCP CC, sched_ext, etc.). We are not sure how to do that. By chance, do
you have examples or doc you could point to us to have this restriction
in place, please?
The bpf_qdisc.c has done that. Take a look at the "bpf_qdisc_kfunc_filter()".
It is in net-next and bpf-next/net.
Also, for one of the two future MPTCP struct_ops, not all callbacks
should be allowed to use this new bpf_iter, because they are called from
different contexts. How can we ensure such callbacks from a struct_ops
cannot call mptcp_subflow bpf_iter without adding new dedicated checks
looking if some locks are held for all callbacks? We understood that
they wanted to have something similar with sched_ext, but we are not
sure if this code is ready nor if it is going to be accepted.
Same. Take a look at "bpf_qdisc_kfunc_filter()".