On 3/20/25 10:48 AM, Matthieu Baerts (NGI0) wrote:
From: Geliang Tang <tanggeli...@kylinos.cn>

It's necessary to traverse all subflows on the conn_list of an MPTCP
socket and then call kfunc to modify the fields of each subflow. In
kernel space, mptcp_for_each_subflow() helper is used for this:

        mptcp_for_each_subflow(msk, subflow)
                kfunc(subflow);

But in the MPTCP BPF program, this has not yet been implemented. As
Martin suggested recently, this conn_list walking + modify-by-kfunc
usage fits the bpf_iter use case.

So this patch adds a new bpf_iter type named "mptcp_subflow" to do
this and implements its helpers bpf_iter_mptcp_subflow_new()/_next()/
_destroy(). And register these bpf_iter mptcp_subflow into mptcp
common kfunc set. Then bpf_for_each() for mptcp_subflow can be used
in BPF program like this:

        bpf_for_each(mptcp_subflow, subflow, msk)
                kfunc(subflow);

Suggested-by: Martin KaFai Lau <martin....@kernel.org>
Signed-off-by: Geliang Tang <tanggeli...@kylinos.cn>
Reviewed-by: Mat Martineau <martin...@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matt...@kernel.org>
---
Notes:
  - v2:
    - Add BUILD_BUG_ON() checks, similar to the ones done with other
      bpf_iter_(...) helpers.
    - Replace msk_owned_by_me() by sock_owned_by_user_nocheck() and
      !spin_is_locked() (Martin).
  - v3:
    - Switch parameter from 'struct mptcp_sock' to 'struct sock' (Martin)
    - Remove unneeded !msk check (Martin)
    - Remove locks checks, add msk_owned_by_me for lockdep (Martin)
    - The following note and 2 questions have been added below.

This new bpf_iter will be used by our future BPF packet schedulers and
path managers. To see how we are going to use them, please check our
export branch [1], especially these two commits:

  - "bpf: Add mptcp packet scheduler struct_ops": introduce a new
    struct_ops.
  - "selftests/bpf: Add bpf_burst scheduler & test": new test showing
    how the new struct_ops and bpf_iter are being used.

[1] https://github.com/multipath-tcp/mptcp_net-next/commits/export

@BPF maintainers: we would like to allow this new mptcp_subflow bpf_iter
to be used with struct_ops, but only with the two new ones we are going
to introduce that are specific to MPTCP, and with not others struct_ops
(TCP CC, sched_ext, etc.). We are not sure how to do that. By chance, do
you have examples or doc you could point to us to have this restriction
in place, please?

The bpf_qdisc.c has done that. Take a look at the "bpf_qdisc_kfunc_filter()".

It is in net-next and bpf-next/net.


Also, for one of the two future MPTCP struct_ops, not all callbacks
should be allowed to use this new bpf_iter, because they are called from
different contexts. How can we ensure such callbacks from a struct_ops
cannot call mptcp_subflow bpf_iter without adding new dedicated checks
looking if some locks are held for all callbacks? We understood that
they wanted to have something similar with sched_ext, but we are not
sure if this code is ready nor if it is going to be accepted.

Same. Take a look at "bpf_qdisc_kfunc_filter()".


Reply via email to