On Fri, 2026-04-17 at 10:43 +0800, Jiayuan Chen wrote: > > On 4/16/26 7:23 PM, KaFai Wan wrote: > > A BPF_SOCK_OPS program can enable > > BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG and then call > > bpf_setsockopt(TCP_NODELAY) from BPF_SOCK_OPS_HDR_OPT_LEN_CB or > > BPF_SOCK_OPS_WRITE_HDR_OPT_CB. > > > > In these callbacks, bpf_setsockopt(TCP_NODELAY) can reach > > __tcp_sock_set_nodelay(), which can call tcp_push_pending_frames(). > > > > From BPF_SOCK_OPS_HDR_OPT_LEN_CB, tcp_push_pending_frames() can call > > tcp_current_mss(), which calls tcp_established_options() and re-enters > > bpf_skops_hdr_opt_len(). > > > > BPF_SOCK_OPS_HDR_OPT_LEN_CB > > -> bpf_setsockopt(TCP_NODELAY) > > -> tcp_push_pending_frames() > > -> tcp_current_mss() > > -> tcp_established_options() > > -> bpf_skops_hdr_opt_len() > > -> BPF_SOCK_OPS_HDR_OPT_LEN_CB > > > > From BPF_SOCK_OPS_WRITE_HDR_OPT_CB, tcp_push_pending_frames() can call > > tcp_write_xmit(), which calls tcp_transmit_skb(). That path recomputes > > header option length through tcp_established_options() and > > bpf_skops_hdr_opt_len() before re-entering bpf_skops_write_hdr_opt(). > > > > BPF_SOCK_OPS_WRITE_HDR_OPT_CB > > -> bpf_setsockopt(TCP_NODELAY) > > -> tcp_push_pending_frames() > > -> tcp_write_xmit() > > -> tcp_transmit_skb() > > -> tcp_established_options() > > -> bpf_skops_hdr_opt_len() > > -> bpf_skops_write_hdr_opt() > > -> BPF_SOCK_OPS_WRITE_HDR_OPT_CB > > > > This leads to unbounded recursion and can overflow the kernel stack. > > > > Reject TCP_NODELAY with -EOPNOTSUPP in bpf_sock_ops_setsockopt() > > when bpf_setsockopt() is called from > > BPF_SOCK_OPS_HDR_OPT_LEN_CB or BPF_SOCK_OPS_WRITE_HDR_OPT_CB. > > > > Reported-by: Quan Sun <[email protected]> > > Reported-by: Yinhao Hu <[email protected]> > > Reported-by: Kaiyan Mei <[email protected]> > > Closes: > > https://lore.kernel.org/bpf/[email protected]/ > > Fixes: 7e41df5dbba2 ("bpf: Add a few optnames to bpf_setsockopt") > > Signed-off-by: KaFai Wan <[email protected]> > > --- > > net/core/filter.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/net/core/filter.c b/net/core/filter.c > > index fcfcb72663ca..911ff04bca5a 100644 > > --- a/net/core/filter.c > > +++ b/net/core/filter.c > > @@ -5833,6 +5833,11 @@ BPF_CALL_5(bpf_sock_ops_setsockopt, struct > > bpf_sock_ops_kern *, bpf_sock, > > if (!is_locked_tcp_sock_ops(bpf_sock)) > > return -EOPNOTSUPP; > > > > + if ((bpf_sock->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB || > > + bpf_sock->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB) && > > + IS_ENABLED(CONFIG_INET) && level == SOL_TCP && optname == > > TCP_NODELAY) > > + return -EOPNOTSUPP; > > + > > return _bpf_setsockopt(bpf_sock->sk, level, optname, optval, optlen); > > } > > > > A simple comment is recommended: > > /* TCP_NODELAY triggers tcp_push_pending_frames() and re-enters these > callbacks. */ > ok. Added in new version. > > Also like Martin pointed before, BPF_SOCK_OPS_HDR_OPT_LEN_CB / > BPF_SOCK_OPS_WRITE_HDR_OPT_CB > > can only be produced under CONFIG_INET so IS_ENABLED(CONFIG_INET) is dead. >
-- Thanks, KaFai

