On Fri, 2026-04-17 at 10:43 +0800, Jiayuan Chen wrote:
> 
> On 4/16/26 7:23 PM, KaFai Wan wrote:
> > A BPF_SOCK_OPS program can enable
> > BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG and then call
> > bpf_setsockopt(TCP_NODELAY) from BPF_SOCK_OPS_HDR_OPT_LEN_CB or
> > BPF_SOCK_OPS_WRITE_HDR_OPT_CB.
> > 
> > In these callbacks, bpf_setsockopt(TCP_NODELAY) can reach
> > __tcp_sock_set_nodelay(), which can call tcp_push_pending_frames().
> > 
> >  From BPF_SOCK_OPS_HDR_OPT_LEN_CB, tcp_push_pending_frames() can call
> > tcp_current_mss(), which calls tcp_established_options() and re-enters
> > bpf_skops_hdr_opt_len().
> > 
> > BPF_SOCK_OPS_HDR_OPT_LEN_CB
> >    -> bpf_setsockopt(TCP_NODELAY)
> >      -> tcp_push_pending_frames()
> >        -> tcp_current_mss()
> >          -> tcp_established_options()
> >            -> bpf_skops_hdr_opt_len()
> >              -> BPF_SOCK_OPS_HDR_OPT_LEN_CB
> > 
> >  From BPF_SOCK_OPS_WRITE_HDR_OPT_CB, tcp_push_pending_frames() can call
> > tcp_write_xmit(), which calls tcp_transmit_skb().  That path recomputes
> > header option length through tcp_established_options() and
> > bpf_skops_hdr_opt_len() before re-entering bpf_skops_write_hdr_opt().
> > 
> > BPF_SOCK_OPS_WRITE_HDR_OPT_CB
> >    -> bpf_setsockopt(TCP_NODELAY)
> >      -> tcp_push_pending_frames()
> >        -> tcp_write_xmit()
> >          -> tcp_transmit_skb()
> >            -> tcp_established_options()
> >              -> bpf_skops_hdr_opt_len()
> >            -> bpf_skops_write_hdr_opt()
> >              -> BPF_SOCK_OPS_WRITE_HDR_OPT_CB
> > 
> > This leads to unbounded recursion and can overflow the kernel stack.
> > 
> > Reject TCP_NODELAY with -EOPNOTSUPP in bpf_sock_ops_setsockopt()
> > when bpf_setsockopt() is called from
> > BPF_SOCK_OPS_HDR_OPT_LEN_CB or BPF_SOCK_OPS_WRITE_HDR_OPT_CB.
> > 
> > Reported-by: Quan Sun <[email protected]>
> > Reported-by: Yinhao Hu <[email protected]>
> > Reported-by: Kaiyan Mei <[email protected]>
> > Closes: 
> > https://lore.kernel.org/bpf/[email protected]/
> > Fixes: 7e41df5dbba2 ("bpf: Add a few optnames to bpf_setsockopt")
> > Signed-off-by: KaFai Wan <[email protected]>
> > ---
> >   net/core/filter.c | 5 +++++
> >   1 file changed, 5 insertions(+)
> > 
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index fcfcb72663ca..911ff04bca5a 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -5833,6 +5833,11 @@ BPF_CALL_5(bpf_sock_ops_setsockopt, struct 
> > bpf_sock_ops_kern *, bpf_sock,
> >     if (!is_locked_tcp_sock_ops(bpf_sock))
> >             return -EOPNOTSUPP;
> >   
> > +   if ((bpf_sock->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB ||
> > +        bpf_sock->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB) &&
> > +       IS_ENABLED(CONFIG_INET) && level == SOL_TCP && optname == 
> > TCP_NODELAY)
> > +           return -EOPNOTSUPP;
> > +
> >     return _bpf_setsockopt(bpf_sock->sk, level, optname, optval, optlen);
> >   }
> >   
> 
> A simple comment is recommended:
> 
> /* TCP_NODELAY triggers tcp_push_pending_frames() and re-enters these 
> callbacks. */
> 
ok. Added in new version.
> 
> Also like Martin pointed before, BPF_SOCK_OPS_HDR_OPT_LEN_CB / 
> BPF_SOCK_OPS_WRITE_HDR_OPT_CB
> 
> can only be produced under CONFIG_INET so IS_ENABLED(CONFIG_INET) is dead.
> 

-- 
Thanks,
KaFai

Reply via email to