On Fri, May 11, 2018 at 02:08:01PM -0700, Joe Stringer wrote: > On 10 May 2018 at 22:00, Martin KaFai Lau <ka...@fb.com> wrote: > > On Wed, May 09, 2018 at 02:07:05PM -0700, Joe Stringer wrote: > >> This patch adds a new BPF helper function, sk_lookup() which allows BPF > >> programs to find out if there is a socket listening on this host, and > >> returns a socket pointer which the BPF program can then access to > >> determine, for instance, whether to forward or drop traffic. sk_lookup() > >> takes a reference on the socket, so when a BPF program makes use of this > >> function, it must subsequently pass the returned pointer into the newly > >> added sk_release() to return the reference. > >> > >> By way of example, the following pseudocode would filter inbound > >> connections at XDP if there is no corresponding service listening for > >> the traffic: > >> > >> struct bpf_sock_tuple tuple; > >> struct bpf_sock_ops *sk; > >> > >> populate_tuple(ctx, &tuple); // Extract the 5tuple from the packet > >> sk = bpf_sk_lookup(ctx, &tuple, sizeof tuple, netns, 0); > >> if (!sk) { > >> // Couldn't find a socket listening for this traffic. Drop. > >> return TC_ACT_SHOT; > >> } > >> bpf_sk_release(sk, 0); > >> return TC_ACT_OK; > >> > >> Signed-off-by: Joe Stringer <j...@wand.net.nz> > >> --- > > ... > > >> @@ -4032,6 +4036,96 @@ static const struct bpf_func_proto > >> bpf_skb_get_xfrm_state_proto = { > >> }; > >> #endif > >> > >> +struct sock * > >> +sk_lookup(struct net *net, struct bpf_sock_tuple *tuple) { > > Would it be possible to have another version that > > returns a sk without taking its refcnt? > > It may have performance benefit. > > Not really. The sockets are not RCU-protected, and established sockets > may be torn down without notice. If we don't take a reference, there's > no guarantee that the socket will continue to exist for the duration > of running the BPF program. > > From what I follow, the comment below has a hidden implication which > is that sockets without SOCK_RCU_FREE, eg established sockets, may be > directly freed regardless of RCU. Right, SOCK_RCU_FREE sk is the one I am concern about. For example, TCP_LISTEN socket does not require taking a refcnt now. Doing a bpf_sk_lookup() may have a rather big impact on handling TCP syn flood. or the usual intention is to redirect instead of passing it up to the stack?
> > /* Sockets having SOCK_RCU_FREE will call this function after one RCU > * grace period. This is the case for UDP sockets and TCP listeners. > */ > static void __sk_destruct(struct rcu_head *head) > ... > > Therefore without the refcount, it won't be safe.