On Tue, Nov 17, 2020 at 06:40:22PM +0900, Kuniyuki Iwashima wrote: > This patch makes it possible to select a new listener for socket migration > by eBPF. > > The noteworthy point is that we select a listening socket in > reuseport_detach_sock() and reuseport_select_sock(), but we do not have > struct skb in the unhash path. > > Since we cannot pass skb to the eBPF program, we run only the > BPF_PROG_TYPE_SK_REUSEPORT program by calling bpf_run_sk_reuseport() with > skb NULL. So, some fields derived from skb are also NULL in the eBPF > program. More things need to be considered here when skb is NULL.
Some helpers are probably assuming skb is not NULL. Also, the sk_lookup in filter.c is actually passing a NULL skb to avoid doing the reuseport select. > > Moreover, we can cancel migration by returning SK_DROP. This feature is > useful when listeners have different settings at the socket API level or > when we want to free resources as soon as possible. > > Reviewed-by: Benjamin Herrenschmidt <b...@amazon.com> > Signed-off-by: Kuniyuki Iwashima <kun...@amazon.co.jp> > --- > net/core/filter.c | 26 +++++++++++++++++++++----- > net/core/sock_reuseport.c | 23 ++++++++++++++++++++--- > net/ipv4/inet_hashtables.c | 2 +- > 3 files changed, 42 insertions(+), 9 deletions(-) > > diff --git a/net/core/filter.c b/net/core/filter.c > index 01e28f283962..ffc4591878b8 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -8914,6 +8914,22 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type > type, > SOCK_ADDR_LOAD_NESTED_FIELD_SIZE_OFF(S, NS, F, NF, \ > BPF_FIELD_SIZEOF(NS, NF), 0) > > +#define SOCK_ADDR_LOAD_NESTED_FIELD_SIZE_OFF_OR_NULL(S, NS, F, NF, SIZE, > OFF) \ > + do { > \ > + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(S, F), si->dst_reg, > \ > + si->src_reg, offsetof(S, F)); > \ > + *insn++ = BPF_JMP_IMM(BPF_JEQ, si->dst_reg, 0, 1); > \ Although it may not matter much, always doing this check seems not very ideal considering the fast path will always have skb and only the slow path (accept-queue migrate) has skb is NULL. I think the req_sk usually has the skb also except the timer one. First thought is to create a temp skb but it has its own issues. or it may actually belong to a new prog type. However, lets keep exploring possible options (including NULL skb). > + *insn++ = BPF_LDX_MEM( > \ > + SIZE, si->dst_reg, si->dst_reg, > \ > + bpf_target_off(NS, NF, sizeof_field(NS, NF), > \ > + target_size) > \ > + + OFF); > \ > + } while (0)