On Wed, May 15, 2019 at 8:11 AM Lorenz Bauer <l...@cloudflare.com> wrote: > > In the BPF-based TPROXY session with Joe Stringer [1], I mentioned > that the sk_lookup_* helpers currently return inconsistent results if > SK_REUSEPORT programs are in play. > > SK_REUSEPORT programs are a hook point in inet_lookup. They get access > to the full packet > that triggered the look up. To support this, inet_lookup gained a new > skb argument to provide such context. If skb is NULL, the SK_REUSEPORT > program is skipped and instead the socket is selected by its hash. > > The first problem is that not all callers to inet_lookup from BPF have > an skb, e.g. XDP. This means that a look up from XDP gives an > incorrect result. For now that is not a huge problem. However, once we > get sk_assign as proposed by Joe, we can end up circumventing > SK_REUSEPORT.
To clarify a bit, the reason this is a problem is that a straightforward implementation may just consider passing the skb context into the sk_lookup_*() and through to the inet_lookup() so that it would run the SK_REUSEPORT BPF program for socket selection on the skb when the packet-path BPF program performs the socket lookup. However, as this paragraph describes, the skb context is not always available. > At the conference, someone suggested using a similar approach to the > work done on the flow dissector by Stanislav: create a dedicated > context sk_reuseport which can either take an skb or a plain pointer. > Patch up load_bytes to deal with both. Pass the context to > inet_lookup. > > This is when we hit the second problem: using the skb or XDP context > directly is incorrect, because it assumes that the relevant protocol > headers are at the start of the buffer. In our use case, the correct > headers are at an offset since we're inspecting encapsulated packets. > > The best solution I've come up with is to steal 17 bits from the flags > argument to sk_lookup_*, 1 bit for BPF_F_HEADERS_AT_OFFSET, 16bit for > the offset itself. FYI there's also the upper 32 bits of the netns_id parameter, another option would be to steal 16 bits from there. > Thoughts? Internally with skbs, we use `skb_pull()` to manage header offsets, could we do something similar with `bpf_xdp_adjust_head()` prior to the call to `bpf_sk_lookup_*()`?