On 1/25/26 11:15 AM, Jakub Sitnicki wrote:
On Thu, Jan 22, 2026 at 12:21 PM -08, Martin KaFai Lau wrote:
On 1/13/26 4:33 AM, Jakub Sitnicki wrote:
Good point. I'm hoping we don't have to allocate from
skb_metadata_set(), which does sound prohibitively expensive. Instead
we'd allocate the extension together with the skb if we know upfront
that metadata will be used.
[ Sorry for being late. Have been catching up after holidays. ]
For the sk local storage (which was mentioned in other replies as making skb
metadata to look more like sk local storage), there is a plan (Amery has been
looking into it) to allocate the storage together with sk for performance
reason. This means allocating a larger 'struct sock'. The extra space will be at
the front of sk instead of the end of sk because of how the 'struct sock' is
embedded in tcp_sock/udp_sock/... If skb is going in the same direction, it
should be useful to have a similar scheme on: upfront allocation and then shared
by multiple BPF progs.
The current thinking is to built upon the existing bpf_sk_local_storage usage. A
boot param decides how much BPF space should be allocated for 'struct
sock'. When a bpf_sk_storage_map is created (with a new use_reserve flag), the
space will be allocated permanently from the head space of every sk for this
map. The read (from a BPF prog) will be at one stable offset before a sk. If
there is no more head space left, the map creation will fail. User can decide if
it wants to retry without the 'use_reserve' flag.
Thanks for sharing the plans.
We will definitely be looking into ways of eliminating allocations in
the long run. With one allocation for skb_ext, one for
bpf_local_storage, and one for the actual map, it seems unlikely we will
be able to attach metadata this way to every packet. Which is something
we wanted for our "label packet once, use label everywhere" use case.
I'm not sure how much we can squeeze in together with the sk_buff.
Hopefully at least skb_ext plus a pointer to bpf_local_storage.
yeah, only a bpf_local_storage pointer is needed in skb (or in skb_ext).
It is the same for the bpf sk/task/... storage.
To be clear, for allocation in skb, I was thinking more about Paolo's
comment on "...increasing struct sk_buff size as an alternative to the
mptcp skb extension...".
I'm also hoping we can allocate memory for bpf_local_storage together
with the backing space for the map, which update triggers the skb
extension activation.
Allocate the actual storage at the end of bpf_local_storage? Hmm... off
the top of my head, I don't have a good idea how to do it without
trading off flexibility. If trading off flexibility, may as well
allocate fixed extra space at the sk (/skb) and get a performance
benefit (which would need to be measured).
Finally, bpf_local_storage itself has a pretty generous cache which
blows it up. Maybe the cache could be a flexible array, which could be
smaller for skb local storage.
For our usage, the cache has been slowly filling up, so we actually have
another side of the issue. Improvements on bpf_local_storage is always
welcomed.
I am currently more interested in getting the extra memory/headroom
allocated for an sk. Eventually, the storage(s) that will be needed for
all (or most) sk will use the extra headroom of sk. The current
bpf_local_storage (pointer) in sk will be more for testing/ad-hoc
purpose or for performance-insensitive usage.
It is probably off topic now. It seems having extra tail space in a skb
is not in your current plan for the next respin.