On 13/01/2026 13.09, Paolo Abeni wrote:
On 1/13/26 4:08 AM, Jakub Kicinski wrote:
On Sat, 10 Jan 2026 22:05:14 +0100 Jakub Sitnicki wrote:
This series is split out of [1] following discussion with Jakub.
To copy XDP metadata into an skb extension when skb_metadata_set() is
called, we need to locate the metadata contents.
"When skb_metadata_set() is called"? I think that may cause perf
regressions unless we merge major optimizations at the same time?
Should we defer touching the drivers until we have a PoC and some
idea whether allocating the extension right away is manageable or
we are better off doing it via a kfunc in TC (after GRO)?
To be clear putting the metadata in an extension right away would
indeed be much cleaner, just not sure how much of the perf hit we
can optimize away..
I agree it would be better deferring touching the driver before we have
proof there will not be significant regressions.
It will be a performance regression to (as cover-letter says):
"To copy XDP metadata into an skb extension when skb_metadata_set() is
called".
The XDP to TC-ingress code path is a fast-path IMHO.
*BUT* this patchset isn't doing that. To me it looks like a cleanup
patchset that simply makes it consistent when skb_metadata_set() called.
Selling it as a pre-requirement for doing copy later seems fishy.
IIRC, at early MPTCP impl time, Eric suggested increasing struct sk_buff
size as an alternative to the mptcp skb extension, leaving the added
trailing part uninitialized when the sk_buff is allocated.
If skb extensions usage become so ubicuos they are basically allocated
for each packet, the total skb extension is kept under strict control
and remains reasonable (assuming it is :), perhaps we could consider
revisiting the above mentioned approach?
I really like this idea. As using the uninitialized tail room in the
SKB (memory area) will make SKB extensions fast. Today SKBs are
allocated via SLUB-alloacator cache-aligned so the real size is 256
bytes. On my system the actual SKB (sk_buff) size is 232 bytes (already
leaving us 24 bytes). The area that gets zero-initialized is only 192
bytes (3 cache-lines). My experience with the SLUB allocator is that
increasing the object size doesn't increase the allocation cost (below
PAGE_SIZE). So, the suggestion of simply allocating a larger sk_buff is
valid as it doesn't cost more (if we don't touch those cache-lines). We
could even make it a CONFIG compile time option how big this area should be.
For Jakub this unfortunately challenge/breaks the design of keeping
data_meta area valid deeper into the netstack. With all the challenges
around encapsulation/decap it seems hard/infeasible to maintain this
area in-front of the packet data pointer deeper into the netstack.
Instead of blindly copying XDP data_meta area into a single SKB
extension. What if we make it the responsibility of the TC-ingress BPF-
hook to understand the data_meta format and via (kfunc) helpers
transfer/create the SKB extension that it deems relevant.
Would this be an acceptable approach that makes it easier to propagate
metadata deeper in netstack?
--Jesper
p.s. For compact storage of SKB extensions in the SKB tail-area, we
could revisit Arthur's "traits" (compact-KV storage).