On Thu, Jun 04, 2026 at 02:30:34PM +0900, Harry Yoo wrote: > > > On 6/3/26 3:31 AM, Pedro Falcato wrote: > > SKB data area allocations (as done from alloc_skb()) use kmalloc(). > > These allocations can be variably sized and their contents can be more > > or less controlled from userspace, which makes them useful for attackers > > that want to overwrite a use-after-free'd object from the same kmalloc slab > > (which often just requires the sizes to roughly match into the same kmalloc > > bucket). [0] is an easy example of an exploit that uses netlink skb > > allocation to target another similarly-sized accidentally freed object. > > > > While other mitigations like CONFIG_RANDOM_KMALLOC_CACHES exist, these are > > probabilistic. Use the existing kmem buckets API to further isolate these > > allocations in a guaranteed fashion, when CONFIG_SLAB_BUCKETS=y. > > > > Link: > > https://github.com/google/security-research/blob/master/pocs/linux/kernelctf/CVE-2023-4207_lts_cos_mitigation_2/docs/exploit.md > > [0] > > Signed-off-by: Pedro Falcato <[email protected]> > > --- > > net/core/skbuff.c | 5 ++++- > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > index 44a7f8401468..1f6c6b531ece 100644 > > --- a/net/core/skbuff.c > > +++ b/net/core/skbuff.c > > @@ -594,6 +594,8 @@ static void *kmalloc_pfmemalloc(size_t obj_size, gfp_t > > flags, int node) > > return kmalloc_node_track_caller(obj_size, flags, node); > > } > > > > +static kmem_buckets *skb_data_buckets __ro_after_init; > > + > > /* > > * kmalloc_reserve is a wrapper around kmalloc_node_track_caller that tells > > * the caller if emergency pfmemalloc reserves are being used. If it is and > > @@ -632,7 +634,7 @@ static void *kmalloc_reserve(unsigned int *size, gfp_t > > flags, int node, > > * Try a regular allocation, when that fails and we're not entitled > > * to the reserves, fail. > > */ > > - obj = kmalloc_node_track_caller(obj_size, > > + obj = kmem_buckets_alloc_node_track_caller(skb_data_buckets, obj_size, > > flags | __GFP_NOMEMALLOC | __GFP_NOWARN, > > node); > > if (likely(obj)) > > What about kmalloc_pfmemalloc()?
Good point, that looks free as well. Sidenote: isolating kmem_cache_alloc for possibly-aliasing caches could also be useful. skb allocation has net_hotdata.skb_small_head_cache. It doesn't merge with anything for $raisins (odd size, plus I don't think usercopy caches are getting merged?) but it feels too... accidental? Maybe passing something like SLAB_NO_MERGE and making the size standard-looking would be nice. I have a size of 704 bytes per object, and this probably causes some weird wastage for each slab. -- Pedro

