On Thu, Feb 05, 2026 at 02:29:53PM +0200, Vladimir Oltean wrote:
> On Wed, Feb 04, 2026 at 05:34:01PM -0800, Jakub Kicinski wrote:
> > On Thu, 5 Feb 2026 02:59:01 +0200 Vladimir Oltean wrote:
> > > Thanks! This is an extremely subtle corner case. I appreciate the patch
> > > and explanation.
> > > 
> > > I did run tests on the blamed commit (which I still have), but to catch
> > > a real issue in a meaningful way it would have been required to have a
> > > program which calls bpf_xdp_adjust_tail() with a very large offset.
> > > I'm noting that I'm seeing the WARN_ON() much easier after your fix, but
> > > before, it was mostly inconsequential for practical cases.
> > > 
> > > Namely, the ENETC truesize is 2048, and XDP_PACKET_HEADROOM is 256.
> > > First buffers also contain the skb_shared_info (320 bytes), while
> > > subsequent buffers don't.
> > 
> > I can't wrap my head around this series, hope you can tell me where I'm
> > going wrong. AFAICT enetc splits the page into two halves for small MTU.
> > 
> > So we have 
> > 
> >  |                 2k          |             2k              |
> >   ----------------------------- ----------------------------- 
> >  | hroom | data | troom/shinfo | hroom | data | troom/shinfo |
> >   ----------------------------- ----------------------------- 
> > 
> > If we attach the second chunk as frag well have:
> >   offset = 2k + hroom
> >   size = data.len
> > But we use
> >   truesize / frag_size = 2k
> > so
> >   tailroom = rxq->frag_size - skb_frag_size(frag) - skb_frag_off(frag);
> >   tailroom = 2k - data.len - 2k
> >   tailroom = -data.len
> >   WARN(tailroom < 0) -> yes
> > 
> > The frag_size thing is unusable for any driver that doesn't hand out
> > full pages to frags?
> 
> This is an excellent question.
> 
> Yes, you're right, bpf_xdp_frags_increase_tail() only has a 50% chance
> of working - the paged data has to be in the first half of the page,
> otherwise the tailroom calculations are not correct due to rxq->frag_size,
> and the WARN_ON() will trigger.
> 
> The reason why I didn't notice this during my testing is stupid. I was
> attaching the BPF program to the interface and then detaching it after
> each test, and each test was sending less than the RX ring size (2048)
> worth of packets. So all multi-buffer frames were using buffers which
> were fresh out of enetc_setup_rxbdr() -> ... -> enetc_new_page() (first
> halves) and never out of flipped pages (enetc_bulk_flip_buff()).
> 
> This seems to be a good reason to convert this driver to use page pool,
> which I can look into. I'm not sure that there's anything that can be
> done to make the rxq->frag_size mechanism compatible with the current
> buffer allocation scheme.

I was just about to send an answer.

Seems like my mistake here. I actually think adjusting the tail should work, if 
we set rxq->frag_size to PAGE_SIZE in enetc and i40e_rx_pg_size() in i40e, and 
not to (PAGE_SIZE / 2), as I did at first, but in such case naming this 
frag_size is just utterly wrong. Glad Jakub has pointed this out.

ice and idpf are fine, since they use libeth for Rx buffers, so mbuf packets
always reside in 3K+ buffers. But for xsk_buff_pool seems like all drivers 
should have PAGE_SIZE as frag_size? I will let the discussion go on for at least
a day and then will send v2 with patches reordered and those sizes corrected, 
maybe add ZC fixes on top.

Reply via email to