On Mon, Jul 14, 2025 at 04:35:26PM +0200, Alexander Lobakin wrote:
> From: Jacob Keller <[email protected]>
> Date: Thu, 10 Jul 2025 15:43:20 -0700
> 
> > 
> > 
> > On 7/7/2025 4:36 PM, Jacob Keller wrote:
> 
> [...]
> 
> > I got this to work with the following diff:
> > 
> > diff --git i/drivers/net/ethernet/intel/ice/ice_txrx.h
> > w/drivers/net/ethernet/intel/ice/ice_txrx.h
> > index 42e74925b9df..6b72608a20ab 100644
> > --- i/drivers/net/ethernet/intel/ice/ice_txrx.h
> > +++ w/drivers/net/ethernet/intel/ice/ice_txrx.h
> > @@ -342,7 +342,6 @@ struct ice_rx_ring {
> >         struct ice_tx_ring *xdp_ring;
> >         struct ice_rx_ring *next;       /* pointer to next ring in
> > q_vector */
> >         struct xsk_buff_pool *xsk_pool;
> > -       u32 nr_frags;
> >         u16 rx_buf_len;
> >         dma_addr_t dma;                 /* physical address of ring */
> >         u8 dcb_tc;                      /* Traffic class of ring */
> > diff --git i/drivers/net/ethernet/intel/ice/ice_txrx.c
> > w/drivers/net/ethernet/intel/ice/ice_txrx.c
> > index 062291dac99c..403b5c54fd2a 100644
> > --- i/drivers/net/ethernet/intel/ice/ice_txrx.c
> > +++ w/drivers/net/ethernet/intel/ice/ice_txrx.c
> > @@ -831,8 +831,7 @@ static int ice_clean_rx_irq(struct ice_rx_ring
> > *rx_ring, int budget)
> > 
> >                 /* retrieve a buffer from the ring */
> >                 rx_buf = &rx_ring->rx_fqes[ntc];
> > -               if (!libeth_xdp_process_buff(xdp, rx_buf, size))
> > -                       break;
> > +               libeth_xdp_process_buff(xdp, rx_buf, size);
> > 
> >                 if (++ntc == cnt)
> >                         ntc = 0;
> > @@ -852,25 +851,18 @@ static int ice_clean_rx_irq(struct ice_rx_ring
> > *rx_ring, int budget)
> > 
> >                 xdp->data = NULL;
> >                 rx_ring->first_desc = ntc;
> > -               rx_ring->nr_frags = 0;
> >                 continue;
> >  construct_skb:
> >                 skb = xdp_build_skb_from_buff(&xdp->base);
> > +               xdp->data = NULL;
> > +               rx_ring->first_desc = ntc;
> > 
> >                 /* exit if we failed to retrieve a buffer */
> >                 if (!skb) {
> > -                       rx_ring->ring_stats->rx_stats.alloc_page_failed++;
> > -                       xdp_verdict = ICE_XDP_CONSUMED;
> > -                       xdp->data = NULL;
> > -                       rx_ring->first_desc = ntc;
> > -                       rx_ring->nr_frags = 0;
> > +                       rx_ring->ring_stats->rx_stats.alloc_buf_failed++;
> >                         break;
> >                 }
> > 
> > -               xdp->data = NULL;
> > -               rx_ring->first_desc = ntc;
> > -               rx_ring->nr_frags = 0;
> > -
> >                 stat_err_bits = BIT(ICE_RX_FLEX_DESC_STATUS0_RXE_S);
> >                 if (unlikely(ice_test_staterr(rx_desc->wb.status_error0,
> >                                               stat_err_bits))) {
> 
> More or less. I'm taking over this series since MichaƂ's on a vacation,
> I'll double check everything (against iavf and idpf as well).
> 
> Anyway, thanks for the fix.
> 
> > 
> > 
> > --->8---
> > 
> > The essential change is to not break if libeth_xdp_process_buff returns
> > false, since we still need to move the ring forward in this case, and
> > the usual reason it returns false is the zero-length descriptor we
> > sometimes get when using larger MTUs.
> > 
> > I also dropped some of the updates and re-ordered how we assign
> > xdp->data, and fixed the bug with the ring stats using alloc_page_failed
> > instead of alloc_buf_failed like we should have. I think this could be
> > further improved or cleaned up, but might be better to wait until the
> > full usage of the XDP helpers.
> > 
> > Regardless, we need something like this to fix the issues with larger MTU.
> 
> Thanks,
> Olek


Dear Jake and Olek,

Thanks for your support, detailed testing and fixes!

I successfully reproduced the crash during stress testing the series
using:
 - MTU == 9k,
 - iperf3 (for UDP traffic),
 - heavy HTTP workload running 20 threads and 100000 connections.

After applying the fixes for v2, I observed no issues.

Thanks,
Michal

Reply via email to