On Mon, Oct 06, 2025 at 06:20:53PM +0200, Alexander Lobakin wrote:
> Add second page_pool for header buffers to each Rx queue and ability
> to toggle the header split on/off using Ethtool (default to off to
> match the current behaviour).
> Unlike idpf, all HW backed up by ice doesn't require any W/As and
> correctly splits all types of packets as configured: after L4 headers
> for TCP/UDP/SCTP, after L3 headers for other IPv4/IPv6 frames, after
> the Ethernet header otherwise (in case of tunneling, same as above,
> but after innermost headers).
> This doesn't affect the XSk path as there are no benefits of having
> it there.
> 
> Signed-off-by: Alexander Lobakin <[email protected]>
> ---
> Applies on top of Tony's next-queue, depends on MichaƂ's Page Pool
> conversion series.
> 
> Sending for review and validation purposes.
> 
> Testing hints: traffic testing with and without header split enabled.
> The header split can be turned on/off using Ethtool:
> 
> sudo ethtool -G <iface> tcp-data-split on (or off)

Nice, I'm very pleased to see this feature in the pipeline for the ice driver.

...

> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c 
> b/drivers/net/ethernet/intel/ice/ice_txrx.c

...

> @@ -836,6 +858,20 @@ bool ice_alloc_rx_bufs(struct ice_rx_ring *rx_ring, 
> unsigned int cleaned_count)
>                */
>               rx_desc->read.pkt_addr = cpu_to_le64(addr);
>  
> +             if (!hdr_fq.pp)
> +                     goto next;
> +
> +             addr = libeth_rx_alloc(&hdr_fq, ntu);
> +             if (addr == DMA_MAPPING_ERROR) {
> +                     rx_ring->ring_stats->rx_stats.alloc_page_failed++;
> +
> +                     libeth_rx_recycle_slow(fq.fqes[ntu].netmem);
> +                     break;
> +             }
> +
> +             rx_desc->read.hdr_addr = cpu_to_le64(addr);
> +
> +next:

Is performance the reason that a goto is used here, rather than, say, putting
the conditional code in an if condition? Likewise in ice_clean_rx_irq?

>               rx_desc++;
>               ntu++;
>               if (unlikely(ntu == rx_ring->count)) {
> @@ -933,14 +969,16 @@ static int ice_clean_rx_irq(struct ice_rx_ring 
> *rx_ring, int budget)
>               unsigned int size;
>               u16 stat_err_bits;
>               u16 vlan_tci;
> +             bool rxe;
>  
>               /* get the Rx desc from Rx ring based on 'next_to_clean' */
>               rx_desc = ICE_RX_DESC(rx_ring, ntc);
>  
> -             /* status_error_len will always be zero for unused descriptors
> -              * because it's cleared in cleanup, and overlaps with hdr_addr
> -              * which is always zero because packet split isn't used, if the
> -              * hardware wrote DD then it will be non-zero
> +             /*
> +              * The DD bit will always be zero for unused descriptors
> +              * because it's cleared in cleanup or when setting the DMA
> +              * address of the header buffer, which never uses the DD bit.
> +              * If the hardware wrote the descriptor, it will be non-zero.
>                */

The update to this comment feels like it could be a separate patch.
(I know, I often say something like that...)

>               stat_err_bits = BIT(ICE_RX_FLEX_DESC_STATUS0_DD_S);
>               if (!ice_test_staterr(rx_desc->wb.status_error0, stat_err_bits))
> @@ -954,12 +992,27 @@ static int ice_clean_rx_irq(struct ice_rx_ring 
> *rx_ring, int budget)
>  
>               ice_trace(clean_rx_irq, rx_ring, rx_desc);
>  
> +             stat_err_bits = BIT(ICE_RX_FLEX_DESC_STATUS0_HBO_S) |
> +                             BIT(ICE_RX_FLEX_DESC_STATUS0_RXE_S);
> +             rxe = ice_test_staterr(rx_desc->wb.status_error0,
> +                                    stat_err_bits);
> +
> +             if (!rx_ring->hdr_pp)
> +                     goto payload;
> +
> +             size = le16_get_bits(rx_desc->wb.hdr_len_sph_flex_flags1,
> +                                  ICE_RX_FLEX_DESC_HDR_LEN_M);
> +             if (unlikely(rxe))
> +                     size = 0;
> +
> +             rx_buf = &rx_ring->hdr_fqes[ntc];
> +             libeth_xdp_process_buff(xdp, rx_buf, size);
> +             rx_buf->netmem = 0;
> +
> +payload:
>               size = le16_to_cpu(rx_desc->wb.pkt_len) &
>                       ICE_RX_FLX_DESC_PKT_LEN_M;
> -
> -             stat_err_bits = BIT(ICE_RX_FLEX_DESC_STATUS0_RXE_S);
> -             if (unlikely(ice_test_staterr(rx_desc->wb.status_error0,
> -                                           stat_err_bits)))
> +             if (unlikely(rxe))
>                       size = 0;
>  
>               /* retrieve a buffer from the ring */
> -- 
> 2.51.0
> 

Reply via email to