On Thu, Sep 25, 2025 at 10:22 AM -07, Jacob Keller wrote:
> On 9/25/2025 2:56 AM, Jakub Sitnicki wrote:
>> On Thu, Sep 25, 2025 at 11:22 AM +02, Michal Kubiak wrote:
>>> This series modernizes the Rx path in the ice driver by removing legacy
>>> code and switching to the Page Pool API. The changes follow the same
>>> direction as previously done for the iavf driver, and aim to simplify
>>> buffer management, improve maintainability, and prepare for future
>>> infrastructure reuse.
>>>
>>> An important motivation for this work was addressing reports of poor
>>> performance in XDP_TX mode when IOMMU is enabled. The legacy Rx model
>>> incurred significant overhead due to per-frame DMA mapping, which
>>> limited throughput in virtualized environments. This series eliminates
>>> those bottlenecks by adopting Page Pool and bi-directional DMA mapping.
>>>
>>> The first patch removes the legacy Rx path, which relied on manual skb
>>> allocation and header copying. This path has become obsolete due to the
>>> availability of build_skb() and the increasing complexity of supporting
>>> features like XDP and multi-buffer.
>>>
>>> The second patch drops the page splitting and recycling logic. While
>>> once used to optimize memory usage, this logic introduced significant
>>> complexity and hotpath overhead. Removing it simplifies the Rx flow and
>>> sets the stage for Page Pool adoption.
>>>
>>> The final patch switches the driver to use the Page Pool and libeth
>>> APIs. It also updates the XDP implementation to use libeth_xdp helpers
>>> and optimizes XDP_TX by avoiding per-frame DMA mapping. This results in
>>> a significant performance improvement in virtualized environments with
>>> IOMMU enabled (over 5x gain in XDP_TX throughput). In other scenarios,
>>> performance remains on par with the previous implementation.
>>>
>>> This conversion also aligns with the broader effort to modularize and
>>> unify XDP support across Intel Ethernet drivers.
>>>
>>> Tested on various workloads including netperf and XDP modes (PASS, DROP,
>>> TX) with and without IOMMU. No regressions observed.
>>
>> Will we be able to have 256 B of XDP headroom after this conversion?
>>
>> Thanks,
>> -jkbs
>
> We should. The queues are configured through libeth, and set the xdp
> field if its enabled on that ring:
>
>> @@ -622,8 +589,14 @@ static unsigned int ice_get_frame_sz(struct ice_rx_ring
>> *rx_ring)
>> */
>> static int ice_vsi_cfg_rxq(struct ice_rx_ring *ring)
>> {
>> + struct libeth_fq fq = {
>> + .count = ring->count,
>> + .nid = NUMA_NO_NODE,
>> + .xdp = ice_is_xdp_ena_vsi(ring->vsi),
>> + .buf_len = LIBIE_MAX_RX_BUF_LEN,
>> + };
>
>
> If .xdp is set, then the libeth Rx configuration reserves
> LIBETH_XDP_HEADROOM, which is XDP_PACKET_HEADROOM aligned to
> NET_SKB_PAD, + an extra NET_IP_ALIGN, which results in 258 bytes of
> headroom reserved.
That's great news. We've been observing a growing adoption of custom XDP
metadata ([1], [2]) at Cloudflare, so the current 192B of headroom in
ICE was limiting.
[1] https://docs.ebpf.io/linux/helper-function/bpf_xdp_adjust_meta/
[2] https://docs.kernel.org/networking/xdp-rx-metadata.html#af-xdp