On 7/4/2025 9:18 AM, Michal Kubiak wrote:
> This series modernizes the Rx path in the ice driver by removing legacy
> code and switching to the Page Pool API. The changes follow the same
> direction as previously done for the iavf driver, and aim to simplify
> buffer management, improve maintainability, and prepare for future
> infrastructure reuse.
> 
> An important motivation for this work was addressing reports of poor
> performance in XDP_TX mode when IOMMU is enabled. The legacy Rx model
> incurred significant overhead due to per-frame DMA mapping, which
> limited throughput in virtualized environments. This series eliminates
> those bottlenecks by adopting Page Pool and bi-directional DMA mapping.
> 
> The first patch removes the legacy Rx path, which relied on manual skb
> allocation and header copying. This path has become obsolete due to the
> availability of build_skb() and the increasing complexity of supporting
> features like XDP and multi-buffer.
> 
> The second patch drops the page splitting and recycling logic. While
> once used to optimize memory usage, this logic introduced significant
> complexity and hotpath overhead. Removing it simplifies the Rx flow and
> sets the stage for Page Pool adoption.
> 
> The final patch switches the driver to use the Page Pool and libeth
> APIs. It also updates the XDP implementation to use libeth_xdp helpers
> and optimizes XDP_TX by avoiding per-frame DMA mapping. This results in
> a significant performance improvement in virtualized environments with
> IOMMU enabled (over 5x gain in XDP_TX throughput). In other scenarios,
> performance remains on par with the previous implementation.
> 
> This conversion also aligns with the broader effort to modularize and
> unify XDP support across Intel Ethernet drivers.
> 
> Tested on various workloads including netperf and XDP modes (PASS, DROP,
> TX) with and without IOMMU. No regressions observed.
> 
> Last but not least, it is suspected that this series may also help
> mitigate the memory consumption issues recently reported in the driver.
> For further details, see:
> 
> https://lore.kernel.org/intel-wired-lan/cak8ffz4hy6gujnenz3wy9jaylzxgfpr7dnzxzgmyoe44car...@mail.gmail.com/
> 

I tried to apply these and test them, but I ran into several issues :(

The iperf3 session starts with some traffic and then very quickly dies
to zero:

> [  5]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec
> [  8]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec
> [ 10]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec
> [ 12]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec
> [ 14]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec
> [SUM]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [  5]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec
> [  8]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec
> [ 10]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec
> [ 12]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec
> [ 14]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec
> [SUM]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [  5]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec
> [  8]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec
> [ 10]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec
> [ 12]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec
> [ 14]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec
> [SUM]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [  5]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec
> [  8]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec
> [ 10]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec
> [ 12]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec
> [ 14]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec
> [SUM]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [  5]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec
> [  8]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec
> [ 10]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec
> [ 12]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec
> [ 14]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec
> [SUM]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [  5]   9.00-10.00  sec  0.00 Bytes  0.00 bits/sec
> [  8]   9.00-10.00  sec  0.00 Bytes  0.00 bits/sec
> [ 10]   9.00-10.00  sec  0.00 Bytes  0.00 bits/sec
> [ 12]   9.00-10.00  sec  0.00 Bytes  0.00 bits/sec
> [ 14]   9.00-10.00  sec  0.00 Bytes  0.00 bits/sec
> [SUM]   9.00-10.00  sec  0.00 Bytes  0.00 bits/sec

I eventually got a crash:


> jekeller-stp-glorfindel login: [  326.338776] ------------[ cut here 
> ]------------
> [  326.343440] WARNING: CPU: 109 PID: 0 at 
> include/net/page_pool/helpers.h:297 libeth_rx_recycle_slow+0x2f/0x4f [libeth]
> [  326.354082] Modules linked in: ice gnss libeth_xdp libeth cfg80211 rfkill 
> nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 
> nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ebtable_nat ebtable_broute 
> ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat 
> nf_nat nf_conntr
> ack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security 
> nf_tables ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter 
> ip_tables qrtr intel_rapl_msr intel_rapl_common intel_uncore_frequency 
> intel_uncore_frequency_common sunrpc skx_edac skx_edac_common nfit libnvdimm 
> x86_pkg_temp_th
> ermal intel_powerclamp coretemp kvm_intel spi_nor mtd kvm irqbypass iTCO_wdt 
> rapl intel_pmc_bxt ipmi_ssif mei_me iTCO_vendor_support intel_cstate vfat fat 
> i40e spi_intel_pci intel_uncore i2c_i801 pcspkr libie ioatdma mei 
> libie_adminq lpc_ich i2c_smbus spi_intel intel_pch_thermal dca ipmi_si 
> acpi_power_meter acpi_ipm
> i ipmi_devintf ipmi_msghandler acpi_pad fuse loop dm_multipath nfnetlink zram
> [  326.354222]  lz4hc_compress lz4_compress xfs qat_c62x intel_qat 
> polyval_clmulni ghash_clmulni_intel sha512_ssse3 sha1_ssse3 ast crc8 
> i2c_algo_bit wmi scsi_dh_rdac scsi_dh_emc scsi_dh_alua pkcs8_key_parser tls
> [  326.462156] CPU: 109 UID: 0 PID: 0 Comm: swapper/109 Not tainted 
> 6.16.0-rc4-ice-page-pool+ #25 PREEMPT(lazy)
> [  326.472075] Hardware name: Intel Corporation S2600STQ/S2600STQ, BIOS 
> SE5C620.86B.02.01.0017.110620230543 11/06/2023
> [  326.482519] RIP: 0010:libeth_rx_recycle_slow+0x2f/0x4f [libeth]
> [  326.488454] Code: 1f 44 00 00 48 89 f8 48 89 fe 48 83 e0 fe 48 8b 50 28 48 
> 8b 78 10 48 ff ca 74 20 48 83 ca ff f0 48 0f c1 50 28 48 ff ca 79 07 <0f> 0b 
> c3 cc cc cc cc 75 12 48 c7 40 28 01 00 00 00 31 c9 83 ca ff
> [  326.507232] RSP: 0018:ffffd2c4c814cd38 EFLAGS: 00010296
> [  326.512466] RAX: fffff58c342d0ec0 RBX: 0000000000000000 RCX: 
> 00000000000000e3
> [  326.519608] RDX: ffffffffffffffff RSI: fffff58c342d0ec0 RDI: 
> ffff8d596e024100
> [  326.527173] RBP: ffffd2c4c814cdf8 R08: ffffd2c4e6bd3960 R09: 
> 0000000000000000
> [  326.534674] R10: 00000000fffffb54 R11: 000000000002cd86 R12: 
> ffff8d49fde71cb0
> [  326.542159] R13: 00000000000001cb R14: ffff8d49acca5600 R15: 
> ffffd2c4e6bd3960
> [  326.549627] FS:  0000000000000000(0000) GS:ffff8d59a3c9b000(0000) 
> knlGS:0000000000000000
> [  326.558047] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  326.564119] CR2: 00007f3eda90df78 CR3: 0000000caee56001 CR4: 
> 00000000007726f0
> [  326.571574] PKRU: 55555554
> [  326.574595] Call Trace:
> [  326.577353]  <IRQ>
> [  326.579664]  ice_clean_rx_irq+0x431/0x520 [ice]
> [  326.584584]  ? iommu_dma_unmap_page+0x48/0x90
> [  326.589232]  ice_napi_poll+0xbe/0x2a0 [ice]
> [  326.593786]  __napi_poll+0x2e/0x1e0
> [  326.597567]  net_rx_action+0x336/0x420
> [  326.601608]  ? update_rq_clock_task+0x3f/0x1d0
> [  326.606344]  ? sched_clock+0x10/0x30
> [  326.610207]  handle_softirqs+0xed/0x340
> [  326.614316]  __irq_exit_rcu+0xcb/0xf0
> [  326.618241]  common_interrupt+0x85/0xa0
> [  326.622340]  </IRQ>
> [  326.624702]  <TASK>
> [  326.627053]  asm_common_interrupt+0x26/0x40
> [  326.631493] RIP: 0010:cpuidle_enter_state+0xcc/0x660
> [  326.636709] Code: 00 00 e8 67 40 ed fe e8 32 f0 ff ff 49 89 c4 0f 1f 44 00 
> 00 31 ff e8 53 54 eb fe 45 84 ff 0f 85 02 02 00 00 fb 0f 1f 44 00 00 <85> ed 
> 0f 88 d3 01 00 00 4c 63 f5 49 83 fe 0a 0f 83 9f 04 00 00 49
> [  326.655959] RSP: 0018:ffffd2c4c6aefe50 EFLAGS: 00000246
> [  326.661446] RAX: ffff8d59a3c9b000 RBX: ffff8d592decfe80 RCX: 
> 0000000000000000
> [  326.668863] RDX: 0000004bfb4d51d2 RSI: 000000003351fed6 RDI: 
> 0000000000000000
> [  326.676284] RBP: 0000000000000002 R08: ffffffbe2deca6d0 R09: 
> ffff8d592deb0660
> [  326.683706] R10: 0000008df1fafa1d R11: 0000000000000000 R12: 
> 0000004bfb4d51d2
> [  326.691133] R13: ffffffff89512ee0 R14: 0000000000000002 R15: 
> 0000000000000000
> [  326.698560]  cpuidle_enter+0x31/0x50
> [  326.702387]  cpuidle_idle_call+0xf5/0x160
> [  326.706647]  do_idle+0x78/0xd0
> [  326.709937]  cpu_startup_entry+0x29/0x30
> [  326.714087]  start_secondary+0x126/0x170
> [  326.718241]  common_startup_64+0x13e/0x141
> [  326.722561]  </TASK>
> [  326.724960] ---[ end trace 0000000000000000 ]---

Something has gone wrong with the patches applied :(


> Thanks,
> Michal
> 
> Michal Kubiak (3):
>   ice: remove legacy Rx and construct SKB
>   ice: drop page splitting and recycling
>   ice: switch to Page Pool
> 
>  drivers/net/ethernet/intel/Kconfig            |   1 +
>  drivers/net/ethernet/intel/ice/ice.h          |   3 +-
>  drivers/net/ethernet/intel/ice/ice_base.c     | 122 ++--
>  drivers/net/ethernet/intel/ice/ice_ethtool.c  |  22 +-
>  drivers/net/ethernet/intel/ice/ice_lib.c      |   1 -
>  drivers/net/ethernet/intel/ice/ice_main.c     |  21 +-
>  drivers/net/ethernet/intel/ice/ice_txrx.c     | 645 +++---------------
>  drivers/net/ethernet/intel/ice/ice_txrx.h     |  37 +-
>  drivers/net/ethernet/intel/ice/ice_txrx_lib.c |  65 +-
>  drivers/net/ethernet/intel/ice/ice_txrx_lib.h |   7 +-
>  drivers/net/ethernet/intel/ice/ice_virtchnl.c |   5 +-
>  drivers/net/ethernet/intel/ice/ice_xsk.c      | 120 +---
>  drivers/net/ethernet/intel/ice/ice_xsk.h      |   6 +-
>  13 files changed, 205 insertions(+), 850 deletions(-)
> 

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to