On 7/7/2025 4:32 PM, Jacob Keller wrote:
> 
> 
> On 7/4/2025 9:18 AM, Michal Kubiak wrote:
>> This series modernizes the Rx path in the ice driver by removing legacy
>> code and switching to the Page Pool API. The changes follow the same
>> direction as previously done for the iavf driver, and aim to simplify
>> buffer management, improve maintainability, and prepare for future
>> infrastructure reuse.
>>
>> An important motivation for this work was addressing reports of poor
>> performance in XDP_TX mode when IOMMU is enabled. The legacy Rx model
>> incurred significant overhead due to per-frame DMA mapping, which
>> limited throughput in virtualized environments. This series eliminates
>> those bottlenecks by adopting Page Pool and bi-directional DMA mapping.
>>
>> The first patch removes the legacy Rx path, which relied on manual skb
>> allocation and header copying. This path has become obsolete due to the
>> availability of build_skb() and the increasing complexity of supporting
>> features like XDP and multi-buffer.
>>
>> The second patch drops the page splitting and recycling logic. While
>> once used to optimize memory usage, this logic introduced significant
>> complexity and hotpath overhead. Removing it simplifies the Rx flow and
>> sets the stage for Page Pool adoption.
>>
>> The final patch switches the driver to use the Page Pool and libeth
>> APIs. It also updates the XDP implementation to use libeth_xdp helpers
>> and optimizes XDP_TX by avoiding per-frame DMA mapping. This results in
>> a significant performance improvement in virtualized environments with
>> IOMMU enabled (over 5x gain in XDP_TX throughput). In other scenarios,
>> performance remains on par with the previous implementation.
>>
>> This conversion also aligns with the broader effort to modularize and
>> unify XDP support across Intel Ethernet drivers.
>>
>> Tested on various workloads including netperf and XDP modes (PASS, DROP,
>> TX) with and without IOMMU. No regressions observed.
>>
>> Last but not least, it is suspected that this series may also help
>> mitigate the memory consumption issues recently reported in the driver.
>> For further details, see:
>>
>> https://lore.kernel.org/intel-wired-lan/cak8ffz4hy6gujnenz3wy9jaylzxgfpr7dnzxzgmyoe44car...@mail.gmail.com/
>>
> 
> I tried to apply these and test them, but I ran into several issues :(
> 
> The iperf3 session starts with some traffic and then very quickly dies
> to zero:
> 
>> [  5]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec
>> [  8]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec
>> [ 10]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec
>> [ 12]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec
>> [ 14]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec
>> [SUM]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec
>> - - - - - - - - - - - - - - - - - - - - - - - - -
>> [  5]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec
>> [  8]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec
>> [ 10]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec
>> [ 12]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec
>> [ 14]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec
>> [SUM]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec
>> - - - - - - - - - - - - - - - - - - - - - - - - -
>> [  5]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec
>> [  8]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec
>> [ 10]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec
>> [ 12]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec
>> [ 14]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec
>> [SUM]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec
>> - - - - - - - - - - - - - - - - - - - - - - - - -
>> [  5]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec
>> [  8]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec
>> [ 10]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec
>> [ 12]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec
>> [ 14]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec
>> [SUM]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec
>> - - - - - - - - - - - - - - - - - - - - - - - - -
>> [  5]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec
>> [  8]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec
>> [ 10]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec
>> [ 12]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec
>> [ 14]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec
>> [SUM]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec
>> - - - - - - - - - - - - - - - - - - - - - - - - -
>> [  5]   9.00-10.00  sec  0.00 Bytes  0.00 bits/sec
>> [  8]   9.00-10.00  sec  0.00 Bytes  0.00 bits/sec
>> [ 10]   9.00-10.00  sec  0.00 Bytes  0.00 bits/sec
>> [ 12]   9.00-10.00  sec  0.00 Bytes  0.00 bits/sec
>> [ 14]   9.00-10.00  sec  0.00 Bytes  0.00 bits/sec
>> [SUM]   9.00-10.00  sec  0.00 Bytes  0.00 bits/sec
> 
> I eventually got a crash:
> 
> 
>> jekeller-stp-glorfindel login: [  326.338776] ------------[ cut here 
>> ]------------
>> [  326.343440] WARNING: CPU: 109 PID: 0 at 
>> include/net/page_pool/helpers.h:297 libeth_rx_recycle_slow+0x2f/0x4f [libeth]
>> [  326.354082] Modules linked in: ice gnss libeth_xdp libeth cfg80211 rfkill 
>> nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet 
>> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ebtable_nat 
>> ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security 
>> iptable_nat nf_nat nf_conntr
>> ack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw 
>> iptable_security nf_tables ebtable_filter ebtables ip6table_filter 
>> ip6_tables iptable_filter ip_tables qrtr intel_rapl_msr intel_rapl_common 
>> intel_uncore_frequency intel_uncore_frequency_common sunrpc skx_edac 
>> skx_edac_common nfit libnvdimm x86_pkg_temp_th
>> ermal intel_powerclamp coretemp kvm_intel spi_nor mtd kvm irqbypass iTCO_wdt 
>> rapl intel_pmc_bxt ipmi_ssif mei_me iTCO_vendor_support intel_cstate vfat 
>> fat i40e spi_intel_pci intel_uncore i2c_i801 pcspkr libie ioatdma mei 
>> libie_adminq lpc_ich i2c_smbus spi_intel intel_pch_thermal dca ipmi_si 
>> acpi_power_meter acpi_ipm
>> i ipmi_devintf ipmi_msghandler acpi_pad fuse loop dm_multipath nfnetlink zram
>> [  326.354222]  lz4hc_compress lz4_compress xfs qat_c62x intel_qat 
>> polyval_clmulni ghash_clmulni_intel sha512_ssse3 sha1_ssse3 ast crc8 
>> i2c_algo_bit wmi scsi_dh_rdac scsi_dh_emc scsi_dh_alua pkcs8_key_parser tls
>> [  326.462156] CPU: 109 UID: 0 PID: 0 Comm: swapper/109 Not tainted 
>> 6.16.0-rc4-ice-page-pool+ #25 PREEMPT(lazy)
>> [  326.472075] Hardware name: Intel Corporation S2600STQ/S2600STQ, BIOS 
>> SE5C620.86B.02.01.0017.110620230543 11/06/2023
>> [  326.482519] RIP: 0010:libeth_rx_recycle_slow+0x2f/0x4f [libeth]
>> [  326.488454] Code: 1f 44 00 00 48 89 f8 48 89 fe 48 83 e0 fe 48 8b 50 28 
>> 48 8b 78 10 48 ff ca 74 20 48 83 ca ff f0 48 0f c1 50 28 48 ff ca 79 07 <0f> 
>> 0b c3 cc cc cc cc 75 12 48 c7 40 28 01 00 00 00 31 c9 83 ca ff
>> [  326.507232] RSP: 0018:ffffd2c4c814cd38 EFLAGS: 00010296
>> [  326.512466] RAX: fffff58c342d0ec0 RBX: 0000000000000000 RCX: 
>> 00000000000000e3
>> [  326.519608] RDX: ffffffffffffffff RSI: fffff58c342d0ec0 RDI: 
>> ffff8d596e024100
>> [  326.527173] RBP: ffffd2c4c814cdf8 R08: ffffd2c4e6bd3960 R09: 
>> 0000000000000000
>> [  326.534674] R10: 00000000fffffb54 R11: 000000000002cd86 R12: 
>> ffff8d49fde71cb0
>> [  326.542159] R13: 00000000000001cb R14: ffff8d49acca5600 R15: 
>> ffffd2c4e6bd3960
>> [  326.549627] FS:  0000000000000000(0000) GS:ffff8d59a3c9b000(0000) 
>> knlGS:0000000000000000
>> [  326.558047] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  326.564119] CR2: 00007f3eda90df78 CR3: 0000000caee56001 CR4: 
>> 00000000007726f0
>> [  326.571574] PKRU: 55555554
>> [  326.574595] Call Trace:
>> [  326.577353]  <IRQ>
>> [  326.579664]  ice_clean_rx_irq+0x431/0x520 [ice]
>> [  326.584584]  ? iommu_dma_unmap_page+0x48/0x90
>> [  326.589232]  ice_napi_poll+0xbe/0x2a0 [ice]
>> [  326.593786]  __napi_poll+0x2e/0x1e0
>> [  326.597567]  net_rx_action+0x336/0x420
>> [  326.601608]  ? update_rq_clock_task+0x3f/0x1d0
>> [  326.606344]  ? sched_clock+0x10/0x30
>> [  326.610207]  handle_softirqs+0xed/0x340
>> [  326.614316]  __irq_exit_rcu+0xcb/0xf0
>> [  326.618241]  common_interrupt+0x85/0xa0
>> [  326.622340]  </IRQ>
>> [  326.624702]  <TASK>
>> [  326.627053]  asm_common_interrupt+0x26/0x40
>> [  326.631493] RIP: 0010:cpuidle_enter_state+0xcc/0x660
>> [  326.636709] Code: 00 00 e8 67 40 ed fe e8 32 f0 ff ff 49 89 c4 0f 1f 44 
>> 00 00 31 ff e8 53 54 eb fe 45 84 ff 0f 85 02 02 00 00 fb 0f 1f 44 00 00 <85> 
>> ed 0f 88 d3 01 00 00 4c 63 f5 49 83 fe 0a 0f 83 9f 04 00 00 49
>> [  326.655959] RSP: 0018:ffffd2c4c6aefe50 EFLAGS: 00000246
>> [  326.661446] RAX: ffff8d59a3c9b000 RBX: ffff8d592decfe80 RCX: 
>> 0000000000000000
>> [  326.668863] RDX: 0000004bfb4d51d2 RSI: 000000003351fed6 RDI: 
>> 0000000000000000
>> [  326.676284] RBP: 0000000000000002 R08: ffffffbe2deca6d0 R09: 
>> ffff8d592deb0660
>> [  326.683706] R10: 0000008df1fafa1d R11: 0000000000000000 R12: 
>> 0000004bfb4d51d2
>> [  326.691133] R13: ffffffff89512ee0 R14: 0000000000000002 R15: 
>> 0000000000000000
>> [  326.698560]  cpuidle_enter+0x31/0x50
>> [  326.702387]  cpuidle_idle_call+0xf5/0x160
>> [  326.706647]  do_idle+0x78/0xd0
>> [  326.709937]  cpu_startup_entry+0x29/0x30
>> [  326.714087]  start_secondary+0x126/0x170
>> [  326.718241]  common_startup_64+0x13e/0x141
>> [  326.722561]  </TASK>
>> [  326.724960] ---[ end trace 0000000000000000 ]---
> 
> Something has gone wrong with the patches applied :(
> 
> 
Changing MTU back to 1500 broke things further:

[  436.757872] page_pool_empty_ring() page_pool refcnt -4 violation
[  436.758147]  non-paged memory
[  436.763901] page_pool_empty_ring() page_pool refcnt -5 violation
[  436.766847] list_del corruption. prev->next should be
fffff58c32d39dc8, but was 0000000000000000. (prev=fffff58c332cb6c8)
[  436.766894] ------------[ cut here ]------------
[  436.772880] page_pool_empty_ring() page_pool refcnt -6 violation
[  436.783799] kernel BUG at lib/list_debug.c:62!
[  436.783805] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
[  436.788430] page_pool_empty_ring() page_pool refcnt -7 violation
[  436.794426] CPU: 61 UID: 0 PID: 388 Comm: kworker/61:0 Tainted: G
   W           6.16.0-rc4-ice-page-pool+ #25 PREEMPT(lazy)
[  436.798880] page_pool_empty_ring() page_pool refcnt -8 violation
[  436.805139] Tainted: [W]=WARN
[  436.805140] Hardware name: Intel Corporation S2600STQ/S2600STQ, BIOS
SE5C620.86B.02.01.0017.110620230543 11/06/2023
[  436.805142] Workqueue: events drm_fb_helper_damage_work
[  436.811149] page_pool_empty_ring() page_pool refcnt -9 violation
[  436.822785]
[  436.822786] RIP: 0010:__list_del_entry_valid_or_report+0xd8/0x110
[  436.828791] page_pool_empty_ring() page_pool refcnt -10 violation
[  436.831755] Code: 0b 48 89 7c 24 08 48 89 cf 48 89 0c 24 e8 30 8b a8
ff 48 8b 0c 24 48 8b 74 24 08 48 c7 c7 f8 40 80 88 48 8b 11 e8 78 59 5c
ff <0f> 0b 48 89 7c 24 08 48 89 d7 48 89 14 24 e8 05 8b a8 ff 48 8b 14
[  436.831757] RSP: 0018:ffffd2c4c7774d38 EFLAGS: 00010046
[  436.842181] page_pool_empty_ring() page_pool refcnt -11 violation
[  436.847396]
[  436.847397] RAX: 000000000000006d RBX: ffff8d494b6ccc40 RCX:
0000000000000027
[  436.847399] RDX: 0000000000000000 RSI: 0000000000000001 RDI:
ffff8d494b69e480
[  436.853408] page_pool_empty_ring() page_pool refcnt -12 violation
[  436.854903] RBP: 0000000000000001 R08: 0000000000000000 R09:
ffffd2c4c7774b20
[  436.854904] R10: ffff8d592c23ffa8 R11: 00000000ffff7fff R12:
fffff58c32d39dc0
[  436.854906] R13: 0000000000000022 R14: ffff8d494b6ccc40 R15:
0000000000cb4e77
[  436.860998] page_pool_empty_ring() page_pool refcnt -13 violation
[  436.867089] FS:  0000000000000000(0000) GS:ffff8d49c149b000(0000)
knlGS:0000000000000000
[  436.867091] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  436.867093] CR2: 000056464518b448 CR3: 0000000161138004 CR4:
00000000007726f0
[  436.885836] page_pool_empty_ring() page_pool refcnt -14 violation
[  436.891050] PKRU: 55555554
[  436.891052] Call Trace:
[  436.891053]  <IRQ>
[  436.897148] page_pool_empty_ring() page_pool refcnt -15 violation
[  436.898634]  free_pcppages_bulk+0x140/0x2d0
[  436.905772] page_pool_empty_ring() page_pool refcnt -16 violation
[  436.912901]  free_frozen_page_commit+0x1d8/0x370
[  436.918997] page_pool_empty_ring() page_pool refcnt -17 violation
[  436.926126]  __free_frozen_pages+0x56c/0x810
[  436.933260] page_pool_empty_ring() page_pool refcnt -18 violation
[  436.940383]  kmem_cache_free+0x3dc/0x490
[  436.946479] page_pool_empty_ring() page_pool refcnt -19 violation
[  436.954561]  ? rcu_do_batch+0x1d4/0x810
[  436.960312] page_pool_empty_ring() page_pool refcnt -20 violation
[  436.967438]  rcu_do_batch+0x1d4/0x810
[  436.967442]  rcu_core+0x17d/0x350
[  436.973535] page_pool_empty_ring() page_pool refcnt -21 violation
[  436.976246]  ? sched_clock_cpu+0xb/0x30
[  436.978702] page_pool_empty_ring() page_pool refcnt -22 violation
[  436.980717]  ? irqtime_account_irq+0x3c/0xc0
[  436.980723]  handle_softirqs+0xed/0x340
[  436.986813] page_pool_empty_ring() page_pool refcnt -23 violation
[  436.990996]  __irq_exit_rcu+0xcb/0xf0
[  436.990999]  sysvec_apic_timer_interrupt+0x71/0x90
[  436.997092] page_pool_empty_ring() page_pool refcnt -24 violation
[  437.001706]  </IRQ>
[  437.001707]  <TASK>
[  437.001708]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
[  437.007805] page_pool_empty_ring() page_pool refcnt -25 violation
[  437.012073] RIP: 0010:memcpy_toio+0x71/0xb0
[  437.018168] page_pool_empty_ring() page_pool refcnt -26 violation
[  437.022093] Code: 5c c3 cc cc cc cc 48 85 db 74 f2 40 f6 c5 01 75 45
48 83 fb 01 76 06 40 f6 c5 02 75 25 48 89 d9 48 89 ef 4c 89 e6 48 c1 e9
02 <f3> a5 f6 c3 02 74 02 66 a5 f6 c3 01 74 01 a4 5b 5d 41 5c c3 cc cc
[  437.022095] RSP: 0018:ffffd2c4c7767ae0 EFLAGS: 00010206
[  437.028189] page_pool_empty_ring() page_pool refcnt -27 violation
[  437.032022] RAX: 0000000000000000 RBX: 0000000000001000 RCX:
00000000000003f0
[  437.032024] RDX: 0000000000001000 RSI: ffffd2c4cac7d040 RDI:
ffffd2c4ce080040
[  437.032025] RBP: ffffd2c4ce080000 R08: ffffd2c4c7767be0 R09:
ffff8d3e3c030758
[  437.038120] page_pool_empty_ring() page_pool refcnt -28 violation
[  437.041781] R10: 0000000000000000 R11: ffffd2c4cabfd000 R12:
ffffd2c4cac7d000

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to