On 7/7/2025 4:32 PM, Jacob Keller wrote: > > > On 7/4/2025 9:18 AM, Michal Kubiak wrote: >> This series modernizes the Rx path in the ice driver by removing legacy >> code and switching to the Page Pool API. The changes follow the same >> direction as previously done for the iavf driver, and aim to simplify >> buffer management, improve maintainability, and prepare for future >> infrastructure reuse. >> >> An important motivation for this work was addressing reports of poor >> performance in XDP_TX mode when IOMMU is enabled. The legacy Rx model >> incurred significant overhead due to per-frame DMA mapping, which >> limited throughput in virtualized environments. This series eliminates >> those bottlenecks by adopting Page Pool and bi-directional DMA mapping. >> >> The first patch removes the legacy Rx path, which relied on manual skb >> allocation and header copying. This path has become obsolete due to the >> availability of build_skb() and the increasing complexity of supporting >> features like XDP and multi-buffer. >> >> The second patch drops the page splitting and recycling logic. While >> once used to optimize memory usage, this logic introduced significant >> complexity and hotpath overhead. Removing it simplifies the Rx flow and >> sets the stage for Page Pool adoption. >> >> The final patch switches the driver to use the Page Pool and libeth >> APIs. It also updates the XDP implementation to use libeth_xdp helpers >> and optimizes XDP_TX by avoiding per-frame DMA mapping. This results in >> a significant performance improvement in virtualized environments with >> IOMMU enabled (over 5x gain in XDP_TX throughput). In other scenarios, >> performance remains on par with the previous implementation. >> >> This conversion also aligns with the broader effort to modularize and >> unify XDP support across Intel Ethernet drivers. >> >> Tested on various workloads including netperf and XDP modes (PASS, DROP, >> TX) with and without IOMMU. No regressions observed. >> >> Last but not least, it is suspected that this series may also help >> mitigate the memory consumption issues recently reported in the driver. >> For further details, see: >> >> https://lore.kernel.org/intel-wired-lan/cak8ffz4hy6gujnenz3wy9jaylzxgfpr7dnzxzgmyoe44car...@mail.gmail.com/ >> > > I tried to apply these and test them, but I ran into several issues :( > > The iperf3 session starts with some traffic and then very quickly dies > to zero: > >> [ 5] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec >> [ 8] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec >> [ 10] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec >> [ 12] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec >> [ 14] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec >> [SUM] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec >> - - - - - - - - - - - - - - - - - - - - - - - - - >> [ 5] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec >> [ 8] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec >> [ 10] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec >> [ 12] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec >> [ 14] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec >> [SUM] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec >> - - - - - - - - - - - - - - - - - - - - - - - - - >> [ 5] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec >> [ 8] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec >> [ 10] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec >> [ 12] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec >> [ 14] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec >> [SUM] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec >> - - - - - - - - - - - - - - - - - - - - - - - - - >> [ 5] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec >> [ 8] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec >> [ 10] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec >> [ 12] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec >> [ 14] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec >> [SUM] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec >> - - - - - - - - - - - - - - - - - - - - - - - - - >> [ 5] 8.00-9.00 sec 0.00 Bytes 0.00 bits/sec >> [ 8] 8.00-9.00 sec 0.00 Bytes 0.00 bits/sec >> [ 10] 8.00-9.00 sec 0.00 Bytes 0.00 bits/sec >> [ 12] 8.00-9.00 sec 0.00 Bytes 0.00 bits/sec >> [ 14] 8.00-9.00 sec 0.00 Bytes 0.00 bits/sec >> [SUM] 8.00-9.00 sec 0.00 Bytes 0.00 bits/sec >> - - - - - - - - - - - - - - - - - - - - - - - - - >> [ 5] 9.00-10.00 sec 0.00 Bytes 0.00 bits/sec >> [ 8] 9.00-10.00 sec 0.00 Bytes 0.00 bits/sec >> [ 10] 9.00-10.00 sec 0.00 Bytes 0.00 bits/sec >> [ 12] 9.00-10.00 sec 0.00 Bytes 0.00 bits/sec >> [ 14] 9.00-10.00 sec 0.00 Bytes 0.00 bits/sec >> [SUM] 9.00-10.00 sec 0.00 Bytes 0.00 bits/sec > > I eventually got a crash: > > >> jekeller-stp-glorfindel login: [ 326.338776] ------------[ cut here >> ]------------ >> [ 326.343440] WARNING: CPU: 109 PID: 0 at >> include/net/page_pool/helpers.h:297 libeth_rx_recycle_slow+0x2f/0x4f [libeth] >> [ 326.354082] Modules linked in: ice gnss libeth_xdp libeth cfg80211 rfkill >> nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet >> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ebtable_nat >> ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security >> iptable_nat nf_nat nf_conntr >> ack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw >> iptable_security nf_tables ebtable_filter ebtables ip6table_filter >> ip6_tables iptable_filter ip_tables qrtr intel_rapl_msr intel_rapl_common >> intel_uncore_frequency intel_uncore_frequency_common sunrpc skx_edac >> skx_edac_common nfit libnvdimm x86_pkg_temp_th >> ermal intel_powerclamp coretemp kvm_intel spi_nor mtd kvm irqbypass iTCO_wdt >> rapl intel_pmc_bxt ipmi_ssif mei_me iTCO_vendor_support intel_cstate vfat >> fat i40e spi_intel_pci intel_uncore i2c_i801 pcspkr libie ioatdma mei >> libie_adminq lpc_ich i2c_smbus spi_intel intel_pch_thermal dca ipmi_si >> acpi_power_meter acpi_ipm >> i ipmi_devintf ipmi_msghandler acpi_pad fuse loop dm_multipath nfnetlink zram >> [ 326.354222] lz4hc_compress lz4_compress xfs qat_c62x intel_qat >> polyval_clmulni ghash_clmulni_intel sha512_ssse3 sha1_ssse3 ast crc8 >> i2c_algo_bit wmi scsi_dh_rdac scsi_dh_emc scsi_dh_alua pkcs8_key_parser tls >> [ 326.462156] CPU: 109 UID: 0 PID: 0 Comm: swapper/109 Not tainted >> 6.16.0-rc4-ice-page-pool+ #25 PREEMPT(lazy) >> [ 326.472075] Hardware name: Intel Corporation S2600STQ/S2600STQ, BIOS >> SE5C620.86B.02.01.0017.110620230543 11/06/2023 >> [ 326.482519] RIP: 0010:libeth_rx_recycle_slow+0x2f/0x4f [libeth] >> [ 326.488454] Code: 1f 44 00 00 48 89 f8 48 89 fe 48 83 e0 fe 48 8b 50 28 >> 48 8b 78 10 48 ff ca 74 20 48 83 ca ff f0 48 0f c1 50 28 48 ff ca 79 07 <0f> >> 0b c3 cc cc cc cc 75 12 48 c7 40 28 01 00 00 00 31 c9 83 ca ff >> [ 326.507232] RSP: 0018:ffffd2c4c814cd38 EFLAGS: 00010296 >> [ 326.512466] RAX: fffff58c342d0ec0 RBX: 0000000000000000 RCX: >> 00000000000000e3 >> [ 326.519608] RDX: ffffffffffffffff RSI: fffff58c342d0ec0 RDI: >> ffff8d596e024100 >> [ 326.527173] RBP: ffffd2c4c814cdf8 R08: ffffd2c4e6bd3960 R09: >> 0000000000000000 >> [ 326.534674] R10: 00000000fffffb54 R11: 000000000002cd86 R12: >> ffff8d49fde71cb0 >> [ 326.542159] R13: 00000000000001cb R14: ffff8d49acca5600 R15: >> ffffd2c4e6bd3960 >> [ 326.549627] FS: 0000000000000000(0000) GS:ffff8d59a3c9b000(0000) >> knlGS:0000000000000000 >> [ 326.558047] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 326.564119] CR2: 00007f3eda90df78 CR3: 0000000caee56001 CR4: >> 00000000007726f0 >> [ 326.571574] PKRU: 55555554 >> [ 326.574595] Call Trace: >> [ 326.577353] <IRQ> >> [ 326.579664] ice_clean_rx_irq+0x431/0x520 [ice] >> [ 326.584584] ? iommu_dma_unmap_page+0x48/0x90 >> [ 326.589232] ice_napi_poll+0xbe/0x2a0 [ice] >> [ 326.593786] __napi_poll+0x2e/0x1e0 >> [ 326.597567] net_rx_action+0x336/0x420 >> [ 326.601608] ? update_rq_clock_task+0x3f/0x1d0 >> [ 326.606344] ? sched_clock+0x10/0x30 >> [ 326.610207] handle_softirqs+0xed/0x340 >> [ 326.614316] __irq_exit_rcu+0xcb/0xf0 >> [ 326.618241] common_interrupt+0x85/0xa0 >> [ 326.622340] </IRQ> >> [ 326.624702] <TASK> >> [ 326.627053] asm_common_interrupt+0x26/0x40 >> [ 326.631493] RIP: 0010:cpuidle_enter_state+0xcc/0x660 >> [ 326.636709] Code: 00 00 e8 67 40 ed fe e8 32 f0 ff ff 49 89 c4 0f 1f 44 >> 00 00 31 ff e8 53 54 eb fe 45 84 ff 0f 85 02 02 00 00 fb 0f 1f 44 00 00 <85> >> ed 0f 88 d3 01 00 00 4c 63 f5 49 83 fe 0a 0f 83 9f 04 00 00 49 >> [ 326.655959] RSP: 0018:ffffd2c4c6aefe50 EFLAGS: 00000246 >> [ 326.661446] RAX: ffff8d59a3c9b000 RBX: ffff8d592decfe80 RCX: >> 0000000000000000 >> [ 326.668863] RDX: 0000004bfb4d51d2 RSI: 000000003351fed6 RDI: >> 0000000000000000 >> [ 326.676284] RBP: 0000000000000002 R08: ffffffbe2deca6d0 R09: >> ffff8d592deb0660 >> [ 326.683706] R10: 0000008df1fafa1d R11: 0000000000000000 R12: >> 0000004bfb4d51d2 >> [ 326.691133] R13: ffffffff89512ee0 R14: 0000000000000002 R15: >> 0000000000000000 >> [ 326.698560] cpuidle_enter+0x31/0x50 >> [ 326.702387] cpuidle_idle_call+0xf5/0x160 >> [ 326.706647] do_idle+0x78/0xd0 >> [ 326.709937] cpu_startup_entry+0x29/0x30 >> [ 326.714087] start_secondary+0x126/0x170 >> [ 326.718241] common_startup_64+0x13e/0x141 >> [ 326.722561] </TASK> >> [ 326.724960] ---[ end trace 0000000000000000 ]--- > > Something has gone wrong with the patches applied :( > > Changing MTU back to 1500 broke things further:
[ 436.757872] page_pool_empty_ring() page_pool refcnt -4 violation [ 436.758147] non-paged memory [ 436.763901] page_pool_empty_ring() page_pool refcnt -5 violation [ 436.766847] list_del corruption. prev->next should be fffff58c32d39dc8, but was 0000000000000000. (prev=fffff58c332cb6c8) [ 436.766894] ------------[ cut here ]------------ [ 436.772880] page_pool_empty_ring() page_pool refcnt -6 violation [ 436.783799] kernel BUG at lib/list_debug.c:62! [ 436.783805] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI [ 436.788430] page_pool_empty_ring() page_pool refcnt -7 violation [ 436.794426] CPU: 61 UID: 0 PID: 388 Comm: kworker/61:0 Tainted: G W 6.16.0-rc4-ice-page-pool+ #25 PREEMPT(lazy) [ 436.798880] page_pool_empty_ring() page_pool refcnt -8 violation [ 436.805139] Tainted: [W]=WARN [ 436.805140] Hardware name: Intel Corporation S2600STQ/S2600STQ, BIOS SE5C620.86B.02.01.0017.110620230543 11/06/2023 [ 436.805142] Workqueue: events drm_fb_helper_damage_work [ 436.811149] page_pool_empty_ring() page_pool refcnt -9 violation [ 436.822785] [ 436.822786] RIP: 0010:__list_del_entry_valid_or_report+0xd8/0x110 [ 436.828791] page_pool_empty_ring() page_pool refcnt -10 violation [ 436.831755] Code: 0b 48 89 7c 24 08 48 89 cf 48 89 0c 24 e8 30 8b a8 ff 48 8b 0c 24 48 8b 74 24 08 48 c7 c7 f8 40 80 88 48 8b 11 e8 78 59 5c ff <0f> 0b 48 89 7c 24 08 48 89 d7 48 89 14 24 e8 05 8b a8 ff 48 8b 14 [ 436.831757] RSP: 0018:ffffd2c4c7774d38 EFLAGS: 00010046 [ 436.842181] page_pool_empty_ring() page_pool refcnt -11 violation [ 436.847396] [ 436.847397] RAX: 000000000000006d RBX: ffff8d494b6ccc40 RCX: 0000000000000027 [ 436.847399] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8d494b69e480 [ 436.853408] page_pool_empty_ring() page_pool refcnt -12 violation [ 436.854903] RBP: 0000000000000001 R08: 0000000000000000 R09: ffffd2c4c7774b20 [ 436.854904] R10: ffff8d592c23ffa8 R11: 00000000ffff7fff R12: fffff58c32d39dc0 [ 436.854906] R13: 0000000000000022 R14: ffff8d494b6ccc40 R15: 0000000000cb4e77 [ 436.860998] page_pool_empty_ring() page_pool refcnt -13 violation [ 436.867089] FS: 0000000000000000(0000) GS:ffff8d49c149b000(0000) knlGS:0000000000000000 [ 436.867091] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 436.867093] CR2: 000056464518b448 CR3: 0000000161138004 CR4: 00000000007726f0 [ 436.885836] page_pool_empty_ring() page_pool refcnt -14 violation [ 436.891050] PKRU: 55555554 [ 436.891052] Call Trace: [ 436.891053] <IRQ> [ 436.897148] page_pool_empty_ring() page_pool refcnt -15 violation [ 436.898634] free_pcppages_bulk+0x140/0x2d0 [ 436.905772] page_pool_empty_ring() page_pool refcnt -16 violation [ 436.912901] free_frozen_page_commit+0x1d8/0x370 [ 436.918997] page_pool_empty_ring() page_pool refcnt -17 violation [ 436.926126] __free_frozen_pages+0x56c/0x810 [ 436.933260] page_pool_empty_ring() page_pool refcnt -18 violation [ 436.940383] kmem_cache_free+0x3dc/0x490 [ 436.946479] page_pool_empty_ring() page_pool refcnt -19 violation [ 436.954561] ? rcu_do_batch+0x1d4/0x810 [ 436.960312] page_pool_empty_ring() page_pool refcnt -20 violation [ 436.967438] rcu_do_batch+0x1d4/0x810 [ 436.967442] rcu_core+0x17d/0x350 [ 436.973535] page_pool_empty_ring() page_pool refcnt -21 violation [ 436.976246] ? sched_clock_cpu+0xb/0x30 [ 436.978702] page_pool_empty_ring() page_pool refcnt -22 violation [ 436.980717] ? irqtime_account_irq+0x3c/0xc0 [ 436.980723] handle_softirqs+0xed/0x340 [ 436.986813] page_pool_empty_ring() page_pool refcnt -23 violation [ 436.990996] __irq_exit_rcu+0xcb/0xf0 [ 436.990999] sysvec_apic_timer_interrupt+0x71/0x90 [ 436.997092] page_pool_empty_ring() page_pool refcnt -24 violation [ 437.001706] </IRQ> [ 437.001707] <TASK> [ 437.001708] asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ 437.007805] page_pool_empty_ring() page_pool refcnt -25 violation [ 437.012073] RIP: 0010:memcpy_toio+0x71/0xb0 [ 437.018168] page_pool_empty_ring() page_pool refcnt -26 violation [ 437.022093] Code: 5c c3 cc cc cc cc 48 85 db 74 f2 40 f6 c5 01 75 45 48 83 fb 01 76 06 40 f6 c5 02 75 25 48 89 d9 48 89 ef 4c 89 e6 48 c1 e9 02 <f3> a5 f6 c3 02 74 02 66 a5 f6 c3 01 74 01 a4 5b 5d 41 5c c3 cc cc [ 437.022095] RSP: 0018:ffffd2c4c7767ae0 EFLAGS: 00010206 [ 437.028189] page_pool_empty_ring() page_pool refcnt -27 violation [ 437.032022] RAX: 0000000000000000 RBX: 0000000000001000 RCX: 00000000000003f0 [ 437.032024] RDX: 0000000000001000 RSI: ffffd2c4cac7d040 RDI: ffffd2c4ce080040 [ 437.032025] RBP: ffffd2c4ce080000 R08: ffffd2c4c7767be0 R09: ffff8d3e3c030758 [ 437.038120] page_pool_empty_ring() page_pool refcnt -28 violation [ 437.041781] R10: 0000000000000000 R11: ffffd2c4cabfd000 R12: ffffd2c4cac7d000
OpenPGP_signature.asc
Description: OpenPGP digital signature
