> -----Original Message-----
> From: Intel-wired-lan <[email protected]> On Behalf Of Jacob 
> Keller
> Sent: 12 July 2025 05:54
> To: Fijalkowski, Maciej <[email protected]>; Kitszel, Przemyslaw 
> <[email protected]>; Intel Wired LAN 
> <[email protected]>; Lobakin, Aleksander 
> <[email protected]>
> Cc: Joe Damato <[email protected]>; Nguyen, Anthony L 
> <[email protected]>; [email protected]; Christoph Petrausch 
> <[email protected]>; Jaroslav Pulchart 
> <[email protected]>; Keller, Jacob E <[email protected]>
> Subject: [Intel-wired-lan] [PATCH iwl-net v2] ice: fix Rx page leak on 
> multi-buffer frames
>
> The ice_put_rx_mbuf() function handles calling ice_put_rx_buf() for each 
> buffer in the current frame. This function was introduced as part of handling 
> multi-buffer XDP support in the ice driver.
>
> It works by iterating over the buffers from first_desc up to 1 plus the total 
> number of fragments in the frame, cached from before the XDP program was 
> executed.
>
> If the hardware posts a descriptor with a size of 0, the logic used in
> ice_put_rx_mbuf() breaks. Such descriptors get skipped and don't get added as 
> fragments in ice_add_xdp_frag. Since the buffer isn't counted as a fragment, 
> we do not iterate over it in ice_put_rx_mbuf(), and thus we don't call 
> ice_put_rx_buf().
>
> Because we don't call ice_put_rx_buf(), we don't attempt to re-use the page 
> or free it. This leaves a stale page in the ring, as we don't increment 
> next_to_alloc.
>
> The ice_reuse_rx_page() assumes that the next_to_alloc has been incremented 
> properly, and that it always points to a buffer with a NULL page. Since this 
> function doesn't check, it will happily recycle a page over the top of the 
> next_to_alloc buffer, losing track of the old page.
>
> Note that this leak only occurs for multi-buffer frames. The
> ice_put_rx_mbuf() function always handles at least one buffer, so a 
> single-buffer frame will always get handled correctly. It is not clear 
> precisely why the hardware hands us descriptors with a size of 0 sometimes, 
> but it happens somewhat regularly with "jumbo frames" used by 9K MTU.
>
> To fix ice_put_rx_mbuf(), we need to make sure to call ice_put_rx_buf() on 
> all buffers between first_desc and next_to_clean. Borrow the logic of a 
> similar function in i40e used for this same purpose. Use the same logic also 
> in ice_get_pgcnts().
>
> Instead of iterating over just the number of fragments, use a loop which 
> iterates until the current index reaches to the next_to_clean element just 
> past the current frame. Check the current number of fragments (post XDP 
> program). For all buffers up 1 more than the number of fragments, > we'll 
> update the pagecnt_bias. For any buffers past this, pagecnt_bias is left 
> as-is. This ensures that fragments released by the XDP program, as well as 
> any buffers with zero-size won't have their pagecnt_bias updated incorrectly. 
> Unlike i40e, the ice_put_rx_mbuf() function does call
> ice_put_rx_buf() on the last buffer of the frame indicating end of packet.
>
> The xdp_xmit value only needs to be updated if an XDP program is run, and 
> only once per packet. Drop the xdp_xmit pointer argument from 
> ice_put_rx_mbuf(). Instead, set xdp_xmit in the ice_clean_rx_irq() function 
> directly. This avoids needing to pass the argument and avoids an extra > 
> bit-wise OR for each buffer in the frame.
>
> Move the increment of the ntc local variable to ensure its updated *before* 
> all calls to ice_get_pgcnts() or ice_put_rx_mbuf(), as the loop logic 
> requires the index of the element just after the current frame.
>
> This has the advantage that we also no longer need to track or cache the 
> number of fragments in the rx_ring, which saves a few bytes in the ring.
>
> Cc: Christoph Petrausch <[email protected]>
> Reported-by: Jaroslav Pulchart <[email protected]>
> Closes: 
> https://lore.kernel.org/netdev/cak8ffz4hy6gujnenz3wy9jaylzxgfpr7dnzxzgmyoe44car...@mail.gmail.com/
> Fixes: 743bbd93cf29 ("ice: put Rx buffers after being done with current 
> frame")
> Signed-off-by: Jacob Keller <[email protected]>
> ---
> I've tested this in a setup with MTU 9000, using a combination of iperf3 and 
> wrk generated traffic.
>
> I tested this in a couple of ways. First, I check memory allocations using
> /proc/allocinfo:
>
> awk '/ice_alloc_mapped_page/ { printf("%s %s\n", $1, $2) }' /proc/allocinfo | 
> numfmt --to=iec
>
> Second, I ported some stats from i40e written by Joe Damato to track the page 
> allocation and busy counts. I consistently saw that the allocate stat 
> increased without the busy or waive stats increasing. I also added a stat to 
> track directly when we overwrote a page pointer that was non-NULL in 
> ice_reuse_rx_page(), and saw it increment consistently.
>
> With this fix, all of these indicators are fixed. I've tested both 1500 byte 
> and 9000 byte MTU and no longer see the leak. With the counters I was able to 
> immediately see a leak within a few minutes of iperf3, so I am confident that 
> I've resolved the leak with this fix.
>
> I've now also tested with xdp-bench and confirm that XDP_TX and XDP_REDIR 
> work properly with the fix for updating xdp_xmit.
> ---
> Changes in v2:
> - Fix XDP Tx/Redirect (Thanks Maciej!)
> - Link to v1: 
> https://lore.kernel.org/r/[email protected]
> ---
> drivers/net/ethernet/intel/ice/ice_txrx.h |  1 -  
> drivers/net/ethernet/intel/ice/ice_txrx.c | 81 +++++++++++++------------------
> 2 files changed, 33 insertions(+), 49 deletions(-)
>

Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel)

Reply via email to