On Sat, Mar 14, 2026 at 12:50:53PM -0700, Jakub Kicinski wrote:
> On Tue, 10 Mar 2026 21:00:49 -0700 Dipayaan Roy wrote:
> > On certain systems configured with 4K PAGE_SIZE, utilizing page_pool
> > fragments for RX buffers results in a significant throughput regression.
> > Profiling reveals that this regression correlates with high overhead in the
> > fragment allocation and reference counting paths on these specific
> > platforms, rendering the multi-buffer-per-page strategy counterproductive.
> 
> Can you say more ? We could technically take two references on the page
> right away if MTU is small and avoid some of the cost.

There is a 15-20% shortfall in achieving line rate for MANA (180+ Gbps)
on a particular ARM64 SKU. The issue is only specific to this processor SKU —
not seen on other ARM64 SKUs (e.g., GB200) or x86 SKUs. Critically, the
regression only manifests beyond 16 TCP connections, which strongly indicates
seen when there is  high contention and traffic.

  no. of     | rx buf backed       | rx buf backed
 connections | with page fragments | with full page
-------------+---------------------+---------------
           4 |         139 Gbps    |     138 Gbps
           8 |         140 Gbps    |     162 Gbps
          16 |         186 Gbps    |     186 Gbps
          32 |         136 Gbps    |     183 Gbps
          48 |         159 Gbps    |     185 Gbps
          64 |         165 Gbps    |     184 Gbps
         128 |         170 Gbps    |     180 Gbps
 
HW team is still working to RCA this hw behaviour.

Regarding "We could technically take two references on the page right
away", are you suggesting having page reference counting logic to driver
instead of relying on page pool?

> 
> The driver doesn't seem to set skb->truesize accordingly after this
> change. So you're lying to the stack about how much memory each packet
> consumes. This is a blocker for the change.
> 
ACK. I will send out a separate patch with fixes tag to fix the skb true
size.

> > To mitigate this, bypass the page_pool fragment path and force a single RX
> > packet per page allocation when all the following conditions are met:
> >   1. The system is configured with a 4K PAGE_SIZE.
> >   2. A processor-specific quirk is detected via SMBIOS Type 4 data.
> 
> I don't think we want the kernel to be in the business of carrying
> matching on platform names and providing optimal config by default.
> This sort of logic needs to live in user space or the hypervisor 
> (which can then pass a single bit to the driver to enable the behavior)
> 
As per our internal discussion the hypervisor cannot provide the CPU
version info(in vm as well as in bare metal offerings).

On handling it from user side are you suggesting it to introduce a new
ethtool Private Flags and have udev rules for the driver to set the private
flag and switch to full page rx buffers? Given that the wide number of distro
support this might be harder to maintain/backport. 

Also the dmi parsing design was influenced by other net wireleass
drivers as /wireless/ath/ath10k/core.c. If this approach is not
acceptable for MANA driver then will have to take a alternate route
based on the dsicussion right above it.

> > This approach restores expected line-rate performance by ensuring
> > predictable RX refill behavior on affected hardware.
> > 
> > There is no behavioral change for systems using larger page sizes
> > (16K/64K), or platforms where this processor-specific quirk do not
> > apply.
> -- 
> pw-bot: cr

Thank you for your comments Jakub, and also pointing out the skb true
size issue. I am sending out a separate to fix the skb true size issue.

Regards
Dipayaan Roy


Reply via email to