Re: [Intel-wired-lan] [PATCH iwl-net v2] ice: fix Rx page leak on multi-buffer frames

Michal Kubiak Tue, 26 Aug 2025 03:25:01 -0700

On Tue, Aug 26, 2025 at 10:35:30AM +0200, Jesper Dangaard Brouer wrote:
> 
> 
> On 26/08/2025 01.00, Jacob Keller wrote:
> > XDP_DROP performance has been tested for this version, thanks to work from
> > Michal Kubiak. The results are quite promising, with 3 versions being
> > compared:
> > 
> > * baseline net-next tree
> > * v1 applied
> > * v2 applied
> > 
> > Michal said:
> > 
> >    I run the XDP_DROP performance comparison tests on my setup in the way I
> >    usually do. I didn't have the pktgen configured on my link partner, but I
> >    used 6 instances of the xdpsock running in Tx-only mode to generate
> >    high-bandwith traffic. Also, I tried to replicate the conditions 
> > according
> >    to Jesper's description, making sure that all the traffic is directed to 
> > a
> >    single Rx queue and one CPU is 100% loaded.
> > 
> 
> Thank you for replicating the test setup.  Using xdpsock as a traffic
> generator is fine, as long as we make sure that the generator TX speeds
> exceeds the Device Under Test RX XDP_DROP speed.  It is also important
> for the test that packets hits a single RX queue and we verify one CPU is
> 100% load, as you describe.
> 
> As a reminder the pktgen kernel module comes with ready-to-use sample
> shell-scripts[1].
> 
>  [1] https://elixir.bootlin.com/linux/v6.16.3/source/samples/pktgen
>


Thank you! I am aware of that and also use those scripts.
The xdpsock solution was just the quickest option for that specific
moment, so I decided not to change my link partner setup, (since I
successfully reproduced the performance drop in v1).

> > The performance hit from v1 is replicated, and shown to be gone in v2, with
> > our results showing even an increase compared to baseline instead of a
> > drop. I've included the relative packet per second deltas compared against
> > a baseline test with neither v1 or v2.
> > 
> 
> Thanks for also replicating the performance hit from v1 as I did in [2].
> 
> To Michal: What CPU did you use?
>  - I used CPU: AMD EPYC 9684X (with SRSO=IBPB)

In my test I used: Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz

> 
> One of the reasons that I saw a larger percentage drop is that this CPU
> doesn't have DDIO/DCA, which deliver the packet to L3 cache (and a L2
> cache-miss will obviously take less time than a full main memory cache-
> miss). (Details: Newer AMD CPUs will get something called PCIe TLP
> Processing Hints (TPH), which resembles DDIO).
> 
> Point is that I see some opportunities in driver to move some of the
> prefetches earlier. But we want to make sure it benefits both CPU types,
> and I can test on the AMD platform. (This CPU is a large part of our
> fleet so it makes sense for us to optimize this).
> 
> > baseline to v1, no-touch:
> >    -8,387,677 packets per second (17%) decrease.
> > 
> > baseline to v2, no-touch:
> >    +4,057,000 packets per second (8%) increase!
> > 
> > baseline to v1, read data:
> >    -411,709 packets per second (1%) decrease.
> > 
> > baseline to v2, read data:
> >    +4,331,857 packets per second (11%) increase!
> 
> Thanks for providing these numbers.
> I would also like to know the throughput PPS packet numbers before and
> after, as this allows me to calculate the nanosec difference. Using
> percentages are usually useful, but it can be misleading when dealing
> with XDP_DROP speeds, because a small nanosec change will get
> "magnified" too much.
> 

I was usually told to share the percentage data, because absolute numbers may
depend on various circumstances.
However, I understand your point regarding XDP_DROP. In such case it may
be justified. Please see my raw results (from xdp-bench summary) below:


net-next (main) (drop, no touch)
  Duration            : 105.7s
  Packets received    : 4,960,778,583
  Average packets/s   : 46,951,873
  Rx dropped          : 4,960,778,583


net-next (main) (drop, read data)
  Duration            : 94.5s
  Packets received    : 3,524,346,352
  Average packets/s   : 37,295,056
  Rx dropped          : 3,524,346,352


net-next (main+v1) (drop, no touch)
  Duration            : 122.5s
  Packets received    : 4,722,510,839
  Average packets/s   : 38,564,196
  Rx dropped          : 4,722,510,839


net-next (main+v1) (drop, read data)
  Duration            : 115.7s
  Packets received    : 4,265,991,147
  Average packets/s   : 36,883,347
  Rx dropped          : 4,265,991,147


net-next (main+v2) (drop, no touch)
  Duration            : 130.6s
  Packets received    : 6,664,104,907
  Average packets/s   : 51,008,873
  Rx dropped          : 6,664,104,907


net-next (main+v2) (drop, read data)
  Duration            : 143.6s
  Packets received    : 5,975,991,044
  Average packets/s   : 41,626,913
  Rx dropped          : 5,975,991,044


Thanks,
Michal

> > ---
> > Changes in v2:
> > - Only access shared info for fragmented frames
> > - Link to v1: 
> > https://lore.kernel.org/netdev/[email protected]/
> 
> [2] 
> https://lore.kernel.org/netdev/[email protected]/
> 
> > ---
> >   drivers/net/ethernet/intel/ice/ice_txrx.h |  1 -
> >   drivers/net/ethernet/intel/ice/ice_txrx.c | 80 
> > +++++++++++++------------------
> >   2 files changed, 34 insertions(+), 47 deletions(-)
> 
> Acked-by: Jesper Dangaard Brouer <[email protected]>

Re: [Intel-wired-lan] [PATCH iwl-net v2] ice: fix Rx page leak on multi-buffer frames

Reply via email to