On Tue, Aug 26, 2025 at 10:35:30AM +0200, Jesper Dangaard Brouer wrote: > > > On 26/08/2025 01.00, Jacob Keller wrote: > > XDP_DROP performance has been tested for this version, thanks to work from > > Michal Kubiak. The results are quite promising, with 3 versions being > > compared: > > > > * baseline net-next tree > > * v1 applied > > * v2 applied > > > > Michal said: > > > > I run the XDP_DROP performance comparison tests on my setup in the way I > > usually do. I didn't have the pktgen configured on my link partner, but I > > used 6 instances of the xdpsock running in Tx-only mode to generate > > high-bandwith traffic. Also, I tried to replicate the conditions > > according > > to Jesper's description, making sure that all the traffic is directed to > > a > > single Rx queue and one CPU is 100% loaded. > > > > Thank you for replicating the test setup. Using xdpsock as a traffic > generator is fine, as long as we make sure that the generator TX speeds > exceeds the Device Under Test RX XDP_DROP speed. It is also important > for the test that packets hits a single RX queue and we verify one CPU is > 100% load, as you describe. > > As a reminder the pktgen kernel module comes with ready-to-use sample > shell-scripts[1]. > > [1] https://elixir.bootlin.com/linux/v6.16.3/source/samples/pktgen >
Thank you! I am aware of that and also use those scripts. The xdpsock solution was just the quickest option for that specific moment, so I decided not to change my link partner setup, (since I successfully reproduced the performance drop in v1). > > The performance hit from v1 is replicated, and shown to be gone in v2, with > > our results showing even an increase compared to baseline instead of a > > drop. I've included the relative packet per second deltas compared against > > a baseline test with neither v1 or v2. > > > > Thanks for also replicating the performance hit from v1 as I did in [2]. > > To Michal: What CPU did you use? > - I used CPU: AMD EPYC 9684X (with SRSO=IBPB) In my test I used: Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz > > One of the reasons that I saw a larger percentage drop is that this CPU > doesn't have DDIO/DCA, which deliver the packet to L3 cache (and a L2 > cache-miss will obviously take less time than a full main memory cache- > miss). (Details: Newer AMD CPUs will get something called PCIe TLP > Processing Hints (TPH), which resembles DDIO). > > Point is that I see some opportunities in driver to move some of the > prefetches earlier. But we want to make sure it benefits both CPU types, > and I can test on the AMD platform. (This CPU is a large part of our > fleet so it makes sense for us to optimize this). > > > baseline to v1, no-touch: > > -8,387,677 packets per second (17%) decrease. > > > > baseline to v2, no-touch: > > +4,057,000 packets per second (8%) increase! > > > > baseline to v1, read data: > > -411,709 packets per second (1%) decrease. > > > > baseline to v2, read data: > > +4,331,857 packets per second (11%) increase! > > Thanks for providing these numbers. > I would also like to know the throughput PPS packet numbers before and > after, as this allows me to calculate the nanosec difference. Using > percentages are usually useful, but it can be misleading when dealing > with XDP_DROP speeds, because a small nanosec change will get > "magnified" too much. > I was usually told to share the percentage data, because absolute numbers may depend on various circumstances. However, I understand your point regarding XDP_DROP. In such case it may be justified. Please see my raw results (from xdp-bench summary) below: net-next (main) (drop, no touch) Duration : 105.7s Packets received : 4,960,778,583 Average packets/s : 46,951,873 Rx dropped : 4,960,778,583 net-next (main) (drop, read data) Duration : 94.5s Packets received : 3,524,346,352 Average packets/s : 37,295,056 Rx dropped : 3,524,346,352 net-next (main+v1) (drop, no touch) Duration : 122.5s Packets received : 4,722,510,839 Average packets/s : 38,564,196 Rx dropped : 4,722,510,839 net-next (main+v1) (drop, read data) Duration : 115.7s Packets received : 4,265,991,147 Average packets/s : 36,883,347 Rx dropped : 4,265,991,147 net-next (main+v2) (drop, no touch) Duration : 130.6s Packets received : 6,664,104,907 Average packets/s : 51,008,873 Rx dropped : 6,664,104,907 net-next (main+v2) (drop, read data) Duration : 143.6s Packets received : 5,975,991,044 Average packets/s : 41,626,913 Rx dropped : 5,975,991,044 Thanks, Michal > > --- > > Changes in v2: > > - Only access shared info for fragmented frames > > - Link to v1: > > https://lore.kernel.org/netdev/[email protected]/ > > [2] > https://lore.kernel.org/netdev/[email protected]/ > > > --- > > drivers/net/ethernet/intel/ice/ice_txrx.h | 1 - > > drivers/net/ethernet/intel/ice/ice_txrx.c | 80 > > +++++++++++++------------------ > > 2 files changed, 34 insertions(+), 47 deletions(-) > > Acked-by: Jesper Dangaard Brouer <[email protected]>
