On Mon, Jun 16, 2025 at 2:29 AM Jesper Dangaard Brouer <h...@kernel.org> wrote: > On 15/06/2025 22.59, Mina Almasry wrote: > > From: Jesper Dangaard Brouer <h...@kernel.org> > > > > We frequently consult with Jesper's out-of-tree page_pool benchmark to > > evaluate page_pool changes. > > > > Import the benchmark into the upstream linux kernel tree so that (a) > > we're all running the same version, (b) pave the way for shared > > improvements, and (c) maybe one day integrate it with nipa, if possible. > > > > Import bench_page_pool_simple from commit 35b1716d0c30 ("Add > > page_bench06_walk_all"), from this repository: > > https://github.com/netoptimizer/prototype-kernel.git > > > > Changes done during upstreaming: > > - Fix checkpatch issues. > > - Remove the tasklet logic not needed. > > - Move under tools/testing > > - Create ksft for the benchmark. > > - Changed slightly how the benchmark gets build. Out of tree, time_bench > > is built as an independent .ko. Here it is included in > > bench_page_pool.ko > > > > Steps to run: > > > > ``` > > mkdir -p /tmp/run-pp-bench > > make -C ./tools/testing/selftests/net/bench > > make -C ./tools/testing/selftests/net/bench install > > INSTALL_PATH=/tmp/run-pp-bench > > rsync --delete -avz --progress /tmp/run-pp-bench mina@$SERVER:~/ > > ssh mina@$SERVER << EOF > > cd ~/run-pp-bench && sudo ./test_bench_page_pool.sh > > EOF > > ``` > > > > Output: > > > > ``` > > (benchmrk dmesg logs) > > > > Something is off with benchmark numbers compared to the OOT version. >
I assume you're comparing my results (my kernel config + my hardware + upstream benchmark) with your results (your kernel config + your hardware + OOT version). The problem may be in OOT vs upstream but it may be just different code/config/hardware. > Adding my numbers below, they were run on my testlab with: > - CPU E5-1650 v4 @ 3.60GHz > - kernel: net.git v6.15-12438-gd9816ec74e6d > > > Fast path results: > > no-softirq-page_pool01 Per elem: 11 cycles(tsc) 4.368 ns > > > > Fast-path on your CPU is faster (22 cycles(tsc) 6.128 ns) than my CPU. > What CPU is this? My test setup is a Gcloud A3 VM (so virtualized). The CPU is: cat /proc/cpuinfo ... model name : Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz > > Type:no-softirq-page_pool01 Per elem: 22 cycles(tsc) 6.128 ns (step:0) > - (measurement period time:0.061282924 sec time_interval:61282924) > - (invoke count:10000000 tsc_interval:220619745) > > > ptr_ring results: > > no-softirq-page_pool02 Per elem: 527 cycles(tsc) 195.187 ns > > I'm surprised that ptr_ring benchmark is very slow, compared to my > result (below) 60 cycles(tsc) 16.853 ns. > > Type:no-softirq-page_pool02 Per elem: 60 cycles(tsc) 16.853 ns (step:0) > - (measurement period time:0.168535760 sec time_interval:168535760) > - (invoke count:10000000 tsc_interval:606734160) > > Maybe your kernel is compiled with some CONFIG debug thing that makes it > slower? > Yeah, I actually just checked and I have CONFIG_DEBUG_NET on in my build, and a lot of other debug configs are turned on. Let me investigate here. Maybe trimming the debug configs and double checking my tree for debug logs I added would point to the difference. I could also try to put both the OOT version and upstream version in my tree and do a proper A/B comparison that way. If you do get chance to run this upstream version from your exact tree and config, that would be a good A/B comparison as well. > You can troubleshoot like this: > - select the `no-softirq-page_pool02` test via run_flags=$((2#100)). > > # perf record -g modprobe bench_page_pool_simple run_flags=$((2#100)) > loops=$((100*10**6)) > # perf report --no-children > Thanks, will do. -- Thanks, Mina