Re: [PATCH net-next v4] page_pool: import Jesper's page_pool benchmark

Mina Almasry Mon, 16 Jun 2025 14:36:03 -0700

On Mon, Jun 16, 2025 at 2:29 AM Jesper Dangaard Brouer <h...@kernel.org> wrote:
> On 15/06/2025 22.59, Mina Almasry wrote:
> > From: Jesper Dangaard Brouer <h...@kernel.org>
> >
> > We frequently consult with Jesper's out-of-tree page_pool benchmark to
> > evaluate page_pool changes.
> >
> > Import the benchmark into the upstream linux kernel tree so that (a)
> > we're all running the same version, (b) pave the way for shared
> > improvements, and (c) maybe one day integrate it with nipa, if possible.
> >
> > Import bench_page_pool_simple from commit 35b1716d0c30 ("Add
> > page_bench06_walk_all"), from this repository:
> > https://github.com/netoptimizer/prototype-kernel.git
> >
> > Changes done during upstreaming:
> > - Fix checkpatch issues.
> > - Remove the tasklet logic not needed.
> > - Move under tools/testing
> > - Create ksft for the benchmark.
> > - Changed slightly how the benchmark gets build. Out of tree, time_bench
> >    is built as an independent .ko. Here it is included in
> >    bench_page_pool.ko
> >
> > Steps to run:
> >
> > ```
> > mkdir -p /tmp/run-pp-bench
> > make -C ./tools/testing/selftests/net/bench
> > make -C ./tools/testing/selftests/net/bench install 
> > INSTALL_PATH=/tmp/run-pp-bench
> > rsync --delete -avz --progress /tmp/run-pp-bench mina@$SERVER:~/
> > ssh mina@$SERVER << EOF
> >    cd ~/run-pp-bench && sudo ./test_bench_page_pool.sh
> > EOF
> > ```
> >
> > Output:
> >
> > ```
> > (benchmrk dmesg logs)
> >
>
> Something is off with benchmark numbers compared to the OOT version.
>


I assume you're comparing my results (my kernel config + my hardware +
upstream benchmark) with your results (your kernel config + your
hardware + OOT version). The problem may be in OOT vs upstream but it
may be just different code/config/hardware.

> Adding my numbers below, they were run on my testlab with:
>   - CPU E5-1650 v4 @ 3.60GHz
>   - kernel: net.git v6.15-12438-gd9816ec74e6d
>
> > Fast path results:
> > no-softirq-page_pool01 Per elem: 11 cycles(tsc) 4.368 ns
> >
>
> Fast-path on your CPU is faster (22 cycles(tsc) 6.128 ns) than my CPU.
> What CPU is this?

My test setup is a Gcloud A3 VM (so virtualized). The CPU is:

cat /proc/cpuinfo
...
model name      : Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz

>
> Type:no-softirq-page_pool01 Per elem: 22 cycles(tsc) 6.128 ns (step:0)
>   - (measurement period time:0.061282924 sec time_interval:61282924)
>   - (invoke count:10000000 tsc_interval:220619745)
>
> > ptr_ring results:
> > no-softirq-page_pool02 Per elem: 527 cycles(tsc) 195.187 ns
>
> I'm surprised that ptr_ring benchmark is very slow, compared to my
> result (below) 60 cycles(tsc) 16.853 ns.
>
> Type:no-softirq-page_pool02 Per elem: 60 cycles(tsc) 16.853 ns (step:0)
>   - (measurement period time:0.168535760 sec time_interval:168535760)
>   - (invoke count:10000000 tsc_interval:606734160)
>
> Maybe your kernel is compiled with some CONFIG debug thing that makes it
> slower?
>

Yeah, I actually just checked and I have CONFIG_DEBUG_NET on in my
build, and a lot of other debug configs are turned on.

Let me investigate here. Maybe trimming the debug configs and double
checking my tree for debug logs I added would point to the difference.

I could also try to put both the OOT version and upstream version in
my tree and do a proper A/B comparison that way.

If you do get chance to run this upstream version from your exact tree
and config, that would be a good A/B comparison as well.

> You can troubleshoot like this:
>   - select the `no-softirq-page_pool02` test via run_flags=$((2#100)).
>
>   # perf record -g modprobe bench_page_pool_simple run_flags=$((2#100))
> loops=$((100*10**6))
>   # perf report --no-children
>

Thanks, will do.

-- 
Thanks,
Mina

Re: [PATCH net-next v4] page_pool: import Jesper's page_pool benchmark

Reply via email to