On Thu, Jan 15, 2026 at 10:48:29AM -0800, Andrii Nakryiko wrote:
> On Mon, Jan 12, 2026 at 1:50 PM Jiri Olsa <[email protected]> wrote:
> >
> > Adding support to call bpf_get_stackid helper from trigger programs,
> > so far added for kprobe multi.
> >
> > Adding the --stacktrace/-g option to enable it.
> >
> > Signed-off-by: Jiri Olsa <[email protected]>
> > ---
> > tools/testing/selftests/bpf/bench.c | 4 ++++
> > tools/testing/selftests/bpf/bench.h | 1 +
> > .../selftests/bpf/benchs/bench_trigger.c | 1 +
> > .../selftests/bpf/progs/trigger_bench.c | 18 ++++++++++++++++++
> > 4 files changed, 24 insertions(+)
> >
>
> This now actually becomes a stack trace benchmark :) But I don't mind,
> I think it would be good to be able to benchmark this. But I think we
> should then implement it for all different tracing programs (tp,
> raw_tp, fentry/fexit/fmod_ret) for consistency and so we can compare
> and contrast?...
fyi I updated the bench for all program types and got some stats
current fix WITHOUT stacktrace:
usermode-count : 810.652 ± 1.036M/s
kernel-count : 336.645 ± 2.812M/s
syscall-count : 27.798 ± 0.063M/s
fentry : 67.677 ± 0.291M/s
fexit : 49.970 ± 0.214M/s
fmodret : 52.860 ± 0.237M/s
rawtp : 65.196 ± 0.224M/s
tp : 34.120 ± 0.042M/s
kprobe : 25.157 ± 0.019M/s
kprobe-multi : 33.223 ± 0.205M/s
kprobe-multi-all: 4.739 ± 0.003M/s
kretprobe : 10.904 ± 0.020M/s
kretprobe-multi: 15.996 ± 0.023M/s
kretprobe-multi-all: 2.559 ± 0.092M/s
current fix WITH stacktrace:
usermode-count : 782.529 ± 5.866M/s
kernel-count : 341.116 ± 2.247M/s
syscall-count : 27.481 ± 0.267M/s
fentry : 2.397 ± 0.026M/s
fexit : 2.472 ± 0.008M/s
fmodret : 2.475 ± 0.014M/s
rawtp : 2.593 ± 0.031M/s
tp : 2.641 ± 0.020M/s
kprobe : 3.848 ± 0.014M/s
kprobe-multi : 4.188 ± 0.025M/s
kprobe-multi-all: 0.261 ± 0.026M/s
kretprobe : 3.782 ± 0.011M/s
kretprobe-multi: 4.157 ± 0.023M/s
kretprobe-multi-all: 0.177 ± 0.000M/s
with similar fix for fentry/fexit/raw_tp/tp WITH stacktrace:
usermode-count : 792.613 ± 1.322M/s
kernel-count : 337.725 ± 2.422M/s
syscall-count : 27.363 ± 0.030M/s
fentry : 14.911 ± 0.083M/s
fexit : 13.749 ± 0.060M/s
fmodret : 13.987 ± 0.049M/s
rawtp : 13.760 ± 0.042M/s
tp : 7.060 ± 0.026M/s
kprobe : 3.920 ± 0.012M/s
kprobe-multi : 4.186 ± 0.030M/s
kprobe-multi-all: 0.281 ± 0.006M/s
kretprobe : 3.782 ± 0.005M/s
kretprobe-multi: 4.030 ± 0.014M/s
kretprobe-multi-all: 0.178 ± 0.000M/s
so cutting the extra initial unwind gets some speedup ex expected
I'm getting wrong callstack for rawtp programs, so I need to find out why,
but the rest of the tracing programs fentry/fexit.. work ok
jirka