On Mon, 26 Jan 2026 22:18:33 +0100
Jiri Olsa <[email protected]> wrote:
> Mahe reported missing function from stack trace on top of kprobe
> multi program. The missing function is the very first one in the
> stacktrace, the one that the bpf program is attached to.
>
> # bpftrace -e 'kprobe:__x64_sys_newuname* { print(kstack)}'
> Attaching 1 probe...
>
> do_syscall_64+134
> entry_SYSCALL_64_after_hwframe+118
>
> ('*' is used for kprobe_multi attachment)
>
> The reason is that the previous change (the Fixes commit) fixed
> stack unwind for tracepoint, but removed attached function address
> from the stack trace on top of kprobe multi programs, which I also
> overlooked in the related test (check following patch).
>
> The tracepoint and kprobe_multi have different stack setup, but use
> same unwind path. I think it's better to keep the previous change,
> which fixed tracepoint unwind and instead change the kprobe multi
> unwind as explained below.
>
> The bpf program stack unwind calls perf_callchain_kernel for kernel
> portion and it follows two unwind paths based on X86_EFLAGS_FIXED
> bit in pt_regs.flags.
>
> When the bit set we unwind from stack represented by pt_regs argument,
> otherwise we unwind currently executed stack up to 'first_frame'
> boundary.
>
> The 'first_frame' value is taken from regs.rsp value, but ftrace_caller
> and ftrace_regs_caller (ftrace trampoline) functions set the regs.rsp
> to the previous stack frame, so we skip the attached function entry.
>
> If we switch kprobe_multi unwind to use the X86_EFLAGS_FIXED bit,
> we set the start of the unwind to the attached function address.
> As another benefit we also cut extra unwind cycles needed to reach
> the 'first_frame' boundary.
>
> The speedup can be measured with trigger bench for kprobe_multi
> program and stacktrace support.
>
> - trigger bench with stacktrace on current code:
>
> kprobe-multi : 0.810 ± 0.001M/s
> kretprobe-multi: 0.808 ± 0.001M/s
>
> - and with the fix:
>
> kprobe-multi : 1.264 ± 0.001M/s
> kretprobe-multi: 1.401 ± 0.002M/s
>
> With the fix, the entry probe stacktrace:
>
> # bpftrace -e 'kprobe:__x64_sys_newuname* { print(kstack)}'
> Attaching 1 probe...
>
> __x64_sys_newuname+9
> do_syscall_64+134
> entry_SYSCALL_64_after_hwframe+118
>
> The return probe skips the attached function, because it's no longer
> on the stack at the point of the unwind and this way is the same how
> standard kretprobe works.
>
> # bpftrace -e 'kretprobe:__x64_sys_newuname* { print(kstack)}'
> Attaching 1 probe...
>
> do_syscall_64+134
> entry_SYSCALL_64_after_hwframe+118
>
> Fixes: 6d08340d1e35 ("Revert "perf/x86: Always store regs->ip in
> perf_callchain_kernel()"")
> Reported-by: Mahe Tardy <[email protected]>
> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> arch/x86/include/asm/ftrace.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
> index b08c95872eed..c56e1e63b893 100644
> --- a/arch/x86/include/asm/ftrace.h
> +++ b/arch/x86/include/asm/ftrace.h
Acked-by: Steven Rostedt (Google) <[email protected]>
(it passed all my tests too)
-- Steve