On 4/29/2026 5:43 PM, Steven Rostedt wrote:
> From: Steven Rostedt <[email protected]>
> 
> [
>    This is an RFC that adds a system call for dynamic linkers to use to
>    tell the kernel where the sframe sections are when it loads dynamic
>    libraries.
> 
>    It is built on top of Jens's sframe implementation for v3:
> 
>       
> https://lore.kernel.org/linux-trace-kernel/[email protected]/
> 
>    I have a repo with that code that this applies on top of here:
> 
>       git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git 
> sframe/core
>        
> 
>    The name of the system call is "stacktrace_setup", but I'm not attached
>    to this name. If anyone can think of a better name I'm happy to take
>    suggestions.
> 
>    This patch is just to get the conversation going and the final result
>    may be much different. I tested this with the attached program which is a
>    major hack. I built glibc with sframe v3 support and I used readelf to
>    find the sframe size and location of glibc.
> 
>    readelf -e /work/usr/lib/libc.so.6 | grep sframe
>      [19] .sframe           GNU_SFRAME       00000000001d3fc0  001d3fc0
> 
>    Then I wrote a program that takes the above location and size of the
>    .sframe section in libc as parameters, scans /proc/self/maps to find
>    where it loaded libc and then calls this new system call with a pointer
>    to the location of the sframe along with its size, as well as where the
>    libc text is located.
> 
>    It then spins for 2 seconds, calls the system call again to remove the
>    sframe section it loaded, and spins for another 2 seconds.
> 
>    I ran perf record --call-graph fp,defer on the program and looked for
>    the do_spin() function.
> 
>    With sframe loaded:
> 
> sframe-test    1350  1396.333593:     202366 cpu/cycles/P: 
>             7fdf0ec38a44 [unknown] ([vdso])
>             5621a6b97243 get_time+0x19 (/work/c/sframe-test)
>             5621a6b9727f do_spin+0x1f (/work/c/sframe-test)
>             5621a6b975cd main+0xd4 (/work/c/sframe-test)
>             7fdf0ea26bda __libc_start_call_main+0x6a (/work/usr/lib/libc.so.6)
>             7fdf0ea26d05 __libc_start_main@@GLIBC_2.34+0x85 
> (/work/usr/lib/libc.so.6)
>             5621a6b97131 _start+0x21 (/work/c/sframe-test)
> 
>    After it unloads the sframe:
> 
> sframe-test    1350  1400.332902:     657582 cpu/cycles/P: 
>             7fdf0ec38a5e [unknown] ([vdso])
>             5621a6b97243 get_time+0x19 (/work/c/sframe-test)
>             5621a6b9727f do_spin+0x1f (/work/c/sframe-test)
>             5621a6b97602 main+0x109 (/work/c/sframe-test)
>             7fdf0ea26bda __libc_start_call_main+0x6a (/work/usr/lib/libc.so.6)
> 
>    As you can see, with the sframe loaded, it was able to walk further up
>    the libc library.
> 
>    Again, this is just an RFC, but I would like to get agreement on the
>    system call so that we can then update the dynamic linker to do this
>    instead of using my hack ;-)
> ]
> 
> Add a system call that can be used by dynamic linkers to tell the kernel
> where the sframe section is in memory for libraries it loads.
> 
> The system call stacktrace_setup takes 5 parameters:
> 
>   op - the type of operation to perform
>   addr_start - The virtual address of the sframe section
>   addr_length - The length of the sframe section
>   text_start - the text section the sframe represents
>   test_length - the length of the section
> 
> The current op values are:
> 
>   STACKTRACE_REGISTER_SFRAME - This registers the sframe
>   STACKTRACE_UNREGISTER_SFRAME - This removes the sframe
> 
> Signed-off-by: Steven Rostedt <[email protected]>

LGTM.  Some comments/questions below.

> diff --git a/include/uapi/linux/stacktrace.h b/include/uapi/linux/stacktrace.h

> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> +#ifndef _UAPI_LINUX_STACKTRACE_H
> +#define _UAPI_LINUX_STACKTRACE_H
> +
> +enum stacktrace_setup_types {
> +     STACKTRACE_REGISTER_SFRAME      = 1,
> +     STACKTRACE_UNREGISTER_SFRAME    = 2,
> +};
> +
> +#endif /* _UAPI_LINUX_STACKTRACE_H */

> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c

Having the syscall live in kernel/unwind/sframe.c means it is only
available if config option HAVE_UNWIND_USER_SFRAME is selected (which
triggers sframe.o to be built and linked into the kernel), which makes
sense as long as it only implements sframe-specific functionality.
I suppose it could be moved elsewhere if non-sframe use cases would
arise in the future?

Would Dylan need to guard it when introducing HAVE_UNWIND_KERNEL_SFRAME?
Provided the syscall fails with -ENOSYS if not implemented (e.g. when
HAVE_UNWIND_USER_SFRAME is not enabled) the dummy implementations of
sframe_add_section() and sframe_remove_section() in linux/sframe.h also
return -ENOSYS, so the user observable behavior would be the same and
it would not matter.  Do you agree?

> @@ -12,8 +12,10 @@
>  #include <linux/mm.h>
>  #include <linux/string_helpers.h>
>  #include <linux/sframe.h>
> +#include <linux/syscalls.h>
>  #include <asm/unwind_user_sframe.h>
>  #include <linux/unwind_user_types.h>
> +#include <uapi/linux/stacktrace.h>
>  
>  #include "sframe.h"
>  #include "sframe_debug.h"
> @@ -838,3 +840,38 @@ void sframe_free_mm(struct mm_struct *mm)
>  
>       mtree_destroy(&mm->sframe_mt);
>  }
> +
> +/**
> + * sys_stacktrace_setup - register an address for user space stacktrace 
> walking.
> + * @op: Type of operation to perform
> + * @addr_start: The virtual address of the stacktrace information
> + * @addr_length: The length of the stacktrace information
> + * @text_start: The virtual address of the text that @addr_start represents
> + * @text_length: The length of teh text
> + *
> + * This system call is used by dynamic library utilities to inform the kernel
> + * of meta data that it loaded that can be used by the kernel to know how
> + * to stack walk the given text locations.
> + *
> + * Currently only sframes are supported, but in the future, this may be used
> + * to tell the kernel about JIT code which will most likely have a different
> + * format.
> + *
> + * The type command may be extended and parameters may be used for other
> + * purposes.
> + *
> + * Return: 0 if successful, otherwise a negative error.
> + */
> +SYSCALL_DEFINE5(stacktrace_setup, int, op, unsigned long, addr_start,
> +             unsigned long, addr_length, unsigned long, text_start,
> +             unsigned long, text_length)

Would it make sense to keep the parameters generic from start, similar
to how it is done in prctl()?  Or can this be changed later, if the need
arises?

SYSCALL_DEFINE5(stacktrace_setup, int, op, unsigned long, arg2,
                unsigned long, arg3, unsigned long, arg4, unsigned long, arg5)

> +{
> +     switch (op) {
> +     case STACKTRACE_REGISTER_SFRAME:
> +             return sframe_add_section(addr_start, addr_start + addr_length,
> +                                       text_start, text_start+text_length);

Nit:
                                          text_start, text_start + text_length);

> +     case STACKTRACE_UNREGISTER_SFRAME:
> +             return sframe_remove_section(addr_start);
> +     }
> +     return -EINVAL;
> +}
Thanks and regards,
Jens
-- 
Jens Remus
Linux on Z Development (D3303)
[email protected] / [email protected]

IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: 
Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: 
Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/


Reply via email to