On Thu, Mar 20, 2025 at 10:15:58AM -0700, Song Liu wrote: > With proper exception boundary detection, it is possible to implment > arch_stack_walk_reliable without sframe. > > Note that, arch_stack_walk_reliable does not guarantee getting reliable > stack in all scenarios. Instead, it can reliably detect when the stack > trace is not reliable, which is enough to provide reliable livepatching. > > Signed-off-by: Song Liu <s...@kernel.org> > --- > arch/arm64/Kconfig | 2 +- > arch/arm64/kernel/stacktrace.c | 66 +++++++++++++++++++++++++--------- > 2 files changed, 51 insertions(+), 17 deletions(-) > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index 701d980ea921..31d5e1ee6089 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -276,6 +276,7 @@ config ARM64 > select HAVE_SOFTIRQ_ON_OWN_STACK > select USER_STACKTRACE_SUPPORT > select VDSO_GETRANDOM > + select HAVE_RELIABLE_STACKTRACE > help > ARM 64-bit (AArch64) Linux support. > > @@ -2500,4 +2501,3 @@ endmenu # "CPU Power Management" > source "drivers/acpi/Kconfig" > > source "arch/arm64/kvm/Kconfig" > - > diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c > index 1d9d51d7627f..7e07911d8694 100644 > --- a/arch/arm64/kernel/stacktrace.c > +++ b/arch/arm64/kernel/stacktrace.c > @@ -56,6 +56,7 @@ struct kunwind_state { > enum kunwind_source source; > union unwind_flags flags; > struct pt_regs *regs; > + bool end_on_unreliable; > }; > > static __always_inline void > @@ -230,8 +231,26 @@ kunwind_next_frame_record(struct kunwind_state *state) > new_fp = READ_ONCE(record->fp); > new_pc = READ_ONCE(record->lr); > > - if (!new_fp && !new_pc) > - return kunwind_next_frame_record_meta(state); > + if (!new_fp && !new_pc) { > + int ret; > + > + ret = kunwind_next_frame_record_meta(state); > + if (ret < 0) { > + /* > + * This covers two different conditions: > + * 1. ret == -ENOENT, unwinding is done. > + * 2. ret == -EINVAL, unwinding hit error. > + */ > + return ret; > + } > + /* > + * Searching across exception boundaries. The stack is now > + * unreliable. > + */ > + if (state->end_on_unreliable) > + return -EINVAL; > + return 0; > + }
My original expectation for this this was that we'd propogate the errors, and then all the reliability logic would live un a consume_entry wrapper like we have for BPF, e.g. | static __always_inline bool | arch_reliable_kunwind_consume_entry(const struct kunwind_state *state, void *cookie) | { | struct kunwind_consume_entry_data *data = cookie; | | /* | * When unwinding across an exception boundary, the PC will be | * reliable, but we do not know whether the FP is live, and so we | * cannot perform the *next* unwind reliably. | * | * Give up as soon as we hit an exception boundary. | */ | if (state->source == KUNWIND_SOURCE_REGS_PC) | return false; | | return data->consume_entry(data->cookie, state->common.pc); | } | | noinline noinstr int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry, | void *cookie, | struct task_struct *task) | { | int ret; | struct kunwind_consume_entry_data data = { | .consume_entry = consume_entry, | .cookie = cookie, | }; | | ret = kunwind_stack_walk(arch_reliable_kunwind_consume_entry, &data, | task, NULL); | return ret == -ENOENT ? 0 : ret; | } ... and then in future we can add anything spdecific to reliable stacktrace there. That aside, this generally looks good to me. The only thing that I note is that we're lacking a check on the return value of kretprobe_find_ret_addr(), and we should return -EINVAL when that is NULL, but that should never happen in normal operation. I've pushed a arm64/stacktrace-updates branch [1] with fixups for those as two separate commits atop this one. If that looks good to you I suggest we post that as a series and ask Will and Catalin to take that as-is. I'll look at the actual patching bits now. Mark. [1] https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/ arm64/stacktrace-updates