On Sun, Aug 14, 2016 at 01:10:42AM -0700, Andy Lutomirski wrote: > On Fri, Aug 12, 2016 at 7:29 AM, Josh Poimboeuf <jpoim...@redhat.com> wrote: > > With frame pointers, when a task is interrupted, its stack is no longer > > completely reliable because the function could have been interrupted > > before it had a chance to save the previous frame pointer on the stack. > > So the caller of the interrupted function could get skipped by a stack > > trace. > > > > This is problematic for live patching, which needs to know whether a > > stack trace of a sleeping task can be relied upon. There's currently no > > way to detect if a sleeping task was interrupted by a page fault > > exception or preemption before it went to sleep. > > > > Another issue is that when dumping the stack of an interrupted task, the > > unwinder has no way of knowing where the saved pt_regs registers are, so > > it can't print them. > > > > This solves those issues by encoding the pt_regs pointer in the frame > > pointer on entry from an interrupt or an exception. > > > > This patch also updates the unwinder to be able to decode it, because > > otherwise the unwinder would be broken by this change. > > > > Note that this causes a change in the behavior of the unwinder: each > > instance of a pt_regs on the stack is now considered a "frame". So > > callers of unwind_get_return_address() will now get an occasional > > 'regs->ip' address that would have previously been skipped over. > > Acked-by: Andy Lutomirski <l...@kernel.org> > > with minor optional nitpicks below. > > > > > Suggested-by: Andy Lutomirski <l...@amacapital.net> > > Signed-off-by: Josh Poimboeuf <jpoim...@redhat.com> > > --- > > arch/x86/entry/calling.h | 21 +++++++++++ > > arch/x86/entry/entry_32.S | 40 ++++++++++++++++++--- > > arch/x86/entry/entry_64.S | 10 ++++-- > > arch/x86/include/asm/unwind.h | 18 ++++++++-- > > arch/x86/kernel/unwind_frame.c | 82 > > +++++++++++++++++++++++++++++++++++++----- > > 5 files changed, 153 insertions(+), 18 deletions(-) > > > > diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h > > index 9a9e588..ab799a3 100644 > > --- a/arch/x86/entry/calling.h > > +++ b/arch/x86/entry/calling.h > > @@ -201,6 +201,27 @@ For 32-bit we have the following conventions - kernel > > is built with > > .byte 0xf1 > > .endm > > > > + /* > > + * This is a sneaky trick to help the unwinder find pt_regs on the > > + * stack. The frame pointer is replaced with an encoded pointer to > > + * pt_regs. The encoding is just a clearing of the highest-order > > bit, > > + * which makes it an invalid address and is also a signal to the > > + * unwinder that it's a pt_regs pointer in disguise. > > + * > > + * NOTE: This macro must be used *after* SAVE_EXTRA_REGS because it > > + * corrupts the original rbp. > > + */ > > +.macro ENCODE_FRAME_POINTER ptregs_offset=0 > > +#ifdef CONFIG_FRAME_POINTER > > + .if \ptregs_offset > > + leaq \ptregs_offset(%rsp), %rbp > > + .else > > + mov %rsp, %rbp > > + .endif > > + btr $63, %rbp > > +#endif > > +.endm > > + > > #endif /* CONFIG_X86_64 */ > > > > /* > > diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S > > index 4396278..4006fa3 100644 > > --- a/arch/x86/entry/entry_32.S > > +++ b/arch/x86/entry/entry_32.S > > @@ -174,6 +174,23 @@ > > SET_KERNEL_GS %edx > > .endm > > > > +/* > > + * This is a sneaky trick to help the unwinder find pt_regs on the > > + * stack. The frame pointer is replaced with an encoded pointer to > > + * pt_regs. The encoding is just a clearing of the highest-order bit, > > + * which makes it an invalid address and is also a signal to the > > + * unwinder that it's a pt_regs pointer in disguise. > > + * > > + * NOTE: This macro must be used *after* SAVE_ALL because it corrupts the > > + * original rbp. > > + */ > > +.macro ENCODE_FRAME_POINTER > > +#ifdef CONFIG_FRAME_POINTER > > + mov %esp, %ebp > > + btr $31, %ebp > > +#endif > > +.endm > > + > > .macro RESTORE_INT_REGS > > popl %ebx > > popl %ecx > > @@ -205,10 +222,16 @@ > > .endm > > > > ENTRY(ret_from_fork) > > + call 1f > > pushl $ret_from_fork is the same length and slightly less strange. > OTOH it forces a relocation, and this function doesn't return, so > there shouldn't be any performance issue, so this may save a byte or > two in the compressed image. > > > +1: push $0 > > This could maybe use a comment.
Oops. This ret_from_fork bit was meant for a separate patch. I think the problem with "pushl $ret_from_fork" is that ret_from_fork+0x0 is not a valid call return address. printk_stack_address() will show it as the end of the previous function in the file. Anyway, this definitely needs a comment and should be split out to a separate patch. -- Josh