Re: stack frame unwindind KASAN errors
On Tue, Mar 07, 2017 at 10:04:11AM +0800, Daniel J Blueman wrote: > On 7 March 2017 at 00:40, Josh Poimboeuf wrote: > > On Mon, Mar 06, 2017 at 02:52:01PM +0800, Daniel J Blueman wrote: > >> Thanks Josh! > >> > >> With this patch, the KASAN warning still occurs, but at > >> unwind_get_return_address+0x1d3/0x130 instead; the rest of the trace > >> is identical. > >> > >> (gdb) list *(unwind_get_return_address+0x1d3) > >> 0x8112bca3 is in unwind_get_return_address > >> (./include/linux/compiler.h:243). > >> 238}) > >> 239 > >> 240static __always_inline > >> 241void __read_once_size(const volatile void *p, void *res, int size) > >> 242{ > >> 243__READ_ONCE_SIZE; > > > > Looking deeper, I have an idea about what's going on: > > > > https://quora.org/dmesg.txt > > > > Each of the warnings seems to show an interrupt happening during an EFI > > call. I'm guessing EFI modified the frame pointer, at least > > temporarily, which confused the unwinder :-( > > > > Would it be possible for you to test again with 4.10? It has some > > additional unwinder output which should hopefully confirm my suspicions. > > Very good; I don't see the KASAN warnings with 4.10 in the same environment. To be honest, that's not the result I was expecting :-) It would be nice to know what fixed it. Any chance you could bisect? -- Josh
Re: stack frame unwindind KASAN errors
On 7 March 2017 at 00:40, Josh Poimboeuf wrote: > On Mon, Mar 06, 2017 at 02:52:01PM +0800, Daniel J Blueman wrote: >> Thanks Josh! >> >> With this patch, the KASAN warning still occurs, but at >> unwind_get_return_address+0x1d3/0x130 instead; the rest of the trace >> is identical. >> >> (gdb) list *(unwind_get_return_address+0x1d3) >> 0x8112bca3 is in unwind_get_return_address >> (./include/linux/compiler.h:243). >> 238}) >> 239 >> 240static __always_inline >> 241void __read_once_size(const volatile void *p, void *res, int size) >> 242{ >> 243__READ_ONCE_SIZE; > > Looking deeper, I have an idea about what's going on: > > https://quora.org/dmesg.txt > > Each of the warnings seems to show an interrupt happening during an EFI > call. I'm guessing EFI modified the frame pointer, at least > temporarily, which confused the unwinder :-( > > Would it be possible for you to test again with 4.10? It has some > additional unwinder output which should hopefully confirm my suspicions. Very good; I don't see the KASAN warnings with 4.10 in the same environment. Thanks, Daniel -- Daniel J Blueman
Re: stack frame unwindind KASAN errors
On Mon, Mar 06, 2017 at 02:52:01PM +0800, Daniel J Blueman wrote: > Thanks Josh! > > With this patch, the KASAN warning still occurs, but at > unwind_get_return_address+0x1d3/0x130 instead; the rest of the trace > is identical. > > (gdb) list *(unwind_get_return_address+0x1d3) > 0x8112bca3 is in unwind_get_return_address > (./include/linux/compiler.h:243). > 238}) > 239 > 240static __always_inline > 241void __read_once_size(const volatile void *p, void *res, int size) > 242{ > 243__READ_ONCE_SIZE; Looking deeper, I have an idea about what's going on: https://quora.org/dmesg.txt Each of the warnings seems to show an interrupt happening during an EFI call. I'm guessing EFI modified the frame pointer, at least temporarily, which confused the unwinder :-( Would it be possible for you to test again with 4.10? It has some additional unwinder output which should hopefully confirm my suspicions. -- Josh
Re: stack frame unwindind KASAN errors
On 27 February 2017 at 23:47, Josh Poimboeuf wrote: > On Mon, Feb 27, 2017 at 12:49:59PM +0800, Daniel J Blueman wrote: >> On 4.9.13 with KASAN enabled [1], we see a number of stack unwinding >> errors reported [2,3]. >> >> This seems to occur at half of boots. >> >> Let me know for further debug info/patch testing and thanks, >> Daniel >> >> [1] https://quora.org/config >> [2] https://quora.org/dmesg.txt > > Hi Daniel, > > Can you try the following patch? It's a backport of the following > upstream commit: > > 09ae68dd0a8d ("x86/unwind: Disable KASAN checks for non-current tasks") > > If it fixes it then I'll submit it for 4.9 stable. > > --- > > From: Josh Poimboeuf > Subject: [PATCH] x86/unwind: Disable KASAN checks for non-current tasks > > There are a handful of callers to save_stack_trace_tsk() and > show_stack() which try to unwind the stack of a task other than current. > In such cases, it's remotely possible that the task is running on one > CPU while the unwinder is reading its stack from another CPU, causing > the unwinder to see stack corruption. > > These cases seem to be mostly harmless. The unwinder has checks which > prevent it from following bad pointers beyond the bounds of the stack. > So it's not really a bug as long as the caller understands that > unwinding another task will not always succeed. > > In such cases, it's possible that the unwinder may read a KASAN-poisoned > region of the stack. Account for that by using READ_ONCE_NOCHECK() when > reading the stack of another task. > > Use READ_ONCE() when reading the stack of the current task, since KASAN > warnings can still be useful for finding bugs in that case. > > Reported-by: Dmitry Vyukov > Signed-off-by: Josh Poimboeuf > Cc: Andy Lutomirski > Cc: Andy Lutomirski > Cc: Borislav Petkov > Cc: Brian Gerst > Cc: Dave Jones > Cc: Denys Vlasenko > Cc: H. Peter Anvin > Cc: Linus Torvalds > Cc: Miroslav Benes > Cc: Peter Zijlstra > Cc: Thomas Gleixner > Link: > http://lkml.kernel.org/r/4c575eb288ba9f73d498dfe0acde2f58674598f1.1483978430.git.jpoim...@redhat.com > Signed-off-by: Ingo Molnar > --- > arch/x86/include/asm/stacktrace.h | 5 - > arch/x86/kernel/unwind_frame.c| 20 ++-- > 2 files changed, 22 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/include/asm/stacktrace.h > b/arch/x86/include/asm/stacktrace.h > index 37f2e0b..4141ead 100644 > --- a/arch/x86/include/asm/stacktrace.h > +++ b/arch/x86/include/asm/stacktrace.h > @@ -55,13 +55,16 @@ extern int kstack_depth_to_print; > static inline unsigned long * > get_frame_pointer(struct task_struct *task, struct pt_regs *regs) > { > + struct inactive_task_frame *frame; > + > if (regs) > return (unsigned long *)regs->bp; > > if (task == current) > return __builtin_frame_address(0); > > - return (unsigned long *)((struct inactive_task_frame > *)task->thread.sp)->bp; > + frame = (struct inactive_task_frame *)task->thread.sp; > + return (unsigned long *)READ_ONCE_NOCHECK(frame->bp); > } > #else > static inline unsigned long * > diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c > index a2456d4..caff129 100644 > --- a/arch/x86/kernel/unwind_frame.c > +++ b/arch/x86/kernel/unwind_frame.c > @@ -6,6 +6,21 @@ > > #define FRAME_HEADER_SIZE (sizeof(long) * 2) > > +/* > + * This disables KASAN checking when reading a value from another task's > stack, > + * since the other task could be running on another CPU and could have > poisoned > + * the stack in the meantime. > + */ > +#define READ_ONCE_TASK_STACK(task, x) \ > +({ \ > + unsigned long val; \ > + if (task == current)\ > + val = READ_ONCE(x); \ > + else\ > + val = READ_ONCE_NOCHECK(x); \ > + val;\ > +}) > + > unsigned long unwind_get_return_address(struct unwind_state *state) > { > unsigned long addr; > @@ -14,7 +29,8 @@ unsigned long unwind_get_return_address(struct unwind_state > *state) > if (unwind_done(state)) > return 0; > > - addr = ftrace_graph_ret_addr(state->task, &state->graph_idx, *addr_p, > + addr = READ_ONCE_TASK_STACK(state->task, *addr_p); > + addr = ftrace_graph_ret_addr(state->task, &state->graph_idx, addr, > addr_p); > > return __kernel_text_address(addr) ? addr : 0; > @@ -48,7 +64,7 @@ bool unwind_next_frame(struct unwind_state *state) > if (unwind_done(state)) > return false; > > - next_bp = (unsigned long *)*state->bp; > + next_bp = (unsigned long *)READ_ONCE_TASK_STACK(state->task, > *state->bp); > > /* make sure the next fra
Re: stack frame unwindind KASAN errors
On Mon, Feb 27, 2017 at 12:49:59PM +0800, Daniel J Blueman wrote: > On 4.9.13 with KASAN enabled [1], we see a number of stack unwinding > errors reported [2,3]. > > This seems to occur at half of boots. > > Let me know for further debug info/patch testing and thanks, > Daniel > > [1] https://quora.org/config > [2] https://quora.org/dmesg.txt Hi Daniel, Can you try the following patch? It's a backport of the following upstream commit: 09ae68dd0a8d ("x86/unwind: Disable KASAN checks for non-current tasks") If it fixes it then I'll submit it for 4.9 stable. --- From: Josh Poimboeuf Subject: [PATCH] x86/unwind: Disable KASAN checks for non-current tasks There are a handful of callers to save_stack_trace_tsk() and show_stack() which try to unwind the stack of a task other than current. In such cases, it's remotely possible that the task is running on one CPU while the unwinder is reading its stack from another CPU, causing the unwinder to see stack corruption. These cases seem to be mostly harmless. The unwinder has checks which prevent it from following bad pointers beyond the bounds of the stack. So it's not really a bug as long as the caller understands that unwinding another task will not always succeed. In such cases, it's possible that the unwinder may read a KASAN-poisoned region of the stack. Account for that by using READ_ONCE_NOCHECK() when reading the stack of another task. Use READ_ONCE() when reading the stack of the current task, since KASAN warnings can still be useful for finding bugs in that case. Reported-by: Dmitry Vyukov Signed-off-by: Josh Poimboeuf Cc: Andy Lutomirski Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Jones Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Miroslav Benes Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/4c575eb288ba9f73d498dfe0acde2f58674598f1.1483978430.git.jpoim...@redhat.com Signed-off-by: Ingo Molnar --- arch/x86/include/asm/stacktrace.h | 5 - arch/x86/kernel/unwind_frame.c| 20 ++-- 2 files changed, 22 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h index 37f2e0b..4141ead 100644 --- a/arch/x86/include/asm/stacktrace.h +++ b/arch/x86/include/asm/stacktrace.h @@ -55,13 +55,16 @@ extern int kstack_depth_to_print; static inline unsigned long * get_frame_pointer(struct task_struct *task, struct pt_regs *regs) { + struct inactive_task_frame *frame; + if (regs) return (unsigned long *)regs->bp; if (task == current) return __builtin_frame_address(0); - return (unsigned long *)((struct inactive_task_frame *)task->thread.sp)->bp; + frame = (struct inactive_task_frame *)task->thread.sp; + return (unsigned long *)READ_ONCE_NOCHECK(frame->bp); } #else static inline unsigned long * diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c index a2456d4..caff129 100644 --- a/arch/x86/kernel/unwind_frame.c +++ b/arch/x86/kernel/unwind_frame.c @@ -6,6 +6,21 @@ #define FRAME_HEADER_SIZE (sizeof(long) * 2) +/* + * This disables KASAN checking when reading a value from another task's stack, + * since the other task could be running on another CPU and could have poisoned + * the stack in the meantime. + */ +#define READ_ONCE_TASK_STACK(task, x) \ +({ \ + unsigned long val; \ + if (task == current)\ + val = READ_ONCE(x); \ + else\ + val = READ_ONCE_NOCHECK(x); \ + val;\ +}) + unsigned long unwind_get_return_address(struct unwind_state *state) { unsigned long addr; @@ -14,7 +29,8 @@ unsigned long unwind_get_return_address(struct unwind_state *state) if (unwind_done(state)) return 0; - addr = ftrace_graph_ret_addr(state->task, &state->graph_idx, *addr_p, + addr = READ_ONCE_TASK_STACK(state->task, *addr_p); + addr = ftrace_graph_ret_addr(state->task, &state->graph_idx, addr, addr_p); return __kernel_text_address(addr) ? addr : 0; @@ -48,7 +64,7 @@ bool unwind_next_frame(struct unwind_state *state) if (unwind_done(state)) return false; - next_bp = (unsigned long *)*state->bp; + next_bp = (unsigned long *)READ_ONCE_TASK_STACK(state->task, *state->bp); /* make sure the next frame's data is accessible */ if (!update_stack_state(state, next_bp, FRAME_HEADER_SIZE)) -- 2.7.4
stack frame unwindind KASAN errors
On 4.9.13 with KASAN enabled [1], we see a number of stack unwinding errors reported [2,3]. This seems to occur at half of boots. Let me know for further debug info/patch testing and thanks, Daniel [1] https://quora.org/config [2] https://quora.org/dmesg.txt -- [3] BUG: KASAN: stack-out-of-bounds in unwind_get_return_address+0x11d/0x130 at addr 881034eafa08 Read of size 8 by task systemd/1 page:ea0040d3abc0 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x2f8000() page dumped because: kasan: bad access detected CPU: 20 PID: 1 Comm: systemd Not tainted 4.9.13-debug+ #3 Hardware name: Supermicro Super Server/X10DRL-i, BIOS 2.0a 08/25/2016 881c2f607a60 b0cdb541 881c2f607af8 881034eafa08 881c2f607ae8 b064dd17 881034ea4f70 0024 0286 881034ea4fe2 Call Trace: [] dump_stack+0x85/0xc4 [] kasan_report_error+0x4d7/0x500 [] __asan_report_load8_noabort+0x61/0x70 [] ? unwind_get_return_address+0x11d/0x130 [] unwind_get_return_address+0x11d/0x130 [] ? unwind_next_frame+0x97/0xf0 [] __save_stack_trace+0x92/0x100 [] ? file_free_rcu+0x46/0x60 [] save_stack_trace+0x1b/0x20 [] save_stack+0x46/0xd0 [] ? save_stack_trace+0x1b/0x20 [] ? save_stack+0x46/0xd0 [] ? kasan_slab_free+0x71/0xb0 [] ? kmem_cache_free+0xc4/0x350 [] ? file_free_rcu+0x46/0x60 [] ? rcu_process_callbacks+0x9d2/0x1220 [] ? __do_softirq+0x286/0x87d [] ? irq_exit+0x160/0x190 [] ? smp_apic_timer_interrupt+0x80/0xa0 [] ? apic_timer_interrupt+0x8c/0xa0 [] ? debug_check_no_locks_freed+0x290/0x290 [] ? debug_object_deactivate+0xf8/0x320 [] ? _raw_spin_unlock_irqrestore+0x5f/0x80 [] ? trace_hardirqs_on_caller+0x19e/0x580 [] ? _raw_spin_unlock_irqrestore+0x4a/0x80 [] ? mark_held_locks+0xc8/0x120 [] ? kmem_cache_free+0xaf/0x350 [] ? file_free_rcu+0x46/0x60 [] kasan_slab_free+0x71/0xb0 [] kmem_cache_free+0xc4/0x350 [] file_free_rcu+0x46/0x60 [] rcu_process_callbacks+0x9d2/0x1220 [] ? rcu_process_callbacks+0x97d/0x1220 [] ? get_max_files+0x20/0x20 [] __do_softirq+0x286/0x87d [] irq_exit+0x160/0x190 [] smp_apic_timer_interrupt+0x80/0xa0 [] apic_timer_interrupt+0x8c/0xa0 [] ? save_stack+0x46/0xd0 [] ? debug_check_no_locks_freed+0x290/0x290 [] ? mark_held_locks+0xc8/0x120 [] ? efi_call+0x58/0x90 [] ? virt_efi_get_variable+0x9c/0x150 [] ? efivar_entry_size+0xa4/0x110 [] ? efivarfs_callback+0x30f/0x4e7 [] ? efivarfs_evict_inode+0x10/0x10 [] mark_held_locks+0xc8/0x120 [] ? _raw_spin_unlock_irqrestore+0x5f/0x80 [] ? _raw_spin_unlock_irqrestore+0x4a/0x80 [] ? efivar_init+0x512/0x750 [] ? efivarfs_evict_inode+0x10/0x10 [] ? efivar_entry_iter+0x140/0x140 [] ? debug_lockdep_rcu_enabled+0x77/0x90 [] ? d_instantiate+0x6f/0x80 [] ? _raw_spin_unlock+0x31/0x50 [] ? _raw_spin_unlock+0x31/0x50 [] ? d_instantiate+0x6f/0x80 [] ? efivarfs_mount+0x20/0x20 [] ? efivarfs_fill_super+0x1ea/0x290 [] ? efivarfs_mount+0x20/0x20 [] ? mount_single+0xcc/0x130 [] ? efivarfs_mount+0x18/0x20 [] ? mount_fs+0x81/0x2c0 [] ? alloc_vfsmnt+0x451/0x720 [] ? vfs_kern_mount+0x6b/0x370 [] ? do_mount+0x355/0x2af0 [] ? debug_lockdep_rcu_enabled+0x77/0x90 [] ? copy_mount_string+0x20/0x20 [] ? __might_fault+0xf6/0x1b0 [] ? __check_object_size+0x1b4/0x3fe [] ? memdup_user+0x6b/0xa0 [] ? SyS_mount+0x95/0xe0 [] ? entry_SYSCALL_64_fastpath+0x23/0xc6 Memory state around the buggy address: 881034eaf900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 881034eaf980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 >881034eafa00: f1 f1 00 f4 f4 f4 f2 f2 f2 f2 00 00 f4 f4 f3 f3 ^ 881034eafa80: f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 881034eafb00: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f4 Disabling lock debugging due to kernel taint -- Daniel J Blueman