Re: stack frame unwindind KASAN errors

2017-03-07 Thread Josh Poimboeuf
On Tue, Mar 07, 2017 at 10:04:11AM +0800, Daniel J Blueman wrote:
> On 7 March 2017 at 00:40, Josh Poimboeuf  wrote:
> > On Mon, Mar 06, 2017 at 02:52:01PM +0800, Daniel J Blueman wrote:
> >> Thanks Josh!
> >>
> >> With this patch, the KASAN warning still occurs, but at
> >> unwind_get_return_address+0x1d3/0x130 instead; the rest of the trace
> >> is identical.
> >>
> >> (gdb) list *(unwind_get_return_address+0x1d3)
> >> 0x8112bca3 is in unwind_get_return_address
> >> (./include/linux/compiler.h:243).
> >> 238})
> >> 239
> >> 240static __always_inline
> >> 241void __read_once_size(const volatile void *p, void *res, int size)
> >> 242{
> >> 243__READ_ONCE_SIZE;
> >
> > Looking deeper, I have an idea about what's going on:
> >
> >   https://quora.org/dmesg.txt
> >
> > Each of the warnings seems to show an interrupt happening during an EFI
> > call.  I'm guessing EFI modified the frame pointer, at least
> > temporarily, which confused the unwinder :-(
> >
> > Would it be possible for you to test again with 4.10?  It has some
> > additional unwinder output which should hopefully confirm my suspicions.
> 
> Very good; I don't see the KASAN warnings with 4.10 in the same environment.

To be honest, that's not the result I was expecting :-)

It would be nice to know what fixed it.  Any chance you could bisect?

-- 
Josh


Re: stack frame unwindind KASAN errors

2017-03-06 Thread Daniel J Blueman
On 7 March 2017 at 00:40, Josh Poimboeuf  wrote:
> On Mon, Mar 06, 2017 at 02:52:01PM +0800, Daniel J Blueman wrote:
>> Thanks Josh!
>>
>> With this patch, the KASAN warning still occurs, but at
>> unwind_get_return_address+0x1d3/0x130 instead; the rest of the trace
>> is identical.
>>
>> (gdb) list *(unwind_get_return_address+0x1d3)
>> 0x8112bca3 is in unwind_get_return_address
>> (./include/linux/compiler.h:243).
>> 238})
>> 239
>> 240static __always_inline
>> 241void __read_once_size(const volatile void *p, void *res, int size)
>> 242{
>> 243__READ_ONCE_SIZE;
>
> Looking deeper, I have an idea about what's going on:
>
>   https://quora.org/dmesg.txt
>
> Each of the warnings seems to show an interrupt happening during an EFI
> call.  I'm guessing EFI modified the frame pointer, at least
> temporarily, which confused the unwinder :-(
>
> Would it be possible for you to test again with 4.10?  It has some
> additional unwinder output which should hopefully confirm my suspicions.

Very good; I don't see the KASAN warnings with 4.10 in the same environment.

Thanks,
  Daniel
-- 
Daniel J Blueman


Re: stack frame unwindind KASAN errors

2017-03-06 Thread Josh Poimboeuf
On Mon, Mar 06, 2017 at 02:52:01PM +0800, Daniel J Blueman wrote:
> Thanks Josh!
> 
> With this patch, the KASAN warning still occurs, but at
> unwind_get_return_address+0x1d3/0x130 instead; the rest of the trace
> is identical.
> 
> (gdb) list *(unwind_get_return_address+0x1d3)
> 0x8112bca3 is in unwind_get_return_address
> (./include/linux/compiler.h:243).
> 238})
> 239
> 240static __always_inline
> 241void __read_once_size(const volatile void *p, void *res, int size)
> 242{
> 243__READ_ONCE_SIZE;

Looking deeper, I have an idea about what's going on:

  https://quora.org/dmesg.txt

Each of the warnings seems to show an interrupt happening during an EFI
call.  I'm guessing EFI modified the frame pointer, at least
temporarily, which confused the unwinder :-(

Would it be possible for you to test again with 4.10?  It has some
additional unwinder output which should hopefully confirm my suspicions.

-- 
Josh


Re: stack frame unwindind KASAN errors

2017-03-05 Thread Daniel J Blueman
On 27 February 2017 at 23:47, Josh Poimboeuf  wrote:
> On Mon, Feb 27, 2017 at 12:49:59PM +0800, Daniel J Blueman wrote:
>> On 4.9.13 with KASAN enabled [1], we see a number of stack unwinding
>> errors reported [2,3].
>>
>> This seems to occur at half of boots.
>>
>> Let me know for further debug info/patch testing and thanks,
>>   Daniel
>>
>> [1] https://quora.org/config
>> [2] https://quora.org/dmesg.txt
>
> Hi Daniel,
>
> Can you try the following patch?  It's a backport of the following
> upstream commit:
>
>   09ae68dd0a8d ("x86/unwind: Disable KASAN checks for non-current tasks")
>
> If it fixes it then I'll submit it for 4.9 stable.
>
> ---
>
> From: Josh Poimboeuf 
> Subject: [PATCH] x86/unwind: Disable KASAN checks for non-current tasks
>
> There are a handful of callers to save_stack_trace_tsk() and
> show_stack() which try to unwind the stack of a task other than current.
> In such cases, it's remotely possible that the task is running on one
> CPU while the unwinder is reading its stack from another CPU, causing
> the unwinder to see stack corruption.
>
> These cases seem to be mostly harmless.  The unwinder has checks which
> prevent it from following bad pointers beyond the bounds of the stack.
> So it's not really a bug as long as the caller understands that
> unwinding another task will not always succeed.
>
> In such cases, it's possible that the unwinder may read a KASAN-poisoned
> region of the stack.  Account for that by using READ_ONCE_NOCHECK() when
> reading the stack of another task.
>
> Use READ_ONCE() when reading the stack of the current task, since KASAN
> warnings can still be useful for finding bugs in that case.
>
> Reported-by: Dmitry Vyukov 
> Signed-off-by: Josh Poimboeuf 
> Cc: Andy Lutomirski 
> Cc: Andy Lutomirski 
> Cc: Borislav Petkov 
> Cc: Brian Gerst 
> Cc: Dave Jones 
> Cc: Denys Vlasenko 
> Cc: H. Peter Anvin 
> Cc: Linus Torvalds 
> Cc: Miroslav Benes 
> Cc: Peter Zijlstra 
> Cc: Thomas Gleixner 
> Link: 
> http://lkml.kernel.org/r/4c575eb288ba9f73d498dfe0acde2f58674598f1.1483978430.git.jpoim...@redhat.com
> Signed-off-by: Ingo Molnar 
> ---
>  arch/x86/include/asm/stacktrace.h |  5 -
>  arch/x86/kernel/unwind_frame.c| 20 ++--
>  2 files changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/include/asm/stacktrace.h 
> b/arch/x86/include/asm/stacktrace.h
> index 37f2e0b..4141ead 100644
> --- a/arch/x86/include/asm/stacktrace.h
> +++ b/arch/x86/include/asm/stacktrace.h
> @@ -55,13 +55,16 @@ extern int kstack_depth_to_print;
>  static inline unsigned long *
>  get_frame_pointer(struct task_struct *task, struct pt_regs *regs)
>  {
> +   struct inactive_task_frame *frame;
> +
> if (regs)
> return (unsigned long *)regs->bp;
>
> if (task == current)
> return __builtin_frame_address(0);
>
> -   return (unsigned long *)((struct inactive_task_frame 
> *)task->thread.sp)->bp;
> +   frame = (struct inactive_task_frame *)task->thread.sp;
> +   return (unsigned long *)READ_ONCE_NOCHECK(frame->bp);
>  }
>  #else
>  static inline unsigned long *
> diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
> index a2456d4..caff129 100644
> --- a/arch/x86/kernel/unwind_frame.c
> +++ b/arch/x86/kernel/unwind_frame.c
> @@ -6,6 +6,21 @@
>
>  #define FRAME_HEADER_SIZE (sizeof(long) * 2)
>
> +/*
> + * This disables KASAN checking when reading a value from another task's 
> stack,
> + * since the other task could be running on another CPU and could have 
> poisoned
> + * the stack in the meantime.
> + */
> +#define READ_ONCE_TASK_STACK(task, x)  \
> +({ \
> +   unsigned long val;  \
> +   if (task == current)\
> +   val = READ_ONCE(x); \
> +   else\
> +   val = READ_ONCE_NOCHECK(x); \
> +   val;\
> +})
> +
>  unsigned long unwind_get_return_address(struct unwind_state *state)
>  {
> unsigned long addr;
> @@ -14,7 +29,8 @@ unsigned long unwind_get_return_address(struct unwind_state 
> *state)
> if (unwind_done(state))
> return 0;
>
> -   addr = ftrace_graph_ret_addr(state->task, &state->graph_idx, *addr_p,
> +   addr = READ_ONCE_TASK_STACK(state->task, *addr_p);
> +   addr = ftrace_graph_ret_addr(state->task, &state->graph_idx, addr,
>  addr_p);
>
> return __kernel_text_address(addr) ? addr : 0;
> @@ -48,7 +64,7 @@ bool unwind_next_frame(struct unwind_state *state)
> if (unwind_done(state))
> return false;
>
> -   next_bp = (unsigned long *)*state->bp;
> +   next_bp = (unsigned long *)READ_ONCE_TASK_STACK(state->task, 
> *state->bp);
>
> /* make sure the next fra

Re: stack frame unwindind KASAN errors

2017-02-27 Thread Josh Poimboeuf
On Mon, Feb 27, 2017 at 12:49:59PM +0800, Daniel J Blueman wrote:
> On 4.9.13 with KASAN enabled [1], we see a number of stack unwinding
> errors reported [2,3].
> 
> This seems to occur at half of boots.
> 
> Let me know for further debug info/patch testing and thanks,
>   Daniel
> 
> [1] https://quora.org/config
> [2] https://quora.org/dmesg.txt

Hi Daniel,

Can you try the following patch?  It's a backport of the following
upstream commit:

  09ae68dd0a8d ("x86/unwind: Disable KASAN checks for non-current tasks")

If it fixes it then I'll submit it for 4.9 stable.

---

From: Josh Poimboeuf 
Subject: [PATCH] x86/unwind: Disable KASAN checks for non-current tasks

There are a handful of callers to save_stack_trace_tsk() and
show_stack() which try to unwind the stack of a task other than current.
In such cases, it's remotely possible that the task is running on one
CPU while the unwinder is reading its stack from another CPU, causing
the unwinder to see stack corruption.

These cases seem to be mostly harmless.  The unwinder has checks which
prevent it from following bad pointers beyond the bounds of the stack.
So it's not really a bug as long as the caller understands that
unwinding another task will not always succeed.

In such cases, it's possible that the unwinder may read a KASAN-poisoned
region of the stack.  Account for that by using READ_ONCE_NOCHECK() when
reading the stack of another task.

Use READ_ONCE() when reading the stack of the current task, since KASAN
warnings can still be useful for finding bugs in that case.

Reported-by: Dmitry Vyukov 
Signed-off-by: Josh Poimboeuf 
Cc: Andy Lutomirski 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Dave Jones 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Miroslav Benes 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/4c575eb288ba9f73d498dfe0acde2f58674598f1.1483978430.git.jpoim...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/stacktrace.h |  5 -
 arch/x86/kernel/unwind_frame.c| 20 ++--
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/stacktrace.h 
b/arch/x86/include/asm/stacktrace.h
index 37f2e0b..4141ead 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -55,13 +55,16 @@ extern int kstack_depth_to_print;
 static inline unsigned long *
 get_frame_pointer(struct task_struct *task, struct pt_regs *regs)
 {
+   struct inactive_task_frame *frame;
+
if (regs)
return (unsigned long *)regs->bp;
 
if (task == current)
return __builtin_frame_address(0);
 
-   return (unsigned long *)((struct inactive_task_frame 
*)task->thread.sp)->bp;
+   frame = (struct inactive_task_frame *)task->thread.sp;
+   return (unsigned long *)READ_ONCE_NOCHECK(frame->bp);
 }
 #else
 static inline unsigned long *
diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
index a2456d4..caff129 100644
--- a/arch/x86/kernel/unwind_frame.c
+++ b/arch/x86/kernel/unwind_frame.c
@@ -6,6 +6,21 @@
 
 #define FRAME_HEADER_SIZE (sizeof(long) * 2)
 
+/*
+ * This disables KASAN checking when reading a value from another task's stack,
+ * since the other task could be running on another CPU and could have poisoned
+ * the stack in the meantime.
+ */
+#define READ_ONCE_TASK_STACK(task, x)  \
+({ \
+   unsigned long val;  \
+   if (task == current)\
+   val = READ_ONCE(x); \
+   else\
+   val = READ_ONCE_NOCHECK(x); \
+   val;\
+})
+
 unsigned long unwind_get_return_address(struct unwind_state *state)
 {
unsigned long addr;
@@ -14,7 +29,8 @@ unsigned long unwind_get_return_address(struct unwind_state 
*state)
if (unwind_done(state))
return 0;
 
-   addr = ftrace_graph_ret_addr(state->task, &state->graph_idx, *addr_p,
+   addr = READ_ONCE_TASK_STACK(state->task, *addr_p);
+   addr = ftrace_graph_ret_addr(state->task, &state->graph_idx, addr,
 addr_p);
 
return __kernel_text_address(addr) ? addr : 0;
@@ -48,7 +64,7 @@ bool unwind_next_frame(struct unwind_state *state)
if (unwind_done(state))
return false;
 
-   next_bp = (unsigned long *)*state->bp;
+   next_bp = (unsigned long *)READ_ONCE_TASK_STACK(state->task, 
*state->bp);
 
/* make sure the next frame's data is accessible */
if (!update_stack_state(state, next_bp, FRAME_HEADER_SIZE))
-- 
2.7.4



stack frame unwindind KASAN errors

2017-02-26 Thread Daniel J Blueman
On 4.9.13 with KASAN enabled [1], we see a number of stack unwinding
errors reported [2,3].

This seems to occur at half of boots.

Let me know for further debug info/patch testing and thanks,
  Daniel

[1] https://quora.org/config
[2] https://quora.org/dmesg.txt

-- [3]

BUG: KASAN: stack-out-of-bounds in
unwind_get_return_address+0x11d/0x130 at addr 881034eafa08
Read of size 8 by task systemd/1
page:ea0040d3abc0 count:0 mapcount:0 mapping: (null) index:0x0
flags: 0x2f8000()
page dumped because: kasan: bad access detected
CPU: 20 PID: 1 Comm: systemd Not tainted 4.9.13-debug+ #3
Hardware name: Supermicro Super Server/X10DRL-i, BIOS 2.0a 08/25/2016
 881c2f607a60 b0cdb541 881c2f607af8 881034eafa08
 881c2f607ae8 b064dd17 881034ea4f70 0024
  0286 881034ea4fe2 
Call Trace:
 
 [] dump_stack+0x85/0xc4
 [] kasan_report_error+0x4d7/0x500
 [] __asan_report_load8_noabort+0x61/0x70
 [] ? unwind_get_return_address+0x11d/0x130
 [] unwind_get_return_address+0x11d/0x130
 [] ? unwind_next_frame+0x97/0xf0
 [] __save_stack_trace+0x92/0x100
 [] ? file_free_rcu+0x46/0x60
 [] save_stack_trace+0x1b/0x20
 [] save_stack+0x46/0xd0
 [] ? save_stack_trace+0x1b/0x20
 [] ? save_stack+0x46/0xd0
 [] ? kasan_slab_free+0x71/0xb0
 [] ? kmem_cache_free+0xc4/0x350
 [] ? file_free_rcu+0x46/0x60
 [] ? rcu_process_callbacks+0x9d2/0x1220
 [] ? __do_softirq+0x286/0x87d
 [] ? irq_exit+0x160/0x190
 [] ? smp_apic_timer_interrupt+0x80/0xa0
 [] ? apic_timer_interrupt+0x8c/0xa0
 [] ? debug_check_no_locks_freed+0x290/0x290
 [] ? debug_object_deactivate+0xf8/0x320
 [] ? _raw_spin_unlock_irqrestore+0x5f/0x80
 [] ? trace_hardirqs_on_caller+0x19e/0x580
 [] ? _raw_spin_unlock_irqrestore+0x4a/0x80
 [] ? mark_held_locks+0xc8/0x120
 [] ? kmem_cache_free+0xaf/0x350
 [] ? file_free_rcu+0x46/0x60
 [] kasan_slab_free+0x71/0xb0
 [] kmem_cache_free+0xc4/0x350
 [] file_free_rcu+0x46/0x60
 [] rcu_process_callbacks+0x9d2/0x1220
 [] ? rcu_process_callbacks+0x97d/0x1220
 [] ? get_max_files+0x20/0x20
 [] __do_softirq+0x286/0x87d
 [] irq_exit+0x160/0x190
 [] smp_apic_timer_interrupt+0x80/0xa0
 [] apic_timer_interrupt+0x8c/0xa0
 
 [] ? save_stack+0x46/0xd0
 [] ? debug_check_no_locks_freed+0x290/0x290
 [] ? mark_held_locks+0xc8/0x120
 [] ? efi_call+0x58/0x90
 [] ? virt_efi_get_variable+0x9c/0x150
 [] ? efivar_entry_size+0xa4/0x110
 [] ? efivarfs_callback+0x30f/0x4e7
 [] ? efivarfs_evict_inode+0x10/0x10
 [] mark_held_locks+0xc8/0x120
 [] ? _raw_spin_unlock_irqrestore+0x5f/0x80
 [] ? _raw_spin_unlock_irqrestore+0x4a/0x80
 [] ? efivar_init+0x512/0x750
 [] ? efivarfs_evict_inode+0x10/0x10
 [] ? efivar_entry_iter+0x140/0x140
 [] ? debug_lockdep_rcu_enabled+0x77/0x90
 [] ? d_instantiate+0x6f/0x80
 [] ? _raw_spin_unlock+0x31/0x50
 [] ? _raw_spin_unlock+0x31/0x50
 [] ? d_instantiate+0x6f/0x80
 [] ? efivarfs_mount+0x20/0x20
 [] ? efivarfs_fill_super+0x1ea/0x290
 [] ? efivarfs_mount+0x20/0x20
 [] ? mount_single+0xcc/0x130
 [] ? efivarfs_mount+0x18/0x20
 [] ? mount_fs+0x81/0x2c0
 [] ? alloc_vfsmnt+0x451/0x720
 [] ? vfs_kern_mount+0x6b/0x370
 [] ? do_mount+0x355/0x2af0
 [] ? debug_lockdep_rcu_enabled+0x77/0x90
 [] ? copy_mount_string+0x20/0x20
 [] ? __might_fault+0xf6/0x1b0
 [] ? __check_object_size+0x1b4/0x3fe
 [] ? memdup_user+0x6b/0xa0
 [] ? SyS_mount+0x95/0xe0
 [] ? entry_SYSCALL_64_fastpath+0x23/0xc6
Memory state around the buggy address:
 881034eaf900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 881034eaf980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1
>881034eafa00: f1 f1 00 f4 f4 f4 f2 f2 f2 f2 00 00 f4 f4 f3 f3
   ^
 881034eafa80: f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 881034eafb00: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f4
Disabling lock debugging due to kernel taint
-- 
Daniel J Blueman