Re: [PATCH v3 0/6] Tracing vs CR2

2019-07-20 Thread Eiichi Tsukata


On 2019/07/20 21:49, Andy Lutomirski wrote:
> On Fri, Jul 19, 2019 at 8:59 PM Eiichi Tsukata  wrote:
>>
...
>>
>> 
>>
>> debug() // dr6: 0x4ff0, user_mode: 1
>>   TRACE_IRQS_OFF
>> arch_stack_user_walk()
>>   debug()  // dr6: 0x4ff1 == 0x4ff0 | 0x0ff1 ... (*)
>> do_debug()
>>   WARN_ON_ONCE
>>   do_debug() // dr6: 0x0ff0(cleared in the above do_debug())
> 
> The dr6 register will indeed be cleared like this, but the dr6
> variable should still be 0x4ff0.

I should have use DR6 to mean it is a register, not variable.
"dr6" was ambiguous.

> 
>>
...
>>
>> Note: printk() in do_debug() can cause infinite loop(printk() ->
>> irq_disable() -> do_debug() -> printk() ...), so printk_deferred()
>> was preferable.
>>
> 
> Shouldn't that be fixed with my patches?  It should only be able to
> recurse two deep: do_debug() from user mode can indeed trip
> breakpoints, but the next do_debug() will clear DR7 in paranoid_entry.
> 

Sorry, I missed that. Now I confirmed your patches fixed the loop.

Thanks

Eiichi 




Re: [PATCH v3 0/6] Tracing vs CR2

2019-07-20 Thread Andy Lutomirski
On Fri, Jul 19, 2019 at 8:59 PM Eiichi Tsukata  wrote:
>
>
> On 2019/07/19 5:27, Andy Lutomirski wrote:
> > Hi all-
> >
> > I suspect that a bunch of the bugs you're all finding boil down to:
> >
> >  - Nested debug exceptions could corrupt the outer exception's DR6.
> >  - Nested debug exceptions in which *both* exceptions came from the
> > kernel were probably all kinds of buggy
> >  - Data breakpoints in bad places in the kernel were bad news
> >
> > Could you give this not-quite-finished series a try?
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/
> >
>
> Though I'm still trying to find out other cases(other areas which could
> be buggy if we set hw breakpoints), as far as I tested, there is
> no problem so far.
>
> If I understand correctly, the call trace and the dr6 value will be:
>
> 
>
> debug() // dr6: 0x4ff0, user_mode: 1
>   TRACE_IRQS_OFF
> arch_stack_user_walk()
>   debug()  // dr6: 0x4ff1 == 0x4ff0 | 0x0ff1 ... (*)
> do_debug()
>   WARN_ON_ONCE
>   do_debug() // dr6: 0x0ff0(cleared in the above do_debug())

The dr6 register will indeed be cleared like this, but the dr6
variable should still be 0x4ff0.

>
> (*) :
> >   * The Intel SDM says:
> >   *
> >   *   Certain debug exceptions may clear bits 0-3. The remaining
> >   *   contents of the DR6 register are never cleared by the
> >   *   processor. To avoid confusion in identifying debug
> >   *   exceptions, debug handlers should clear the register before
> >   *   returning to the interrupted task.
>
> 
>
> Note: printk() in do_debug() can cause infinite loop(printk() ->
> irq_disable() -> do_debug() -> printk() ...), so printk_deferred()
> was preferable.
>

Shouldn't that be fixed with my patches?  It should only be able to
recurse two deep: do_debug() from user mode can indeed trip
breakpoints, but the next do_debug() will clear DR7 in paranoid_entry.


Re: [PATCH v3 0/6] Tracing vs CR2

2019-07-19 Thread Eiichi Tsukata


On 2019/07/19 5:27, Andy Lutomirski wrote:
> Hi all-
> 
> I suspect that a bunch of the bugs you're all finding boil down to:
> 
>  - Nested debug exceptions could corrupt the outer exception's DR6.
>  - Nested debug exceptions in which *both* exceptions came from the
> kernel were probably all kinds of buggy
>  - Data breakpoints in bad places in the kernel were bad news
> 
> Could you give this not-quite-finished series a try?
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/
> 

Though I'm still trying to find out other cases(other areas which could
be buggy if we set hw breakpoints), as far as I tested, there is
no problem so far.

If I understand correctly, the call trace and the dr6 value will be: 



debug() // dr6: 0x4ff0, user_mode: 1
  TRACE_IRQS_OFF
arch_stack_user_walk()
  debug()  // dr6: 0x4ff1 == 0x4ff0 | 0x0ff1 ... (*)
do_debug()
  WARN_ON_ONCE
  do_debug() // dr6: 0x0ff0(cleared in the above do_debug())

(*) :
>   * The Intel SDM says:
>   *
>   *   Certain debug exceptions may clear bits 0-3. The remaining
>   *   contents of the DR6 register are never cleared by the
>   *   processor. To avoid confusion in identifying debug
>   *   exceptions, debug handlers should clear the register before
>   *   returning to the interrupted task.



Note: printk() in do_debug() can cause infinite loop(printk() -> 
irq_disable() -> do_debug() -> printk() ...), so printk_deferred()
was preferable.

Thanks

Eiichi


Re: [PATCH v3 0/6] Tracing vs CR2

2019-07-18 Thread Andy Lutomirski
Hi all-

I suspect that a bunch of the bugs you're all finding boil down to:

 - Nested debug exceptions could corrupt the outer exception's DR6.
 - Nested debug exceptions in which *both* exceptions came from the
kernel were probably all kinds of buggy
 - Data breakpoints in bad places in the kernel were bad news

Could you give this not-quite-finished series a try?

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/


Re: [PATCH v3 0/6] Tracing vs CR2

2019-07-17 Thread Eiichi Tsukata



On 2019/07/17 6:51, Vegard Nossum wrote:
> 
...
> 
> Got a different one:
> 
> WARNING: CPU: 0 PID: 2150 at arch/x86/kernel/traps.c:791 do_debug+0xfe/0x240
> CPU: 0 PID: 2150 Comm: init Not tainted 5.2.0+ #124
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> Ubuntu-1.8.2-1ubuntu1 04/01/2014
> RIP: 0010:do_debug+0xfe/0x240
...


Hello Vegard

I found a way to reproduce #DB WARNING by setting hardware watchpoint to
the address arch_stack_walk_user() will touch.


[Steps to Reproduce #DB WARNING]

poc.s:

```
.global _start

.text
_start:
# exit(0)
mov $60, %rax
xor %rdi, %rdi
syscall
```

build:

  # gcc -g -c poc.s; ld -o poc poc.o

setup ftrace:

  # echo 1 > options/userstacktrace
  # echo 1 > events/preemptirq/irq_disable/enable

exec gdb:(set hardware watch point to $rbp)

  [18:28:48 root@vm loops]# gdb ./poc
  GNU gdb (GDB) Fedora 8.3-6.fc30
  Copyright (C) 2019 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later 
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.
  Type "show copying" and "show warranty" for details.
  This GDB was configured as "x86_64-redhat-linux-gnu".
  Type "show configuration" for configuration details.
  For bug reporting instructions, please see:
  .
  Find the GDB manual and other documentation resources online at:
  .

  For help, type "help".
  Type "apropos word" to search for commands related to "word"...
  Reading symbols from ./poc...
  (gdb) l
  1   .global _start
  2
  3   .text
  4   _start:
  5   # exit(0)
  6   mov $60, %rax
  7   xor %rdi, %rdi
  8   syscall
  (gdb) b 6
  Breakpoint 1 at 0x401000: file poc.s, line 6.
  (gdb) start
  Function "main" not defined.
  Make breakpoint pending on future shared library load? (y or [n]) n
  Starting program: /root/tmp/loops/poc

  Breakpoint 1, _start () at poc.s:6
  6   mov $60, %rax
  (gdb) set $rbp = $rsp
  (gdb) p $rbp
  $1 = (void *) 0x7fffe4b0
  (gdb) rwatch *0x7fffe4b0
  Hardware read watchpoint 2: *0x7fffe4b0
  (gdb) c
  Continuing.
  [Inferior 1 (process 2744) exited normally]

dmesg:

[  564.646159][ T2744] WARNING: CPU: 0 PID: 2744 at arch/x86/kernel/traps.c:791 
do_debug+0x220/0x490
[  564.648581][ T2744] Modules linked in:
[  564.649530][ T2744] CPU: 0 PID: 2744 Comm: poc Tainted: GW 
5.2.0+ #77
[  564.651121][ T2744] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.12.0-2.fc30 04/01/2014
[  564.653569][ T2744] RIP: 0010:do_debug+0x220/0x490
[  564.654847][ T2744] Code: 00 48 8b 95 60 ff ff ff 48 b8 00 00 00 00 00 fc ff 
df 48 c1 ea 03 80 3c 02 00 0f 85 03 02 00 00 41 f6 87 88 00 00 00 03 75 60 <0f> 
0b 4c 89 f2 49 81 e5 ff bf ff ff 48 b8 00 00 00 00 00 fc ff df
[  564.659905][ T2744] RSP: :fe014e98 EFLAGS: 00010046
[  564.661500][ T2744] RAX: dc00 RBX: 1fc029d8 RCX: 
11100f81c2d3
[  564.663531][ T2744] RDX: 1fc029fc RSI:  RDI: 
85c19f00
[  564.665553][ T2744] RBP: fe014f48 R08: fe014fe8 R09: 
88807c0e08a0
[  564.667637][ T2744] R10: 0001 R11: 11100d1042ba R12: 
88807c0e
[  564.669700][ T2744] R13: 4001 R14: 88807c0e1698 R15: 
fe014f58
[  564.671768][ T2744] FS:  () GS:88806880() 
knlGS:
[  564.674032][ T2744] CS:  0010 DS:  ES:  CR0: 80050033
[  564.675752][ T2744] CR2: 0001 CR3: 5fe08002 CR4: 
00160ef0
[  564.677570][ T2744] DR0: 7fffe4b0 DR1:  DR2: 

[  564.679686][ T2744] DR3:  DR6: 0ff0 DR7: 
000f0602
[  564.681788][ T2744] Call Trace:
[  564.682700][ T2744]  <#DB>
[  564.683492][ T2744]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[  564.684954][ T2744]  ? do_int3+0x1f0/0x1f0
[  564.686074][ T2744]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[  564.687512][ T2744]  debug+0x2d/0x70
[  564.688456][ T2744] RIP: 0010:arch_stack_walk_user+0x7d/0xf2
[  564.689899][ T2744] Code: 00 0f 85 8d 00 00 00 49 8b 87 d8 16 00 00 48 83 e8 
10 49 39 c6 77 32 41 83 87 e8 15 00 00 01 0f 1f 00 0f ae e8 31 c0 49 8b 0e <85> 
c0 75 6d 49 8b 76 08 0f 1f 00 85 c0 74 1f 65 48 8b 04 25 00 ef
[  564.694763][ T2744] RSP: :888061fb7c48 EFLAGS: 0046
[  564.696316][ T2744] RAX:  RBX: 88807c0e RCX: 
0001
[  564.698342][ T2744] RDX: 11100ba08e93 RSI: 00401009 RDI: 
888061fb7cbc
[  564.700323][ T2744] RBP: 888061fb7c80 R08: 11100ba08e93 R09: 
88805d04749c
[  564.702337][ T2744] R10: ed100ba08e9b R11: 88805d0474db R12: 
888061fb7cb0
[  564.704359][ T2744] 

Re: [PATCH v3 0/6] Tracing vs CR2

2019-07-17 Thread Vegard Nossum

On 7/17/19 10:07 AM, Peter Zijlstra wrote:

On Tue, Jul 16, 2019 at 09:33:50PM +0200, Vegard Nossum wrote:

[ cut here ]
General protection fault in user access. Non-canonical address?
WARNING: CPU: 0 PID: 5039 at arch/x86/mm/extable.c:126
ex_handler_uaccess+0x5d/0x70

[...]



   https://lkml.kernel.org/r/57754f11-2c65-a2c8-2f6d-bfab0d2f8...@etsukata.com

Does something like the below help?

diff --git a/kernel/stacktrace.c b/kernel/stacktrace.c
index c8d0f05721a1..80ad4ccb7025 100644
--- a/kernel/stacktrace.c
+++ b/kernel/stacktrace.c
@@ -226,12 +226,16 @@ unsigned int stack_trace_save_user(unsigned long *store, 
unsigned int size)
.store  = store,
.size   = size,
};
+   mm_segment_t fs;
  
  	/* Trace user stack if not a kernel thread */

if (current->flags & PF_KTHREAD)
return 0;
  
+	fs = get_fs();

+   set_fs(USER_DS);
arch_stack_walk_user(consume_entry, , task_pt_regs(current));
+   set_fs(fs);
return c.len;
  }
  #endif



Yes.


Vegard


Re: [PATCH v3 0/6] Tracing vs CR2

2019-07-17 Thread Peter Zijlstra
On Tue, Jul 16, 2019 at 09:33:50PM +0200, Vegard Nossum wrote:
> [ cut here ]
> General protection fault in user access. Non-canonical address?
> WARNING: CPU: 0 PID: 5039 at arch/x86/mm/extable.c:126
> ex_handler_uaccess+0x5d/0x70
> CPU: 0 PID: 5039 Comm: init Not tainted 5.2.0+ #124
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> Ubuntu-1.8.2-1ubuntu1 04/01/2014
> RIP: 0010:ex_handler_uaccess+0x5d/0x70
> Code: 5d 41 5c c3 e8 c4 8e 0e 00 80 3d e5 74 1e 01 00 75 d3 e8 b6 8e 0e 00
> 48 c7 c7 10 a7 fb 81 c6 05 d0 74 1e 01 01 e8 d1 43 01 00 <0f> 0b eb b7 0f 1f
> 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89
> RSP: :fe00fc48 EFLAGS: 00010086
> RAX:  RBX: 81c07dac RCX: 811a887c
> RDX:  RSI: 8289f05f RDI: 0093
> RBP: fe00fcb8 R08: 0036fe0f15d3 R09: 003f
> R10:  R11:  R12: 000d
> R13: 000d R14:  R15: 
> FS:  563ab8c0() GS:88803ec0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 1ff7 CR3: 3c804002 CR4: 003606f0
> DR0: 40209100 DR1: 402091a1 DR2: 
> DR3:  DR6: 0ff1 DR7: 000b062a
> Call Trace:
>  <#DB>
>  fixup_exception+0x50/0x6a
>  do_general_protection+0x40/0x160
>  general_protection+0x2d/0x40
> RIP: 0010:arch_stack_walk_user+0x71/0x100
> Code: 00 48 83 e8 10 49 39 c4 77 45 4c 8b 04 24 4c 89 e3 4d 89 fd 4c 89 fd
> 41 83 87 98 0a 00 00 01 0f 01 cb 0f ae e8 31 c0 4c 89 e2 <4c> 8b 33 4d 89 f4
> 85 c0 75 7a 48 8b 73 08 0f 01 ca 85 c0 74 1f 65
> RSP: :fe00fd68 EFLAGS: 00050046
> RAX:  RBX: 854163717acc2789 RCX: 811ca27b
> RDX: 854163717acc2789 RSI: 40209102 RDI: fe00fdb8
> RBP: 88803d55d040 R08: c9000520bf58 R09: 
> R10:  R11:  R12: 854163717acc2789
> R13: 88803d55d040 R14: 0093 R15: 88803d55d040
>  ? stack_trace_consume_entry+0x4b/0x80
>  ? arch_stack_walk_user+0x34/0x100
>  ? profile_setup.cold+0xc1/0xc1
>  stack_trace_save_user+0x71/0x9c
>  trace_buffer_unlock_commit_regs+0x1ae/0x270
>  trace_event_buffer_commit+0x90/0x240
>  trace_event_raw_event_preemptirq_template+0x9a/0x100
>  ? debug+0x16/0x70
>  ? perf_trace_preemptirq_template+0x120/0x120
>  ? trace_hardirqs_off_thunk+0x1a/0x1c
>  trace_hardirqs_off_caller+0xf4/0x150
>  trace_hardirqs_off_thunk+0x1a/0x1c
>  ? debug+0x11/0x70
>  debug+0x16/0x70
> RIP: 0010:copy_user_generic_unrolled+0xa0/0xc0
> Code: 7f 40 ff c9 75 b6 89 d1 83 e2 07 c1 e9 03 74 12 4c 8b 06 4c 89 07 48
> 8d 76 08 48 8d 7f 08 ff c9 75 ee 21 d2 74 10 89 d1 8a 06 <88> 07 48 ff c6 48
> ff c7 ff c9 75 f2 31 c0 0f 01 ca c3 0f 1f 40 00
> RSP: :c9000520be38 EFLAGS: 00040202
> RAX: 88803d55d09c RBX: 88803d55d040 RCX: 0001
> RDX: 0001 RSI: 40209102 RDI: c9000520be76
> RBP: 0001 R08: 0001 R09: 
> R10:  R11:  R12: 7000
> R13: 40209102 R14: c9000520be76 R15: 
>  
>  __probe_kernel_read+0x57/0x90
>  is_prefetch.isra.0+0xb5/0x210
>  ? tracer_hardirqs_on+0x53/0x1a0
>  __bad_area_nosemaphore+0x9e/0x220
>  __do_page_fault+0x483/0x630
>  ? async_page_fault+0x8/0x40
>  async_page_fault+0x36/0x40
> RIP: 0033:0x40209102
> Code: 00 00 49 bc 00 20 23 40 00 00 00 00 49 bd 00 00 d0 40 00 00 00 00 49
> be ff ff ff ff ff ff ff ff 49 bf 00 50 80 40 00 00 00 00 <9c> 48 81 0c 24 00
> 04 00 00 48 81 0c 24 00 00 04 00 9d ff 2c 25 00
> RSP: 002b:1fff EFLAGS: 00010217
> RAX:  RBX: 402090b0 RCX: 0001
> RDX: 0001 RSI:  RDI: 41ebb000
> RBP: 854163717acc2789 R08: 0001 R09: b1f39cc399a61ebb
> R10: 7ffeab175000 R11: 0360 R12: 40232000
> R13: 40d0 R14:  R15: 40805000
> ---[ end trace e5e49800ff5aa5ed ]---


  https://lkml.kernel.org/r/57754f11-2c65-a2c8-2f6d-bfab0d2f8...@etsukata.com

Does something like the below help?

diff --git a/kernel/stacktrace.c b/kernel/stacktrace.c
index c8d0f05721a1..80ad4ccb7025 100644
--- a/kernel/stacktrace.c
+++ b/kernel/stacktrace.c
@@ -226,12 +226,16 @@ unsigned int stack_trace_save_user(unsigned long *store, 
unsigned int size)
.store  = store,
.size   = size,
};
+   mm_segment_t fs;
 
/* Trace user stack if not a kernel thread */
if (current->flags & PF_KTHREAD)
return 0;
 
+   fs = get_fs();
+   set_fs(USER_DS);
arch_stack_walk_user(consume_entry, , task_pt_regs(current));
+   set_fs(fs);
return c.len;
 }
 #endif


Re: [PATCH v3 0/6] Tracing vs CR2

2019-07-17 Thread Vegard Nossum



On 7/17/19 3:02 AM, Andy Lutomirski wrote:

On Tue, Jul 16, 2019 at 2:53 PM Vegard Nossum  wrote:



On 7/16/19 9:33 PM, Vegard Nossum wrote:


On 7/11/19 1:40 PM, Peter Zijlstra wrote:

Hi,

Here's the latest (and hopefully final) set of tracing vs CR2 patches.

They are basically the same as v2, with only minor edits and tags
collected
from the last review.

Please consider.



Hi,

I ran my own battery of tests on your patch set on top of
5ad18b2e60b75c7297a998dea702451d33a052ed and ran into this:



On a different thread, Peter and I decided that the last patch in this
series (the one that removes the _DEBUG stuff) is wrong.  Can you see
if these are reproducible with that patch removed?


Yes, without the last patch I still get this:

Run /init as init process
init[711]: segfault at 4000 ip 400a sp 4ff8 
error 7

[ cut here ]
General protection fault in user access. Non-canonical address?
WARNING: CPU: 0 PID: 711 at arch/x86/mm/extable.c:126 
ex_handler_uaccess+0x5d/0x70

CPU: 0 PID: 711 Comm: init Not tainted 5.2.0+ #125
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Ubuntu-1.8.2-1ubuntu1 04/01/2014
init[716]: segfault at 4000 ip 400a sp 4ff8 
error 7

RIP: 0010:ex_handler_uaccess+0x5d/0x70
Code: 5d 41 5c c3 e8 c4 8e 0e 00 80 3d e5 74 1e 01 00 75 d3 e8 b6 8e 0e 
00 48 c7 c7 10 a7 fb 81 c6 05 d0 74 1e 01 01 e8 d1 43 01 00 <0f> 0b eb 
b7 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89

RSP: :c965fa18 EFLAGS: 00010086
RAX:  RBX: 81c07dac RCX: 811a887c
init[714]: segfault at 4000 ip 400a sp 4ff8 
error 7

RDX:  RSI: 8289f05f RDI: 0093
RBP: c965fa88 R08: 2e80b265 R09: 003f
init[718]: segfault at 4000 ip 400a sp 4ff8 
error 7

R10:  R11:  R12: 000d
R13: 000d R14:  R15: 
FS:  006ce880() GS:88803ec0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 3fe0 CR3: 3d2f6004 CR4: 003606f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
Code: Bad RIP value.
 fixup_exception+0x50/0x6a
 do_general_protection+0x40/0x160
 general_protection+0x2d/0x40
RIP: 0010:arch_stack_walk_user+0x71/0x100
Code: 00 48 83 e8 10 49 39 c4 77 45 4c 8b 04 24 4c 89 e3 4d 89 fd 4c 89 
fd 41 83 87 98 0a 00 00 01 0f 01 cb 0f ae e8 31 c0 4c 89 e2 <4c> 8b 33 
4d 89 f4 85 c0 75 7a 48 8b 73 08 0f 01 ca 85 c0 74 1f 65

[...]

This is my reproducer (as init):

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

struct child_data {
  (*code)();
};

child_fn(void *arg)
{
  child_data *data = arg;
  mprotect(data->code, PAGE_SIZE, PROT_EXEC);
  data->code();
}

int main()
{
  mkdir("/sys", 7);
  mount("nodev", "/sys", "sysfs", 0, "");
  mount("nodev", "/sys/kernel/tracing", "tracefs", 0, "");

  int tracing_options_userstacktrace = 
open("/sys/kernel/tracing/options/userstacktrace", O_RDWR);

  write(tracing_options_userstacktrace, "1\n", 2);

  int tracing_events_preemptirq_irq_disable = 
open("/sys/kernel/tracing/events/preemptirq/irq_disable/enable", O_RDWR);

  write(tracing_events_preemptirq_irq_disable, "1\n", 2);

  void *code = mmap(0, PAGE_SIZE, PROT_WRITE, MAP_PRIVATE | 
MAP_ANONYMOUS | MAP_32BIT, 1, 0);

  {
unsigned char *output = code;

*output++ = 72;
*output++ = 189;
for (int i = 0; i < 8; ++i)
  *output++ = i;
  }

  void *child_stack = mmap(0, PAGE_SIZE, PROT_WRITE, MAP_PRIVATE | 
MAP_ANONYMOUS | MAP_32BIT, 1, 0);


  while (1) {
child_data data = { code };
clone(child_fn, child_stack, SIGCHLD, );
  }
}

Compiled with -static and booted with "norandmaps" (for some reason that
makes a difference), this is 100% reproducible for me, although the
reproducer is somewhat sensitive to small changes that I don't quite
understand.


Vegard


Re: [PATCH v3 0/6] Tracing vs CR2

2019-07-17 Thread Peter Zijlstra
On Tue, Jul 16, 2019 at 06:02:33PM -0700, Andy Lutomirski wrote:

> On a different thread, Peter and I decided that the last patch in this
> series (the one that removes the _DEBUG stuff) is wrong.  Can you see
> if these are reproducible with that patch removed?

Wrong is maybe the wrong word :-), premature maybe, we definitely want to
get there, but the #DB crud needs a wee bit of work first.


Re: [PATCH v3 0/6] Tracing vs CR2

2019-07-16 Thread Andy Lutomirski
On Tue, Jul 16, 2019 at 2:53 PM Vegard Nossum  wrote:
>
>
> On 7/16/19 9:33 PM, Vegard Nossum wrote:
> >
> > On 7/11/19 1:40 PM, Peter Zijlstra wrote:
> >> Hi,
> >>
> >> Here's the latest (and hopefully final) set of tracing vs CR2 patches.
> >>
> >> They are basically the same as v2, with only minor edits and tags
> >> collected
> >> from the last review.
> >>
> >> Please consider.
> >>
> >
> > Hi,
> >
> > I ran my own battery of tests on your patch set on top of
> > 5ad18b2e60b75c7297a998dea702451d33a052ed and ran into this:
> >

On a different thread, Peter and I decided that the last patch in this
series (the one that removes the _DEBUG stuff) is wrong.  Can you see
if these are reproducible with that patch removed?

--Andy


Re: [PATCH v3 0/6] Tracing vs CR2

2019-07-16 Thread Vegard Nossum



On 7/16/19 9:33 PM, Vegard Nossum wrote:


On 7/11/19 1:40 PM, Peter Zijlstra wrote:

Hi,

Here's the latest (and hopefully final) set of tracing vs CR2 patches.

They are basically the same as v2, with only minor edits and tags 
collected

from the last review.

Please consider.



Hi,

I ran my own battery of tests on your patch set on top of 
5ad18b2e60b75c7297a998dea702451d33a052ed and ran into this:


[ cut here ]
General protection fault in user access. Non-canonical address?
WARNING: CPU: 0 PID: 5039 at arch/x86/mm/extable.c:126 
ex_handler_uaccess+0x5d/0x70


Got a different one:

WARNING: CPU: 0 PID: 2150 at arch/x86/kernel/traps.c:791 do_debug+0xfe/0x240
CPU: 0 PID: 2150 Comm: init Not tainted 5.2.0+ #124
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Ubuntu-1.8.2-1ubuntu1 04/01/2014

RIP: 0010:do_debug+0xfe/0x240
Code: 05 07 3d f3 7e f6 85 91 00 00 00 02 0f 85 d8 00 00 00 49 8b 84 24 
18 0b 00 00 f6 44 24 01 40 74 2f f6 85 88 00 00 00 03 75 26 <0f> 0b 80 
e4 bf 49 89 84 24 18 0b 00 00 f0 41 80 0c 24 10 48 81 a5

RSP: :fe00ff20 EFLAGS: 00010046
RAX: 4002 RBX:  RCX: 810e2f72
RDX:  RSI: 0003 RDI: 8201f090
RBP: fe00ff58 R08:  R09: 0005
R10:  R11:  R12: 88803e0df040
R13:  R14: 3d376001 R15: 
FS:  56dbc8c0() GS:88803ec0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 41f38010 CR3: 3d376001 CR4: 003606f0
DR0: 0001 DR1: 41a4f070 DR2: 7fff959ff000
DR3:  DR6: fffe0ff0 DR7: 03b3062a
Call Trace:
 <#DB>
 debug+0x2d/0x70
RIP: 0010:arch_stack_walk_user+0x74/0x100
Code: e8 10 49 39 c4 77 45 4c 8b 04 24 4c 89 e3 4d 89 fd 4c 89 fd 41 83 
87 98 0a 00 00 01 0f 01 cb 0f ae e8 31 c0 4c 89 e2 4c 8b 33 <4d> 89 f4 
85 c0 75 7a 48 8b 73 08 0f 01 ca 85 c0 74 1f 65 48 8b 04

RSP: :c900030dbd68 EFLAGS: 00040046
RAX:  RBX: 41a4f073 RCX: 811ca27b
RDX: 41a4f073 RSI: 41a4f0dd RDI: c900030dbdb8
RBP: 88803e0df040 R08: c900030dbf58 R09: 
R10:  R11:  R12: 41a4f073
R13: 88803e0df040 R14: 0041281000bf4800 R15: 88803e0df040
 ? stack_trace_consume_entry+0x4b/0x80
 
 ? profile_setup.cold+0xc1/0xc1
 stack_trace_save_user+0x71/0x9c
 trace_buffer_unlock_commit_regs+0x1ae/0x270
 trace_event_buffer_commit+0x90/0x240
 trace_event_raw_event_preemptirq_template+0x9a/0x100
 ? debug+0x49/0x70
 ? perf_trace_preemptirq_template+0x120/0x120
 ? trace_hardirqs_off_thunk+0x1a/0x1c
 trace_hardirqs_off_caller+0xf4/0x150
 ? debug+0x44/0x70
 trace_hardirqs_off_thunk+0x1a/0x1c
 debug+0x49/0x70
RIP: 0033:0x41a4f0dd
Code: 47 11 b7 d2 36 45 6c 49 be 00 f0 9f 95 ff 7f 00 00 49 bf de a7 b3 
e8 d7 21 3c 15 9c 48 81 0c 24 00 01 00 00 9d b8 62 00 00 00 <8e> c0 0f 
05 66 8c c8 9c 48 81 24 24 ff fe ff ff 9d 48 89 04 25 40

RSP: 002b:40901ea0 EFLAGS: 0317
RAX: 0062 RBX: 41281000 RCX: 
RDX: 401c RSI: 41892000 RDI: 41281000
RBP: 41a4f073 R08: 0001 R09: 0001
R10: 917d7748 R11: 1000 R12: fdff
R13: 6c4536d2b71147a5 R14: 7fff959ff000 R15: 153c21d7e8b3a7de
---[ end trace 0cd51ba690f12b47 ]---

The warning is this:

if (WARN_ON_ONCE((dr6 & DR_STEP) && !user_mode(regs))) {
/*
 * Historical junk that used to handle SYSENTER 
single-stepping.
 * This should be unreachable now.  If we survive for a 
while
 * without anyone hitting this warning, we'll turn this 
into

 * an oops.
 */
tsk->thread.debugreg6 &= ~DR_STEP;
set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
regs->flags &= ~X86_EFLAGS_TF;
}

Unfortunately DR6 from the register dump has already been cleared at the
top of do_debug() and the local variable dr6 is on the stack and not
loaded into any of the registers AFAICT.

From the userspace Code: line you can clearly see it setting EFLAGS_TF,
then it seems to be trapping on the next instruction:

  1b:   9c  pushfq
  1c:   48 81 0c 24 00 01 00orq$0x100,(%rsp)
  23:   00
  24:   9d  popfq
  25:   b8 62 00 00 00  mov$0x62,%eax
  2a:*  8e c0   mov%eax,%es <-- trapping 
instruction


You can see that DR1 points to 41a4f070, which is close to userspace RBP
(41a4f073), which is perhaps being accessed by stack_trace_save_user()
and causing the debug exception on a data breakpoint?

The Code: line from stack_trace_save_user() is:

  27:   4c 8b 33mov(%rbx),%r14
  2a:*  4d 89 f4   

Re: [PATCH v3 0/6] Tracing vs CR2

2019-07-16 Thread Vegard Nossum



On 7/11/19 1:40 PM, Peter Zijlstra wrote:

Hi,

Here's the latest (and hopefully final) set of tracing vs CR2 patches.

They are basically the same as v2, with only minor edits and tags collected
from the last review.

Please consider.



Hi,

I ran my own battery of tests on your patch set on top of 
5ad18b2e60b75c7297a998dea702451d33a052ed and ran into this:


[ cut here ]
General protection fault in user access. Non-canonical address?
WARNING: CPU: 0 PID: 5039 at arch/x86/mm/extable.c:126 
ex_handler_uaccess+0x5d/0x70

CPU: 0 PID: 5039 Comm: init Not tainted 5.2.0+ #124
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Ubuntu-1.8.2-1ubuntu1 04/01/2014

RIP: 0010:ex_handler_uaccess+0x5d/0x70
Code: 5d 41 5c c3 e8 c4 8e 0e 00 80 3d e5 74 1e 01 00 75 d3 e8 b6 8e 0e 
00 48 c7 c7 10 a7 fb 81 c6 05 d0 74 1e 01 01 e8 d1 43 01 00 <0f> 0b eb 
b7 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89

RSP: :fe00fc48 EFLAGS: 00010086
RAX:  RBX: 81c07dac RCX: 811a887c
RDX:  RSI: 8289f05f RDI: 0093
RBP: fe00fcb8 R08: 0036fe0f15d3 R09: 003f
R10:  R11:  R12: 000d
R13: 000d R14:  R15: 
FS:  563ab8c0() GS:88803ec0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 1ff7 CR3: 3c804002 CR4: 003606f0
DR0: 40209100 DR1: 402091a1 DR2: 
DR3:  DR6: 0ff1 DR7: 000b062a
Call Trace:
 <#DB>
 fixup_exception+0x50/0x6a
 do_general_protection+0x40/0x160
 general_protection+0x2d/0x40
RIP: 0010:arch_stack_walk_user+0x71/0x100
Code: 00 48 83 e8 10 49 39 c4 77 45 4c 8b 04 24 4c 89 e3 4d 89 fd 4c 89 
fd 41 83 87 98 0a 00 00 01 0f 01 cb 0f ae e8 31 c0 4c 89 e2 <4c> 8b 33 
4d 89 f4 85 c0 75 7a 48 8b 73 08 0f 01 ca 85 c0 74 1f 65

RSP: :fe00fd68 EFLAGS: 00050046
RAX:  RBX: 854163717acc2789 RCX: 811ca27b
RDX: 854163717acc2789 RSI: 40209102 RDI: fe00fdb8
RBP: 88803d55d040 R08: c9000520bf58 R09: 
R10:  R11:  R12: 854163717acc2789
R13: 88803d55d040 R14: 0093 R15: 88803d55d040
 ? stack_trace_consume_entry+0x4b/0x80
 ? arch_stack_walk_user+0x34/0x100
 ? profile_setup.cold+0xc1/0xc1
 stack_trace_save_user+0x71/0x9c
 trace_buffer_unlock_commit_regs+0x1ae/0x270
 trace_event_buffer_commit+0x90/0x240
 trace_event_raw_event_preemptirq_template+0x9a/0x100
 ? debug+0x16/0x70
 ? perf_trace_preemptirq_template+0x120/0x120
 ? trace_hardirqs_off_thunk+0x1a/0x1c
 trace_hardirqs_off_caller+0xf4/0x150
 trace_hardirqs_off_thunk+0x1a/0x1c
 ? debug+0x11/0x70
 debug+0x16/0x70
RIP: 0010:copy_user_generic_unrolled+0xa0/0xc0
Code: 7f 40 ff c9 75 b6 89 d1 83 e2 07 c1 e9 03 74 12 4c 8b 06 4c 89 07 
48 8d 76 08 48 8d 7f 08 ff c9 75 ee 21 d2 74 10 89 d1 8a 06 <88> 07 48 
ff c6 48 ff c7 ff c9 75 f2 31 c0 0f 01 ca c3 0f 1f 40 00

RSP: :c9000520be38 EFLAGS: 00040202
RAX: 88803d55d09c RBX: 88803d55d040 RCX: 0001
RDX: 0001 RSI: 40209102 RDI: c9000520be76
RBP: 0001 R08: 0001 R09: 
R10:  R11:  R12: 7000
R13: 40209102 R14: c9000520be76 R15: 
 
 __probe_kernel_read+0x57/0x90
 is_prefetch.isra.0+0xb5/0x210
 ? tracer_hardirqs_on+0x53/0x1a0
 __bad_area_nosemaphore+0x9e/0x220
 __do_page_fault+0x483/0x630
 ? async_page_fault+0x8/0x40
 async_page_fault+0x36/0x40
RIP: 0033:0x40209102
Code: 00 00 49 bc 00 20 23 40 00 00 00 00 49 bd 00 00 d0 40 00 00 00 00 
49 be ff ff ff ff ff ff ff ff 49 bf 00 50 80 40 00 00 00 00 <9c> 48 81 
0c 24 00 04 00 00 48 81 0c 24 00 00 04 00 9d ff 2c 25 00

RSP: 002b:1fff EFLAGS: 00010217
RAX:  RBX: 402090b0 RCX: 0001
RDX: 0001 RSI:  RDI: 41ebb000
RBP: 854163717acc2789 R08: 0001 R09: b1f39cc399a61ebb
R10: 7ffeab175000 R11: 0360 R12: 40232000
R13: 40d0 R14:  R15: 40805000
---[ end trace e5e49800ff5aa5ed ]---
PANIC: double fault, error_code: 0x0
CPU: 0 PID: 5039 Comm: init Tainted: GW 5.2.0+ #124
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Ubuntu-1.8.2-1ubuntu1 04/01/2014

RIP: 0010:__sanitizer_cov_trace_pc+0x0/0x50
Code: 82 e8 74 2d f8 ff 48 89 9d 10 01 00 00 48 89 ee 5b 4c 89 e7 5d 41 
5c e9 8e 5d 12 00 5b b8 f4 ff ff ff 5d 41 5c c3 0f 1f 40 00 <65> 48 8b 
04 25 c0 6c 01 00 65 8b 15 78 ba df 7e 81 e2 00 01 1f 00

RSP: :fe00f008 EFLAGS: 00010093
RAX: 00016cc0 RBX: 81a01436 RCX: 81a00b97
RDX: 00016cc0 RSI: 81a01428 RDI: 81a01436
RBP: fe00f088 R08: 

[PATCH v3 0/6] Tracing vs CR2

2019-07-11 Thread Peter Zijlstra
Hi,

Here's the latest (and hopefully final) set of tracing vs CR2 patches.

They are basically the same as v2, with only minor edits and tags collected
from the last review.

Please consider.