El Thu, Jul 13, 2017 at 01:00:01PM -0500 Josh Poimboeuf ha dit:

> On Wed, Jul 12, 2017 at 04:22:13PM -0700, Matthias Kaehlcke wrote:
> > El Wed, Jul 12, 2017 at 05:36:30PM -0500 Josh Poimboeuf ha dit:
> > 
> > > On Wed, Jul 12, 2017 at 05:35:47PM -0500, Josh Poimboeuf wrote:
> > > > On Wed, Jul 12, 2017 at 03:20:40PM -0700, Matthias Kaehlcke wrote:
> > > > > > This is admittedly an awkward way of achieving this goal, but it's 
> > > > > > the
> > > > > > only way I know how to do it with GCC.
> > > > > > 
> > > > > > What extra instruction does clang add?
> > > > > 
> > > > > I was looking at the get_user() call in drm_mode_setcrtc(). The code
> > > > > generated by clang without the patch is:
> > > > > 
> > > > >                         if (get_user(out_id, &set_connectors_ptr[i])) 
> > > > > {
> > > > > ffffffff81386955:       4a 8d 04 bd 00 00 00    lea    
> > > > > 0x0(,%r15,4),%rax
> > > > > ffffffff8138695c:       00 
> > > > > ffffffff8138695d:       49 03 06                add    (%r14),%rax
> > > > > ffffffff81386960:       e8 2b a5 f0 ff          callq  
> > > > > ffffffff81290e90 <__get_user_4>
> > > > > 
> > > > > And with the patch:
> > > > > 
> > > > >                         if (get_user(out_id, &set_connectors_ptr[i])) 
> > > > > {
> > > > > ffffffff81386a56:       4a 8d 04 bd 00 00 00    lea    
> > > > > 0x0(,%r15,4),%rax
> > > > > ffffffff81386a5d:       00 
> > > > > ffffffff81386a5e:       49 03 06                add    (%r14),%rax
> > > > > ffffffff81386a61:       48 8b 64 24 28          mov    0x28(%rsp),%rsp
> > > > > ffffffff81386a66:       e8 15 a5 f0 ff          callq
> > > > > ffffffff81290f80 <__get_user_4>
> > > > 
> > > > Hm, that seems odd.  Can you sure the disassembly for the whole
> > > > function?
> > > 
> > > Er, share :-)
> > 
> > Sure, please find below the disassemblies with and without the
> > patch. The exact extra instruction differs from the one above, the
> > disassembly above is from a debug session with some 'random' kernel
> > version (bisect), the ones below from a v4.12ish kernel. At the bottom
> > you also find a log of a double faults observed with the patch.
> > 
> > If you are interested in building the kernel with clang yourself I can
> > provide instructions, it is fairly painless nowadays as long as you
> > have a recent version of clang (a somewhat older version should also
> > do for this issue with some extra kernel patches).
> 
> Here's the reason for the double fault.  First it puts zero on the stack
> at offset -0x58:
> 
> > ffffffff81367616:   31 c0                   xor    %eax,%eax
> > ffffffff81367618:   48 89 45 c8             mov    %rax,-0x38(%rbp)
> > ffffffff8136761c:   45 31 ff                xor    %r15d,%r15d
> > ffffffff8136761f:   48 89 45 a8             mov    %rax,-0x58(%rbp)
> 
> Then, later, it copies that zeroed word from the stack to RSP:
> 
> > ffffffff81367874:   48 8b 65 a8             mov    -0x58(%rbp),%rsp
> 
> Then it double faults because the call instruction tries to write RIP on
> the stack, but RSP is zero:
> 
> > ffffffff81367878:   e8 73 26 f1 ff          callq  ffffffff81279ef0 
> > <__get_user_4>
> 
> Then clang tries to put RSP's value on the stack, at the same stack slot
> where the original zero was stored (though it never reaches this point):
> 
> > ffffffff8136787d:   49 89 d4                mov    %rdx,%r12
> > ffffffff81367880:   48 89 65 a8             mov    %rsp,-0x58(%rbp)
> 
> The panic is consistent with the above.  RIP points to the call
> instruction, RSP is zero:
> 
> > [    3.798722] PANIC: double fault, error_code: 0x0
> > [    3.807387] CPU: 1 PID: 605 Comm: frecon Not tainted 
> > 4.12.0-00023-g711d82c128ff #107
> > [    3.816040] Hardware name: GOOGLE Squawks, BIOS 
> > Google_Squawks.5216.152.76 03/04/2016
> > [    3.824792] task: ffff880075b92f00 task.stack: ffffc90000d6c000
> > [    3.829599] EXT4-fs (mmcblk0p1): re-mounted. Opts: 
> > commit=600,data=ordered
> > [    3.839092] RIP: 0010:drm_mode_setcrtc+0x328/0x51f
> > [    3.844443] RSP: 0018:0000000000000000 EFLAGS: 00010206
> > [    3.850280] RAX: 0000559e707c4d60 RBX: 0000000000000000 RCX: 
> > 0000000000000008
> > [    3.858253] RDX: 0000000000000001 RSI: ffffc90000d6fcc8 RDI: 
> > ffffffff81367805
> > [    3.866225] RBP: ffffc90000d6fd90 R08: 00000000014000c0 R09: 
> > 0000000000000308
> > [    3.874199] R10: 0000000000000300 R11: 0000000000000556 R12: 
> > 0000000000000000
> > [    3.882163] R13: ffff880077a25000 R14: ffffc90000d6fdd0 R15: 
> > 0000000000000000
> > [    3.890136] FS:  00007fb2dbd62740(0000) GS:ffff88007ad00000(0000) 
> > knlGS:0000000000000000
> > [    3.899177] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    3.905595] CR2: fffffffffffffff8 CR3: 000000007596b000 CR4: 
> > 00000000001006e0
> > [    3.913568] Call Trace:
> > [    3.916296] Code: b8 48 8d b5 38 ff ff ff 41 8b 56 08 39 d3 0f 83 85 00 
> > 00 00 4c 63 fb 4a 83 24 f8 00 4a 8d 04 bd 00 00 00 00 49 03 06 48 8b 65 a8 
> > <e8> 73 26 f1 ff 49 89 d4 48 89 65 a8 85 c0 0f 85 a0 00 00 00 ba 
> > [    3.937440] Kernel panic - not syncing: Machine halted.
> > [    3.943268] CPU: 1 PID: 605 Comm: frecon Not tainted 
> > 4.12.0-00023-g711d82c128ff #107
> > [    3.951921] Hardware name: GOOGLE Squawks, BIOS 
> > Google_Squawks.5216.152.76 03/04/2016
> > [    3.960671] Call Trace:
> > [    3.963398]  <#DF>
> > [    3.965637]  __dump_stack+0x19/0x1b
> > [    3.969531]  dump_stack+0x42/0x60
> 
> clang is obviously getting confused by the RSP output constraint.  I
> think it tries to take the constraint literally, since it takes RSP as
> an output from the inline asm and stores it on the stack.  However, that
> behavior doesn't really make sense for a "register" variable.  It also
> doesn't explain why it's zeroing the register out first.

Thanks for your analysis!

> What happens if you try the below patch instead of the revert?  Any
> chance the offending instruction goes away?
> 
> diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
> index 11433f9..beac907 100644
> --- a/arch/x86/include/asm/uaccess.h
> +++ b/arch/x86/include/asm/uaccess.h
> @@ -171,7 +171,7 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 
> 0ULL, 0UL))
>       might_fault();                                                  \
>       asm volatile("call __get_user_%P4"                              \
>                    : "=a" (__ret_gu), "=r" (__val_gu), "+r" (__sp)    \
> -                  : "0" (ptr), "i" (sizeof(*(ptr))));                \
> +                  : "0" (ptr), "i" (sizeof(*(ptr))), "r" (__sp));    \
>       (x) = (__force __typeof__(*(ptr))) __val_gu;                    \
>       __builtin_expect(__ret_gu, 0);                                  \
>  })

The generated code is basically the same, only that now the value from
the stack is stored in a register and written twice to RSP:

ffffffff813676ba:       31 c0                   xor    %eax,%eax
ffffffff813676bc:       48 89 45 c8             mov    %rax,-0x38(%rbp)
ffffffff813676c0:       45 31 ff                xor    %r15d,%r15d
ffffffff813676c3:       48 89 45 a8             mov    %rax,-0x58(%rbp)
...
ffffffff81367918:       48 8b 4d a8             mov    -0x58(%rbp),%rcx
ffffffff8136791c:       48 89 cc                mov    %rcx,%rsp
ffffffff8136791f:       48 89 cc                mov    %rcx,%rsp
ffffffff81367922:       e8 69 26 f1 ff          callq  ffffffff81279f90 
<__get_user_4>

Reply via email to