> On Apr 5, 2019, at 06:50, Andy Lutomirski <[email protected]> wrote:
> 
> 
> 
>> On Apr 5, 2019, at 2:35 AM, Thomas Gleixner <[email protected]> wrote:
>> 
>>> On Mon, 25 Mar 2019, Thomas Gleixner wrote:
>>>> On Fri, 15 Mar 2019, Chang S. Bae wrote:
>>>> ENTRY(paranoid_exit)
>>>>   UNWIND_HINT_REGS
>>>>   DISABLE_INTERRUPTS(CLBR_ANY)
>>>>   TRACE_IRQS_OFF_DEBUG
>>>> +    ALTERNATIVE "jmp .Lparanoid_exit_no_fsgsbase",    "nop",\
>>>> +        X86_FEATURE_FSGSBASE
>>>> +    wrgsbase    %rbx
>>>> +    jmp    .Lparanoid_exit_no_swapgs;
>>> 
>>> Again. A few newlines would make it more readable.
>>> 
>>> This modifies the semantics of paranoid_entry and paranoid_exit. Looking at
>>> the usage sites there is the following code in the nmi maze:
>>> 
>>>   /*
>>>    * Use paranoid_entry to handle SWAPGS, but no need to use paranoid_exit
>>>    * as we should not be calling schedule in NMI context.
>>>    * Even with normal interrupts enabled. An NMI should not be
>>>    * setting NEED_RESCHED or anything that normal interrupts and
>>>    * exceptions might do.
>>>    */
>>>   call    paranoid_entry
>>>   UNWIND_HINT_REGS
>>> 
>>>   /* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
>>>   movq    %rsp, %rdi
>>>   movq    $-1, %rsi
>>>   call    do_nmi
>>> 
>>>   /* Always restore stashed CR3 value (see paranoid_entry) */
>>>   RESTORE_CR3 scratch_reg=%r15 save_reg=%r14
>>> 
>>>   testl    %ebx, %ebx            /* swapgs needed? */
>>>   jnz    nmi_restore
>>> nmi_swapgs:
>>>   SWAPGS_UNSAFE_STACK
>>> nmi_restore:
>>>   POP_REGS
>>> 
>>> I might be missing something, but how is that supposed to work when
>>> paranoid_entry uses FSGSBASE? I think it's broken, but if it's not then
>>> there is a big fat comment missing explaining why.
>> 
>> So this _is_ broken.
>> 
>>  On entry:
>> 
>>     rbx = rdgsbase()
>>     wrgsbase(KERNEL_GS)
>> 
>>  On exit:
>> 
>>     if (ebx == 0)
>>          swapgs
>> 
>> The resulting matrix:
>> 
>>  |  ENTRY GS    | RBX        | EXIT        | GS on IRET    | RESULT
>>  |        |        |        |        |
>> 1 |  KERNEL_GS    | KERNEL_GS    | EBX == 0    | USER_GS    | FAIL
>>  |        |        |        |        |
>> 2 |  KERNEL_GS    | KERNEL_GS    | EBX != 0    | KERNEL_GS    | ok
>>  |        |        |        |        |
>> 3 |  USER_GS    | USER_GS    | EBX == 0    | USER_GS    | ok
>>  |        |        |        |        |
>> 4 |  USER_GS    | USER_GS    | EBX != 0    | KERNEL_GS    | FAIL
>> 
>> 
>> #1 Just works by chance because it's unlikely that the lower 32bits of a
>>  per CPU kernel GS are all 0.
>> 
>>  But it's just a question of probability that this turns into a
>>  non-debuggable once per year crash (think KASLR).
>> 
>> #4 This can happen when the NMI hits the kernel in some other entry code
>>  _BEFORE_ or _AFTER_ swapgs.
>> 
>>  User space using GS addressing with GS[31:0] != 0 will crash and burn.
>> 
>> 
> 
> Hi all-
> 
> In a previous incarnation of these patches, I complained about the use of 
> SWAPGS in the paranoid path. Now I’m putting my maintainer foot down.  On a 
> non-FSGSBASE system, the paranoid path known, definitively, which GS is 
> where, so SWAPGS is annoying. With FSGSBASE, unless you start looking at the 
> RIP that you interrupted, you cannot know whether you have user or kernel 
> GSBASE live, since they can have literally the same value.  One of the 
> numerous versions of this patch compared the values and just said “well, it’s 
> harmless to SWAPGS if user code happens to use the same value as the kernel”. 
>  I complained that it was far too fragile.
> 
> So I’m putting my foot down. If you all want my ack, you’re going to save the 
> old GS, load the new one with WRGSBASE, and, on return, you’re going to 
> restore the old one with WRGSBASE. You will not use SWAPGS in the paranoid 
> path.
> 
> Obviously, for the non-paranoid path, it all keeps working exactly like it 
> does now.

Although I can see some other concerns with this, looks like it is still worth 
pursuing.

> 
> Furthermore, if you folks even want me to review this series, the ptrace 
> tests need to be in place.  On inspection of the current code (after the 
> debacle a few releases back), it appears the SETREGSET’s effect depends on 
> the current values in the registers — it does not actually seem to reliably 
> load the whole state. So my confidence will be greatly increased if your 
> series first adds a test that detects that bug (and fails!), then fixes the 
> bug in a tiny little patch, then adds FSGSBASE, and keeps the test working.
> 

I think I need to understand the issue. Appreciate if you can elaborate a 
little bit.

> —Andy

Reply via email to