On 01/04/2018 09:56 AM, Tim Chen wrote: > If NMI runs when exiting kernel between IBRS_DISABLE and > SWAPGS, the NMI would have turned on IBRS bit 0 and then it would have > left enabled when exiting the NMI. IBRS bit 0 would then be left > enabled in userland until the next enter kernel. > > That is a minor inefficiency only, but we can eliminate it by saving > the MSR when entering the NMI in save_paranoid and restoring it when > exiting the NMI.
Can I suggest and alternate description for the NMI case? This is long-winded, but it should keep me from having to think through it yet again. :) " The normal interrupt code uses the 'error_entry' path which uses the Code Segment (CS) of the instruction that was interrupted to tell whether it interrupted the kernel or userspace and thus has to switch IBRS, or leave it alone. The NMI code is different. It uses 'paranoid_entry' because it can interrupt the kernel while it is running with a userspace IBRS (and %GS and CR3) value, but has a kernel CS. If we used the same approach as the normal interrupt code, we might do the following; SYSENTER_entry <-------------- NMI HERE IBRS=1 do_something() IBRS=0 SYSRET The NMI code might notice that we are running in the kernel and decide that it is OK to skip the IBRS=1. This would leave it running unprotected with IBRS=0, which is bad. However, if we unconditionally set IBRS=1, in the NMI, we might get the following case: SYSENTER_entry IBRS=1 do_something() IBRS=0 <-------------- NMI HERE (set IBRS=1) SYSRET and we would return to userspace with IBRS=1. Userspace would run slowly until we entered and exited the kernel again. (This is the case Tim is alluding to in the patch description). Instead of those two approaches, we chose a third one where we simply save the IBRS value in a scratch register (%r13) and then restore that value, verbatim. This is what PTI does with CR3 and it works beautifully. "