On Sun, Jan 24, 2021 at 5:13 PM Lai Jiangshan <[email protected]> wrote:
>
> From: Lai Jiangshan <[email protected]>
>
> The commit 929bacec21478("x86/entry/64: De-Xen-ify our NMI code") simplified
> the NMI code by changing paravirt code into native code and left a comment
> about "inspecting RIP instead".  But until now, "inspecting RIP instead"
> has not been made happened and this patch tries to complete it.
>
> Signed-off-by: Lai Jiangshan <[email protected]>
> ---
>  arch/x86/entry/entry_64.S | 46 +++++++++++----------------------------
>  1 file changed, 13 insertions(+), 33 deletions(-)
>
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index cad08703c4ad..cb6b8a6c6652 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -1268,32 +1268,12 @@ SYM_CODE_START(asm_exc_nmi)
>         je      nested_nmi
>
>         /*
> -        * Now test if the previous stack was an NMI stack.  This covers
> -        * the case where we interrupt an outer NMI after it clears
> -        * "NMI executing" but before IRET.  We need to be careful, though:
> -        * there is one case in which RSP could point to the NMI stack
> -        * despite there being no NMI active: naughty userspace controls
> -        * RSP at the very beginning of the SYSCALL targets.  We can
> -        * pull a fast one on naughty userspace, though: we program
> -        * SYSCALL to mask DF, so userspace cannot cause DF to be set
> -        * if it controls the kernel's RSP.  We set DF before we clear
> -        * "NMI executing".
> +        * Now test if we interrupt an outer NMI after it clears
> +        * "NMI executing" but before iret.

s/interrupt/interrupted

But let's make it a lot more clear:


Now test if we interrupted an outer NMI that just cleared "NMI
executing" and is about to IRET.  This is a single-instruction window.
This check does not handle the case in which we get a nested interrupt
(#MC, #VE, #VC, etc.) after clearing "NMI executing" but before the
outer NMI executes IRET.

> +       movq    $nmi_executing_cleared, %rdx
> +       cmpq    8(%rsp), %rdx
> +       jne     first_nmi

If we're okay with non-PIC code, then this is suboptimal -- you can
just compare directly.  But using PIC is polite, so that movq should
be a RIP-relative leaq.

>
>         /* This is a nested NMI. */
>
> @@ -1438,16 +1418,16 @@ nmi_restore:
>         addq    $6*8, %rsp
>
>         /*
> -        * Clear "NMI executing".  Set DF first so that we can easily
> -        * distinguish the remaining code between here and IRET from
> -        * the SYSCALL entry and exit paths.
> -        *
> -        * We arguably should just inspect RIP instead, but I (Andy) wrote
> -        * this code when I had the misapprehension that Xen PV supported
> -        * NMIs, and Xen PV would break that approach.
> +        * Clear "NMI executing".  It also leaves a window after it before
> +        * iret which should be also considered to be "NMI executing" albeit
> +        * with "NMI executing" variable being zero.  So we should also check
> +        * the RIP after it when checking "NMI executing".  See the code
> +        * before nested_nmi.  No code is allowed to be added to between
> +        * clearing "NMI executing" and iret unless we check a larger window
> +        * with a range of RIPs instead of currently a single-RIP window.

Let's simplify this comment:

Clear "NMI executing".  This leaves a window in which a nested NMI
could observe "NMI executing" cleared, and a nested NMI will detect
this by inspecting RIP.

>          */
> -       std
>         movq    $0, 5*8(%rsp)           /* clear "NMI executing" */
> +nmi_executing_cleared:
>

This should be local.  Let's call it .Lnmi_iret.  And add a comment:

.Lnmi_iret: /* must be immediately after clearing "NMI executing" */

>         /*
>          * iretq reads the "iret" frame and exits the NMI stack in a

Reply via email to