On Thu, Sep 7, 2017 at 2:34 AM, Juergen Gross <jgr...@suse.com> wrote: > On 06/09/17 23:36, Andy Lutomirski wrote: >> Xen PV is fundamentally incompatible with our fancy NMI code: it >> doesn't use IST at all, and Xen entries clobber two stack slots >> below the hardware frame. >> >> Drop Xen PV support from our NMI code entirely. >> >> XXX: Juergen: could you write and test the tiny patch needed to >> make Xen PV have a xen_nmi entry that handles NMIs? I don't know >> how to test it. > > You mean something like the attached one?
Yes. Mind if I add it to my series? > > Seems to work at least for the "normal" case of a NMI coming in at > a random point in time. > > Regarding testing: in case you have a Xen setup you can easily send > a NMI to a domain from dom0: > > xl trigger <domain> nmi Thanks! > > > Juergen > >> >> Cc: Juergen Gross <jgr...@suse.com> >> Cc: Boris Ostrovsky <boris.ostrov...@oracle.com> >> Signed-off-by: Andy Lutomirski <l...@kernel.org> >> --- >> arch/x86/entry/entry_64.S | 41 +++++++++++++++++------------------------ >> 1 file changed, 17 insertions(+), 24 deletions(-) >> >> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S >> index a9e318f7cc9b..c81e05fb999e 100644 >> --- a/arch/x86/entry/entry_64.S >> +++ b/arch/x86/entry/entry_64.S >> @@ -1171,21 +1171,12 @@ ENTRY(error_exit) >> jmp retint_user >> END(error_exit) >> >> -/* Runs on exception stack */ >> +/* >> + * Runs on exception stack. Xen PV does not go through this path at all, >> + * so we can use real assembly here. >> + */ >> ENTRY(nmi) >> /* >> - * Fix up the exception frame if we're on Xen. >> - * PARAVIRT_ADJUST_EXCEPTION_FRAME is guaranteed to push at most >> - * one value to the stack on native, so it may clobber the rdx >> - * scratch slot, but it won't clobber any of the important >> - * slots past it. >> - * >> - * Xen is a different story, because the Xen frame itself overlaps >> - * the "NMI executing" variable. >> - */ >> - PARAVIRT_ADJUST_EXCEPTION_FRAME >> - >> - /* >> * We allow breakpoints in NMIs. If a breakpoint occurs, then >> * the iretq it performs will take us out of NMI context. >> * This means that we can have nested NMIs where the next >> @@ -1240,7 +1231,7 @@ ENTRY(nmi) >> * stacks lest we corrupt the "NMI executing" variable. >> */ >> >> - SWAPGS_UNSAFE_STACK >> + swapgs >> cld >> movq %rsp, %rdx >> movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp >> @@ -1402,7 +1393,7 @@ nested_nmi_out: >> popq %rdx >> >> /* We are returning to kernel mode, so this cannot result in a fault. >> */ >> - INTERRUPT_RETURN >> + iretq >> >> first_nmi: >> /* Restore rdx. */ >> @@ -1432,7 +1423,7 @@ first_nmi: >> pushfq /* RFLAGS */ >> pushq $__KERNEL_CS /* CS */ >> pushq $1f /* RIP */ >> - INTERRUPT_RETURN /* continues at repeat_nmi below */ >> + iretq /* continues at repeat_nmi below */ >> 1: >> #endif >> >> @@ -1502,20 +1493,22 @@ nmi_restore: >> /* >> * Clear "NMI executing". Set DF first so that we can easily >> * distinguish the remaining code between here and IRET from >> - * the SYSCALL entry and exit paths. On a native kernel, we >> - * could just inspect RIP, but, on paravirt kernels, >> - * INTERRUPT_RETURN can translate into a jump into a >> - * hypercall page. >> + * the SYSCALL entry and exit paths. >> + * >> + * We arguably should just inspect RIP instead, but I (Andy) wrote >> + * this code when I had the misapprehension that Xen PV supported >> + * NMIs, and Xen PV would break that approach. >> */ >> std >> movq $0, 5*8(%rsp) /* clear "NMI executing" */ >> >> /* >> - * INTERRUPT_RETURN reads the "iret" frame and exits the NMI >> - * stack in a single instruction. We are returning to kernel >> - * mode, so this cannot result in a fault. >> + * iretq reads the "iret" frame and exits the NMI stack in a >> + * single instruction. We are returning to kernel mode, so this >> + * cannot result in a fault. Similarly, we don't need to worry >> + * about espfix64 on the way back to kernel mode. >> */ >> - INTERRUPT_RETURN >> + iretq >> END(nmi) >> >> ENTRY(ignore_sysret) >> >