Re: [PATCH 05/30] x86, kaiser: prepare assembly for entry/exit CR3 switching

2017-11-20 Thread Thomas Gleixner
On Fri, 10 Nov 2017, Dave Hansen wrote:
> From: Dave Hansen 
> 
> This is largely code from Andy Lutomirski.  I fixed a few bugs
> in it, and added a few SWITCH_TO_* spots.
> 
> KAISER needs to switch to a different CR3 value when it enters
> the kernel and switch back when it exits.  This essentially
> needs to be done before leaving assembly code.
> 
> This is extra challenging because the switching context is
> tricky: the registers that can be clobbered can vary.  It is also
> hard to store things on the stack because there is an established
> ABI (ptregs) or the stack is entirely unsafe to use.

Changelog nitpicking starts here

> This patch establishes a set of macros that allow changing to

s/This patch establishes/Establish/

> the user and kernel CR3 values.
> 
> Interactions with SWAPGS: previous versions of the KAISER code
> relied on having per-cpu scratch space to save/restore a register
> that can be used for the CR3 MOV.  The %GS register is used to
> index into our per-cpu space, so SWAPGS *had* to be done before

s/our/the/

> the CR3 switch.  That scratch space is gone now, but the semantic
> that SWAPGS must be done before the CR3 MOV is retained.  This is
> good to keep because it is not that hard to do and it allows us

s/us//

> to do things like add per-cpu debugging information to help us
> figure out what goes wrong sometimes.

the part after 'information' is fairy tale mode and redundant. Debugging
information says it all, right?

> What this does in the NMI code is worth pointing out.  NMIs
> can interrupt *any* context and they can also be nested with
> NMIs interrupting other NMIs.  The comments below
> ".Lnmi_from_kernel" explain the format of the stack during this
> situation.  Changing the format of this stack is not a fun
> exercise: I tried.  Instead of storing the old CR3 value on the
> stack, this patch depend on the *regular* register save/restore
> mechanism and then uses %r14 to keep CR3 during the NMI.  It is
> callee-saved and will not be clobbered by the C NMI handlers that
> get called.

  The comments below ".Lnmi_from_kernel" explain the format of the stack
  during this situation. Changing this stack format is too complex and
  risky, so the following solution has been used:

  Instead of storing the old CR3 value on the stack, depend on the regular
  register save/restore mechanism and use %r14 to hold CR3 during the
  NMI. r14 is callee-saved and will not be clobbered by the C NMI handlers
  that get called.

End of nitpicking

> +.macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req
> + movq%cr3, %r\scratch_reg
> + movq%r\scratch_reg, \save_reg
> + /*
> +  * Is the switch bit zero?  This means the address is
> +  * up in real KAISER patches in a moment.

 * If the switch bit is zero, CR3 points at the kernel page tables
 * already.
Hmm?

>  /*
> @@ -1189,6 +1201,7 @@ ENTRY(paranoid_exit)
>   testl   %ebx, %ebx  /* swapgs needed? */
>   jnz .Lparanoid_exit_no_swapgs
>   TRACE_IRQS_IRETQ
> + RESTORE_CR3 %r14

You have the named macro arguments everywhere, just not here.

Other than that.

Reviewed-by: Thomas Gleixner 


Re: [PATCH 05/30] x86, kaiser: prepare assembly for entry/exit CR3 switching

2017-11-20 Thread Thomas Gleixner
On Fri, 10 Nov 2017, Dave Hansen wrote:
> From: Dave Hansen 
> 
> This is largely code from Andy Lutomirski.  I fixed a few bugs
> in it, and added a few SWITCH_TO_* spots.
> 
> KAISER needs to switch to a different CR3 value when it enters
> the kernel and switch back when it exits.  This essentially
> needs to be done before leaving assembly code.
> 
> This is extra challenging because the switching context is
> tricky: the registers that can be clobbered can vary.  It is also
> hard to store things on the stack because there is an established
> ABI (ptregs) or the stack is entirely unsafe to use.

Changelog nitpicking starts here

> This patch establishes a set of macros that allow changing to

s/This patch establishes/Establish/

> the user and kernel CR3 values.
> 
> Interactions with SWAPGS: previous versions of the KAISER code
> relied on having per-cpu scratch space to save/restore a register
> that can be used for the CR3 MOV.  The %GS register is used to
> index into our per-cpu space, so SWAPGS *had* to be done before

s/our/the/

> the CR3 switch.  That scratch space is gone now, but the semantic
> that SWAPGS must be done before the CR3 MOV is retained.  This is
> good to keep because it is not that hard to do and it allows us

s/us//

> to do things like add per-cpu debugging information to help us
> figure out what goes wrong sometimes.

the part after 'information' is fairy tale mode and redundant. Debugging
information says it all, right?

> What this does in the NMI code is worth pointing out.  NMIs
> can interrupt *any* context and they can also be nested with
> NMIs interrupting other NMIs.  The comments below
> ".Lnmi_from_kernel" explain the format of the stack during this
> situation.  Changing the format of this stack is not a fun
> exercise: I tried.  Instead of storing the old CR3 value on the
> stack, this patch depend on the *regular* register save/restore
> mechanism and then uses %r14 to keep CR3 during the NMI.  It is
> callee-saved and will not be clobbered by the C NMI handlers that
> get called.

  The comments below ".Lnmi_from_kernel" explain the format of the stack
  during this situation. Changing this stack format is too complex and
  risky, so the following solution has been used:

  Instead of storing the old CR3 value on the stack, depend on the regular
  register save/restore mechanism and use %r14 to hold CR3 during the
  NMI. r14 is callee-saved and will not be clobbered by the C NMI handlers
  that get called.

End of nitpicking

> +.macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req
> + movq%cr3, %r\scratch_reg
> + movq%r\scratch_reg, \save_reg
> + /*
> +  * Is the switch bit zero?  This means the address is
> +  * up in real KAISER patches in a moment.

 * If the switch bit is zero, CR3 points at the kernel page tables
 * already.
Hmm?

>  /*
> @@ -1189,6 +1201,7 @@ ENTRY(paranoid_exit)
>   testl   %ebx, %ebx  /* swapgs needed? */
>   jnz .Lparanoid_exit_no_swapgs
>   TRACE_IRQS_IRETQ
> + RESTORE_CR3 %r14

You have the named macro arguments everywhere, just not here.

Other than that.

Reviewed-by: Thomas Gleixner 


[PATCH 05/30] x86, kaiser: prepare assembly for entry/exit CR3 switching

2017-11-10 Thread Dave Hansen

From: Dave Hansen 

This is largely code from Andy Lutomirski.  I fixed a few bugs
in it, and added a few SWITCH_TO_* spots.

KAISER needs to switch to a different CR3 value when it enters
the kernel and switch back when it exits.  This essentially
needs to be done before leaving assembly code.

This is extra challenging because the switching context is
tricky: the registers that can be clobbered can vary.  It is also
hard to store things on the stack because there is an established
ABI (ptregs) or the stack is entirely unsafe to use.

This patch establishes a set of macros that allow changing to
the user and kernel CR3 values.

Interactions with SWAPGS: previous versions of the KAISER code
relied on having per-cpu scratch space to save/restore a register
that can be used for the CR3 MOV.  The %GS register is used to
index into our per-cpu space, so SWAPGS *had* to be done before
the CR3 switch.  That scratch space is gone now, but the semantic
that SWAPGS must be done before the CR3 MOV is retained.  This is
good to keep because it is not that hard to do and it allows us
to do things like add per-cpu debugging information to help us
figure out what goes wrong sometimes.

What this does in the NMI code is worth pointing out.  NMIs
can interrupt *any* context and they can also be nested with
NMIs interrupting other NMIs.  The comments below
".Lnmi_from_kernel" explain the format of the stack during this
situation.  Changing the format of this stack is not a fun
exercise: I tried.  Instead of storing the old CR3 value on the
stack, this patch depend on the *regular* register save/restore
mechanism and then uses %r14 to keep CR3 during the NMI.  It is
callee-saved and will not be clobbered by the C NMI handlers that
get called.

Signed-off-by: Dave Hansen 
Cc: Moritz Lipp 
Cc: Daniel Gruss 
Cc: Michael Schwarz 
Cc: Richard Fellner 
Cc: Andy Lutomirski 
Cc: Linus Torvalds 
Cc: Kees Cook 
Cc: Hugh Dickins 
Cc: x...@kernel.org
---

 b/arch/x86/entry/calling.h |   65 +
 b/arch/x86/entry/entry_64.S|   34 ---
 b/arch/x86/entry/entry_64_compat.S |8 
 3 files changed, 102 insertions(+), 5 deletions(-)

diff -puN arch/x86/entry/calling.h~kaiser-luto-base-cr3-work 
arch/x86/entry/calling.h
--- a/arch/x86/entry/calling.h~kaiser-luto-base-cr3-work2017-11-10 
11:22:07.191244954 -0800
+++ b/arch/x86/entry/calling.h  2017-11-10 11:22:07.198244954 -0800
@@ -1,5 +1,6 @@
 #include 
 #include 
+#include 
 
 /*
 
@@ -186,6 +187,70 @@ For 32-bit we have the following convent
 #endif
 .endm
 
+#ifdef CONFIG_KAISER
+
+/* KAISER PGDs are 8k.  We flip bit 12 to switch between the two halves: */
+#define KAISER_SWITCH_MASK (1<

[PATCH 05/30] x86, kaiser: prepare assembly for entry/exit CR3 switching

2017-11-10 Thread Dave Hansen

From: Dave Hansen 

This is largely code from Andy Lutomirski.  I fixed a few bugs
in it, and added a few SWITCH_TO_* spots.

KAISER needs to switch to a different CR3 value when it enters
the kernel and switch back when it exits.  This essentially
needs to be done before leaving assembly code.

This is extra challenging because the switching context is
tricky: the registers that can be clobbered can vary.  It is also
hard to store things on the stack because there is an established
ABI (ptregs) or the stack is entirely unsafe to use.

This patch establishes a set of macros that allow changing to
the user and kernel CR3 values.

Interactions with SWAPGS: previous versions of the KAISER code
relied on having per-cpu scratch space to save/restore a register
that can be used for the CR3 MOV.  The %GS register is used to
index into our per-cpu space, so SWAPGS *had* to be done before
the CR3 switch.  That scratch space is gone now, but the semantic
that SWAPGS must be done before the CR3 MOV is retained.  This is
good to keep because it is not that hard to do and it allows us
to do things like add per-cpu debugging information to help us
figure out what goes wrong sometimes.

What this does in the NMI code is worth pointing out.  NMIs
can interrupt *any* context and they can also be nested with
NMIs interrupting other NMIs.  The comments below
".Lnmi_from_kernel" explain the format of the stack during this
situation.  Changing the format of this stack is not a fun
exercise: I tried.  Instead of storing the old CR3 value on the
stack, this patch depend on the *regular* register save/restore
mechanism and then uses %r14 to keep CR3 during the NMI.  It is
callee-saved and will not be clobbered by the C NMI handlers that
get called.

Signed-off-by: Dave Hansen 
Cc: Moritz Lipp 
Cc: Daniel Gruss 
Cc: Michael Schwarz 
Cc: Richard Fellner 
Cc: Andy Lutomirski 
Cc: Linus Torvalds 
Cc: Kees Cook 
Cc: Hugh Dickins 
Cc: x...@kernel.org
---

 b/arch/x86/entry/calling.h |   65 +
 b/arch/x86/entry/entry_64.S|   34 ---
 b/arch/x86/entry/entry_64_compat.S |8 
 3 files changed, 102 insertions(+), 5 deletions(-)

diff -puN arch/x86/entry/calling.h~kaiser-luto-base-cr3-work 
arch/x86/entry/calling.h
--- a/arch/x86/entry/calling.h~kaiser-luto-base-cr3-work2017-11-10 
11:22:07.191244954 -0800
+++ b/arch/x86/entry/calling.h  2017-11-10 11:22:07.198244954 -0800
@@ -1,5 +1,6 @@
 #include 
 #include 
+#include 
 
 /*
 
@@ -186,6 +187,70 @@ For 32-bit we have the following convent
 #endif
 .endm
 
+#ifdef CONFIG_KAISER
+
+/* KAISER PGDs are 8k.  We flip bit 12 to switch between the two halves: */
+#define KAISER_SWITCH_MASK (1r14 = 0 */
pushq   $0  /* pt_regs->r15 = 0 */
 
+   SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi
+
/*
 * User mode is traced as though IRQs are on, and SYSENTER
 * turned them off.
@@ -240,6 +245,7 @@ sysret32_from_system_call:
popq%rsi/* pt_regs->si */
popq%rdi/* pt_regs->di */
 
+   SWITCH_TO_USER_CR3 scratch_reg=%r8
 /*
  * USERGS_SYSRET32 does:
  *  GSBASE = user's GS base
@@ -324,6 +330,8 @@ ENTRY(entry_INT80_compat)
pushq   %r15/* pt_regs->r15 */
cld
 
+   SWITCH_TO_KERNEL_CR3 scratch_reg=%r11
+
movq%rsp, %rdi  /* pt_regs pointer */
callsync_regs
movq%rax, %rsp  /* switch stack */
diff -puN arch/x86/entry/entry_64.S~kaiser-luto-base-cr3-work 
arch/x86/entry/entry_64.S
--- a/arch/x86/entry/entry_64.S~kaiser-luto-base-cr3-work   2017-11-10 
11:22:07.194244954 -0800
+++ b/arch/x86/entry/entry_64.S 2017-11-10 11:22:07.199244954 -0800
@@ -147,8 +147,6 @@ ENTRY(entry_SYSCALL_64)
movq%rsp, PER_CPU_VAR(rsp_scratch)
movqPER_CPU_VAR(cpu_current_top_of_stack), %rsp
 
-   TRACE_IRQS_OFF
-
/* Construct struct pt_regs on stack */
pushq   $__USER_DS  /* pt_regs->ss */
pushq   PER_CPU_VAR(rsp_scratch)/* pt_regs->sp */
@@ -169,6 +167,13 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
sub $(6*8), %rsp/* pt_regs->bp, bx, r12-15 not 
saved */
UNWIND_HINT_REGS extra=0
 
+   /* NB: right here, all regs except r11 are live. */
+
+   SWITCH_TO_KERNEL_CR3 scratch_reg=%r11
+
+   /* Must wait until we have the kernel CR3 to call C functions: */
+   TRACE_IRQS_OFF
+
   

Re: [PATCH 05/30] x86, kaiser: prepare assembly for entry/exit CR3 switching

2017-11-09 Thread Borislav Petkov
On Thu, Nov 09, 2017 at 07:34:52AM -0800, Dave Hansen wrote:
> On 11/09/2017 05:20 AM, Borislav Petkov wrote:
> > What branch is that one against?
> 
> It's against Andy's entry rework:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/entry_consolidation

Ah, so this is what

" * Updated to be on top of Andy L's new entry code"

means.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH 05/30] x86, kaiser: prepare assembly for entry/exit CR3 switching

2017-11-09 Thread Borislav Petkov
On Thu, Nov 09, 2017 at 07:34:52AM -0800, Dave Hansen wrote:
> On 11/09/2017 05:20 AM, Borislav Petkov wrote:
> > What branch is that one against?
> 
> It's against Andy's entry rework:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/entry_consolidation

Ah, so this is what

" * Updated to be on top of Andy L's new entry code"

means.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH 05/30] x86, kaiser: prepare assembly for entry/exit CR3 switching

2017-11-09 Thread Dave Hansen
On 11/09/2017 05:20 AM, Borislav Petkov wrote:
> What branch is that one against?

It's against Andy's entry rework:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/entry_consolidation



Re: [PATCH 05/30] x86, kaiser: prepare assembly for entry/exit CR3 switching

2017-11-09 Thread Dave Hansen
On 11/09/2017 05:20 AM, Borislav Petkov wrote:
> What branch is that one against?

It's against Andy's entry rework:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/entry_consolidation



Re: [PATCH 05/30] x86, kaiser: prepare assembly for entry/exit CR3 switching

2017-11-09 Thread Borislav Petkov
On Wed, Nov 08, 2017 at 11:46:54AM -0800, Dave Hansen wrote:
> From: Dave Hansen 
> 
> This is largely code from Andy Lutomirski.  I fixed a few bugs
> in it, and added a few SWITCH_TO_* spots.

...

> Signed-off-by: Dave Hansen 
> Cc: Moritz Lipp 
> Cc: Daniel Gruss 
> Cc: Michael Schwarz 
> Cc: Richard Fellner 
> Cc: Andy Lutomirski 
> Cc: Linus Torvalds 
> Cc: Kees Cook 
> Cc: Hugh Dickins 
> Cc: x...@kernel.org
> ---
> 
>  b/arch/x86/entry/calling.h |   65 
> +
>  b/arch/x86/entry/entry_64.S|   30 ++---
>  b/arch/x86/entry/entry_64_compat.S |8 
>  3 files changed, 98 insertions(+), 5 deletions(-)

What branch is that one against?

It doesn't apply cleanly against tip:x86/asm from today:

patching file arch/x86/entry/calling.h
Hunk #1 succeeded at 2 with fuzz 1 (offset 1 line).
Hunk #2 succeeded at 188 (offset 1 line).
patching file arch/x86/entry/entry_64_compat.S
Hunk #1 succeeded at 92 (offset 1 line).
Hunk #2 succeeded at 218 (offset 1 line).
Hunk #3 succeeded at 246 (offset 1 line).
Hunk #4 FAILED at 330.
1 out of 4 hunks FAILED -- saving rejects to file 
arch/x86/entry/entry_64_compat.S.rej
patching file arch/x86/entry/entry_64.S
Hunk #1 succeeded at 148 (offset 1 line).
Hunk #2 succeeded at 168 (offset 1 line).
Hunk #3 succeeded at 508 with fuzz 2 (offset 163 lines).
Hunk #4 FAILED at 685.
Hunk #5 succeeded at 1119 (offset -54 lines).
Hunk #6 succeeded at 1145 (offset -54 lines).
Hunk #7 succeeded at 1174 (offset -54 lines).
Hunk #8 succeeded at 1223 (offset -54 lines).
Hunk #9 succeeded at 1350 (offset -54 lines).
Hunk #10 succeeded at 1575 (offset -54 lines).
1 out of 10 hunks FAILED -- saving rejects to file arch/x86/entry/entry_64.S.rej

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH 05/30] x86, kaiser: prepare assembly for entry/exit CR3 switching

2017-11-09 Thread Borislav Petkov
On Wed, Nov 08, 2017 at 11:46:54AM -0800, Dave Hansen wrote:
> From: Dave Hansen 
> 
> This is largely code from Andy Lutomirski.  I fixed a few bugs
> in it, and added a few SWITCH_TO_* spots.

...

> Signed-off-by: Dave Hansen 
> Cc: Moritz Lipp 
> Cc: Daniel Gruss 
> Cc: Michael Schwarz 
> Cc: Richard Fellner 
> Cc: Andy Lutomirski 
> Cc: Linus Torvalds 
> Cc: Kees Cook 
> Cc: Hugh Dickins 
> Cc: x...@kernel.org
> ---
> 
>  b/arch/x86/entry/calling.h |   65 
> +
>  b/arch/x86/entry/entry_64.S|   30 ++---
>  b/arch/x86/entry/entry_64_compat.S |8 
>  3 files changed, 98 insertions(+), 5 deletions(-)

What branch is that one against?

It doesn't apply cleanly against tip:x86/asm from today:

patching file arch/x86/entry/calling.h
Hunk #1 succeeded at 2 with fuzz 1 (offset 1 line).
Hunk #2 succeeded at 188 (offset 1 line).
patching file arch/x86/entry/entry_64_compat.S
Hunk #1 succeeded at 92 (offset 1 line).
Hunk #2 succeeded at 218 (offset 1 line).
Hunk #3 succeeded at 246 (offset 1 line).
Hunk #4 FAILED at 330.
1 out of 4 hunks FAILED -- saving rejects to file 
arch/x86/entry/entry_64_compat.S.rej
patching file arch/x86/entry/entry_64.S
Hunk #1 succeeded at 148 (offset 1 line).
Hunk #2 succeeded at 168 (offset 1 line).
Hunk #3 succeeded at 508 with fuzz 2 (offset 163 lines).
Hunk #4 FAILED at 685.
Hunk #5 succeeded at 1119 (offset -54 lines).
Hunk #6 succeeded at 1145 (offset -54 lines).
Hunk #7 succeeded at 1174 (offset -54 lines).
Hunk #8 succeeded at 1223 (offset -54 lines).
Hunk #9 succeeded at 1350 (offset -54 lines).
Hunk #10 succeeded at 1575 (offset -54 lines).
1 out of 10 hunks FAILED -- saving rejects to file arch/x86/entry/entry_64.S.rej

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


[PATCH 05/30] x86, kaiser: prepare assembly for entry/exit CR3 switching

2017-11-08 Thread Dave Hansen

From: Dave Hansen 

This is largely code from Andy Lutomirski.  I fixed a few bugs
in it, and added a few SWITCH_TO_* spots.

KAISER needs to switch to a different CR3 value when it enters
the kernel and switch back when it exits.  This essentially
needs to be done before we leave assembly code.

This is extra challenging because the context in which we have to
make this switch is tricky: the registers we are allowed to
clobber can vary.  It's also hard to store things on the stack
because there are already things on it with an established ABI
(ptregs) or the stack is unsafe to use at all.

This patch establishes a set of macros that allow changing to
the user and kernel CR3 values.

Interactions with SWAPGS: previous versions of the KAISER code
relied on having per-cpu scratch space so we have a register
to clobber for our CR3 MOV.  The %GS register is what we use
to index into our per-cpu sapce, so SWAPGS *had* to be done
before the CR3 switch.  That scratch space is gone now, but we
still keep the semantic that SWAPGS must be done before the
CR3 MOV.  This is good to keep because it is not that hard to
do and it allows us to do things like add per-cpu debugging
information to help us figure out what goes wrong sometimes.

What this does in the NMI code is worth pointing out.  NMIs
can interrupt *any* context and they can also be nested with
NMIs interrupting other NMIs.  The comments below
".Lnmi_from_kernel" explain the format of the stack that we
have to deal with this situation.  Changing the format of
this stack is not a fun exercise: I tried.  Instead of
storing the old CR3 value on the stack, we depend on the
*regular* register save/restore mechanism and then use %r14
to keep CR3 during the NMI.  It will not be clobbered by the
C NMI handlers that get called.

Signed-off-by: Dave Hansen 
Cc: Moritz Lipp 
Cc: Daniel Gruss 
Cc: Michael Schwarz 
Cc: Richard Fellner 
Cc: Andy Lutomirski 
Cc: Linus Torvalds 
Cc: Kees Cook 
Cc: Hugh Dickins 
Cc: x...@kernel.org
---

 b/arch/x86/entry/calling.h |   65 +
 b/arch/x86/entry/entry_64.S|   30 ++---
 b/arch/x86/entry/entry_64_compat.S |8 
 3 files changed, 98 insertions(+), 5 deletions(-)

diff -puN arch/x86/entry/calling.h~kaiser-luto-base-cr3-work 
arch/x86/entry/calling.h
--- a/arch/x86/entry/calling.h~kaiser-luto-base-cr3-work2017-11-08 
10:45:28.091681398 -0800
+++ b/arch/x86/entry/calling.h  2017-11-08 10:45:28.098681398 -0800
@@ -1,5 +1,6 @@
 #include 
 #include 
+#include 
 
 /*
 
@@ -186,6 +187,70 @@ For 32-bit we have the following convent
 #endif
 .endm
 
+#ifdef CONFIG_KAISER
+
+/* KAISER PGDs are 8k.  We flip bit 12 to switch between the two halves: */
+#define KAISER_SWITCH_MASK (1<

[PATCH 05/30] x86, kaiser: prepare assembly for entry/exit CR3 switching

2017-11-08 Thread Dave Hansen

From: Dave Hansen 

This is largely code from Andy Lutomirski.  I fixed a few bugs
in it, and added a few SWITCH_TO_* spots.

KAISER needs to switch to a different CR3 value when it enters
the kernel and switch back when it exits.  This essentially
needs to be done before we leave assembly code.

This is extra challenging because the context in which we have to
make this switch is tricky: the registers we are allowed to
clobber can vary.  It's also hard to store things on the stack
because there are already things on it with an established ABI
(ptregs) or the stack is unsafe to use at all.

This patch establishes a set of macros that allow changing to
the user and kernel CR3 values.

Interactions with SWAPGS: previous versions of the KAISER code
relied on having per-cpu scratch space so we have a register
to clobber for our CR3 MOV.  The %GS register is what we use
to index into our per-cpu sapce, so SWAPGS *had* to be done
before the CR3 switch.  That scratch space is gone now, but we
still keep the semantic that SWAPGS must be done before the
CR3 MOV.  This is good to keep because it is not that hard to
do and it allows us to do things like add per-cpu debugging
information to help us figure out what goes wrong sometimes.

What this does in the NMI code is worth pointing out.  NMIs
can interrupt *any* context and they can also be nested with
NMIs interrupting other NMIs.  The comments below
".Lnmi_from_kernel" explain the format of the stack that we
have to deal with this situation.  Changing the format of
this stack is not a fun exercise: I tried.  Instead of
storing the old CR3 value on the stack, we depend on the
*regular* register save/restore mechanism and then use %r14
to keep CR3 during the NMI.  It will not be clobbered by the
C NMI handlers that get called.

Signed-off-by: Dave Hansen 
Cc: Moritz Lipp 
Cc: Daniel Gruss 
Cc: Michael Schwarz 
Cc: Richard Fellner 
Cc: Andy Lutomirski 
Cc: Linus Torvalds 
Cc: Kees Cook 
Cc: Hugh Dickins 
Cc: x...@kernel.org
---

 b/arch/x86/entry/calling.h |   65 +
 b/arch/x86/entry/entry_64.S|   30 ++---
 b/arch/x86/entry/entry_64_compat.S |8 
 3 files changed, 98 insertions(+), 5 deletions(-)

diff -puN arch/x86/entry/calling.h~kaiser-luto-base-cr3-work 
arch/x86/entry/calling.h
--- a/arch/x86/entry/calling.h~kaiser-luto-base-cr3-work2017-11-08 
10:45:28.091681398 -0800
+++ b/arch/x86/entry/calling.h  2017-11-08 10:45:28.098681398 -0800
@@ -1,5 +1,6 @@
 #include 
 #include 
+#include 
 
 /*
 
@@ -186,6 +187,70 @@ For 32-bit we have the following convent
 #endif
 .endm
 
+#ifdef CONFIG_KAISER
+
+/* KAISER PGDs are 8k.  We flip bit 12 to switch between the two halves: */
+#define KAISER_SWITCH_MASK (1r14 = 0 */
pushq   $0  /* pt_regs->r15 = 0 */
 
+   SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi
+
/*
 * User mode is traced as though IRQs are on, and SYSENTER
 * turned them off.
@@ -240,6 +245,7 @@ sysret32_from_system_call:
popq%rsi/* pt_regs->si */
popq%rdi/* pt_regs->di */
 
+   SWITCH_TO_USER_CR3 scratch_reg=%r8
 /*
  * USERGS_SYSRET32 does:
  *  GSBASE = user's GS base
@@ -324,6 +330,8 @@ ENTRY(entry_INT80_compat)
pushq   %r15/* pt_regs->r15 */
cld
 
+   SWITCH_TO_KERNEL_CR3 scratch_reg=%r11
+
movq%rsp, %rdi  /* pt_regs pointer */
callsync_regs
movq%rax, %rsp  /* switch stack */
diff -puN arch/x86/entry/entry_64.S~kaiser-luto-base-cr3-work 
arch/x86/entry/entry_64.S
--- a/arch/x86/entry/entry_64.S~kaiser-luto-base-cr3-work   2017-11-08 
10:45:28.094681398 -0800
+++ b/arch/x86/entry/entry_64.S 2017-11-08 10:45:28.099681398 -0800
@@ -147,8 +147,6 @@ ENTRY(entry_SYSCALL_64)
movq%rsp, PER_CPU_VAR(rsp_scratch)
movqPER_CPU_VAR(cpu_current_top_of_stack), %rsp
 
-   TRACE_IRQS_OFF
-
/* Construct struct pt_regs on stack */
pushq   $__USER_DS  /* pt_regs->ss */
pushq   PER_CPU_VAR(rsp_scratch)/* pt_regs->sp */
@@ -169,6 +167,13 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
sub $(6*8), %rsp/* pt_regs->bp, bx, r12-15 not 
saved */
UNWIND_HINT_REGS extra=0
 
+   /* NB: right here, all regs except r11 are live. */
+
+   SWITCH_TO_KERNEL_CR3 scratch_reg=%r11
+
+   /* Must wait until we have the kernel CR3 to call C