[tip: x86/seves] x86/sev-es: Optimize __sev_es_ist_enter() for better readability

2021-03-19 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 799de1baaf3509a54ff713efb768020f8defd709
Gitweb:
https://git.kernel.org/tip/799de1baaf3509a54ff713efb768020f8defd709
Author:Joerg Roedel 
AuthorDate:Wed, 03 Mar 2021 15:17:14 +01:00
Committer: Borislav Petkov 
CommitterDate: Fri, 19 Mar 2021 13:37:22 +01:00

x86/sev-es: Optimize __sev_es_ist_enter() for better readability

Reorganize the code and improve the comments to make the function more
readable and easier to understand.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20210303141716.29223-4-j...@8bytes.org
---
 arch/x86/kernel/sev-es.c | 36 
 1 file changed, 24 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 225704e..26f5479 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -137,29 +137,41 @@ static __always_inline bool on_vc_stack(struct pt_regs 
*regs)
 }
 
 /*
- * This function handles the case when an NMI is raised in the #VC exception
- * handler entry code. In this case, the IST entry for #VC must be adjusted, so
- * that any subsequent #VC exception will not overwrite the stack contents of 
the
- * interrupted #VC handler.
+ * This function handles the case when an NMI is raised in the #VC
+ * exception handler entry code, before the #VC handler has switched off
+ * its IST stack. In this case, the IST entry for #VC must be adjusted,
+ * so that any nested #VC exception will not overwrite the stack
+ * contents of the interrupted #VC handler.
  *
  * The IST entry is adjusted unconditionally so that it can be also be
- * unconditionally adjusted back in sev_es_ist_exit(). Otherwise a nested
- * sev_es_ist_exit() call may adjust back the IST entry too early.
+ * unconditionally adjusted back in __sev_es_ist_exit(). Otherwise a
+ * nested sev_es_ist_exit() call may adjust back the IST entry too
+ * early.
+ *
+ * The __sev_es_ist_enter() and __sev_es_ist_exit() functions always run
+ * on the NMI IST stack, as they are only called from NMI handling code
+ * right now.
  */
 void noinstr __sev_es_ist_enter(struct pt_regs *regs)
 {
unsigned long old_ist, new_ist;
 
/* Read old IST entry */
-   old_ist = __this_cpu_read(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC]);
+   new_ist = old_ist = 
__this_cpu_read(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC]);
 
-   /* Make room on the IST stack */
+   /*
+* If NMI happened while on the #VC IST stack, set the new IST
+* value below regs->sp, so that the interrupted stack frame is
+* not overwritten by subsequent #VC exceptions.
+*/
if (on_vc_stack(regs))
-   new_ist = ALIGN_DOWN(regs->sp, 8) - sizeof(old_ist);
-   else
-   new_ist = old_ist - sizeof(old_ist);
+   new_ist = regs->sp;
 
-   /* Store old IST entry */
+   /*
+* Reserve additional 8 bytes and store old IST value so this
+* adjustment can be unrolled in __sev_es_ist_exit().
+*/
+   new_ist -= sizeof(old_ist);
*(unsigned long *)new_ist = old_ist;
 
/* Set new IST entry */


[tip: x86/seves] x86/boot/compressed/64: Add CPUID sanity check to 32-bit boot-path

2021-03-18 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: e927e62d8e370ebfc0d702fec22bc752249ebcef
Gitweb:
https://git.kernel.org/tip/e927e62d8e370ebfc0d702fec22bc752249ebcef
Author:Joerg Roedel 
AuthorDate:Fri, 12 Mar 2021 13:38:22 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 18 Mar 2021 23:04:12 +01:00

x86/boot/compressed/64: Add CPUID sanity check to 32-bit boot-path

The 32-bit #VC handler has no GHCB and can only handle CPUID exit codes.
It is needed by the early boot code to handle #VC exceptions raised in
verify_cpu() and to get the position of the C-bit.

But the CPUID information comes from the hypervisor which is untrusted
and might return results which trick the guest into the no-SEV boot path
with no C-bit set in the page-tables. All data written to memory would
then be unencrypted and could leak sensitive data to the hypervisor.

Add sanity checks to the 32-bit boot #VC handler to make sure the
hypervisor does not pretend that SEV is not enabled.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20210312123824.306-7-j...@8bytes.org
---
 arch/x86/boot/compressed/mem_encrypt.S | 28 +-
 1 file changed, 28 insertions(+)

diff --git a/arch/x86/boot/compressed/mem_encrypt.S 
b/arch/x86/boot/compressed/mem_encrypt.S
index ebc4a29..c1e81a8 100644
--- a/arch/x86/boot/compressed/mem_encrypt.S
+++ b/arch/x86/boot/compressed/mem_encrypt.S
@@ -139,6 +139,26 @@ SYM_CODE_START(startup32_vc_handler)
jnz .Lfail
movl%edx, 0(%esp)   # Store result
 
+   /*
+* Sanity check CPUID results from the Hypervisor. See comment in
+* do_vc_no_ghcb() for more details on why this is necessary.
+*/
+
+   /* Fail if SEV leaf not available in CPUID[0x8000].EAX */
+   cmpl$0x8000, %ebx
+   jne .Lcheck_sev
+   cmpl$0x801f, 12(%esp)
+   jb  .Lfail
+   jmp .Ldone
+
+.Lcheck_sev:
+   /* Fail if SEV bit not set in CPUID[0x801f].EAX[1] */
+   cmpl$0x801f, %ebx
+   jne .Ldone
+   btl $1, 12(%esp)
+   jnc .Lfail
+
+.Ldone:
popl%edx
popl%ecx
popl%ebx
@@ -152,6 +172,14 @@ SYM_CODE_START(startup32_vc_handler)
 
iret
 .Lfail:
+   /* Send terminate request to Hypervisor */
+   movl$0x100, %eax
+   xorl%edx, %edx
+   movl$MSR_AMD64_SEV_ES_GHCB, %ecx
+   wrmsr
+   rep; vmmcall
+
+   /* If request fails, go to hlt loop */
hlt
jmp .Lfail
 SYM_CODE_END(startup32_vc_handler)


[tip: x86/seves] x86/boot/compressed/64: Add 32-bit boot #VC handler

2021-03-18 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 1ccdbf748d862bc2ea106fa9f2300983c77860fe
Gitweb:
https://git.kernel.org/tip/1ccdbf748d862bc2ea106fa9f2300983c77860fe
Author:Joerg Roedel 
AuthorDate:Wed, 10 Mar 2021 09:43:22 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 18 Mar 2021 23:03:43 +01:00

x86/boot/compressed/64: Add 32-bit boot #VC handler

Add a #VC exception handler which is used when the kernel still executes
in protected mode. This boot-path already uses CPUID, which will cause #VC
exceptions in an SEV-ES guest.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20210312123824.306-6-j...@8bytes.org
---
 arch/x86/boot/compressed/head_64.S |  6 ++-
 arch/x86/boot/compressed/mem_encrypt.S | 96 -
 2 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index 2001c3b..ee448ae 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "pgtable.h"
 
 /*
@@ -857,6 +858,11 @@ SYM_FUNC_END(startup32_set_idt_entry)
 
 SYM_FUNC_START(startup32_load_idt)
 #ifdef CONFIG_AMD_MEM_ENCRYPT
+   /* #VC handler */
+   lealrva(startup32_vc_handler)(%ebp), %eax
+   movl$X86_TRAP_VC, %edx
+   callstartup32_set_idt_entry
+
/* Load IDT */
lealrva(boot32_idt)(%ebp), %eax
movl%eax, rva(boot32_idt_desc+2)(%ebp)
diff --git a/arch/x86/boot/compressed/mem_encrypt.S 
b/arch/x86/boot/compressed/mem_encrypt.S
index a6dea4e..ebc4a29 100644
--- a/arch/x86/boot/compressed/mem_encrypt.S
+++ b/arch/x86/boot/compressed/mem_encrypt.S
@@ -61,10 +61,104 @@ SYM_FUNC_START(get_sev_encryption_bit)
ret
 SYM_FUNC_END(get_sev_encryption_bit)
 
+/**
+ * sev_es_req_cpuid - Request a CPUID value from the Hypervisor using
+ *   the GHCB MSR protocol
+ *
+ * @%eax:  Register to request (0=EAX, 1=EBX, 2=ECX, 3=EDX)
+ * @%edx:  CPUID Function
+ *
+ * Returns 0 in %eax on success, non-zero on failure
+ * %edx returns CPUID value on success
+ */
+SYM_CODE_START_LOCAL(sev_es_req_cpuid)
+   shll$30, %eax
+   orl $0x0004, %eax
+   movl$MSR_AMD64_SEV_ES_GHCB, %ecx
+   wrmsr
+   rep; vmmcall# VMGEXIT
+   rdmsr
+
+   /* Check response */
+   movl%eax, %ecx
+   andl$0x3000, %ecx   # Bits [12-29] MBZ
+   jnz 2f
+
+   /* Check return code */
+   andl$0xfff, %eax
+   cmpl$5, %eax
+   jne 2f
+
+   /* All good - return success */
+   xorl%eax, %eax
+1:
+   ret
+2:
+   movl$-1, %eax
+   jmp 1b
+SYM_CODE_END(sev_es_req_cpuid)
+
+SYM_CODE_START(startup32_vc_handler)
+   pushl   %eax
+   pushl   %ebx
+   pushl   %ecx
+   pushl   %edx
+
+   /* Keep CPUID function in %ebx */
+   movl%eax, %ebx
+
+   /* Check if error-code == SVM_EXIT_CPUID */
+   cmpl$0x72, 16(%esp)
+   jne .Lfail
+
+   movl$0, %eax# Request CPUID[fn].EAX
+   movl%ebx, %edx  # CPUID fn
+   callsev_es_req_cpuid# Call helper
+   testl   %eax, %eax  # Check return code
+   jnz .Lfail
+   movl%edx, 12(%esp)  # Store result
+
+   movl$1, %eax# Request CPUID[fn].EBX
+   movl%ebx, %edx  # CPUID fn
+   callsev_es_req_cpuid# Call helper
+   testl   %eax, %eax  # Check return code
+   jnz .Lfail
+   movl%edx, 8(%esp)   # Store result
+
+   movl$2, %eax# Request CPUID[fn].ECX
+   movl%ebx, %edx  # CPUID fn
+   callsev_es_req_cpuid# Call helper
+   testl   %eax, %eax  # Check return code
+   jnz .Lfail
+   movl%edx, 4(%esp)   # Store result
+
+   movl$3, %eax# Request CPUID[fn].EDX
+   movl%ebx, %edx  # CPUID fn
+   callsev_es_req_cpuid# Call helper
+   testl   %eax, %eax  # Check return code
+   jnz .Lfail
+   movl%edx, 0(%esp)   # Store result
+
+   popl%edx
+   popl%ecx
+   popl%ebx
+   popl%eax
+
+   /* Remove error code */
+   addl$4, %esp
+
+   /* Jump over CPUID instruction */
+   addl$2, (%esp)
+
+   iret
+.Lfail:
+   hlt
+   jmp .Lfail
+SYM_CODE_END(startup32_vc_handler)
+
.code64
 
 #include "../../kernel/sev_verify_cbit.S"
-
 SYM_FUNC_START(set_sev_encryption_mask)
 #ifdef CONFIG_AMD_MEM_ENCRYPT
push%rbp


[tip: x86/seves] x86/sev-es: Replace open-coded hlt-loops with sev_es_terminate()

2021-03-18 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: f15a0a732aefb46f999c2a8aa8f9f16e71cec5b2
Gitweb:
https://git.kernel.org/tip/f15a0a732aefb46f999c2a8aa8f9f16e71cec5b2
Author:Joerg Roedel 
AuthorDate:Fri, 12 Mar 2021 13:38:24 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 18 Mar 2021 23:04:12 +01:00

x86/sev-es: Replace open-coded hlt-loops with sev_es_terminate()

There are a few places left in the SEV-ES C code where hlt loops and/or
terminate requests are implemented. Replace them all with calls to
sev_es_terminate().

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20210312123824.306-9-j...@8bytes.org
---
 arch/x86/boot/compressed/sev-es.c | 12 +++-
 arch/x86/kernel/sev-es-shared.c   | 10 +++---
 2 files changed, 6 insertions(+), 16 deletions(-)

diff --git a/arch/x86/boot/compressed/sev-es.c 
b/arch/x86/boot/compressed/sev-es.c
index 27826c2..d904bd5 100644
--- a/arch/x86/boot/compressed/sev-es.c
+++ b/arch/x86/boot/compressed/sev-es.c
@@ -200,14 +200,8 @@ void do_boot_stage2_vc(struct pt_regs *regs, unsigned long 
exit_code)
}
 
 finish:
-   if (result == ES_OK) {
+   if (result == ES_OK)
vc_finish_insn(&ctxt);
-   } else if (result != ES_RETRY) {
-   /*
-* For now, just halt the machine. That makes debugging easier,
-* later we just call sev_es_terminate() here.
-*/
-   while (true)
-   asm volatile("hlt\n");
-   }
+   else if (result != ES_RETRY)
+   sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
 }
diff --git a/arch/x86/kernel/sev-es-shared.c b/arch/x86/kernel/sev-es-shared.c
index 387b716..0aa9f13 100644
--- a/arch/x86/kernel/sev-es-shared.c
+++ b/arch/x86/kernel/sev-es-shared.c
@@ -24,7 +24,7 @@ static bool __init sev_es_check_cpu_features(void)
return true;
 }
 
-static void sev_es_terminate(unsigned int reason)
+static void __noreturn sev_es_terminate(unsigned int reason)
 {
u64 val = GHCB_SEV_TERMINATE;
 
@@ -206,12 +206,8 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned 
long exit_code)
return;
 
 fail:
-   sev_es_wr_ghcb_msr(GHCB_SEV_TERMINATE);
-   VMGEXIT();
-
-   /* Shouldn't get here - if we do halt the machine */
-   while (true)
-   asm volatile("hlt\n");
+   /* Terminate the guest */
+   sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
 }
 
 static enum es_result vc_insn_string_read(struct es_em_ctxt *ctxt,


[tip: x86/seves] x86/boot/compressed/64: Check SEV encryption in the 32-bit boot-path

2021-03-18 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: fef81c86262879d4b1176ef51a834c15b805ebb9
Gitweb:
https://git.kernel.org/tip/fef81c86262879d4b1176ef51a834c15b805ebb9
Author:Joerg Roedel 
AuthorDate:Fri, 12 Mar 2021 13:38:23 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 18 Mar 2021 23:04:12 +01:00

x86/boot/compressed/64: Check SEV encryption in the 32-bit boot-path

Check whether the hypervisor reported the correct C-bit when running
as an SEV guest. Using a wrong C-bit position could be used to leak
sensitive data from the guest to the hypervisor.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20210312123824.306-8-j...@8bytes.org
---
 arch/x86/boot/compressed/head_64.S | 83 +-
 1 file changed, 83 insertions(+)

diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index ee448ae..91ea0d5 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -183,11 +183,21 @@ SYM_FUNC_START(startup_32)
 */
callget_sev_encryption_bit
xorl%edx, %edx
+#ifdef CONFIG_AMD_MEM_ENCRYPT
testl   %eax, %eax
jz  1f
subl$32, %eax   /* Encryption bit is always above bit 31 */
bts %eax, %edx  /* Set encryption mask for page tables */
+   /*
+* Mark SEV as active in sev_status so that startup32_check_sev_cbit()
+* will do a check. The sev_status memory will be fully initialized
+* with the contents of MSR_AMD_SEV_STATUS later in
+* set_sev_encryption_mask(). For now it is sufficient to know that SEV
+* is active.
+*/
+   movl$1, rva(sev_status)(%ebp)
 1:
+#endif
 
/* Initialize Page tables to 0 */
lealrva(pgtable)(%ebx), %edi
@@ -272,6 +282,9 @@ SYM_FUNC_START(startup_32)
movl%esi, %edx
 1:
 #endif
+   /* Check if the C-bit position is correct when SEV is active */
+   callstartup32_check_sev_cbit
+
pushl   $__KERNEL_CS
pushl   %eax
 
@@ -872,6 +885,76 @@ SYM_FUNC_START(startup32_load_idt)
 SYM_FUNC_END(startup32_load_idt)
 
 /*
+ * Check for the correct C-bit position when the startup_32 boot-path is used.
+ *
+ * The check makes use of the fact that all memory is encrypted when paging is
+ * disabled. The function creates 64 bits of random data using the RDRAND
+ * instruction. RDRAND is mandatory for SEV guests, so always available. If the
+ * hypervisor violates that the kernel will crash right here.
+ *
+ * The 64 bits of random data are stored to a memory location and at the same
+ * time kept in the %eax and %ebx registers. Since encryption is always active
+ * when paging is off the random data will be stored encrypted in main memory.
+ *
+ * Then paging is enabled. When the C-bit position is correct all memory is
+ * still mapped encrypted and comparing the register values with memory will
+ * succeed. An incorrect C-bit position will map all memory unencrypted, so 
that
+ * the compare will use the encrypted random data and fail.
+ */
+SYM_FUNC_START(startup32_check_sev_cbit)
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+   pushl   %eax
+   pushl   %ebx
+   pushl   %ecx
+   pushl   %edx
+
+   /* Check for non-zero sev_status */
+   movlrva(sev_status)(%ebp), %eax
+   testl   %eax, %eax
+   jz  4f
+
+   /*
+* Get two 32-bit random values - Don't bail out if RDRAND fails
+* because it is better to prevent forward progress if no random value
+* can be gathered.
+*/
+1: rdrand  %eax
+   jnc 1b
+2: rdrand  %ebx
+   jnc 2b
+
+   /* Store to memory and keep it in the registers */
+   movl%eax, rva(sev_check_data)(%ebp)
+   movl%ebx, rva(sev_check_data+4)(%ebp)
+
+   /* Enable paging to see if encryption is active */
+   movl%cr0, %edx   /* Backup %cr0 in %edx */
+   movl$(X86_CR0_PG | X86_CR0_PE), %ecx /* Enable Paging and Protected 
mode */
+   movl%ecx, %cr0
+
+   cmpl%eax, rva(sev_check_data)(%ebp)
+   jne 3f
+   cmpl%ebx, rva(sev_check_data+4)(%ebp)
+   jne 3f
+
+   movl%edx, %cr0  /* Restore previous %cr0 */
+
+   jmp 4f
+
+3: /* Check failed - hlt the machine */
+   hlt
+   jmp 3b
+
+4:
+   popl%edx
+   popl%ecx
+   popl%ebx
+   popl%eax
+#endif
+   ret
+SYM_FUNC_END(startup32_check_sev_cbit)
+
+/*
  * Stack and heap for uncompression
  */
.bss


[tip: x86/seves] x86/sev: Do not require Hypervisor CPUID bit for SEV guests

2021-03-18 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: eab696d8e8b9c9d600be6fad8dd8dfdfaca6ca7c
Gitweb:
https://git.kernel.org/tip/eab696d8e8b9c9d600be6fad8dd8dfdfaca6ca7c
Author:Joerg Roedel 
AuthorDate:Fri, 12 Mar 2021 13:38:18 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 18 Mar 2021 16:44:40 +01:00

x86/sev: Do not require Hypervisor CPUID bit for SEV guests

A malicious hypervisor could disable the CPUID intercept for an SEV or
SEV-ES guest and trick it into the no-SEV boot path, where it could
potentially reveal secrets. This is not an issue for SEV-SNP guests,
as the CPUID intercept can't be disabled for those.

Remove the Hypervisor CPUID bit check from the SEV detection code to
protect against this kind of attack and add a Hypervisor bit equals zero
check to the SME detection path to prevent non-encrypted guests from
trying to enable SME.

This handles the following cases:

1) SEV(-ES) guest where CPUID intercept is disabled. The guest
   will still see leaf 0x801f and the SEV bit. It can
   retrieve the C-bit and boot normally.

2) Non-encrypted guests with intercepted CPUID will check
   the SEV_STATUS MSR and find it 0 and will try to enable SME.
   This will fail when the guest finds MSR_K8_SYSCFG to be zero,
   as it is emulated by KVM. But we can't rely on that, as there
   might be other hypervisors which return this MSR with bit
   23 set. The Hypervisor bit check will prevent that the guest
   tries to enable SME in this case.

3) Non-encrypted guests on SEV capable hosts with CPUID intercept
   disabled (by a malicious hypervisor) will try to boot into
   the SME path. This will fail, but it is also not considered
   a problem because non-encrypted guests have no protection
   against the hypervisor anyway.

 [ bp: s/non-SEV/non-encrypted/g ]

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Acked-by: Tom Lendacky 
Link: https://lkml.kernel.org/r/20210312123824.306-3-j...@8bytes.org
---
 arch/x86/boot/compressed/mem_encrypt.S |  6 +
 arch/x86/kernel/sev-es-shared.c|  6 +
 arch/x86/mm/mem_encrypt_identity.c | 35 +
 3 files changed, 20 insertions(+), 27 deletions(-)

diff --git a/arch/x86/boot/compressed/mem_encrypt.S 
b/arch/x86/boot/compressed/mem_encrypt.S
index aa56179..a6dea4e 100644
--- a/arch/x86/boot/compressed/mem_encrypt.S
+++ b/arch/x86/boot/compressed/mem_encrypt.S
@@ -23,12 +23,6 @@ SYM_FUNC_START(get_sev_encryption_bit)
push%ecx
push%edx
 
-   /* Check if running under a hypervisor */
-   movl$1, %eax
-   cpuid
-   bt  $31, %ecx   /* Check the hypervisor bit */
-   jnc .Lno_sev
-
movl$0x8000, %eax   /* CPUID to check the highest leaf */
cpuid
cmpl$0x801f, %eax   /* See if 0x801f is available */
diff --git a/arch/x86/kernel/sev-es-shared.c b/arch/x86/kernel/sev-es-shared.c
index cdc04d0..387b716 100644
--- a/arch/x86/kernel/sev-es-shared.c
+++ b/arch/x86/kernel/sev-es-shared.c
@@ -186,7 +186,6 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned 
long exit_code)
 * make it accessible to the hypervisor.
 *
 * In particular, check for:
-*  - Hypervisor CPUID bit
 *  - Availability of CPUID leaf 0x801f
 *  - SEV CPUID bit.
 *
@@ -194,10 +193,7 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned 
long exit_code)
 * can't be checked here.
 */
 
-   if ((fn == 1 && !(regs->cx & BIT(31
-   /* Hypervisor bit */
-   goto fail;
-   else if (fn == 0x8000 && (regs->ax < 0x801f))
+   if (fn == 0x8000 && (regs->ax < 0x801f))
/* SEV leaf check */
goto fail;
else if ((fn == 0x801f && !(regs->ax & BIT(1
diff --git a/arch/x86/mm/mem_encrypt_identity.c 
b/arch/x86/mm/mem_encrypt_identity.c
index 6c5eb6f..a19374d 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -503,14 +503,10 @@ void __init sme_enable(struct boot_params *bp)
 
 #define AMD_SME_BITBIT(0)
 #define AMD_SEV_BITBIT(1)
-   /*
-* Set the feature mask (SME or SEV) based on whether we are
-* running under a hypervisor.
-*/
-   eax = 1;
-   ecx = 0;
-   native_cpuid(&eax, &ebx, &ecx, &edx);
-   feature_mask = (ecx & BIT(31)) ? AMD_SEV_BIT : AMD_SME_BIT;
+
+   /* Check the SEV MSR whether SEV or SME is enabled */
+   sev_status   = __rdmsr(MSR_AMD64_SEV);
+   feature_mask = (sev_status & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : 
AMD_SME_BIT;
 
/*
 * Check for the SME/SEV feature:
@@ -530,19 +526,26 @@ void __init sme_enable(struct boot_params *bp)
 
/*

[tip: x86/seves] x86/boot/compressed/64: Add CPUID sanity check to 32-bit boot-path

2021-03-18 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 0dd986f3a1e31bd827d2f1f52f07c8a82cd83143
Gitweb:
https://git.kernel.org/tip/0dd986f3a1e31bd827d2f1f52f07c8a82cd83143
Author:Joerg Roedel 
AuthorDate:Fri, 12 Mar 2021 13:38:22 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 18 Mar 2021 16:44:52 +01:00

x86/boot/compressed/64: Add CPUID sanity check to 32-bit boot-path

The 32-bit #VC handler has no GHCB and can only handle CPUID exit codes.
It is needed by the early boot code to handle #VC exceptions raised in
verify_cpu() and to get the position of the C-bit.

But the CPUID information comes from the hypervisor which is untrusted
and might return results which trick the guest into the no-SEV boot path
with no C-bit set in the page-tables. All data written to memory would
then be unencrypted and could leak sensitive data to the hypervisor.

Add sanity checks to the 32-bit boot #VC handler to make sure the
hypervisor does not pretend that SEV is not enabled.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20210312123824.306-7-j...@8bytes.org
---
 arch/x86/boot/compressed/mem_encrypt.S | 28 +-
 1 file changed, 28 insertions(+)

diff --git a/arch/x86/boot/compressed/mem_encrypt.S 
b/arch/x86/boot/compressed/mem_encrypt.S
index ebc4a29..c1e81a8 100644
--- a/arch/x86/boot/compressed/mem_encrypt.S
+++ b/arch/x86/boot/compressed/mem_encrypt.S
@@ -139,6 +139,26 @@ SYM_CODE_START(startup32_vc_handler)
jnz .Lfail
movl%edx, 0(%esp)   # Store result
 
+   /*
+* Sanity check CPUID results from the Hypervisor. See comment in
+* do_vc_no_ghcb() for more details on why this is necessary.
+*/
+
+   /* Fail if SEV leaf not available in CPUID[0x8000].EAX */
+   cmpl$0x8000, %ebx
+   jne .Lcheck_sev
+   cmpl$0x801f, 12(%esp)
+   jb  .Lfail
+   jmp .Ldone
+
+.Lcheck_sev:
+   /* Fail if SEV bit not set in CPUID[0x801f].EAX[1] */
+   cmpl$0x801f, %ebx
+   jne .Ldone
+   btl $1, 12(%esp)
+   jnc .Lfail
+
+.Ldone:
popl%edx
popl%ecx
popl%ebx
@@ -152,6 +172,14 @@ SYM_CODE_START(startup32_vc_handler)
 
iret
 .Lfail:
+   /* Send terminate request to Hypervisor */
+   movl$0x100, %eax
+   xorl%edx, %edx
+   movl$MSR_AMD64_SEV_ES_GHCB, %ecx
+   wrmsr
+   rep; vmmcall
+
+   /* If request fails, go to hlt loop */
hlt
jmp .Lfail
 SYM_CODE_END(startup32_vc_handler)


[tip: x86/seves] x86/boot/compressed/64: Cleanup exception handling before booting kernel

2021-03-18 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: b099155e2df7dadf8b1ad9828158b89f5639f654
Gitweb:
https://git.kernel.org/tip/b099155e2df7dadf8b1ad9828158b89f5639f654
Author:Joerg Roedel 
AuthorDate:Wed, 10 Mar 2021 09:43:19 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 18 Mar 2021 16:44:36 +01:00

x86/boot/compressed/64: Cleanup exception handling before booting kernel

Disable the exception handling before booting the kernel to make sure
any exceptions that happen during early kernel boot are not directed to
the pre-decompression code.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20210312123824.306-2-j...@8bytes.org
---
 arch/x86/boot/compressed/idt_64.c | 14 ++
 arch/x86/boot/compressed/misc.c   |  7 ++-
 arch/x86/boot/compressed/misc.h   |  6 ++
 3 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/arch/x86/boot/compressed/idt_64.c 
b/arch/x86/boot/compressed/idt_64.c
index 804a502..9b93567 100644
--- a/arch/x86/boot/compressed/idt_64.c
+++ b/arch/x86/boot/compressed/idt_64.c
@@ -52,3 +52,17 @@ void load_stage2_idt(void)
 
load_boot_idt(&boot_idt_desc);
 }
+
+void cleanup_exception_handling(void)
+{
+   /*
+* Flush GHCB from cache and map it encrypted again when running as
+* SEV-ES guest.
+*/
+   sev_es_shutdown_ghcb();
+
+   /* Set a null-idt, disabling #PF and #VC handling */
+   boot_idt_desc.size= 0;
+   boot_idt_desc.address = 0;
+   load_boot_idt(&boot_idt_desc);
+}
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 267e7f9..cc9fd0e 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -443,11 +443,8 @@ asmlinkage __visible void *extract_kernel(void *rmode, 
memptr heap,
handle_relocations(output, output_len, virt_addr);
debug_putstr("done.\nBooting the kernel.\n");
 
-   /*
-* Flush GHCB from cache and map it encrypted again when running as
-* SEV-ES guest.
-*/
-   sev_es_shutdown_ghcb();
+   /* Disable exception handling before booting the kernel */
+   cleanup_exception_handling();
 
return output;
 }
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 901ea5e..e5612f0 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -155,6 +155,12 @@ extern pteval_t __default_kernel_pte_mask;
 extern gate_desc boot_idt[BOOT_IDT_ENTRIES];
 extern struct desc_ptr boot_idt_desc;
 
+#ifdef CONFIG_X86_64
+void cleanup_exception_handling(void);
+#else
+static inline void cleanup_exception_handling(void) { }
+#endif
+
 /* IDT Entry Points */
 void boot_page_fault(void);
 void boot_stage1_vc(void);


[tip: x86/seves] x86/boot/compressed/64: Setup IDT in startup_32 boot path

2021-03-18 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 79419e13e8082cc15d174df979a363528e31f2e7
Gitweb:
https://git.kernel.org/tip/79419e13e8082cc15d174df979a363528e31f2e7
Author:Joerg Roedel 
AuthorDate:Wed, 10 Mar 2021 09:43:21 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 18 Mar 2021 16:44:46 +01:00

x86/boot/compressed/64: Setup IDT in startup_32 boot path

This boot path needs exception handling when it is used with SEV-ES.
Setup an IDT and provide a helper function to write IDT entries for
use in 32-bit protected mode.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20210312123824.306-5-j...@8bytes.org
---
 arch/x86/boot/compressed/head_64.S | 72 +-
 1 file changed, 72 insertions(+)

diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index c59c80c..2001c3b 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -116,6 +116,9 @@ SYM_FUNC_START(startup_32)
lretl
 1:
 
+   /* Setup Exception handling for SEV-ES */
+   callstartup32_load_idt
+
/* Make sure cpu supports long mode. */
callverify_cpu
testl   %eax, %eax
@@ -701,6 +704,19 @@ SYM_DATA_START(boot_idt)
.endr
 SYM_DATA_END_LABEL(boot_idt, SYM_L_GLOBAL, boot_idt_end)
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+SYM_DATA_START(boot32_idt_desc)
+   .word   boot32_idt_end - boot32_idt - 1
+   .long   0
+SYM_DATA_END(boot32_idt_desc)
+   .balign 8
+SYM_DATA_START(boot32_idt)
+   .rept 32
+   .quad 0
+   .endr
+SYM_DATA_END_LABEL(boot32_idt, SYM_L_GLOBAL, boot32_idt_end)
+#endif
+
 #ifdef CONFIG_EFI_STUB
 SYM_DATA(image_offset, .long 0)
 #endif
@@ -793,6 +809,62 @@ SYM_DATA_START_LOCAL(loaded_image_proto)
 SYM_DATA_END(loaded_image_proto)
 #endif
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+   __HEAD
+   .code32
+/*
+ * Write an IDT entry into boot32_idt
+ *
+ * Parameters:
+ *
+ * %eax:   Handler address
+ * %edx:   Vector number
+ *
+ * Physical offset is expected in %ebp
+ */
+SYM_FUNC_START(startup32_set_idt_entry)
+   push%ebx
+   push%ecx
+
+   /* IDT entry address to %ebx */
+   lealrva(boot32_idt)(%ebp), %ebx
+   shl $3, %edx
+   addl%edx, %ebx
+
+   /* Build IDT entry, lower 4 bytes */
+   movl%eax, %edx
+   andl$0x, %edx   # Target code segment offset [15:0]
+   movl$__KERNEL32_CS, %ecx# Target code segment selector
+   shl $16, %ecx
+   orl %ecx, %edx
+
+   /* Store lower 4 bytes to IDT */
+   movl%edx, (%ebx)
+
+   /* Build IDT entry, upper 4 bytes */
+   movl%eax, %edx
+   andl$0x, %edx   # Target code segment offset [31:16]
+   orl $0x8e00, %edx   # Present, Type 32-bit Interrupt Gate
+
+   /* Store upper 4 bytes to IDT */
+   movl%edx, 4(%ebx)
+
+   pop %ecx
+   pop %ebx
+   ret
+SYM_FUNC_END(startup32_set_idt_entry)
+#endif
+
+SYM_FUNC_START(startup32_load_idt)
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+   /* Load IDT */
+   lealrva(boot32_idt)(%ebp), %eax
+   movl%eax, rva(boot32_idt_desc+2)(%ebp)
+   lidtrva(boot32_idt_desc)(%ebp)
+#endif
+   ret
+SYM_FUNC_END(startup32_load_idt)
+
 /*
  * Stack and heap for uncompression
  */


[tip: x86/seves] x86/boot/compressed/64: Add 32-bit boot #VC handler

2021-03-18 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 9e373ba233b236a831d0d9578be095a4b7435abe
Gitweb:
https://git.kernel.org/tip/9e373ba233b236a831d0d9578be095a4b7435abe
Author:Joerg Roedel 
AuthorDate:Wed, 10 Mar 2021 09:43:22 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 18 Mar 2021 16:44:48 +01:00

x86/boot/compressed/64: Add 32-bit boot #VC handler

Add a #VC exception handler which is used when the kernel still executes
in protected mode. This boot-path already uses CPUID, which will cause

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20210312123824.306-6-j...@8bytes.org
---
 arch/x86/boot/compressed/head_64.S |  6 ++-
 arch/x86/boot/compressed/mem_encrypt.S | 96 -
 2 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index 2001c3b..ee448ae 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "pgtable.h"
 
 /*
@@ -857,6 +858,11 @@ SYM_FUNC_END(startup32_set_idt_entry)
 
 SYM_FUNC_START(startup32_load_idt)
 #ifdef CONFIG_AMD_MEM_ENCRYPT
+   /* #VC handler */
+   lealrva(startup32_vc_handler)(%ebp), %eax
+   movl$X86_TRAP_VC, %edx
+   callstartup32_set_idt_entry
+
/* Load IDT */
lealrva(boot32_idt)(%ebp), %eax
movl%eax, rva(boot32_idt_desc+2)(%ebp)
diff --git a/arch/x86/boot/compressed/mem_encrypt.S 
b/arch/x86/boot/compressed/mem_encrypt.S
index a6dea4e..ebc4a29 100644
--- a/arch/x86/boot/compressed/mem_encrypt.S
+++ b/arch/x86/boot/compressed/mem_encrypt.S
@@ -61,10 +61,104 @@ SYM_FUNC_START(get_sev_encryption_bit)
ret
 SYM_FUNC_END(get_sev_encryption_bit)
 
+/**
+ * sev_es_req_cpuid - Request a CPUID value from the Hypervisor using
+ *   the GHCB MSR protocol
+ *
+ * @%eax:  Register to request (0=EAX, 1=EBX, 2=ECX, 3=EDX)
+ * @%edx:  CPUID Function
+ *
+ * Returns 0 in %eax on success, non-zero on failure
+ * %edx returns CPUID value on success
+ */
+SYM_CODE_START_LOCAL(sev_es_req_cpuid)
+   shll$30, %eax
+   orl $0x0004, %eax
+   movl$MSR_AMD64_SEV_ES_GHCB, %ecx
+   wrmsr
+   rep; vmmcall# VMGEXIT
+   rdmsr
+
+   /* Check response */
+   movl%eax, %ecx
+   andl$0x3000, %ecx   # Bits [12-29] MBZ
+   jnz 2f
+
+   /* Check return code */
+   andl$0xfff, %eax
+   cmpl$5, %eax
+   jne 2f
+
+   /* All good - return success */
+   xorl%eax, %eax
+1:
+   ret
+2:
+   movl$-1, %eax
+   jmp 1b
+SYM_CODE_END(sev_es_req_cpuid)
+
+SYM_CODE_START(startup32_vc_handler)
+   pushl   %eax
+   pushl   %ebx
+   pushl   %ecx
+   pushl   %edx
+
+   /* Keep CPUID function in %ebx */
+   movl%eax, %ebx
+
+   /* Check if error-code == SVM_EXIT_CPUID */
+   cmpl$0x72, 16(%esp)
+   jne .Lfail
+
+   movl$0, %eax# Request CPUID[fn].EAX
+   movl%ebx, %edx  # CPUID fn
+   callsev_es_req_cpuid# Call helper
+   testl   %eax, %eax  # Check return code
+   jnz .Lfail
+   movl%edx, 12(%esp)  # Store result
+
+   movl$1, %eax# Request CPUID[fn].EBX
+   movl%ebx, %edx  # CPUID fn
+   callsev_es_req_cpuid# Call helper
+   testl   %eax, %eax  # Check return code
+   jnz .Lfail
+   movl%edx, 8(%esp)   # Store result
+
+   movl$2, %eax# Request CPUID[fn].ECX
+   movl%ebx, %edx  # CPUID fn
+   callsev_es_req_cpuid# Call helper
+   testl   %eax, %eax  # Check return code
+   jnz .Lfail
+   movl%edx, 4(%esp)   # Store result
+
+   movl$3, %eax# Request CPUID[fn].EDX
+   movl%ebx, %edx  # CPUID fn
+   callsev_es_req_cpuid# Call helper
+   testl   %eax, %eax  # Check return code
+   jnz .Lfail
+   movl%edx, 0(%esp)   # Store result
+
+   popl%edx
+   popl%ecx
+   popl%ebx
+   popl%eax
+
+   /* Remove error code */
+   addl$4, %esp
+
+   /* Jump over CPUID instruction */
+   addl$2, (%esp)
+
+   iret
+.Lfail:
+   hlt
+   jmp .Lfail
+SYM_CODE_END(startup32_vc_handler)
+
.code64
 
 #include "../../kernel/sev_verify_cbit.S"
-
 SYM_FUNC_START(set_sev_encryption_mask)
 #ifdef CONFIG_AMD_MEM_ENCRYPT
push%rbp


[tip: x86/seves] x86/boot/compressed/64: Reload CS in startup_32

2021-03-18 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 0c289ff81c24033777fab23019039f11e1449ba4
Gitweb:
https://git.kernel.org/tip/0c289ff81c24033777fab23019039f11e1449ba4
Author:Joerg Roedel 
AuthorDate:Wed, 10 Mar 2021 09:43:20 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 18 Mar 2021 16:44:43 +01:00

x86/boot/compressed/64: Reload CS in startup_32

Exception handling in the startup_32 boot path requires the CS
selector to be correctly set up. Reload it from the current GDT.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20210312123824.306-4-j...@8bytes.org
---
 arch/x86/boot/compressed/head_64.S |  9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index e94874f..c59c80c 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -107,9 +107,16 @@ SYM_FUNC_START(startup_32)
movl%eax, %gs
movl%eax, %ss
 
-/* setup a stack and make sure cpu supports long mode. */
+   /* Setup a stack and load CS from current GDT */
lealrva(boot_stack_end)(%ebp), %esp
 
+   pushl   $__KERNEL32_CS
+   lealrva(1f)(%ebp), %eax
+   pushl   %eax
+   lretl
+1:
+
+   /* Make sure cpu supports long mode. */
callverify_cpu
testl   %eax, %eax
jnz .Lno_longmode


[tip: x86/seves] x86/boot/compressed/64: Check SEV encryption in the 32-bit boot-path

2021-03-18 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 769eb023aa77062cf15c2a179fc8d13b43422c9b
Gitweb:
https://git.kernel.org/tip/769eb023aa77062cf15c2a179fc8d13b43422c9b
Author:Joerg Roedel 
AuthorDate:Fri, 12 Mar 2021 13:38:23 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 18 Mar 2021 16:44:54 +01:00

x86/boot/compressed/64: Check SEV encryption in the 32-bit boot-path

Check whether the hypervisor reported the correct C-bit when running
as an SEV guest. Using a wrong C-bit position could be used to leak
sensitive data from the guest to the hypervisor.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20210312123824.306-8-j...@8bytes.org
---
 arch/x86/boot/compressed/head_64.S | 83 +-
 1 file changed, 83 insertions(+)

diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index ee448ae..91ea0d5 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -183,11 +183,21 @@ SYM_FUNC_START(startup_32)
 */
callget_sev_encryption_bit
xorl%edx, %edx
+#ifdef CONFIG_AMD_MEM_ENCRYPT
testl   %eax, %eax
jz  1f
subl$32, %eax   /* Encryption bit is always above bit 31 */
bts %eax, %edx  /* Set encryption mask for page tables */
+   /*
+* Mark SEV as active in sev_status so that startup32_check_sev_cbit()
+* will do a check. The sev_status memory will be fully initialized
+* with the contents of MSR_AMD_SEV_STATUS later in
+* set_sev_encryption_mask(). For now it is sufficient to know that SEV
+* is active.
+*/
+   movl$1, rva(sev_status)(%ebp)
 1:
+#endif
 
/* Initialize Page tables to 0 */
lealrva(pgtable)(%ebx), %edi
@@ -272,6 +282,9 @@ SYM_FUNC_START(startup_32)
movl%esi, %edx
 1:
 #endif
+   /* Check if the C-bit position is correct when SEV is active */
+   callstartup32_check_sev_cbit
+
pushl   $__KERNEL_CS
pushl   %eax
 
@@ -872,6 +885,76 @@ SYM_FUNC_START(startup32_load_idt)
 SYM_FUNC_END(startup32_load_idt)
 
 /*
+ * Check for the correct C-bit position when the startup_32 boot-path is used.
+ *
+ * The check makes use of the fact that all memory is encrypted when paging is
+ * disabled. The function creates 64 bits of random data using the RDRAND
+ * instruction. RDRAND is mandatory for SEV guests, so always available. If the
+ * hypervisor violates that the kernel will crash right here.
+ *
+ * The 64 bits of random data are stored to a memory location and at the same
+ * time kept in the %eax and %ebx registers. Since encryption is always active
+ * when paging is off the random data will be stored encrypted in main memory.
+ *
+ * Then paging is enabled. When the C-bit position is correct all memory is
+ * still mapped encrypted and comparing the register values with memory will
+ * succeed. An incorrect C-bit position will map all memory unencrypted, so 
that
+ * the compare will use the encrypted random data and fail.
+ */
+SYM_FUNC_START(startup32_check_sev_cbit)
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+   pushl   %eax
+   pushl   %ebx
+   pushl   %ecx
+   pushl   %edx
+
+   /* Check for non-zero sev_status */
+   movlrva(sev_status)(%ebp), %eax
+   testl   %eax, %eax
+   jz  4f
+
+   /*
+* Get two 32-bit random values - Don't bail out if RDRAND fails
+* because it is better to prevent forward progress if no random value
+* can be gathered.
+*/
+1: rdrand  %eax
+   jnc 1b
+2: rdrand  %ebx
+   jnc 2b
+
+   /* Store to memory and keep it in the registers */
+   movl%eax, rva(sev_check_data)(%ebp)
+   movl%ebx, rva(sev_check_data+4)(%ebp)
+
+   /* Enable paging to see if encryption is active */
+   movl%cr0, %edx   /* Backup %cr0 in %edx */
+   movl$(X86_CR0_PG | X86_CR0_PE), %ecx /* Enable Paging and Protected 
mode */
+   movl%ecx, %cr0
+
+   cmpl%eax, rva(sev_check_data)(%ebp)
+   jne 3f
+   cmpl%ebx, rva(sev_check_data+4)(%ebp)
+   jne 3f
+
+   movl%edx, %cr0  /* Restore previous %cr0 */
+
+   jmp 4f
+
+3: /* Check failed - hlt the machine */
+   hlt
+   jmp 3b
+
+4:
+   popl%edx
+   popl%ecx
+   popl%ebx
+   popl%eax
+#endif
+   ret
+SYM_FUNC_END(startup32_check_sev_cbit)
+
+/*
  * Stack and heap for uncompression
  */
.bss


[tip: x86/seves] x86/sev-es: Replace open-coded hlt-loops with sev_es_terminate()

2021-03-18 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 4fbe2c3b9dd04f44608d710ad2ae83d7f1c04182
Gitweb:
https://git.kernel.org/tip/4fbe2c3b9dd04f44608d710ad2ae83d7f1c04182
Author:Joerg Roedel 
AuthorDate:Fri, 12 Mar 2021 13:38:24 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 18 Mar 2021 16:44:59 +01:00

x86/sev-es: Replace open-coded hlt-loops with sev_es_terminate()

There are a few places left in the SEV-ES C code where hlt loops and/or
terminate requests are implemented. Replace them all with calls to
sev_es_terminate().

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20210312123824.306-9-j...@8bytes.org
---
 arch/x86/boot/compressed/sev-es.c | 12 +++-
 arch/x86/kernel/sev-es-shared.c   | 10 +++---
 2 files changed, 6 insertions(+), 16 deletions(-)

diff --git a/arch/x86/boot/compressed/sev-es.c 
b/arch/x86/boot/compressed/sev-es.c
index 27826c2..d904bd5 100644
--- a/arch/x86/boot/compressed/sev-es.c
+++ b/arch/x86/boot/compressed/sev-es.c
@@ -200,14 +200,8 @@ void do_boot_stage2_vc(struct pt_regs *regs, unsigned long 
exit_code)
}
 
 finish:
-   if (result == ES_OK) {
+   if (result == ES_OK)
vc_finish_insn(&ctxt);
-   } else if (result != ES_RETRY) {
-   /*
-* For now, just halt the machine. That makes debugging easier,
-* later we just call sev_es_terminate() here.
-*/
-   while (true)
-   asm volatile("hlt\n");
-   }
+   else if (result != ES_RETRY)
+   sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
 }
diff --git a/arch/x86/kernel/sev-es-shared.c b/arch/x86/kernel/sev-es-shared.c
index 387b716..0aa9f13 100644
--- a/arch/x86/kernel/sev-es-shared.c
+++ b/arch/x86/kernel/sev-es-shared.c
@@ -24,7 +24,7 @@ static bool __init sev_es_check_cpu_features(void)
return true;
 }
 
-static void sev_es_terminate(unsigned int reason)
+static void __noreturn sev_es_terminate(unsigned int reason)
 {
u64 val = GHCB_SEV_TERMINATE;
 
@@ -206,12 +206,8 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned 
long exit_code)
return;
 
 fail:
-   sev_es_wr_ghcb_msr(GHCB_SEV_TERMINATE);
-   VMGEXIT();
-
-   /* Shouldn't get here - if we do halt the machine */
-   while (true)
-   asm volatile("hlt\n");
+   /* Terminate the guest */
+   sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
 }
 
 static enum es_result vc_insn_string_read(struct es_em_ctxt *ctxt,


[tip: x86/urgent] x86/sev-es: Introduce ip_within_syscall_gap() helper

2021-03-09 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 78a81d88f60ba773cbe890205e1ee67f00502948
Gitweb:
https://git.kernel.org/tip/78a81d88f60ba773cbe890205e1ee67f00502948
Author:Joerg Roedel 
AuthorDate:Wed, 03 Mar 2021 15:17:12 +01:00
Committer: Borislav Petkov 
CommitterDate: Mon, 08 Mar 2021 14:22:17 +01:00

x86/sev-es: Introduce ip_within_syscall_gap() helper

Introduce a helper to check whether an exception came from the syscall
gap and use it in the SEV-ES code. Extend the check to also cover the
compatibility SYSCALL entry path.

Fixes: 315562c9af3d5 ("x86/sev-es: Adjust #VC IST Stack on entering NMI 
handler")
Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Cc: sta...@vger.kernel.org # 5.10+
Link: https://lkml.kernel.org/r/20210303141716.29223-2-j...@8bytes.org
---
 arch/x86/entry/entry_64_compat.S |  2 ++
 arch/x86/include/asm/proto.h |  1 +
 arch/x86/include/asm/ptrace.h| 15 +++
 arch/x86/kernel/traps.c  |  3 +--
 4 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 541fdaf..0051cf5 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -210,6 +210,8 @@ SYM_CODE_START(entry_SYSCALL_compat)
/* Switch to the kernel stack */
movqPER_CPU_VAR(cpu_current_top_of_stack), %rsp
 
+SYM_INNER_LABEL(entry_SYSCALL_compat_safe_stack, SYM_L_GLOBAL)
+
/* Construct struct pt_regs on stack */
pushq   $__USER32_DS/* pt_regs->ss */
pushq   %r8 /* pt_regs->sp */
diff --git a/arch/x86/include/asm/proto.h b/arch/x86/include/asm/proto.h
index 2c35f1c..b6a9d51 100644
--- a/arch/x86/include/asm/proto.h
+++ b/arch/x86/include/asm/proto.h
@@ -25,6 +25,7 @@ void __end_SYSENTER_singlestep_region(void);
 void entry_SYSENTER_compat(void);
 void __end_entry_SYSENTER_compat(void);
 void entry_SYSCALL_compat(void);
+void entry_SYSCALL_compat_safe_stack(void);
 void entry_INT80_compat(void);
 #ifdef CONFIG_XEN_PV
 void xen_entry_INT80_compat(void);
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index d8324a2..409f661 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -94,6 +94,8 @@ struct pt_regs {
 #include 
 #endif
 
+#include 
+
 struct cpuinfo_x86;
 struct task_struct;
 
@@ -175,6 +177,19 @@ static inline bool any_64bit_mode(struct pt_regs *regs)
 #ifdef CONFIG_X86_64
 #define current_user_stack_pointer()   current_pt_regs()->sp
 #define compat_user_stack_pointer()current_pt_regs()->sp
+
+static inline bool ip_within_syscall_gap(struct pt_regs *regs)
+{
+   bool ret = (regs->ip >= (unsigned long)entry_SYSCALL_64 &&
+   regs->ip <  (unsigned long)entry_SYSCALL_64_safe_stack);
+
+#ifdef CONFIG_IA32_EMULATION
+   ret = ret || (regs->ip >= (unsigned long)entry_SYSCALL_compat &&
+ regs->ip <  (unsigned 
long)entry_SYSCALL_compat_safe_stack);
+#endif
+
+   return ret;
+}
 #endif
 
 static inline unsigned long kernel_stack_pointer(struct pt_regs *regs)
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 7f5aec7..ac1874a 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -694,8 +694,7 @@ asmlinkage __visible noinstr struct pt_regs 
*vc_switch_off_ist(struct pt_regs *r
 * In the SYSCALL entry path the RSP value comes from user-space - don't
 * trust it and switch to the current kernel stack
 */
-   if (regs->ip >= (unsigned long)entry_SYSCALL_64 &&
-   regs->ip <  (unsigned long)entry_SYSCALL_64_safe_stack) {
+   if (ip_within_syscall_gap(regs)) {
sp = this_cpu_read(cpu_current_top_of_stack);
goto sync;
}


[tip: x86/urgent] x86/sev-es: Correctly track IRQ states in runtime #VC handler

2021-03-09 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 62441a1fb53263bda349b6e5997c3cc5c120d89e
Gitweb:
https://git.kernel.org/tip/62441a1fb53263bda349b6e5997c3cc5c120d89e
Author:Joerg Roedel 
AuthorDate:Wed, 03 Mar 2021 15:17:15 +01:00
Committer: Borislav Petkov 
CommitterDate: Tue, 09 Mar 2021 12:33:46 +01:00

x86/sev-es: Correctly track IRQ states in runtime #VC handler

Call irqentry_nmi_enter()/irqentry_nmi_exit() in the #VC handler to
correctly track the IRQ state during its execution.

Fixes: 0786138c78e79 ("x86/sev-es: Add a Runtime #VC Exception Handler")
Reported-by: Andy Lutomirski 
Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Cc: sta...@vger.kernel.org # v5.10+
Link: https://lkml.kernel.org/r/20210303141716.29223-5-j...@8bytes.org
---
 arch/x86/kernel/sev-es.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 301f20f..c3fd8fa 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -1258,13 +1258,12 @@ static __always_inline bool on_vc_fallback_stack(struct 
pt_regs *regs)
 DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
 {
struct sev_es_runtime_data *data = this_cpu_read(runtime_data);
+   irqentry_state_t irq_state;
struct ghcb_state state;
struct es_em_ctxt ctxt;
enum es_result result;
struct ghcb *ghcb;
 
-   lockdep_assert_irqs_disabled();
-
/*
 * Handle #DB before calling into !noinstr code to avoid recursive #DB.
 */
@@ -1273,6 +1272,8 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
return;
}
 
+   irq_state = irqentry_nmi_enter(regs);
+   lockdep_assert_irqs_disabled();
instrumentation_begin();
 
/*
@@ -1335,6 +1336,7 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
 
 out:
instrumentation_end();
+   irqentry_nmi_exit(regs, irq_state);
 
return;
 


[tip: x86/urgent] x86/sev-es: Use __copy_from_user_inatomic()

2021-03-09 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: bffe30dd9f1f3b2608a87ac909a224d6be472485
Gitweb:
https://git.kernel.org/tip/bffe30dd9f1f3b2608a87ac909a224d6be472485
Author:Joerg Roedel 
AuthorDate:Wed, 03 Mar 2021 15:17:16 +01:00
Committer: Borislav Petkov 
CommitterDate: Tue, 09 Mar 2021 12:37:54 +01:00

x86/sev-es: Use __copy_from_user_inatomic()

The #VC handler must run in atomic context and cannot sleep. This is a
problem when it tries to fetch instruction bytes from user-space via
copy_from_user().

Introduce a insn_fetch_from_user_inatomic() helper which uses
__copy_from_user_inatomic() to safely copy the instruction bytes to
kernel memory in the #VC handler.

Fixes: 5e3427a7bc432 ("x86/sev-es: Handle instruction fetches from user-space")
Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Cc: sta...@vger.kernel.org # v5.10+
Link: https://lkml.kernel.org/r/20210303141716.29223-6-j...@8bytes.org
---
 arch/x86/include/asm/insn-eval.h |  2 +-
 arch/x86/kernel/sev-es.c |  2 +-
 arch/x86/lib/insn-eval.c | 66 ---
 3 files changed, 55 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index a0f839a..98b4dae 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -23,6 +23,8 @@ unsigned long insn_get_seg_base(struct pt_regs *regs, int 
seg_reg_idx);
 int insn_get_code_seg_params(struct pt_regs *regs);
 int insn_fetch_from_user(struct pt_regs *regs,
 unsigned char buf[MAX_INSN_SIZE]);
+int insn_fetch_from_user_inatomic(struct pt_regs *regs,
+ unsigned char buf[MAX_INSN_SIZE]);
 bool insn_decode(struct insn *insn, struct pt_regs *regs,
 unsigned char buf[MAX_INSN_SIZE], int buf_size);
 
diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index c3fd8fa..04a780a 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -258,7 +258,7 @@ static enum es_result vc_decode_insn(struct es_em_ctxt 
*ctxt)
int res;
 
if (user_mode(ctxt->regs)) {
-   res = insn_fetch_from_user(ctxt->regs, buffer);
+   res = insn_fetch_from_user_inatomic(ctxt->regs, buffer);
if (!res) {
ctxt->fi.vector = X86_TRAP_PF;
ctxt->fi.error_code = X86_PF_INSTR | X86_PF_USER;
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 4229950..bb0b3fe 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -1415,6 +1415,25 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
}
 }
 
+static unsigned long insn_get_effective_ip(struct pt_regs *regs)
+{
+   unsigned long seg_base = 0;
+
+   /*
+* If not in user-space long mode, a custom code segment could be in
+* use. This is true in protected mode (if the process defined a local
+* descriptor table), or virtual-8086 mode. In most of the cases
+* seg_base will be zero as in USER_CS.
+*/
+   if (!user_64bit_mode(regs)) {
+   seg_base = insn_get_seg_base(regs, INAT_SEG_REG_CS);
+   if (seg_base == -1L)
+   return 0;
+   }
+
+   return seg_base + regs->ip;
+}
+
 /**
  * insn_fetch_from_user() - Copy instruction bytes from user-space memory
  * @regs:  Structure with register values as seen when entering kernel mode
@@ -1431,24 +1450,43 @@ void __user *insn_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
  */
 int insn_fetch_from_user(struct pt_regs *regs, unsigned char 
buf[MAX_INSN_SIZE])
 {
-   unsigned long seg_base = 0;
+   unsigned long ip;
int not_copied;
 
-   /*
-* If not in user-space long mode, a custom code segment could be in
-* use. This is true in protected mode (if the process defined a local
-* descriptor table), or virtual-8086 mode. In most of the cases
-* seg_base will be zero as in USER_CS.
-*/
-   if (!user_64bit_mode(regs)) {
-   seg_base = insn_get_seg_base(regs, INAT_SEG_REG_CS);
-   if (seg_base == -1L)
-   return 0;
-   }
+   ip = insn_get_effective_ip(regs);
+   if (!ip)
+   return 0;
+
+   not_copied = copy_from_user(buf, (void __user *)ip, MAX_INSN_SIZE);
 
+   return MAX_INSN_SIZE - not_copied;
+}
+
+/**
+ * insn_fetch_from_user_inatomic() - Copy instruction bytes from user-space 
memory
+ *   while in atomic code
+ * @regs:  Structure with register values as seen when entering kernel mode
+ * @buf:   Array to store the fetched instruction
+ *
+ * Gets the linear address of the instruction and copies the instruction bytes
+ * to the buf. This function must be used in atomic context.
+ *
+ * Returns:
+ *
+ * Number of i

[tip: x86/urgent] x86/sev-es: Check regs->sp is trusted before adjusting #VC IST stack

2021-03-09 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 545ac14c16b5dbd909d5a90ddf5b5a629a40fa94
Gitweb:
https://git.kernel.org/tip/545ac14c16b5dbd909d5a90ddf5b5a629a40fa94
Author:Joerg Roedel 
AuthorDate:Wed, 03 Mar 2021 15:17:13 +01:00
Committer: Borislav Petkov 
CommitterDate: Tue, 09 Mar 2021 12:26:26 +01:00

x86/sev-es: Check regs->sp is trusted before adjusting #VC IST stack

The code in the NMI handler to adjust the #VC handler IST stack is
needed in case an NMI hits when the #VC handler is still using its IST
stack.

But the check for this condition also needs to look if the regs->sp
value is trusted, meaning it was not set by user-space. Extend the check
to not use regs->sp when the NMI interrupted user-space code or the
SYSCALL gap.

Fixes: 315562c9af3d5 ("x86/sev-es: Adjust #VC IST Stack on entering NMI 
handler")
Reported-by: Andy Lutomirski 
Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Cc: sta...@vger.kernel.org # 5.10+
Link: https://lkml.kernel.org/r/20210303141716.29223-3-j...@8bytes.org
---
 arch/x86/kernel/sev-es.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 84c1821..301f20f 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -121,8 +121,18 @@ static void __init setup_vc_stacks(int cpu)
cea_set_pte((void *)vaddr, pa, PAGE_KERNEL);
 }
 
-static __always_inline bool on_vc_stack(unsigned long sp)
+static __always_inline bool on_vc_stack(struct pt_regs *regs)
 {
+   unsigned long sp = regs->sp;
+
+   /* User-mode RSP is not trusted */
+   if (user_mode(regs))
+   return false;
+
+   /* SYSCALL gap still has user-mode RSP */
+   if (ip_within_syscall_gap(regs))
+   return false;
+
return ((sp >= __this_cpu_ist_bottom_va(VC)) && (sp < 
__this_cpu_ist_top_va(VC)));
 }
 
@@ -144,7 +154,7 @@ void noinstr __sev_es_ist_enter(struct pt_regs *regs)
old_ist = __this_cpu_read(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC]);
 
/* Make room on the IST stack */
-   if (on_vc_stack(regs->sp))
+   if (on_vc_stack(regs))
new_ist = ALIGN_DOWN(regs->sp, 8) - sizeof(old_ist);
else
new_ist = old_ist - sizeof(old_ist);


[tip: x86/seves] x86/head/64: Check SEV encryption before switching to kernel page-table

2020-10-29 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: c9f09539e16e281f92a27760fdfae71e8af036f6
Gitweb:
https://git.kernel.org/tip/c9f09539e16e281f92a27760fdfae71e8af036f6
Author:Joerg Roedel 
AuthorDate:Wed, 28 Oct 2020 17:46:58 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 29 Oct 2020 18:09:59 +01:00

x86/head/64: Check SEV encryption before switching to kernel page-table

When SEV is enabled, the kernel requests the C-bit position again from
the hypervisor to build its own page-table. Since the hypervisor is an
untrusted source, the C-bit position needs to be verified before the
kernel page-table is used.

Call sev_verify_cbit() before writing the CR3.

 [ bp: Massage. ]

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Reviewed-by: Tom Lendacky 
Link: https://lkml.kernel.org/r/20201028164659.27002-5-j...@8bytes.org
---
 arch/x86/kernel/head_64.S | 16 
 arch/x86/mm/mem_encrypt.c |  1 +
 2 files changed, 17 insertions(+)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 7eb2a1c..3c41773 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -161,6 +161,21 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, 
SYM_L_GLOBAL)
 
/* Setup early boot stage 4-/5-level pagetables. */
addqphys_base(%rip), %rax
+
+   /*
+* For SEV guests: Verify that the C-bit is correct. A malicious
+* hypervisor could lie about the C-bit position to perform a ROP
+* attack on the guest by writing to the unencrypted stack and wait for
+* the next RET instruction.
+* %rsi carries pointer to realmode data and is callee-clobbered. Save
+* and restore it.
+*/
+   pushq   %rsi
+   movq%rax, %rdi
+   callsev_verify_cbit
+   popq%rsi
+
+   /* Switch to new page-table */
movq%rax, %cr3
 
/* Ensure I am executing from virtual addresses */
@@ -279,6 +294,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, 
SYM_L_GLOBAL)
 SYM_CODE_END(secondary_startup_64)
 
 #include "verify_cpu.S"
+#include "sev_verify_cbit.S"
 
 #ifdef CONFIG_HOTPLUG_CPU
 /*
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index efbb3de..bc08337 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -39,6 +39,7 @@
  */
 u64 sme_me_mask __section(".data") = 0;
 u64 sev_status __section(".data") = 0;
+u64 sev_check_data __section(".data") = 0;
 EXPORT_SYMBOL(sme_me_mask);
 DEFINE_STATIC_KEY_FALSE(sev_enable_key);
 EXPORT_SYMBOL_GPL(sev_enable_key);


[tip: x86/seves] x86/sev-es: Do not support MMIO to/from encrypted memory

2020-10-29 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 2411cd82112397bfb9d8f0f19cd46c3d71e0ce67
Gitweb:
https://git.kernel.org/tip/2411cd82112397bfb9d8f0f19cd46c3d71e0ce67
Author:Joerg Roedel 
AuthorDate:Wed, 28 Oct 2020 17:46:59 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 29 Oct 2020 19:27:42 +01:00

x86/sev-es: Do not support MMIO to/from encrypted memory

MMIO memory is usually not mapped encrypted, so there is no reason to
support emulated MMIO when it is mapped encrypted.

Prevent a possible hypervisor attack where a RAM page is mapped as
an MMIO page in the nested page-table, so that any guest access to it
will trigger a #VC exception and leak the data on that page to the
hypervisor via the GHCB (like with valid MMIO). On the read side this
attack would allow the HV to inject data into the guest.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Reviewed-by: Tom Lendacky 
Link: https://lkml.kernel.org/r/20201028164659.27002-6-j...@8bytes.org
---
 arch/x86/kernel/sev-es.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 4a96726..0bd1a0f 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -374,8 +374,8 @@ fault:
return ES_EXCEPTION;
 }
 
-static bool vc_slow_virt_to_phys(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
-unsigned long vaddr, phys_addr_t *paddr)
+static enum es_result vc_slow_virt_to_phys(struct ghcb *ghcb, struct 
es_em_ctxt *ctxt,
+  unsigned long vaddr, phys_addr_t 
*paddr)
 {
unsigned long va = (unsigned long)vaddr;
unsigned int level;
@@ -394,15 +394,19 @@ static bool vc_slow_virt_to_phys(struct ghcb *ghcb, 
struct es_em_ctxt *ctxt,
if (user_mode(ctxt->regs))
ctxt->fi.error_code |= X86_PF_USER;
 
-   return false;
+   return ES_EXCEPTION;
}
 
+   if (WARN_ON_ONCE(pte_val(*pte) & _PAGE_ENC))
+   /* Emulated MMIO to/from encrypted memory not supported */
+   return ES_UNSUPPORTED;
+
pa = (phys_addr_t)pte_pfn(*pte) << PAGE_SHIFT;
pa |= va & ~page_level_mask(level);
 
*paddr = pa;
 
-   return true;
+   return ES_OK;
 }
 
 /* Include code shared with pre-decompression boot stage */
@@ -731,6 +735,7 @@ static enum es_result vc_do_mmio(struct ghcb *ghcb, struct 
es_em_ctxt *ctxt,
 {
u64 exit_code, exit_info_1, exit_info_2;
unsigned long ghcb_pa = __pa(ghcb);
+   enum es_result res;
phys_addr_t paddr;
void __user *ref;
 
@@ -740,11 +745,12 @@ static enum es_result vc_do_mmio(struct ghcb *ghcb, 
struct es_em_ctxt *ctxt,
 
exit_code = read ? SVM_VMGEXIT_MMIO_READ : SVM_VMGEXIT_MMIO_WRITE;
 
-   if (!vc_slow_virt_to_phys(ghcb, ctxt, (unsigned long)ref, &paddr)) {
-   if (!read)
+   res = vc_slow_virt_to_phys(ghcb, ctxt, (unsigned long)ref, &paddr);
+   if (res != ES_OK) {
+   if (res == ES_EXCEPTION && !read)
ctxt->fi.error_code |= X86_PF_WRITE;
 
-   return ES_EXCEPTION;
+   return res;
}
 
exit_info_1 = paddr;


[tip: x86/seves] x86/boot/compressed/64: Sanity-check CPUID results in the early #VC handler

2020-10-29 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: ed7b895f3efb5df184722f5a30f8164fcaffceb1
Gitweb:
https://git.kernel.org/tip/ed7b895f3efb5df184722f5a30f8164fcaffceb1
Author:Joerg Roedel 
AuthorDate:Wed, 28 Oct 2020 17:46:56 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 29 Oct 2020 13:48:49 +01:00

x86/boot/compressed/64: Sanity-check CPUID results in the early #VC handler

The early #VC handler which doesn't have a GHCB can only handle CPUID
exit codes. It is needed by the early boot code to handle #VC exceptions
raised in verify_cpu() and to get the position of the C-bit.

But the CPUID information comes from the hypervisor which is untrusted
and might return results which trick the guest into the no-SEV boot path
with no C-bit set in the page-tables. All data written to memory would
then be unencrypted and could leak sensitive data to the hypervisor.

Add sanity checks to the early #VC handler to make sure the hypervisor
can not pretend that SEV is disabled.

 [ bp: Massage a bit. ]

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Reviewed-by: Tom Lendacky 
Link: https://lkml.kernel.org/r/20201028164659.27002-3-j...@8bytes.org
---
 arch/x86/kernel/sev-es-shared.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/arch/x86/kernel/sev-es-shared.c b/arch/x86/kernel/sev-es-shared.c
index 5f83cca..7d04b35 100644
--- a/arch/x86/kernel/sev-es-shared.c
+++ b/arch/x86/kernel/sev-es-shared.c
@@ -178,6 +178,32 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned 
long exit_code)
goto fail;
regs->dx = val >> 32;
 
+   /*
+* This is a VC handler and the #VC is only raised when SEV-ES is
+* active, which means SEV must be active too. Do sanity checks on the
+* CPUID results to make sure the hypervisor does not trick the kernel
+* into the no-sev path. This could map sensitive data unencrypted and
+* make it accessible to the hypervisor.
+*
+* In particular, check for:
+*  - Hypervisor CPUID bit
+*  - Availability of CPUID leaf 0x801f
+*  - SEV CPUID bit.
+*
+* The hypervisor might still report the wrong C-bit position, but this
+* can't be checked here.
+*/
+
+   if ((fn == 1 && !(regs->cx & BIT(31
+   /* Hypervisor bit */
+   goto fail;
+   else if (fn == 0x8000 && (regs->ax < 0x801f))
+   /* SEV leaf check */
+   goto fail;
+   else if ((fn == 0x801f && !(regs->ax & BIT(1
+   /* SEV bit */
+   goto fail;
+
/* Skip over the CPUID two-byte opcode */
regs->ip += 2;
 


[tip: x86/seves] x86/boot/compressed/64: Check SEV encryption in 64-bit boot-path

2020-10-29 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 86ce43f7dde81562f58b24b426cef068bd9f7595
Gitweb:
https://git.kernel.org/tip/86ce43f7dde81562f58b24b426cef068bd9f7595
Author:Joerg Roedel 
AuthorDate:Wed, 28 Oct 2020 17:46:57 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 29 Oct 2020 18:06:52 +01:00

x86/boot/compressed/64: Check SEV encryption in 64-bit boot-path

Check whether the hypervisor reported the correct C-bit when running as
an SEV guest. Using a wrong C-bit position could be used to leak
sensitive data from the guest to the hypervisor.

The check function is in a separate file:

  arch/x86/kernel/sev_verify_cbit.S

so that it can be re-used in the running kernel image.

 [ bp: Massage. ]

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Reviewed-by: Tom Lendacky 
Link: https://lkml.kernel.org/r/20201028164659.27002-4-j...@8bytes.org
---
 arch/x86/boot/compressed/ident_map_64.c |  1 +-
 arch/x86/boot/compressed/mem_encrypt.S  |  4 +-
 arch/x86/boot/compressed/misc.h |  2 +-
 arch/x86/kernel/sev_verify_cbit.S   | 89 -
 4 files changed, 96 insertions(+)
 create mode 100644 arch/x86/kernel/sev_verify_cbit.S

diff --git a/arch/x86/boot/compressed/ident_map_64.c 
b/arch/x86/boot/compressed/ident_map_64.c
index a5e5db6..39b2ede 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -164,6 +164,7 @@ void initialize_identity_maps(void *rmode)
add_identity_map(cmdline, cmdline + COMMAND_LINE_SIZE);
 
/* Load the new page-table. */
+   sev_verify_cbit(top_level_pgt);
write_cr3(top_level_pgt);
 }
 
diff --git a/arch/x86/boot/compressed/mem_encrypt.S 
b/arch/x86/boot/compressed/mem_encrypt.S
index 3092ae1..aa56179 100644
--- a/arch/x86/boot/compressed/mem_encrypt.S
+++ b/arch/x86/boot/compressed/mem_encrypt.S
@@ -68,6 +68,9 @@ SYM_FUNC_START(get_sev_encryption_bit)
 SYM_FUNC_END(get_sev_encryption_bit)
 
.code64
+
+#include "../../kernel/sev_verify_cbit.S"
+
 SYM_FUNC_START(set_sev_encryption_mask)
 #ifdef CONFIG_AMD_MEM_ENCRYPT
push%rbp
@@ -111,4 +114,5 @@ SYM_FUNC_END(set_sev_encryption_mask)
.balign 8
 SYM_DATA(sme_me_mask,  .quad 0)
 SYM_DATA(sev_status,   .quad 0)
+SYM_DATA(sev_check_data,   .quad 0)
 #endif
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 6d31f1b..d9a631c 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -159,4 +159,6 @@ void boot_page_fault(void);
 void boot_stage1_vc(void);
 void boot_stage2_vc(void);
 
+unsigned long sev_verify_cbit(unsigned long cr3);
+
 #endif /* BOOT_COMPRESSED_MISC_H */
diff --git a/arch/x86/kernel/sev_verify_cbit.S 
b/arch/x86/kernel/sev_verify_cbit.S
new file mode 100644
index 000..ee04941
--- /dev/null
+++ b/arch/x86/kernel/sev_verify_cbit.S
@@ -0,0 +1,89 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * sev_verify_cbit.S - Code for verification of the C-bit position reported
+ * by the Hypervisor when running with SEV enabled.
+ *
+ * Copyright (c) 2020  Joerg Roedel (jroe...@suse.de)
+ *
+ * sev_verify_cbit() is called before switching to a new long-mode page-table
+ * at boot.
+ *
+ * Verify that the C-bit position is correct by writing a random value to
+ * an encrypted memory location while on the current page-table. Then it
+ * switches to the new page-table to verify the memory content is still the
+ * same. After that it switches back to the current page-table and when the
+ * check succeeded it returns. If the check failed the code invalidates the
+ * stack pointer and goes into a hlt loop. The stack-pointer is invalidated to
+ * make sure no interrupt or exception can get the CPU out of the hlt loop.
+ *
+ * New page-table pointer is expected in %rdi (first parameter)
+ *
+ */
+SYM_FUNC_START(sev_verify_cbit)
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+   /* First check if a C-bit was detected */
+   movqsme_me_mask(%rip), %rsi
+   testq   %rsi, %rsi
+   jz  3f
+
+   /* sme_me_mask != 0 could mean SME or SEV - Check also for SEV */
+   movqsev_status(%rip), %rsi
+   testq   %rsi, %rsi
+   jz  3f
+
+   /* Save CR4 in %rsi */
+   movq%cr4, %rsi
+
+   /* Disable Global Pages */
+   movq%rsi, %rdx
+   andq$(~X86_CR4_PGE), %rdx
+   movq%rdx, %cr4
+
+   /*
+* Verified that running under SEV - now get a random value using
+* RDRAND. This instruction is mandatory when running as an SEV guest.
+*
+* Don't bail out of the loop if RDRAND returns errors. It is better to
+* prevent forward progress than to work with a non-random value here.
+*/
+1: rdrand  %rdx
+   jnc 1b
+
+   /* Store value to memory and keep it in %rdx */
+   movq%rdx, sev_check_data(%rip)
+
+   

[tip: x86/seves] x86/boot/compressed/64: Introduce sev_status

2020-10-29 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 3ad84246a4097010f3ae3d6944120c0be00e9e7a
Gitweb:
https://git.kernel.org/tip/3ad84246a4097010f3ae3d6944120c0be00e9e7a
Author:Joerg Roedel 
AuthorDate:Wed, 28 Oct 2020 17:46:55 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 29 Oct 2020 10:54:36 +01:00

x86/boot/compressed/64: Introduce sev_status

Introduce sev_status and initialize it together with sme_me_mask to have
an indicator which SEV features are enabled.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Reviewed-by: Tom Lendacky 
Link: https://lkml.kernel.org/r/20201028164659.27002-2-j...@8bytes.org
---
 arch/x86/boot/compressed/mem_encrypt.S | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/mem_encrypt.S 
b/arch/x86/boot/compressed/mem_encrypt.S
index dd07e7b..3092ae1 100644
--- a/arch/x86/boot/compressed/mem_encrypt.S
+++ b/arch/x86/boot/compressed/mem_encrypt.S
@@ -81,6 +81,19 @@ SYM_FUNC_START(set_sev_encryption_mask)
 
bts %rax, sme_me_mask(%rip) /* Create the encryption mask */
 
+   /*
+* Read MSR_AMD64_SEV again and store it to sev_status. Can't do this in
+* get_sev_encryption_bit() because this function is 32-bit code and
+* shared between 64-bit and 32-bit boot path.
+*/
+   movl$MSR_AMD64_SEV, %ecx/* Read the SEV MSR */
+   rdmsr
+
+   /* Store MSR value in sev_status */
+   shlq$32, %rdx
+   orq %rdx, %rax
+   movq%rax, sev_status(%rip)
+
 .Lno_sev_mask:
movq%rbp, %rsp  /* Restore original stack pointer */
 
@@ -96,5 +109,6 @@ SYM_FUNC_END(set_sev_encryption_mask)
 
 #ifdef CONFIG_AMD_MEM_ENCRYPT
.balign 8
-SYM_DATA(sme_me_mask, .quad 0)
+SYM_DATA(sme_me_mask,  .quad 0)
+SYM_DATA(sev_status,   .quad 0)
 #endif


[tip: x86/seves] x86/sev-es: Adjust #VC IST Stack on entering NMI handler

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 315562c9af3d583502b35c4b223a08d95ce69864
Gitweb:
https://git.kernel.org/tip/315562c9af3d583502b35c4b223a08d95ce69864
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:44 +02:00
Committer: Borislav Petkov 
CommitterDate: Wed, 09 Sep 2020 11:33:19 +02:00

x86/sev-es: Adjust #VC IST Stack on entering NMI handler

When an NMI hits in the #VC handler entry code before it has switched to
another stack, any subsequent #VC exception in the NMI code-path will
overwrite the interrupted #VC handler's stack.

Make sure this doesn't happen by explicitly adjusting the #VC IST entry
in the NMI handler for the time it can cause #VC exceptions.

 [ bp: Touchups, spelling fixes. ]

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-44-j...@8bytes.org
---
 arch/x86/include/asm/sev-es.h | 19 -
 arch/x86/kernel/nmi.c |  9 ++-
 arch/x86/kernel/sev-es.c  | 53 ++-
 3 files changed, 81 insertions(+)

diff --git a/arch/x86/include/asm/sev-es.h b/arch/x86/include/asm/sev-es.h
index 9fbeeda..59176e8 100644
--- a/arch/x86/include/asm/sev-es.h
+++ b/arch/x86/include/asm/sev-es.h
@@ -78,4 +78,23 @@ extern void vc_no_ghcb(void);
 extern void vc_boot_ghcb(void);
 extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+extern struct static_key_false sev_es_enable_key;
+extern void __sev_es_ist_enter(struct pt_regs *regs);
+extern void __sev_es_ist_exit(void);
+static __always_inline void sev_es_ist_enter(struct pt_regs *regs)
+{
+   if (static_branch_unlikely(&sev_es_enable_key))
+   __sev_es_ist_enter(regs);
+}
+static __always_inline void sev_es_ist_exit(void)
+{
+   if (static_branch_unlikely(&sev_es_enable_key))
+   __sev_es_ist_exit();
+}
+#else
+static inline void sev_es_ist_enter(struct pt_regs *regs) { }
+static inline void sev_es_ist_exit(void) { }
+#endif
+
 #endif
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 4fc9954..4c89c4d 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define CREATE_TRACE_POINTS
 #include 
@@ -488,6 +489,12 @@ DEFINE_IDTENTRY_RAW(exc_nmi)
this_cpu_write(nmi_cr2, read_cr2());
 nmi_restart:
 
+   /*
+* Needs to happen before DR7 is accessed, because the hypervisor can
+* intercept DR7 reads/writes, turning those into #VC exceptions.
+*/
+   sev_es_ist_enter(regs);
+
this_cpu_write(nmi_dr7, local_db_save());
 
irq_state = idtentry_enter_nmi(regs);
@@ -501,6 +508,8 @@ nmi_restart:
 
local_db_restore(this_cpu_read(nmi_dr7));
 
+   sev_es_ist_exit();
+
if (unlikely(this_cpu_read(nmi_cr2) != read_cr2()))
write_cr2(this_cpu_read(nmi_cr2));
if (this_cpu_dec_return(nmi_state))
diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index fae8145..39ebb2d 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -51,6 +51,7 @@ struct sev_es_runtime_data {
 };
 
 static DEFINE_PER_CPU(struct sev_es_runtime_data*, runtime_data);
+DEFINE_STATIC_KEY_FALSE(sev_es_enable_key);
 
 static void __init setup_vc_stacks(int cpu)
 {
@@ -73,6 +74,55 @@ static void __init setup_vc_stacks(int cpu)
cea_set_pte((void *)vaddr, pa, PAGE_KERNEL);
 }
 
+static __always_inline bool on_vc_stack(unsigned long sp)
+{
+   return ((sp >= __this_cpu_ist_bottom_va(VC)) && (sp < 
__this_cpu_ist_top_va(VC)));
+}
+
+/*
+ * This function handles the case when an NMI is raised in the #VC exception
+ * handler entry code. In this case, the IST entry for #VC must be adjusted, so
+ * that any subsequent #VC exception will not overwrite the stack contents of 
the
+ * interrupted #VC handler.
+ *
+ * The IST entry is adjusted unconditionally so that it can be also be
+ * unconditionally adjusted back in sev_es_ist_exit(). Otherwise a nested
+ * sev_es_ist_exit() call may adjust back the IST entry too early.
+ */
+void noinstr __sev_es_ist_enter(struct pt_regs *regs)
+{
+   unsigned long old_ist, new_ist;
+
+   /* Read old IST entry */
+   old_ist = __this_cpu_read(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC]);
+
+   /* Make room on the IST stack */
+   if (on_vc_stack(regs->sp))
+   new_ist = ALIGN_DOWN(regs->sp, 8) - sizeof(old_ist);
+   else
+   new_ist = old_ist - sizeof(old_ist);
+
+   /* Store old IST entry */
+   *(unsigned long *)new_ist = old_ist;
+
+   /* Set new IST entry */
+   this_cpu_write(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC], new_ist);
+}
+
+void noinstr __sev_es_ist_exit(void)
+{
+   unsigned long ist;
+
+   /* Read IST entry */
+   ist = __this_cpu_read(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC]);
+
+   if (WARN_ON(ist == __this_cpu_ist_top_va(VC))

[tip: x86/seves] x86/sev-es: Handle MMIO String Instructions

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 0118b604c2c94c6e34982015cfa7891af4764786
Gitweb:
https://git.kernel.org/tip/0118b604c2c94c6e34982015cfa7891af4764786
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:51 +02:00
Committer: Borislav Petkov 
CommitterDate: Wed, 09 Sep 2020 11:33:19 +02:00

x86/sev-es: Handle MMIO String Instructions

Add handling for emulation of the MOVS instruction on MMIO regions, as
done by the memcpy_toio() and memcpy_fromio() functions.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-51-j...@8bytes.org
---
 arch/x86/kernel/sev-es.c | 77 +++-
 1 file changed, 77 insertions(+)

diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index a170810..f724b75 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -594,6 +594,73 @@ static enum es_result vc_handle_mmio_twobyte_ops(struct 
ghcb *ghcb,
return ret;
 }
 
+/*
+ * The MOVS instruction has two memory operands, which raises the
+ * problem that it is not known whether the access to the source or the
+ * destination caused the #VC exception (and hence whether an MMIO read
+ * or write operation needs to be emulated).
+ *
+ * Instead of playing games with walking page-tables and trying to guess
+ * whether the source or destination is an MMIO range, split the move
+ * into two operations, a read and a write with only one memory operand.
+ * This will cause a nested #VC exception on the MMIO address which can
+ * then be handled.
+ *
+ * This implementation has the benefit that it also supports MOVS where
+ * source _and_ destination are MMIO regions.
+ *
+ * It will slow MOVS on MMIO down a lot, but in SEV-ES guests it is a
+ * rare operation. If it turns out to be a performance problem the split
+ * operations can be moved to memcpy_fromio() and memcpy_toio().
+ */
+static enum es_result vc_handle_mmio_movs(struct es_em_ctxt *ctxt,
+ unsigned int bytes)
+{
+   unsigned long ds_base, es_base;
+   unsigned char *src, *dst;
+   unsigned char buffer[8];
+   enum es_result ret;
+   bool rep;
+   int off;
+
+   ds_base = insn_get_seg_base(ctxt->regs, INAT_SEG_REG_DS);
+   es_base = insn_get_seg_base(ctxt->regs, INAT_SEG_REG_ES);
+
+   if (ds_base == -1L || es_base == -1L) {
+   ctxt->fi.vector = X86_TRAP_GP;
+   ctxt->fi.error_code = 0;
+   return ES_EXCEPTION;
+   }
+
+   src = ds_base + (unsigned char *)ctxt->regs->si;
+   dst = es_base + (unsigned char *)ctxt->regs->di;
+
+   ret = vc_read_mem(ctxt, src, buffer, bytes);
+   if (ret != ES_OK)
+   return ret;
+
+   ret = vc_write_mem(ctxt, dst, buffer, bytes);
+   if (ret != ES_OK)
+   return ret;
+
+   if (ctxt->regs->flags & X86_EFLAGS_DF)
+   off = -bytes;
+   else
+   off =  bytes;
+
+   ctxt->regs->si += off;
+   ctxt->regs->di += off;
+
+   rep = insn_has_rep_prefix(&ctxt->insn);
+   if (rep)
+   ctxt->regs->cx -= 1;
+
+   if (!rep || ctxt->regs->cx == 0)
+   return ES_OK;
+   else
+   return ES_RETRY;
+}
+
 static enum es_result vc_handle_mmio(struct ghcb *ghcb,
 struct es_em_ctxt *ctxt)
 {
@@ -655,6 +722,16 @@ static enum es_result vc_handle_mmio(struct ghcb *ghcb,
memcpy(reg_data, ghcb->shared_buffer, bytes);
break;
 
+   /* MOVS instruction */
+   case 0xa4:
+   bytes = 1;
+   fallthrough;
+   case 0xa5:
+   if (!bytes)
+   bytes = insn->opnd_bytes;
+
+   ret = vc_handle_mmio_movs(ctxt, bytes);
+   break;
/* Two-Byte Opcodes */
case 0x0f:
ret = vc_handle_mmio_twobyte_ops(ghcb, ctxt);


[tip: x86/seves] x86/smpboot: Load TSS and getcpu GDT entry before loading IDT

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 520d030852b4c9babfce9a79d8b5320b6b5545e6
Gitweb:
https://git.kernel.org/tip/520d030852b4c9babfce9a79d8b5320b6b5545e6
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:16:08 +02:00
Committer: Borislav Petkov 
CommitterDate: Wed, 09 Sep 2020 11:33:20 +02:00

x86/smpboot: Load TSS and getcpu GDT entry before loading IDT

The IDT on 64-bit contains vectors which use paranoid_entry() and/or IST
stacks. To make these vectors work, the TSS and the getcpu GDT entry need
to be set up before the IDT is loaded.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-68-j...@8bytes.org
---
 arch/x86/include/asm/processor.h |  1 +
 arch/x86/kernel/cpu/common.c | 23 +++
 arch/x86/kernel/smpboot.c|  2 +-
 3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 97143d8..615dd44 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -696,6 +696,7 @@ extern void load_direct_gdt(int);
 extern void load_fixmap_gdt(int);
 extern void load_percpu_segment(int);
 extern void cpu_init(void);
+extern void cpu_init_exception_handling(void);
 extern void cr4_init(void);
 
 static inline unsigned long get_debugctlmsr(void)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 81fba4d..beffea2 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1863,6 +1863,29 @@ static inline void tss_setup_io_bitmap(struct tss_struct 
*tss)
 }
 
 /*
+ * Setup everything needed to handle exceptions from the IDT, including the IST
+ * exceptions which use paranoid_entry().
+ */
+void cpu_init_exception_handling(void)
+{
+   struct tss_struct *tss = this_cpu_ptr(&cpu_tss_rw);
+   int cpu = raw_smp_processor_id();
+
+   /* paranoid_entry() gets the CPU number from the GDT */
+   setup_getcpu(cpu);
+
+   /* IST vectors need TSS to be set up. */
+   tss_setup_ist(tss);
+   tss_setup_io_bitmap(tss);
+   set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss);
+
+   load_TR_desc();
+
+   /* Finally load the IDT */
+   load_current_idt();
+}
+
+/*
  * cpu_init() initializes state that is per-CPU. Some data is already
  * initialized (naturally) in the bootstrap process, such as the GDT
  * and IDT. We reload them nevertheless, this function acts as a
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index f5ef689..de776b2 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -227,7 +227,7 @@ static void notrace start_secondary(void *unused)
load_cr3(swapper_pg_dir);
__flush_tlb_all();
 #endif
-   load_current_idt();
+   cpu_init_exception_handling();
cpu_init();
x86_cpuinit.early_percpu_clock_init();
preempt_disable();


[tip: x86/seves] x86/sev-es: Add SEV-ES Feature Detection

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: b57de6cd16395be1ebdaa9b489ffbf462bb585c4
Gitweb:
https://git.kernel.org/tip/b57de6cd16395be1ebdaa9b489ffbf462bb585c4
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:37 +02:00
Committer: Borislav Petkov 
CommitterDate: Mon, 07 Sep 2020 23:00:20 +02:00

x86/sev-es: Add SEV-ES Feature Detection

Add a sev_es_active() function for checking whether SEV-ES is enabled.
Also cache the value of MSR_AMD64_SEV at boot to speed up the feature
checking in the running code.

 [ bp: Remove "!!" in sev_active() too. ]

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Reviewed-by: Kees Cook 
Link: https://lkml.kernel.org/r/20200907131613.12703-37-j...@8bytes.org
---
 arch/x86/include/asm/mem_encrypt.h |  3 +++
 arch/x86/include/asm/msr-index.h   |  2 ++
 arch/x86/mm/mem_encrypt.c  |  9 -
 arch/x86/mm/mem_encrypt_identity.c |  3 +++
 4 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index 5049f6c..4e72b73 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -19,6 +19,7 @@
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 
 extern u64 sme_me_mask;
+extern u64 sev_status;
 extern bool sev_enabled;
 
 void sme_encrypt_execute(unsigned long encrypted_kernel_vaddr,
@@ -50,6 +51,7 @@ void __init mem_encrypt_init(void);
 
 bool sme_active(void);
 bool sev_active(void);
+bool sev_es_active(void);
 
 #define __bss_decrypted __attribute__((__section__(".bss..decrypted")))
 
@@ -72,6 +74,7 @@ static inline void __init sme_enable(struct boot_params *bp) 
{ }
 
 static inline bool sme_active(void) { return false; }
 static inline bool sev_active(void) { return false; }
+static inline bool sev_es_active(void) { return false; }
 
 static inline int __init
 early_set_memory_decrypted(unsigned long vaddr, unsigned long size) { return 
0; }
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index da34fdb..249a414 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -469,7 +469,9 @@
 #define MSR_AMD64_SEV_ES_GHCB  0xc0010130
 #define MSR_AMD64_SEV  0xc0010131
 #define MSR_AMD64_SEV_ENABLED_BIT  0
+#define MSR_AMD64_SEV_ES_ENABLED_BIT   1
 #define MSR_AMD64_SEV_ENABLED  BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
+#define MSR_AMD64_SEV_ES_ENABLED   BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
 
 #define MSR_AMD64_VIRT_SPEC_CTRL   0xc001011f
 
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 9f1177e..a38f556 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -38,6 +38,7 @@
  * section is later cleared.
  */
 u64 sme_me_mask __section(.data) = 0;
+u64 sev_status __section(.data) = 0;
 EXPORT_SYMBOL(sme_me_mask);
 DEFINE_STATIC_KEY_FALSE(sev_enable_key);
 EXPORT_SYMBOL_GPL(sev_enable_key);
@@ -347,7 +348,13 @@ bool sme_active(void)
 
 bool sev_active(void)
 {
-   return sme_me_mask && sev_enabled;
+   return sev_status & MSR_AMD64_SEV_ENABLED;
+}
+
+/* Needs to be called from non-instrumentable code */
+bool noinstr sev_es_active(void)
+{
+   return sev_status & MSR_AMD64_SEV_ES_ENABLED;
 }
 
 /* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */
diff --git a/arch/x86/mm/mem_encrypt_identity.c 
b/arch/x86/mm/mem_encrypt_identity.c
index e2b0e2a..68d7537 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -540,6 +540,9 @@ void __init sme_enable(struct boot_params *bp)
if (!(msr & MSR_AMD64_SEV_ENABLED))
return;
 
+   /* Save SEV_STATUS to avoid reading MSR again */
+   sev_status = msr;
+
/* SEV state cannot be controlled by a command line option */
sme_me_mask = me_mask;
sev_enabled = true;


[tip: x86/seves] x86/head/64: Switch to initial stack earlier

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 3add38cb96a1ae7d152db69ab4329e809c2af2d4
Gitweb:
https://git.kernel.org/tip/3add38cb96a1ae7d152db69ab4329e809c2af2d4
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:33 +02:00
Committer: Borislav Petkov 
CommitterDate: Mon, 07 Sep 2020 21:44:01 +02:00

x86/head/64: Switch to initial stack earlier

Make sure there is a stack once the kernel runs from virtual addresses.
At this stage any secondary CPU which boots will have lost its stack
because the kernel switched to a new page-table which does not map the
real-mode stack anymore.

This is needed for handling early #VC exceptions caused by instructions
like CPUID.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Reviewed-by: Kees Cook 
Link: https://lkml.kernel.org/r/20200907131613.12703-33-j...@8bytes.org
---
 arch/x86/kernel/head_64.S |  9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index f402087..83050c9 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -192,6 +192,12 @@ SYM_CODE_START(secondary_startup_64)
movlinitial_gs+4(%rip),%edx
wrmsr
 
+   /*
+* Setup a boot time stack - Any secondary CPU will have lost its stack
+* by now because the cr3-switch above unmaps the real-mode stack
+*/
+   movq initial_stack(%rip), %rsp
+
/* Check if nx is implemented */
movl$0x8001, %eax
cpuid
@@ -212,9 +218,6 @@ SYM_CODE_START(secondary_startup_64)
/* Make changes effective */
movq%rax, %cr0
 
-   /* Setup a boot time stack */
-   movq initial_stack(%rip), %rsp
-
/* zero EFLAGS after setting rsp */
pushq $0
popfq


[tip: x86/seves] x86/sev-es: Allocate and map an IST stack for #VC handler

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 02772fb9b68e6a72a5e17f994048df832fe2b15e
Gitweb:
https://git.kernel.org/tip/02772fb9b68e6a72a5e17f994048df832fe2b15e
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:43 +02:00
Committer: Borislav Petkov 
CommitterDate: Wed, 09 Sep 2020 11:33:19 +02:00

x86/sev-es: Allocate and map an IST stack for #VC handler

Allocate and map an IST stack and an additional fall-back stack for
the #VC handler.  The memory for the stacks is allocated only when
SEV-ES is active.

The #VC handler needs to use an IST stack because a #VC exception can be
raised from kernel space with unsafe stack, e.g. in the SYSCALL entry
path.

Since the #VC exception can be nested, the #VC handler switches back to
the interrupted stack when entered from kernel space. If switching back
is not possible, the fall-back stack is used.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-43-j...@8bytes.org
---
 arch/x86/include/asm/cpu_entry_area.h | 33 --
 arch/x86/include/asm/page_64_types.h  |  1 +-
 arch/x86/kernel/cpu/common.c  |  2 ++-
 arch/x86/kernel/dumpstack_64.c|  8 --
 arch/x86/kernel/sev-es.c  | 33 ++-
 5 files changed, 63 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/cpu_entry_area.h 
b/arch/x86/include/asm/cpu_entry_area.h
index 8902fdb..3d52b09 100644
--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -11,25 +11,29 @@
 #ifdef CONFIG_X86_64
 
 /* Macro to enforce the same ordering and stack sizes */
-#define ESTACKS_MEMBERS(guardsize) \
-   charDF_stack_guard[guardsize];  \
-   charDF_stack[EXCEPTION_STKSZ];  \
-   charNMI_stack_guard[guardsize]; \
-   charNMI_stack[EXCEPTION_STKSZ]; \
-   charDB_stack_guard[guardsize];  \
-   charDB_stack[EXCEPTION_STKSZ];  \
-   charMCE_stack_guard[guardsize]; \
-   charMCE_stack[EXCEPTION_STKSZ]; \
-   charIST_top_guard[guardsize];   \
+#define ESTACKS_MEMBERS(guardsize, optional_stack_size)\
+   charDF_stack_guard[guardsize];  \
+   charDF_stack[EXCEPTION_STKSZ];  \
+   charNMI_stack_guard[guardsize]; \
+   charNMI_stack[EXCEPTION_STKSZ]; \
+   charDB_stack_guard[guardsize];  \
+   charDB_stack[EXCEPTION_STKSZ];  \
+   charMCE_stack_guard[guardsize]; \
+   charMCE_stack[EXCEPTION_STKSZ]; \
+   charVC_stack_guard[guardsize];  \
+   charVC_stack[optional_stack_size];  \
+   charVC2_stack_guard[guardsize]; \
+   charVC2_stack[optional_stack_size]; \
+   charIST_top_guard[guardsize];   \
 
 /* The exception stacks' physical storage. No guard pages required */
 struct exception_stacks {
-   ESTACKS_MEMBERS(0)
+   ESTACKS_MEMBERS(0, 0)
 };
 
 /* The effective cpu entry area mapping with guard pages. */
 struct cea_exception_stacks {
-   ESTACKS_MEMBERS(PAGE_SIZE)
+   ESTACKS_MEMBERS(PAGE_SIZE, EXCEPTION_STKSZ)
 };
 
 /*
@@ -40,6 +44,8 @@ enum exception_stack_ordering {
ESTACK_NMI,
ESTACK_DB,
ESTACK_MCE,
+   ESTACK_VC,
+   ESTACK_VC2,
N_EXCEPTION_STACKS
 };
 
@@ -139,4 +145,7 @@ static inline struct entry_stack *cpu_entry_stack(int cpu)
 #define __this_cpu_ist_top_va(name)\
CEA_ESTACK_TOP(__this_cpu_read(cea_exception_stacks), name)
 
+#define __this_cpu_ist_bottom_va(name) \
+   CEA_ESTACK_BOT(__this_cpu_read(cea_exception_stacks), name)
+
 #endif
diff --git a/arch/x86/include/asm/page_64_types.h 
b/arch/x86/include/asm/page_64_types.h
index 288b065..d0c6c10 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -28,6 +28,7 @@
 #defineIST_INDEX_NMI   1
 #defineIST_INDEX_DB2
 #defineIST_INDEX_MCE   3
+#defineIST_INDEX_VC4
 
 /*
  * Set __PAGE_OFFSET to the most negative possible address +
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index c5d6f17..81fba4d 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1829,6 +1829,8 @@ static inline void tss_setup_ist(struct tss_struct *tss)
tss->x86_tss.ist[IST_INDEX_NMI] = __this_cpu_ist_top_va(NMI);
tss->x86_tss.ist[IST_INDEX_DB] = __this_cpu_ist_top_va(DB);
tss->x86_tss.ist[IST_INDEX_MCE] = __this_cpu_ist_top_va(MCE);
+   /* Only mapped 

[tip: x86/seves] x86/sev-es: Setup GHCB-based boot #VC handler

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 1aa9aa8ee517e0443b06e816a4fd2d15f2113615
Gitweb:
https://git.kernel.org/tip/1aa9aa8ee517e0443b06e816a4fd2d15f2113615
Author:Joerg Roedel 
AuthorDate:Tue, 08 Sep 2020 14:38:16 +02:00
Committer: Borislav Petkov 
CommitterDate: Wed, 09 Sep 2020 11:32:27 +02:00

x86/sev-es: Setup GHCB-based boot #VC handler

Add the infrastructure to handle #VC exceptions when the kernel runs on
virtual addresses and has mapped a GHCB. This handler will be used until
the runtime #VC handler takes over.

Since the handler runs very early, disable instrumentation for sev-es.c.

 [ bp: Make vc_ghcb_invalidate() __always_inline so that it can be
   inlined in noinstr functions like __sev_es_nmi_complete(). ]

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200908123816.gb3...@8bytes.org
---
 arch/x86/include/asm/realmode.h |   3 +-
 arch/x86/include/asm/segment.h  |   2 +-
 arch/x86/include/asm/sev-es.h   |   2 +-
 arch/x86/kernel/Makefile|   2 +-
 arch/x86/kernel/head64.c|   8 ++-
 arch/x86/kernel/head_64.S   |  36 ++-
 arch/x86/kernel/sev-es-shared.c |  14 ++--
 arch/x86/kernel/sev-es.c| 116 +++-
 arch/x86/mm/extable.c   |   1 +-
 9 files changed, 176 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index b35030e..96118fb 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -57,6 +57,9 @@ extern unsigned char real_mode_blob_end[];
 extern unsigned long initial_code;
 extern unsigned long initial_gs;
 extern unsigned long initial_stack;
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+extern unsigned long initial_vc_handler;
+#endif
 
 extern unsigned char real_mode_blob[];
 extern unsigned char real_mode_relocs[];
diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h
index 9646c30..4e8dec3 100644
--- a/arch/x86/include/asm/segment.h
+++ b/arch/x86/include/asm/segment.h
@@ -230,7 +230,7 @@
 #define NUM_EXCEPTION_VECTORS  32
 
 /* Bitmask of exception vectors which push an error code on the stack: */
-#define EXCEPTION_ERRCODE_MASK 0x00027d00
+#define EXCEPTION_ERRCODE_MASK 0x20027d00
 
 #define GDT_SIZE   (GDT_ENTRIES*8)
 #define GDT_ENTRY_TLS_ENTRIES  3
diff --git a/arch/x86/include/asm/sev-es.h b/arch/x86/include/asm/sev-es.h
index 7175d43..9fbeeda 100644
--- a/arch/x86/include/asm/sev-es.h
+++ b/arch/x86/include/asm/sev-es.h
@@ -75,5 +75,7 @@ static inline u64 lower_bits(u64 val, unsigned int bits)
 
 /* Early IDT entry points for #VC handler */
 extern void vc_no_ghcb(void);
+extern void vc_boot_ghcb(void);
+extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
 
 #endif
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 23bc0f8..4a33272 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -20,6 +20,7 @@ CFLAGS_REMOVE_kvmclock.o = -pg
 CFLAGS_REMOVE_ftrace.o = -pg
 CFLAGS_REMOVE_early_printk.o = -pg
 CFLAGS_REMOVE_head64.o = -pg
+CFLAGS_REMOVE_sev-es.o = -pg
 endif
 
 KASAN_SANITIZE_head$(BITS).o   := n
@@ -27,6 +28,7 @@ KASAN_SANITIZE_dumpstack.o:= n
 KASAN_SANITIZE_dumpstack_$(BITS).o := n
 KASAN_SANITIZE_stacktrace.o:= n
 KASAN_SANITIZE_paravirt.o  := n
+KASAN_SANITIZE_sev-es.o:= n
 
 # With some compiler versions the generated code results in boot hangs, caused
 # by several compilation units. To be safe, disable all instrumentation.
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index fc55cc9..4199f25 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -406,6 +406,10 @@ void __init do_early_exception(struct pt_regs *regs, int 
trapnr)
early_make_pgtable(native_read_cr2()))
return;
 
+   if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT) &&
+   trapnr == X86_TRAP_VC && handle_vc_boot_ghcb(regs))
+   return;
+
early_fixup_exception(regs, trapnr);
 }
 
@@ -575,6 +579,10 @@ static void startup_64_load_idt(unsigned long physbase)
 /* This is used when running on kernel addresses */
 void early_setup_idt(void)
 {
+   /* VMM Communication Exception */
+   if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
+   set_bringup_idt_handler(bringup_idt_table, X86_TRAP_VC, 
vc_boot_ghcb);
+
bringup_idt_descr.address = (unsigned long)bringup_idt_table;
native_load_idt(&bringup_idt_descr);
 }
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 6e68bca..1a71d0d 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -281,11 +281,47 @@ SYM_CODE_START(start_cpu0)
 SYM_CODE_END(start_cpu0)
 #endif
 
+#ifdef CONFIG_AM

[tip: x86/seves] x86/head/64: Load segment registers earlier

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 7b99819dfb60268cc1c75f83c949bc4a09221bea
Gitweb:
https://git.kernel.org/tip/7b99819dfb60268cc1c75f83c949bc4a09221bea
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:32 +02:00
Committer: Borislav Petkov 
CommitterDate: Mon, 07 Sep 2020 21:40:53 +02:00

x86/head/64: Load segment registers earlier

Make sure segments are properly set up before setting up an IDT and
doing anything that might cause a #VC exception. This is later needed
for early exception handling.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Reviewed-by: Kees Cook 
Link: https://lkml.kernel.org/r/20200907131613.12703-32-j...@8bytes.org
---
 arch/x86/kernel/head_64.S | 52 +++---
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 03b03f2..f402087 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -166,6 +166,32 @@ SYM_CODE_START(secondary_startup_64)
 */
lgdtearly_gdt_descr(%rip)
 
+   /* set up data segments */
+   xorl %eax,%eax
+   movl %eax,%ds
+   movl %eax,%ss
+   movl %eax,%es
+
+   /*
+* We don't really need to load %fs or %gs, but load them anyway
+* to kill any stale realmode selectors.  This allows execution
+* under VT hardware.
+*/
+   movl %eax,%fs
+   movl %eax,%gs
+
+   /* Set up %gs.
+*
+* The base of %gs always points to fixed_percpu_data. If the
+* stack protector canary is enabled, it is located at %gs:40.
+* Note that, on SMP, the boot cpu uses init data section until
+* the per cpu areas are set up.
+*/
+   movl$MSR_GS_BASE,%ecx
+   movlinitial_gs(%rip),%eax
+   movlinitial_gs+4(%rip),%edx
+   wrmsr
+
/* Check if nx is implemented */
movl$0x8001, %eax
cpuid
@@ -193,32 +219,6 @@ SYM_CODE_START(secondary_startup_64)
pushq $0
popfq
 
-   /* set up data segments */
-   xorl %eax,%eax
-   movl %eax,%ds
-   movl %eax,%ss
-   movl %eax,%es
-
-   /*
-* We don't really need to load %fs or %gs, but load them anyway
-* to kill any stale realmode selectors.  This allows execution
-* under VT hardware.
-*/
-   movl %eax,%fs
-   movl %eax,%gs
-
-   /* Set up %gs.
-*
-* The base of %gs always points to fixed_percpu_data. If the
-* stack protector canary is enabled, it is located at %gs:40.
-* Note that, on SMP, the boot cpu uses init data section until
-* the per cpu areas are set up.
-*/
-   movl$MSR_GS_BASE,%ecx
-   movlinitial_gs(%rip),%eax
-   movlinitial_gs+4(%rip),%edx
-   wrmsr
-
/* rsi is pointer to real mode structure with interesting info.
   pass it to C */
movq%rsi, %rdi


[tip: x86/seves] x86/idt: Make IDT init functions static inlines

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 097ee5b778b8970e1c2ed3ca1631b297d90acd61
Gitweb:
https://git.kernel.org/tip/097ee5b778b8970e1c2ed3ca1631b297d90acd61
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:35 +02:00
Committer: Borislav Petkov 
CommitterDate: Mon, 07 Sep 2020 22:44:43 +02:00

x86/idt: Make IDT init functions static inlines

Move these two functions from kernel/idt.c to include/asm/desc.h:

* init_idt_data()
* idt_init_desc()

These functions are needed to setup IDT entries very early and need to
be called from head64.c. To be usable this early, these functions need
to be compiled without instrumentation and the stack-protector feature.

These features need to be kept enabled for kernel/idt.c, so head64.c
must use its own versions.

 [ bp: Take Kees' suggested patch title and add his Rev-by. ]

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Reviewed-by: Kees Cook 
Link: https://lkml.kernel.org/r/20200907131613.12703-35-j...@8bytes.org
---
 arch/x86/include/asm/desc.h  | 27 +-
 arch/x86/include/asm/desc_defs.h |  7 ++-
 arch/x86/kernel/idt.c| 34 +---
 3 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index 1ced11d..476082a 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -383,6 +383,33 @@ static inline void set_desc_limit(struct desc_struct 
*desc, unsigned long limit)
 
 void alloc_intr_gate(unsigned int n, const void *addr);
 
+static inline void init_idt_data(struct idt_data *data, unsigned int n,
+const void *addr)
+{
+   BUG_ON(n > 0xFF);
+
+   memset(data, 0, sizeof(*data));
+   data->vector= n;
+   data->addr  = addr;
+   data->segment   = __KERNEL_CS;
+   data->bits.type = GATE_INTERRUPT;
+   data->bits.p= 1;
+}
+
+static inline void idt_init_desc(gate_desc *gate, const struct idt_data *d)
+{
+   unsigned long addr = (unsigned long) d->addr;
+
+   gate->offset_low= (u16) addr;
+   gate->segment   = (u16) d->segment;
+   gate->bits  = d->bits;
+   gate->offset_middle = (u16) (addr >> 16);
+#ifdef CONFIG_X86_64
+   gate->offset_high   = (u32) (addr >> 32);
+   gate->reserved  = 0;
+#endif
+}
+
 extern unsigned long system_vectors[];
 
 extern void load_current_idt(void);
diff --git a/arch/x86/include/asm/desc_defs.h b/arch/x86/include/asm/desc_defs.h
index 5621fb3..f7e7099 100644
--- a/arch/x86/include/asm/desc_defs.h
+++ b/arch/x86/include/asm/desc_defs.h
@@ -74,6 +74,13 @@ struct idt_bits {
p   : 1;
 } __attribute__((packed));
 
+struct idt_data {
+   unsigned intvector;
+   unsigned intsegment;
+   struct idt_bits bits;
+   const void  *addr;
+};
+
 struct gate_struct {
u16 offset_low;
u16 segment;
diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index 53946c1..4bb4e3d 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -11,13 +11,6 @@
 #include 
 #include 
 
-struct idt_data {
-   unsigned intvector;
-   unsigned intsegment;
-   struct idt_bits bits;
-   const void  *addr;
-};
-
 #define DPL0   0x0
 #define DPL3   0x3
 
@@ -178,20 +171,6 @@ bool idt_is_f00f_address(unsigned long address)
 }
 #endif
 
-static inline void idt_init_desc(gate_desc *gate, const struct idt_data *d)
-{
-   unsigned long addr = (unsigned long) d->addr;
-
-   gate->offset_low= (u16) addr;
-   gate->segment   = (u16) d->segment;
-   gate->bits  = d->bits;
-   gate->offset_middle = (u16) (addr >> 16);
-#ifdef CONFIG_X86_64
-   gate->offset_high   = (u32) (addr >> 32);
-   gate->reserved  = 0;
-#endif
-}
-
 static __init void
 idt_setup_from_table(gate_desc *idt, const struct idt_data *t, int size, bool 
sys)
 {
@@ -205,19 +184,6 @@ idt_setup_from_table(gate_desc *idt, const struct idt_data 
*t, int size, bool sy
}
 }
 
-static void init_idt_data(struct idt_data *data, unsigned int n,
- const void *addr)
-{
-   BUG_ON(n > 0xFF);
-
-   memset(data, 0, sizeof(*data));
-   data->vector= n;
-   data->addr  = addr;
-   data->segment   = __KERNEL_CS;
-   data->bits.type = GATE_INTERRUPT;
-   data->bits.p= 1;
-}
-
 static __init void set_intr_gate(unsigned int n, const void *addr)
 {
struct idt_data data;


[tip: x86/seves] x86/sev-es: Handle #DB Events

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: cb1ad3ecea959593400dfac4f027dbc005e62c39
Gitweb:
https://git.kernel.org/tip/cb1ad3ecea959593400dfac4f027dbc005e62c39
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:16:02 +02:00
Committer: Borislav Petkov 
CommitterDate: Wed, 09 Sep 2020 11:33:20 +02:00

x86/sev-es: Handle #DB Events

Handle #VC exceptions caused by #DB exceptions in the guest. Those
must be handled outside of instrumentation_begin()/end() so that the
handler will not be raised recursively.

Handle them by calling the kernel's debug exception handler.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-62-j...@8bytes.org
---
 arch/x86/kernel/sev-es.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 8867c48..79d5190 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -922,6 +922,14 @@ static enum es_result vc_handle_trap_ac(struct ghcb *ghcb,
return ES_EXCEPTION;
 }
 
+static __always_inline void vc_handle_trap_db(struct pt_regs *regs)
+{
+   if (user_mode(regs))
+   noist_exc_debug(regs);
+   else
+   exc_debug(regs);
+}
+
 static enum es_result vc_handle_exitcode(struct es_em_ctxt *ctxt,
 struct ghcb *ghcb,
 unsigned long exit_code)
@@ -1033,6 +1041,15 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
struct ghcb *ghcb;
 
lockdep_assert_irqs_disabled();
+
+   /*
+* Handle #DB before calling into !noinstr code to avoid recursive #DB.
+*/
+   if (error_code == SVM_EXIT_EXCP_BASE + X86_TRAP_DB) {
+   vc_handle_trap_db(regs);
+   return;
+   }
+
instrumentation_begin();
 
/*


[tip: x86/seves] x86/boot/compressed/64: Setup a GHCB-based VC Exception handler

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 597cfe48212a3f110ab0f918bf59791f453e65b7
Gitweb:
https://git.kernel.org/tip/597cfe48212a3f110ab0f918bf59791f453e65b7
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:24 +02:00
Committer: Borislav Petkov 
CommitterDate: Mon, 07 Sep 2020 19:45:25 +02:00

x86/boot/compressed/64: Setup a GHCB-based VC Exception handler

Install an exception handler for #VC exception that uses a GHCB. Also
add the infrastructure for handling different exit-codes by decoding
the instruction that caused the exception and error handling.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-24-j...@8bytes.org
---
 arch/x86/Kconfig   |   1 +-
 arch/x86/boot/compressed/Makefile  |   5 +-
 arch/x86/boot/compressed/idt_64.c  |   4 +-
 arch/x86/boot/compressed/idt_handlers_64.S |   3 +-
 arch/x86/boot/compressed/misc.c|   7 +-
 arch/x86/boot/compressed/misc.h|   7 +-
 arch/x86/boot/compressed/sev-es.c  | 111 ++-
 arch/x86/include/asm/sev-es.h  |  39 +-
 arch/x86/include/uapi/asm/svm.h|   1 +-
 arch/x86/kernel/sev-es-shared.c| 154 -
 10 files changed, 331 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 7101ac6..8289dd4 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1521,6 +1521,7 @@ config AMD_MEM_ENCRYPT
select DYNAMIC_PHYSICAL_MASK
select ARCH_USE_MEMREMAP_PROT
select ARCH_HAS_FORCE_DMA_UNENCRYPTED
+   select INSTRUCTION_DECODER
help
  Say yes to enable support for the encryption of system memory.
  This requires an AMD processor that supports Secure Memory
diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index 38f4a52..c01236a 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -44,6 +44,11 @@ KBUILD_CFLAGS += $(call 
cc-option,-fmacro-prefix-map=$(srctree)/=)
 KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
 KBUILD_CFLAGS += -D__DISABLE_EXPORTS
 
+# sev-es.c indirectly inludes inat-table.h which is generated during
+# compilation and stored in $(objtree). Add the directory to the includes so
+# that the compiler finds it even with out-of-tree builds (make O=/some/path).
+CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/
+
 KBUILD_AFLAGS  := $(KBUILD_CFLAGS) -D__ASSEMBLY__
 GCOV_PROFILE := n
 UBSAN_SANITIZE :=n
diff --git a/arch/x86/boot/compressed/idt_64.c 
b/arch/x86/boot/compressed/idt_64.c
index f3ca732..804a502 100644
--- a/arch/x86/boot/compressed/idt_64.c
+++ b/arch/x86/boot/compressed/idt_64.c
@@ -46,5 +46,9 @@ void load_stage2_idt(void)
 
set_idt_entry(X86_TRAP_PF, boot_page_fault);
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+   set_idt_entry(X86_TRAP_VC, boot_stage2_vc);
+#endif
+
load_boot_idt(&boot_idt_desc);
 }
diff --git a/arch/x86/boot/compressed/idt_handlers_64.S 
b/arch/x86/boot/compressed/idt_handlers_64.S
index 92eb4df..22890e1 100644
--- a/arch/x86/boot/compressed/idt_handlers_64.S
+++ b/arch/x86/boot/compressed/idt_handlers_64.S
@@ -72,5 +72,6 @@ SYM_FUNC_END(\name)
 EXCEPTION_HANDLER  boot_page_fault do_boot_page_fault error_code=1
 
 #ifdef CONFIG_AMD_MEM_ENCRYPT
-EXCEPTION_HANDLER  boot_stage1_vc do_vc_no_ghcb error_code=1
+EXCEPTION_HANDLER  boot_stage1_vc do_vc_no_ghcberror_code=1
+EXCEPTION_HANDLER  boot_stage2_vc do_boot_stage2_vcerror_code=1
 #endif
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index e478e40..267e7f9 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -442,6 +442,13 @@ asmlinkage __visible void *extract_kernel(void *rmode, 
memptr heap,
parse_elf(output);
handle_relocations(output, output_len, virt_addr);
debug_putstr("done.\nBooting the kernel.\n");
+
+   /*
+* Flush GHCB from cache and map it encrypted again when running as
+* SEV-ES guest.
+*/
+   sev_es_shutdown_ghcb();
+
return output;
 }
 
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 01c0fb3..9995c70 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -115,6 +115,12 @@ static inline void console_init(void)
 
 void set_sev_encryption_mask(void);
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+void sev_es_shutdown_ghcb(void);
+#else
+static inline void sev_es_shutdown_ghcb(void) { }
+#endif
+
 /* acpi.c */
 #ifdef CONFIG_ACPI
 acpi_physical_address get_rsdp_addr(void);
@@ -144,5 +150,6 @@ extern struct desc_ptr boot_idt_desc;
 /* IDT Entry Points */
 void boot_page_fault(void);
 void boot_stage1_vc(void);
+void boot_stage2_vc(void);
 
 #endif /* BOOT_COMPRESSED_MISC_H */
diff --git a/arch/x86/boot/compressed/sev-es.c 
b/

[tip: x86/seves] x86/boot/compressed/64: Add set_page_en/decrypted() helpers

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: c81d60029a1393183d2125fcb4b64831629b8864
Gitweb:
https://git.kernel.org/tip/c81d60029a1393183d2125fcb4b64831629b8864
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:23 +02:00
Committer: Borislav Petkov 
CommitterDate: Mon, 07 Sep 2020 19:45:25 +02:00

x86/boot/compressed/64: Add set_page_en/decrypted() helpers

The functions are needed to map the GHCB for SEV-ES guests. The GHCB
is used for communication with the hypervisor, so its content must not
be encrypted. After the GHCB is not needed anymore it must be mapped
encrypted again so that the running kernel image can safely re-use the
memory.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-23-j...@8bytes.org
---
 arch/x86/boot/compressed/ident_map_64.c | 133 +++-
 arch/x86/boot/compressed/misc.h |   2 +-
 2 files changed, 135 insertions(+)

diff --git a/arch/x86/boot/compressed/ident_map_64.c 
b/arch/x86/boot/compressed/ident_map_64.c
index aa91beb..05742f6 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -24,6 +24,7 @@
 
 /* These actually do the work of building the kernel identity maps. */
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -165,6 +166,138 @@ void finalize_identity_maps(void)
write_cr3(top_level_pgt);
 }
 
+static pte_t *split_large_pmd(struct x86_mapping_info *info,
+ pmd_t *pmdp, unsigned long __address)
+{
+   unsigned long page_flags;
+   unsigned long address;
+   pte_t *pte;
+   pmd_t pmd;
+   int i;
+
+   pte = (pte_t *)info->alloc_pgt_page(info->context);
+   if (!pte)
+   return NULL;
+
+   address = __address & PMD_MASK;
+   /* No large page - clear PSE flag */
+   page_flags  = info->page_flag & ~_PAGE_PSE;
+
+   /* Populate the PTEs */
+   for (i = 0; i < PTRS_PER_PMD; i++) {
+   set_pte(&pte[i], __pte(address | page_flags));
+   address += PAGE_SIZE;
+   }
+
+   /*
+* Ideally we need to clear the large PMD first and do a TLB
+* flush before we write the new PMD. But the 2M range of the
+* PMD might contain the code we execute and/or the stack
+* we are on, so we can't do that. But that should be safe here
+* because we are going from large to small mappings and we are
+* also the only user of the page-table, so there is no chance
+* of a TLB multihit.
+*/
+   pmd = __pmd((unsigned long)pte | info->kernpg_flag);
+   set_pmd(pmdp, pmd);
+   /* Flush TLB to establish the new PMD */
+   write_cr3(top_level_pgt);
+
+   return pte + pte_index(__address);
+}
+
+static void clflush_page(unsigned long address)
+{
+   unsigned int flush_size;
+   char *cl, *start, *end;
+
+   /*
+* Hardcode cl-size to 64 - CPUID can't be used here because that might
+* cause another #VC exception and the GHCB is not ready to use yet.
+*/
+   flush_size = 64;
+   start  = (char *)(address & PAGE_MASK);
+   end= start + PAGE_SIZE;
+
+   /*
+* First make sure there are no pending writes on the cache-lines to
+* flush.
+*/
+   asm volatile("mfence" : : : "memory");
+
+   for (cl = start; cl != end; cl += flush_size)
+   clflush(cl);
+}
+
+static int set_clr_page_flags(struct x86_mapping_info *info,
+ unsigned long address,
+ pteval_t set, pteval_t clr)
+{
+   pgd_t *pgdp = (pgd_t *)top_level_pgt;
+   p4d_t *p4dp;
+   pud_t *pudp;
+   pmd_t *pmdp;
+   pte_t *ptep, pte;
+
+   /*
+* First make sure there is a PMD mapping for 'address'.
+* It should already exist, but keep things generic.
+*
+* To map the page just read from it and fault it in if there is no
+* mapping yet. add_identity_map() can't be called here because that
+* would unconditionally map the address on PMD level, destroying any
+* PTE-level mappings that might already exist. Use assembly here so
+* the access won't be optimized away.
+*/
+   asm volatile("mov %[address], %%r9"
+:: [address] "g" (*(unsigned long *)address)
+: "r9", "memory");
+
+   /*
+* The page is mapped at least with PMD size - so skip checks and walk
+* directly to the PMD.
+*/
+   p4dp = p4d_offset(pgdp, address);
+   pudp = pud_offset(p4dp, address);
+   pmdp = pmd_offset(pudp, address);
+
+   if (pmd_large(*pmdp))
+   ptep = split_large_pmd(info, pmdp, address);
+   else
+   ptep = pte_offset_kernel(pmdp, address);
+
+   if (!ptep)
+   return -ENOMEM;
+
+

[tip: x86/seves] x86/head/64: Install startup GDT

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 866b556efa1295934ed0bc20c2f208c93a873fb0
Gitweb:
https://git.kernel.org/tip/866b556efa1295934ed0bc20c2f208c93a873fb0
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:30 +02:00
Committer: Borislav Petkov 
CommitterDate: Mon, 07 Sep 2020 21:33:17 +02:00

x86/head/64: Install startup GDT

Handling exceptions during boot requires a working GDT. The kernel GDT
can't be used on the direct mapping, so load a startup GDT and setup
segments.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-30-j...@8bytes.org
---
 arch/x86/include/asm/setup.h |  1 +
 arch/x86/kernel/head64.c | 33 +
 arch/x86/kernel/head_64.S| 14 ++
 3 files changed, 48 insertions(+)

diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index 84b645c..5c2fd05 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -48,6 +48,7 @@ extern void reserve_standard_io_resources(void);
 extern void i386_reserve_resources(void);
 extern unsigned long __startup_64(unsigned long physaddr, struct boot_params 
*bp);
 extern unsigned long __startup_secondary_64(void);
+extern void startup_64_setup_env(unsigned long physbase);
 extern int early_make_pgtable(unsigned long address);
 
 #ifdef CONFIG_X86_INTEL_MID
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index cbb71c1..8c82be4 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -61,6 +61,24 @@ unsigned long vmemmap_base __ro_after_init = 
__VMEMMAP_BASE_L4;
 EXPORT_SYMBOL(vmemmap_base);
 #endif
 
+/*
+ * GDT used on the boot CPU before switching to virtual addresses.
+ */
+static struct desc_struct startup_gdt[GDT_ENTRIES] = {
+   [GDT_ENTRY_KERNEL32_CS] = GDT_ENTRY_INIT(0xc09b, 0, 0xf),
+   [GDT_ENTRY_KERNEL_CS]   = GDT_ENTRY_INIT(0xa09b, 0, 0xf),
+   [GDT_ENTRY_KERNEL_DS]   = GDT_ENTRY_INIT(0xc093, 0, 0xf),
+};
+
+/*
+ * Address needs to be set at runtime because it references the startup_gdt
+ * while the kernel still uses a direct mapping.
+ */
+static struct desc_ptr startup_gdt_descr = {
+   .size = sizeof(startup_gdt),
+   .address = 0,
+};
+
 #define __head __section(.head.text)
 
 static void __head *fixup_pointer(void *ptr, unsigned long physaddr)
@@ -489,3 +507,18 @@ void __init x86_64_start_reservations(char *real_mode_data)
 
start_kernel();
 }
+
+/*
+ * Setup boot CPU state needed before kernel switches to virtual addresses.
+ */
+void __head startup_64_setup_env(unsigned long physbase)
+{
+   /* Load GDT */
+   startup_gdt_descr.address = (unsigned long)fixup_pointer(startup_gdt, 
physbase);
+   native_load_gdt(&startup_gdt_descr);
+
+   /* New GDT is live - reload data segment registers */
+   asm volatile("movl %%eax, %%ds\n"
+"movl %%eax, %%ss\n"
+"movl %%eax, %%es\n" : : "a"(__KERNEL_DS) : "memory");
+}
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 16da4ac..2b2e916 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -73,6 +73,20 @@ SYM_CODE_START_NOALIGN(startup_64)
/* Set up the stack for verify_cpu(), similar to initial_stack below */
leaq(__end_init_task - SIZEOF_PTREGS)(%rip), %rsp
 
+   leaq_text(%rip), %rdi
+   pushq   %rsi
+   callstartup_64_setup_env
+   popq%rsi
+
+   /* Now switch to __KERNEL_CS so IRET works reliably */
+   pushq   $__KERNEL_CS
+   leaq.Lon_kernel_cs(%rip), %rax
+   pushq   %rax
+   lretq
+
+.Lon_kernel_cs:
+   UNWIND_HINT_EMPTY
+
/* Sanitize CPU configuration */
call verify_cpu
 


[tip: x86/seves] x86/paravirt: Allow hypervisor-specific VMMCALL handling under SEV-ES

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: f6a9f8a45810d2914ea422ff39bfe2e0251c50f2
Gitweb:
https://git.kernel.org/tip/f6a9f8a45810d2914ea422ff39bfe2e0251c50f2
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:16:03 +02:00
Committer: Borislav Petkov 
CommitterDate: Wed, 09 Sep 2020 11:33:20 +02:00

x86/paravirt: Allow hypervisor-specific VMMCALL handling under SEV-ES

Add two new paravirt callbacks to provide hypervisor-specific processor
state in the GHCB and to copy state from the hypervisor back to the
processor.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-63-j...@8bytes.org
---
 arch/x86/include/asm/x86_init.h | 16 +++-
 arch/x86/kernel/sev-es.c| 12 
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 6807153..0304e29 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -4,8 +4,10 @@
 
 #include 
 
+struct ghcb;
 struct mpc_bus;
 struct mpc_cpu;
+struct pt_regs;
 struct mpc_table;
 struct cpuinfo_x86;
 
@@ -236,10 +238,22 @@ struct x86_legacy_features {
 /**
  * struct x86_hyper_runtime - x86 hypervisor specific runtime callbacks
  *
- * @pin_vcpu:  pin current vcpu to specified physical cpu (run rarely)
+ * @pin_vcpu:  pin current vcpu to specified physical
+ * cpu (run rarely)
+ * @sev_es_hcall_prepare:  Load additional hypervisor-specific
+ * state into the GHCB when doing a VMMCALL under
+ * SEV-ES. Called from the #VC exception handler.
+ * @sev_es_hcall_finish:   Copies state from the GHCB back into the
+ * processor (or pt_regs). Also runs checks on the
+ * state returned from the hypervisor after a
+ * VMMCALL under SEV-ES.  Needs to return 'false'
+ * if the checks fail.  Called from the #VC
+ * exception handler.
  */
 struct x86_hyper_runtime {
void (*pin_vcpu)(int cpu);
+   void (*sev_es_hcall_prepare)(struct ghcb *ghcb, struct pt_regs *regs);
+   bool (*sev_es_hcall_finish)(struct ghcb *ghcb, struct pt_regs *regs);
 };
 
 /**
diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 79d5190..6d89df9 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -897,6 +897,9 @@ static enum es_result vc_handle_vmmcall(struct ghcb *ghcb,
ghcb_set_rax(ghcb, ctxt->regs->ax);
ghcb_set_cpl(ghcb, user_mode(ctxt->regs) ? 3 : 0);
 
+   if (x86_platform.hyper.sev_es_hcall_prepare)
+   x86_platform.hyper.sev_es_hcall_prepare(ghcb, ctxt->regs);
+
ret = sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_VMMCALL, 0, 0);
if (ret != ES_OK)
return ret;
@@ -906,6 +909,15 @@ static enum es_result vc_handle_vmmcall(struct ghcb *ghcb,
 
ctxt->regs->ax = ghcb->save.rax;
 
+   /*
+* Call sev_es_hcall_finish() after regs->ax is already set.
+* This allows the hypervisor handler to overwrite it again if
+* necessary.
+*/
+   if (x86_platform.hyper.sev_es_hcall_finish &&
+   !x86_platform.hyper.sev_es_hcall_finish(ghcb, ctxt->regs))
+   return ES_VMM_ERROR;
+
return ES_OK;
 }
 


[tip: x86/seves] x86/boot/compressed/64: Call set_sev_encryption_mask() earlier

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: c2a0304a286f386e45cea3f4b0617f0813de67fd
Gitweb:
https://git.kernel.org/tip/c2a0304a286f386e45cea3f4b0617f0813de67fd
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:21 +02:00
Committer: Borislav Petkov 
CommitterDate: Mon, 07 Sep 2020 19:45:25 +02:00

x86/boot/compressed/64: Call set_sev_encryption_mask() earlier

Call set_sev_encryption_mask() while still on the stage 1 #VC-handler
because the stage 2 handler needs the kernel's own page tables to be
set up, to which calling set_sev_encryption_mask() is a prerequisite.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-21-j...@8bytes.org
---
 arch/x86/boot/compressed/head_64.S  |  9 -
 arch/x86/boot/compressed/ident_map_64.c |  3 ---
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index fb6c039..42190c0 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -543,9 +543,16 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
rep stosq
 
 /*
- * Load stage2 IDT and switch to our own page-table
+ * If running as an SEV guest, the encryption mask is required in the
+ * page-table setup code below. When the guest also has SEV-ES enabled
+ * set_sev_encryption_mask() will cause #VC exceptions, but the stage2
+ * handler can't map its GHCB because the page-table is not set up yet.
+ * So set up the encryption mask here while still on the stage1 #VC
+ * handler. Then load stage2 IDT and switch to the kernel's own
+ * page-table.
  */
pushq   %rsi
+   callset_sev_encryption_mask
callload_stage2_idt
callinitialize_identity_maps
popq%rsi
diff --git a/arch/x86/boot/compressed/ident_map_64.c 
b/arch/x86/boot/compressed/ident_map_64.c
index 62e42c1..b4f2a5f 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -105,9 +105,6 @@ static void add_identity_map(unsigned long start, unsigned 
long end)
 /* Locates and clears a region for a new top level page table. */
 void initialize_identity_maps(void)
 {
-   /* If running as an SEV guest, the encryption mask is required. */
-   set_sev_encryption_mask();
-
/* Exclude the encryption mask from __PHYSICAL_MASK */
physical_mask &= ~sme_me_mask;
 


[tip: x86/seves] x86/boot/compressed/64: Check return value of kernel_ident_mapping_init()

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 4b3fdca64a7e8ad90c87cad1fbc6991471f48dc7
Gitweb:
https://git.kernel.org/tip/4b3fdca64a7e8ad90c87cad1fbc6991471f48dc7
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:22 +02:00
Committer: Borislav Petkov 
CommitterDate: Mon, 07 Sep 2020 19:45:25 +02:00

x86/boot/compressed/64: Check return value of kernel_ident_mapping_init()

The function can fail to create an identity mapping, check for that
and bail out if it happens.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-22-j...@8bytes.org
---
 arch/x86/boot/compressed/ident_map_64.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/boot/compressed/ident_map_64.c 
b/arch/x86/boot/compressed/ident_map_64.c
index b4f2a5f..aa91beb 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -91,6 +91,8 @@ static struct x86_mapping_info mapping_info;
  */
 static void add_identity_map(unsigned long start, unsigned long end)
 {
+   int ret;
+
/* Align boundary to 2M. */
start = round_down(start, PMD_SIZE);
end = round_up(end, PMD_SIZE);
@@ -98,8 +100,9 @@ static void add_identity_map(unsigned long start, unsigned 
long end)
return;
 
/* Build the mapping. */
-   kernel_ident_mapping_init(&mapping_info, (pgd_t *)top_level_pgt,
- start, end);
+   ret = kernel_ident_mapping_init(&mapping_info, (pgd_t *)top_level_pgt, 
start, end);
+   if (ret)
+   error("Error: kernel_ident_mapping_init() failed\n");
 }
 
 /* Locates and clears a region for a new top level page table. */


[tip: x86/seves] x86/sev-es: Wire up existing #VC exit-code handlers

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: d3529bb73f76d0ec8aafaca505226fa0971c1dc9
Gitweb:
https://git.kernel.org/tip/d3529bb73f76d0ec8aafaca505226fa0971c1dc9
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:48 +02:00
Committer: Borislav Petkov 
CommitterDate: Wed, 09 Sep 2020 11:33:19 +02:00

x86/sev-es: Wire up existing #VC exit-code handlers

Re-use the handlers for CPUID- and IOIO-caused #VC exceptions in the
early boot handler.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-48-j...@8bytes.org
---
 arch/x86/kernel/sev-es-shared.c | 7 +++
 arch/x86/kernel/sev-es.c| 6 ++
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/sev-es-shared.c b/arch/x86/kernel/sev-es-shared.c
index 53e3de0..491b557 100644
--- a/arch/x86/kernel/sev-es-shared.c
+++ b/arch/x86/kernel/sev-es-shared.c
@@ -325,8 +325,7 @@ static enum es_result vc_ioio_exitinfo(struct es_em_ctxt 
*ctxt, u64 *exitinfo)
return ES_OK;
 }
 
-static enum es_result __maybe_unused
-vc_handle_ioio(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
+static enum es_result vc_handle_ioio(struct ghcb *ghcb, struct es_em_ctxt 
*ctxt)
 {
struct pt_regs *regs = ctxt->regs;
u64 exit_info_1, exit_info_2;
@@ -434,8 +433,8 @@ vc_handle_ioio(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
return ret;
 }
 
-static enum es_result __maybe_unused vc_handle_cpuid(struct ghcb *ghcb,
-struct es_em_ctxt *ctxt)
+static enum es_result vc_handle_cpuid(struct ghcb *ghcb,
+ struct es_em_ctxt *ctxt)
 {
struct pt_regs *regs = ctxt->regs;
u32 cr4 = native_read_cr4();
diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 0d6b66e..b10a62a 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -441,6 +441,12 @@ static enum es_result vc_handle_exitcode(struct es_em_ctxt 
*ctxt,
enum es_result result;
 
switch (exit_code) {
+   case SVM_EXIT_CPUID:
+   result = vc_handle_cpuid(ghcb, ctxt);
+   break;
+   case SVM_EXIT_IOIO:
+   result = vc_handle_ioio(ghcb, ctxt);
+   break;
default:
/*
 * Unexpected #VC exception


[tip: x86/seves] x86/sev-es: Handle instruction fetches from user-space

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 5e3427a7bc432ed2e5de394ac30f160cc6c37a1f
Gitweb:
https://git.kernel.org/tip/5e3427a7bc432ed2e5de394ac30f160cc6c37a1f
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:49 +02:00
Committer: Borislav Petkov 
CommitterDate: Wed, 09 Sep 2020 11:33:19 +02:00

x86/sev-es: Handle instruction fetches from user-space

When a #VC exception is triggered by user-space, the instruction decoder
needs to read the instruction bytes from user addresses. Enhance
vc_decode_insn() to safely fetch kernel and user instructions.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-49-j...@8bytes.org
---
 arch/x86/kernel/sev-es.c | 31 ++-
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index b10a62a..6c30dbc 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -232,17 +232,30 @@ static enum es_result vc_decode_insn(struct es_em_ctxt 
*ctxt)
enum es_result ret;
int res;
 
-   res = vc_fetch_insn_kernel(ctxt, buffer);
-   if (unlikely(res == -EFAULT)) {
-   ctxt->fi.vector = X86_TRAP_PF;
-   ctxt->fi.error_code = 0;
-   ctxt->fi.cr2= ctxt->regs->ip;
-   return ES_EXCEPTION;
+   if (user_mode(ctxt->regs)) {
+   res = insn_fetch_from_user(ctxt->regs, buffer);
+   if (!res) {
+   ctxt->fi.vector = X86_TRAP_PF;
+   ctxt->fi.error_code = X86_PF_INSTR | X86_PF_USER;
+   ctxt->fi.cr2= ctxt->regs->ip;
+   return ES_EXCEPTION;
+   }
+
+   if (!insn_decode(&ctxt->insn, ctxt->regs, buffer, res))
+   return ES_DECODE_FAILED;
+   } else {
+   res = vc_fetch_insn_kernel(ctxt, buffer);
+   if (res) {
+   ctxt->fi.vector = X86_TRAP_PF;
+   ctxt->fi.error_code = X86_PF_INSTR;
+   ctxt->fi.cr2= ctxt->regs->ip;
+   return ES_EXCEPTION;
+   }
+
+   insn_init(&ctxt->insn, buffer, MAX_INSN_SIZE - res, 1);
+   insn_get_length(&ctxt->insn);
}
 
-   insn_init(&ctxt->insn, buffer, MAX_INSN_SIZE - res, 1);
-   insn_get_length(&ctxt->insn);
-
ret = ctxt->insn.immediate.got ? ES_OK : ES_DECODE_FAILED;
 
return ret;


[tip: x86/seves] x86/insn: Add insn_has_rep_prefix() helper

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 5901781a11175a5e5ee91746ec8627f18d47eebd
Gitweb:
https://git.kernel.org/tip/5901781a11175a5e5ee91746ec8627f18d47eebd
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:12 +02:00
Committer: Borislav Petkov 
CommitterDate: Mon, 07 Sep 2020 19:45:25 +02:00

x86/insn: Add insn_has_rep_prefix() helper

Add a function to check whether an instruction has a REP prefix.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Reviewed-by: Masami Hiramatsu 
Link: https://lkml.kernel.org/r/20200907131613.12703-12-j...@8bytes.org
---
 arch/x86/include/asm/insn-eval.h |  1 +
 arch/x86/lib/insn-eval.c | 24 
 2 files changed, 25 insertions(+)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index f748f57..a0f839a 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -15,6 +15,7 @@
 #define INSN_CODE_SEG_OPND_SZ(params) (params & 0xf)
 #define INSN_CODE_SEG_PARAMS(oper_sz, addr_sz) (oper_sz | (addr_sz << 4))
 
+bool insn_has_rep_prefix(struct insn *insn);
 void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
 int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs);
 int insn_get_modrm_reg_off(struct insn *insn, struct pt_regs *regs);
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index f20942c..58f7fb9 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -54,6 +54,30 @@ static bool is_string_insn(struct insn *insn)
 }
 
 /**
+ * insn_has_rep_prefix() - Determine if instruction has a REP prefix
+ * @insn:  Instruction containing the prefix to inspect
+ *
+ * Returns:
+ *
+ * true if the instruction has a REP prefix, false if not.
+ */
+bool insn_has_rep_prefix(struct insn *insn)
+{
+   int i;
+
+   insn_get_prefixes(insn);
+
+   for (i = 0; i < insn->prefixes.nbytes; i++) {
+   insn_byte_t p = insn->prefixes.bytes[i];
+
+   if (p == 0xf2 || p == 0xf3)
+   return true;
+   }
+
+   return false;
+}
+
+/**
  * get_seg_reg_override_idx() - obtain segment register override index
  * @insn:  Valid instruction with segment override prefixes
  *


[tip: x86/seves] x86/boot/compressed/64: Don't pre-map memory in KASLR code

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 8570978ea030757839747aa9944ea576708be3d4
Gitweb:
https://git.kernel.org/tip/8570978ea030757839747aa9944ea576708be3d4
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:18 +02:00
Committer: Borislav Petkov 
CommitterDate: Mon, 07 Sep 2020 19:45:25 +02:00

x86/boot/compressed/64: Don't pre-map memory in KASLR code

With the page-fault handler in place, he identity mapping can be built
on-demand. So remove the code which manually creates the mappings and
unexport/remove the functions used for it.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Reviewed-by: Kees Cook 
Link: https://lkml.kernel.org/r/20200907131613.12703-18-j...@8bytes.org
---
 arch/x86/boot/compressed/ident_map_64.c |  6 ++
 arch/x86/boot/compressed/kaslr.c| 24 +---
 arch/x86/boot/compressed/misc.h | 10 +--
 3 files changed, 3 insertions(+), 37 deletions(-)

diff --git a/arch/x86/boot/compressed/ident_map_64.c 
b/arch/x86/boot/compressed/ident_map_64.c
index ecf9353..c63257b 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -87,11 +87,9 @@ phys_addr_t physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) 
- 1;
 static struct x86_mapping_info mapping_info;
 
 /*
- * Adds the specified range to what will become the new identity mappings.
- * Once all ranges have been added, the new mapping is activated by calling
- * finalize_identity_maps() below.
+ * Adds the specified range to the identity mappings.
  */
-void add_identity_map(unsigned long start, unsigned long size)
+static void add_identity_map(unsigned long start, unsigned long size)
 {
unsigned long end = start + size;
 
diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 8266286..b59547c 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -397,8 +397,6 @@ static void mem_avoid_init(unsigned long input, unsigned 
long input_size,
 */
mem_avoid[MEM_AVOID_ZO_RANGE].start = input;
mem_avoid[MEM_AVOID_ZO_RANGE].size = (output + init_size) - input;
-   add_identity_map(mem_avoid[MEM_AVOID_ZO_RANGE].start,
-mem_avoid[MEM_AVOID_ZO_RANGE].size);
 
/* Avoid initrd. */
initrd_start  = (u64)boot_params->ext_ramdisk_image << 32;
@@ -416,15 +414,11 @@ static void mem_avoid_init(unsigned long input, unsigned 
long input_size,
cmd_line_size = strnlen((char *)cmd_line, COMMAND_LINE_SIZE-1) 
+ 1;
mem_avoid[MEM_AVOID_CMDLINE].start = cmd_line;
mem_avoid[MEM_AVOID_CMDLINE].size = cmd_line_size;
-   add_identity_map(mem_avoid[MEM_AVOID_CMDLINE].start,
-mem_avoid[MEM_AVOID_CMDLINE].size);
}
 
/* Avoid boot parameters. */
mem_avoid[MEM_AVOID_BOOTPARAMS].start = (unsigned long)boot_params;
mem_avoid[MEM_AVOID_BOOTPARAMS].size = sizeof(*boot_params);
-   add_identity_map(mem_avoid[MEM_AVOID_BOOTPARAMS].start,
-mem_avoid[MEM_AVOID_BOOTPARAMS].size);
 
/* We don't need to set a mapping for setup_data. */
 
@@ -433,11 +427,6 @@ static void mem_avoid_init(unsigned long input, unsigned 
long input_size,
 
/* Enumerate the immovable memory regions */
num_immovable_mem = count_immovable_mem_regions();
-
-#ifdef CONFIG_X86_VERBOSE_BOOTUP
-   /* Make sure video RAM can be used. */
-   add_identity_map(0, PMD_SIZE);
-#endif
 }
 
 /*
@@ -884,19 +873,8 @@ void choose_random_location(unsigned long input,
warn("Physical KASLR disabled: no suitable memory region!");
} else {
/* Update the new physical address location. */
-   if (*output != random_addr) {
-   add_identity_map(random_addr, output_size);
+   if (*output != random_addr)
*output = random_addr;
-   }
-
-   /*
-* This loads the identity mapping page table.
-* This should only be done if a new physical address
-* is found for the kernel, otherwise we should keep
-* the old page table to make it be like the "nokaslr"
-* case.
-*/
-   finalize_identity_maps();
}
 
 
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index f0e1991..9840c82 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -98,17 +98,7 @@ static inline void choose_random_location(unsigned long 
input,
 #endif
 
 #ifdef CONFIG_X86_64
-void initialize_identity_maps(void);
-void add_identity_map(unsigned long start, unsigned long size);
-void finalize_identity_maps(void);
 extern unsigned char _pgtable[];
-#else
-static inline void initialize_identity_maps(v

[tip: x86/seves] x86/boot/compressed/64: Disable red-zone usage

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 6ba0efa46047936afa81460489cfd24bc95dd863
Gitweb:
https://git.kernel.org/tip/6ba0efa46047936afa81460489cfd24bc95dd863
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:13 +02:00
Committer: Borislav Petkov 
CommitterDate: Mon, 07 Sep 2020 19:45:25 +02:00

x86/boot/compressed/64: Disable red-zone usage

The x86-64 ABI defines a red-zone on the stack:

  The 128-byte area beyond the location pointed to by %rsp is considered
  to be reserved and shall not be modified by signal or interrupt
  handlers. Therefore, functions may use this area for temporary data
  that is not needed across function calls. In particular, leaf
  functions may use this area for their entire stack frame, rather than
  adjusting the stack pointer in the prologue and epilogue. This area is
  known as the red zone.

This is not compatible with exception handling, because the IRET frame
written by the hardware at the stack pointer and the functions to handle
the exception will overwrite the temporary variables of the interrupted
function, causing undefined behavior. So disable red-zones for the
pre-decompression boot code.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Reviewed-by: Kees Cook 
Link: https://lkml.kernel.org/r/20200907131613.12703-13-j...@8bytes.org
---
 arch/x86/boot/compressed/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index 3962f59..5343079 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -32,7 +32,7 @@ KBUILD_CFLAGS := -m$(BITS) -O2
 KBUILD_CFLAGS += -fno-strict-aliasing $(call cc-option, -fPIE, -fPIC)
 KBUILD_CFLAGS += -DDISABLE_BRANCH_PROFILING
 cflags-$(CONFIG_X86_32) := -march=i386
-cflags-$(CONFIG_X86_64) := -mcmodel=small
+cflags-$(CONFIG_X86_64) := -mcmodel=small -mno-red-zone
 KBUILD_CFLAGS += $(cflags-y)
 KBUILD_CFLAGS += -mno-mmx -mno-sse
 KBUILD_CFLAGS += -ffreestanding


[tip: x86/seves] x86/boot/compressed/64: Change add_identity_map() to take start and end

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 21cf2372618ef167d8c4ae04880fb873b55b2daa
Gitweb:
https://git.kernel.org/tip/21cf2372618ef167d8c4ae04880fb873b55b2daa
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:19 +02:00
Committer: Borislav Petkov 
CommitterDate: Mon, 07 Sep 2020 19:45:25 +02:00

x86/boot/compressed/64: Change add_identity_map() to take start and end

Changing the function to take start and end as parameters instead of
start and size simplifies the callers which don't need to calculate the
size if they already have start and end.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Reviewed-by: Kees Cook 
Link: https://lkml.kernel.org/r/20200907131613.12703-19-j...@8bytes.org
---
 arch/x86/boot/compressed/ident_map_64.c | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/x86/boot/compressed/ident_map_64.c 
b/arch/x86/boot/compressed/ident_map_64.c
index c63257b..62e42c1 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -89,10 +89,8 @@ static struct x86_mapping_info mapping_info;
 /*
  * Adds the specified range to the identity mappings.
  */
-static void add_identity_map(unsigned long start, unsigned long size)
+static void add_identity_map(unsigned long start, unsigned long end)
 {
-   unsigned long end = start + size;
-
/* Align boundary to 2M. */
start = round_down(start, PMD_SIZE);
end = round_up(end, PMD_SIZE);
@@ -107,8 +105,6 @@ static void add_identity_map(unsigned long start, unsigned 
long size)
 /* Locates and clears a region for a new top level page table. */
 void initialize_identity_maps(void)
 {
-   unsigned long start, size;
-
/* If running as an SEV guest, the encryption mask is required. */
set_sev_encryption_mask();
 
@@ -155,9 +151,7 @@ void initialize_identity_maps(void)
 * New page-table is set up - map the kernel image and load it
 * into cr3.
 */
-   start = (unsigned long)_head;
-   size  = _end - _head;
-   add_identity_map(start, size);
+   add_identity_map((unsigned long)_head, (unsigned long)_end);
write_cr3(top_level_pgt);
 }
 
@@ -189,7 +183,8 @@ static void do_pf_error(const char *msg, unsigned long 
error_code,
 
 void do_boot_page_fault(struct pt_regs *regs, unsigned long error_code)
 {
-   unsigned long address = native_read_cr2();
+   unsigned long address = native_read_cr2() & PMD_MASK;
+   unsigned long end = address + PMD_SIZE;
 
/*
 * Check for unexpected error codes. Unexpected are:
@@ -204,5 +199,5 @@ void do_boot_page_fault(struct pt_regs *regs, unsigned long 
error_code)
 * Error code is sane - now identity map the 2M region around
 * the faulting address.
 */
-   add_identity_map(address & PMD_MASK, PMD_SIZE);
+   add_identity_map(address, end);
 }


[tip: x86/seves] x86/insn: Add insn_get_modrm_reg_off()

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 7af1bd822dd45a669fc178a35cc8183922333d56
Gitweb:
https://git.kernel.org/tip/7af1bd822dd45a669fc178a35cc8183922333d56
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:11 +02:00
Committer: Borislav Petkov 
CommitterDate: Mon, 07 Sep 2020 19:45:24 +02:00

x86/insn: Add insn_get_modrm_reg_off()

Add a function to the instruction decoder which returns the pt_regs
offset of the register specified in the reg field of the modrm byte.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Acked-by: Masami Hiramatsu 
Link: https://lkml.kernel.org/r/20200907131613.12703-11-j...@8bytes.org
---
 arch/x86/include/asm/insn-eval.h |  1 +
 arch/x86/lib/insn-eval.c | 23 +++
 2 files changed, 24 insertions(+)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 392b4fe..f748f57 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -17,6 +17,7 @@
 
 void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
 int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs);
+int insn_get_modrm_reg_off(struct insn *insn, struct pt_regs *regs);
 unsigned long insn_get_seg_base(struct pt_regs *regs, int seg_reg_idx);
 int insn_get_code_seg_params(struct pt_regs *regs);
 int insn_fetch_from_user(struct pt_regs *regs,
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 2323c85..f20942c 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -20,6 +20,7 @@
 
 enum reg_type {
REG_TYPE_RM = 0,
+   REG_TYPE_REG,
REG_TYPE_INDEX,
REG_TYPE_BASE,
 };
@@ -439,6 +440,13 @@ static int get_reg_offset(struct insn *insn, struct 
pt_regs *regs,
regno += 8;
break;
 
+   case REG_TYPE_REG:
+   regno = X86_MODRM_REG(insn->modrm.value);
+
+   if (X86_REX_R(insn->rex_prefix.value))
+   regno += 8;
+   break;
+
case REG_TYPE_INDEX:
regno = X86_SIB_INDEX(insn->sib.value);
if (X86_REX_X(insn->rex_prefix.value))
@@ -808,6 +816,21 @@ int insn_get_modrm_rm_off(struct insn *insn, struct 
pt_regs *regs)
 }
 
 /**
+ * insn_get_modrm_reg_off() - Obtain register in reg part of the ModRM byte
+ * @insn:  Instruction containing the ModRM byte
+ * @regs:  Register values as seen when entering kernel mode
+ *
+ * Returns:
+ *
+ * The register indicated by the reg part of the ModRM byte. The
+ * register is obtained as an offset from the base of pt_regs.
+ */
+int insn_get_modrm_reg_off(struct insn *insn, struct pt_regs *regs)
+{
+   return get_reg_offset(insn, regs, REG_TYPE_REG);
+}
+
+/**
  * get_seg_base_limit() - obtain base address and limit of a segment
  * @insn:  Instruction. Must be valid.
  * @regs:  Register values as seen when entering kernel mode


[tip: x86/seves] x86/boot/compressed/64: Add stage1 #VC handler

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 29dcc60f6a19fb0aaee97bd1ae2ed8a7dc6f0cfe
Gitweb:
https://git.kernel.org/tip/29dcc60f6a19fb0aaee97bd1ae2ed8a7dc6f0cfe
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:20 +02:00
Committer: Borislav Petkov 
CommitterDate: Mon, 07 Sep 2020 19:45:25 +02:00

x86/boot/compressed/64: Add stage1 #VC handler

Add the first handler for #VC exceptions. At stage 1 there is no GHCB
yet because the kernel might still be running on the EFI page table.

The stage 1 handler is limited to the MSR-based protocol to talk to the
hypervisor and can only support CPUID exit-codes, but that is enough to
get to stage 2.

 [ bp: Zap superfluous newlines after rd/wrmsr instruction mnemonics. ]

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-20-j...@8bytes.org
---
 arch/x86/boot/compressed/Makefile  |  1 +-
 arch/x86/boot/compressed/idt_64.c  |  4 +-
 arch/x86/boot/compressed/idt_handlers_64.S |  4 +-
 arch/x86/boot/compressed/misc.h|  1 +-
 arch/x86/boot/compressed/sev-es.c  | 45 ++-
 arch/x86/include/asm/msr-index.h   |  1 +-
 arch/x86/include/asm/sev-es.h  | 37 -
 arch/x86/include/asm/trapnr.h  |  1 +-
 arch/x86/kernel/sev-es-shared.c| 66 +-
 9 files changed, 160 insertions(+)
 create mode 100644 arch/x86/boot/compressed/sev-es.c
 create mode 100644 arch/x86/include/asm/sev-es.h
 create mode 100644 arch/x86/kernel/sev-es-shared.c

diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index e7f3eba..38f4a52 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -88,6 +88,7 @@ ifdef CONFIG_X86_64
vmlinux-objs-y += $(obj)/idt_64.o $(obj)/idt_handlers_64.o
vmlinux-objs-y += $(obj)/mem_encrypt.o
vmlinux-objs-y += $(obj)/pgtable_64.o
+   vmlinux-objs-$(CONFIG_AMD_MEM_ENCRYPT) += $(obj)/sev-es.o
 endif
 
 vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
diff --git a/arch/x86/boot/compressed/idt_64.c 
b/arch/x86/boot/compressed/idt_64.c
index 5f08309..f3ca732 100644
--- a/arch/x86/boot/compressed/idt_64.c
+++ b/arch/x86/boot/compressed/idt_64.c
@@ -32,6 +32,10 @@ void load_stage1_idt(void)
 {
boot_idt_desc.address = (unsigned long)boot_idt;
 
+
+   if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
+   set_idt_entry(X86_TRAP_VC, boot_stage1_vc);
+
load_boot_idt(&boot_idt_desc);
 }
 
diff --git a/arch/x86/boot/compressed/idt_handlers_64.S 
b/arch/x86/boot/compressed/idt_handlers_64.S
index b20e575..92eb4df 100644
--- a/arch/x86/boot/compressed/idt_handlers_64.S
+++ b/arch/x86/boot/compressed/idt_handlers_64.S
@@ -70,3 +70,7 @@ SYM_FUNC_END(\name)
.code64
 
 EXCEPTION_HANDLER  boot_page_fault do_boot_page_fault error_code=1
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+EXCEPTION_HANDLER  boot_stage1_vc do_vc_no_ghcb error_code=1
+#endif
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 9840c82..eaa8b45 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -141,5 +141,6 @@ extern struct desc_ptr boot_idt_desc;
 
 /* IDT Entry Points */
 void boot_page_fault(void);
+void boot_stage1_vc(void);
 
 #endif /* BOOT_COMPRESSED_MISC_H */
diff --git a/arch/x86/boot/compressed/sev-es.c 
b/arch/x86/boot/compressed/sev-es.c
new file mode 100644
index 000..99c3bcd
--- /dev/null
+++ b/arch/x86/boot/compressed/sev-es.c
@@ -0,0 +1,45 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * AMD Encrypted Register State Support
+ *
+ * Author: Joerg Roedel 
+ */
+
+/*
+ * misc.h needs to be first because it knows how to include the other kernel
+ * headers in the pre-decompression code in a way that does not break
+ * compilation.
+ */
+#include "misc.h"
+
+#include 
+#include 
+#include 
+#include 
+
+static inline u64 sev_es_rd_ghcb_msr(void)
+{
+   unsigned long low, high;
+
+   asm volatile("rdmsr" : "=a" (low), "=d" (high) :
+   "c" (MSR_AMD64_SEV_ES_GHCB));
+
+   return ((high << 32) | low);
+}
+
+static inline void sev_es_wr_ghcb_msr(u64 val)
+{
+   u32 low, high;
+
+   low  = val & 0xUL;
+   high = val >> 32;
+
+   asm volatile("wrmsr" : : "c" (MSR_AMD64_SEV_ES_GHCB),
+   "a"(low), "d" (high) : "memory");
+}
+
+#undef __init
+#define __init
+
+/* Include code for early handlers */
+#include "../../kernel/sev-es-shared.c"
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 2859ee4..da34fdb 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -466,6 +466,7 @@
 #define MSR_AMD64_IBSBRTARGET  0xc001103b
 #define MSR_AMD64_IBSOPDATA4   0xc001103d
 #define MSR_AMD64_IBS_REG_COUNT_MAX8 /* includes MSR_AMD64_IBSBRTARGET */

[tip: x86/seves] x86/boot/compressed/64: Add page-fault handler

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 8b0d3b3b41ab6f14f1ce6d4a6b1c5f60b825123f
Gitweb:
https://git.kernel.org/tip/8b0d3b3b41ab6f14f1ce6d4a6b1c5f60b825123f
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:16 +02:00
Committer: Borislav Petkov 
CommitterDate: Mon, 07 Sep 2020 19:45:25 +02:00

x86/boot/compressed/64: Add page-fault handler

Install a page-fault handler to add an identity mapping to addresses
not yet mapped. Also do some checking whether the error code is sane.

This makes non SEV-ES machines use the exception handling
infrastructure in the pre-decompressions boot code too, making it less
likely to break in the future.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Reviewed-by: Kees Cook 
Link: https://lkml.kernel.org/r/20200907131613.12703-16-j...@8bytes.org
---
 arch/x86/boot/compressed/ident_map_64.c| 39 +-
 arch/x86/boot/compressed/idt_64.c  |  2 +-
 arch/x86/boot/compressed/idt_handlers_64.S |  2 +-
 arch/x86/boot/compressed/misc.h|  6 +++-
 4 files changed, 49 insertions(+)

diff --git a/arch/x86/boot/compressed/ident_map_64.c 
b/arch/x86/boot/compressed/ident_map_64.c
index d9932a1..e3d980a 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -19,10 +19,13 @@
 /* No PAGE_TABLE_ISOLATION support needed either: */
 #undef CONFIG_PAGE_TABLE_ISOLATION
 
+#include "error.h"
 #include "misc.h"
 
 /* These actually do the work of building the kernel identity maps. */
 #include 
+#include 
+#include 
 #include 
 /* Use the static base for this part of the boot process */
 #undef __PAGE_OFFSET
@@ -160,3 +163,39 @@ void finalize_identity_maps(void)
 {
write_cr3(top_level_pgt);
 }
+
+static void do_pf_error(const char *msg, unsigned long error_code,
+   unsigned long address, unsigned long ip)
+{
+   error_putstr(msg);
+
+   error_putstr("\nError Code: ");
+   error_puthex(error_code);
+   error_putstr("\nCR2: 0x");
+   error_puthex(address);
+   error_putstr("\nRIP relative to _head: 0x");
+   error_puthex(ip - (unsigned long)_head);
+   error_putstr("\n");
+
+   error("Stopping.\n");
+}
+
+void do_boot_page_fault(struct pt_regs *regs, unsigned long error_code)
+{
+   unsigned long address = native_read_cr2();
+
+   /*
+* Check for unexpected error codes. Unexpected are:
+*  - Faults on present pages
+*  - User faults
+*  - Reserved bits set
+*/
+   if (error_code & (X86_PF_PROT | X86_PF_USER | X86_PF_RSVD))
+   do_pf_error("Unexpected page-fault:", error_code, address, 
regs->ip);
+
+   /*
+* Error code is sane - now identity map the 2M region around
+* the faulting address.
+*/
+   add_identity_map(address & PMD_MASK, PMD_SIZE);
+}
diff --git a/arch/x86/boot/compressed/idt_64.c 
b/arch/x86/boot/compressed/idt_64.c
index 082cd6b..5f08309 100644
--- a/arch/x86/boot/compressed/idt_64.c
+++ b/arch/x86/boot/compressed/idt_64.c
@@ -40,5 +40,7 @@ void load_stage2_idt(void)
 {
boot_idt_desc.address = (unsigned long)boot_idt;
 
+   set_idt_entry(X86_TRAP_PF, boot_page_fault);
+
load_boot_idt(&boot_idt_desc);
 }
diff --git a/arch/x86/boot/compressed/idt_handlers_64.S 
b/arch/x86/boot/compressed/idt_handlers_64.S
index 36dee2f..b20e575 100644
--- a/arch/x86/boot/compressed/idt_handlers_64.S
+++ b/arch/x86/boot/compressed/idt_handlers_64.S
@@ -68,3 +68,5 @@ SYM_FUNC_END(\name)
 
.text
.code64
+
+EXCEPTION_HANDLER  boot_page_fault do_boot_page_fault error_code=1
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 98b7a1d..f0e1991 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -37,6 +37,9 @@
 #define memptr unsigned
 #endif
 
+/* boot/compressed/vmlinux start and end markers */
+extern char _head[], _end[];
+
 /* misc.c */
 extern memptr free_mem_ptr;
 extern memptr free_mem_end_ptr;
@@ -146,4 +149,7 @@ extern pteval_t __default_kernel_pte_mask;
 extern gate_desc boot_idt[BOOT_IDT_ENTRIES];
 extern struct desc_ptr boot_idt_desc;
 
+/* IDT Entry Points */
+void boot_page_fault(void);
+
 #endif /* BOOT_COMPRESSED_MISC_H */


[tip: x86/seves] x86/umip: Factor out instruction fetch

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 172b75e56b08846e6fb07a88e5685ce4e24f4620
Gitweb:
https://git.kernel.org/tip/172b75e56b08846e6fb07a88e5685ce4e24f4620
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:09 +02:00
Committer: Borislav Petkov 
CommitterDate: Mon, 07 Sep 2020 19:45:24 +02:00

x86/umip: Factor out instruction fetch

Factor out the code to fetch the instruction from user-space to a helper
function.

No functional changes.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-9-j...@8bytes.org
---
 arch/x86/include/asm/insn-eval.h |  2 ++-
 arch/x86/kernel/umip.c   | 26 -
 arch/x86/lib/insn-eval.c | 38 +++-
 3 files changed, 46 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 2b6ccf2..b8b9ef1 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -19,5 +19,7 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs);
 int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs);
 unsigned long insn_get_seg_base(struct pt_regs *regs, int seg_reg_idx);
 int insn_get_code_seg_params(struct pt_regs *regs);
+int insn_fetch_from_user(struct pt_regs *regs,
+unsigned char buf[MAX_INSN_SIZE]);
 
 #endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
index 2c304fd..ad135be 100644
--- a/arch/x86/kernel/umip.c
+++ b/arch/x86/kernel/umip.c
@@ -335,11 +335,11 @@ static void force_sig_info_umip_fault(void __user *addr, 
struct pt_regs *regs)
  */
 bool fixup_umip_exception(struct pt_regs *regs)
 {
-   int not_copied, nr_copied, reg_offset, dummy_data_size, umip_inst;
-   unsigned long seg_base = 0, *reg_addr;
+   int nr_copied, reg_offset, dummy_data_size, umip_inst;
/* 10 bytes is the maximum size of the result of UMIP instructions */
unsigned char dummy_data[10] = { 0 };
unsigned char buf[MAX_INSN_SIZE];
+   unsigned long *reg_addr;
void __user *uaddr;
struct insn insn;
int seg_defs;
@@ -347,26 +347,12 @@ bool fixup_umip_exception(struct pt_regs *regs)
if (!regs)
return false;
 
-   /*
-* If not in user-space long mode, a custom code segment could be in
-* use. This is true in protected mode (if the process defined a local
-* descriptor table), or virtual-8086 mode. In most of the cases
-* seg_base will be zero as in USER_CS.
-*/
-   if (!user_64bit_mode(regs))
-   seg_base = insn_get_seg_base(regs, INAT_SEG_REG_CS);
-
-   if (seg_base == -1L)
-   return false;
-
-   not_copied = copy_from_user(buf, (void __user *)(seg_base + regs->ip),
-   sizeof(buf));
-   nr_copied = sizeof(buf) - not_copied;
+   nr_copied = insn_fetch_from_user(regs, buf);
 
/*
-* The copy_from_user above could have failed if user code is protected
-* by a memory protection key. Give up on emulation in such a case.
-* Should we issue a page fault?
+* The insn_fetch_from_user above could have failed if user code
+* is protected by a memory protection key. Give up on emulation
+* in such a case.  Should we issue a page fault?
 */
if (!nr_copied)
return false;
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 5e69603..947b7f1 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -1367,3 +1367,41 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
return (void __user *)-1L;
}
 }
+
+/**
+ * insn_fetch_from_user() - Copy instruction bytes from user-space memory
+ * @regs:  Structure with register values as seen when entering kernel mode
+ * @buf:   Array to store the fetched instruction
+ *
+ * Gets the linear address of the instruction and copies the instruction bytes
+ * to the buf.
+ *
+ * Returns:
+ *
+ * Number of instruction bytes copied.
+ *
+ * 0 if nothing was copied.
+ */
+int insn_fetch_from_user(struct pt_regs *regs, unsigned char 
buf[MAX_INSN_SIZE])
+{
+   unsigned long seg_base = 0;
+   int not_copied;
+
+   /*
+* If not in user-space long mode, a custom code segment could be in
+* use. This is true in protected mode (if the process defined a local
+* descriptor table), or virtual-8086 mode. In most of the cases
+* seg_base will be zero as in USER_CS.
+*/
+   if (!user_64bit_mode(regs)) {
+   seg_base = insn_get_seg_base(regs, INAT_SEG_REG_CS);
+   if (seg_base == -1L)
+   return 0;
+   }
+
+
+   not_copied = copy_from_user(buf, (void __user *)(seg_bas

[tip: x86/seves] x86/boot/compressed/64: Add IDT Infrastructure

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 64e682638eb51070ba6044535b250aad43c5564e
Gitweb:
https://git.kernel.org/tip/64e682638eb51070ba6044535b250aad43c5564e
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:14 +02:00
Committer: Borislav Petkov 
CommitterDate: Mon, 07 Sep 2020 19:45:25 +02:00

x86/boot/compressed/64: Add IDT Infrastructure

Add code needed to setup an IDT in the early pre-decompression
boot-code. The IDT is loaded first in startup_64, which is after
EfiExitBootServices() has been called, and later reloaded when the
kernel image has been relocated to the end of the decompression area.

This allows to setup different IDT handlers before and after the
relocation.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-14-j...@8bytes.org
---
 arch/x86/boot/compressed/Makefile  |  1 +-
 arch/x86/boot/compressed/head_64.S | 25 +++-
 arch/x86/boot/compressed/idt_64.c  | 44 +-
 arch/x86/boot/compressed/idt_handlers_64.S | 70 +-
 arch/x86/boot/compressed/misc.h|  5 ++-
 arch/x86/include/asm/desc_defs.h   |  3 +-
 6 files changed, 147 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/boot/compressed/idt_64.c
 create mode 100644 arch/x86/boot/compressed/idt_handlers_64.S

diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index 5343079..c661dc5 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -85,6 +85,7 @@ vmlinux-objs-$(CONFIG_EARLY_PRINTK) += 
$(obj)/early_serial_console.o
 vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr.o
 ifdef CONFIG_X86_64
vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr_64.o
+   vmlinux-objs-y += $(obj)/idt_64.o $(obj)/idt_handlers_64.o
vmlinux-objs-y += $(obj)/mem_encrypt.o
vmlinux-objs-y += $(obj)/pgtable_64.o
 endif
diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index 97d37f0..c634ed8 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "pgtable.h"
 
 /*
@@ -410,6 +411,10 @@ SYM_CODE_START(startup_64)
 
 .Lon_kernel_cs:
 
+   pushq   %rsi
+   callload_stage1_idt
+   popq%rsi
+
/*
 * paging_prepare() sets up the trampoline and checks if we need to
 * enable 5-level paging.
@@ -538,6 +543,13 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
rep stosq
 
 /*
+ * Load stage2 IDT
+ */
+   pushq   %rsi
+   callload_stage2_idt
+   popq%rsi
+
+/*
  * Do the extraction, and jump to the new kernel..
  */
pushq   %rsi/* Save the real mode argument */
@@ -690,10 +702,21 @@ SYM_DATA_START_LOCAL(gdt)
.quad   0x  /* TS continued */
 SYM_DATA_END_LABEL(gdt, SYM_L_LOCAL, gdt_end)
 
+SYM_DATA_START(boot_idt_desc)
+   .word   boot_idt_end - boot_idt - 1
+   .quad   0
+SYM_DATA_END(boot_idt_desc)
+   .balign 8
+SYM_DATA_START(boot_idt)
+   .rept   BOOT_IDT_ENTRIES
+   .quad   0
+   .quad   0
+   .endr
+SYM_DATA_END_LABEL(boot_idt, SYM_L_GLOBAL, boot_idt_end)
+
 #ifdef CONFIG_EFI_STUB
 SYM_DATA(image_offset, .long 0)
 #endif
-
 #ifdef CONFIG_EFI_MIXED
 SYM_DATA_LOCAL(efi32_boot_args, .long 0, 0, 0)
 SYM_DATA(efi_is64, .byte 1)
diff --git a/arch/x86/boot/compressed/idt_64.c 
b/arch/x86/boot/compressed/idt_64.c
new file mode 100644
index 000..082cd6b
--- /dev/null
+++ b/arch/x86/boot/compressed/idt_64.c
@@ -0,0 +1,44 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include 
+#include 
+#include 
+#include "misc.h"
+
+static void set_idt_entry(int vector, void (*handler)(void))
+{
+   unsigned long address = (unsigned long)handler;
+   gate_desc entry;
+
+   memset(&entry, 0, sizeof(entry));
+
+   entry.offset_low= (u16)(address & 0x);
+   entry.segment   = __KERNEL_CS;
+   entry.bits.type = GATE_TRAP;
+   entry.bits.p= 1;
+   entry.offset_middle = (u16)((address >> 16) & 0x);
+   entry.offset_high   = (u32)(address >> 32);
+
+   memcpy(&boot_idt[vector], &entry, sizeof(entry));
+}
+
+/* Have this here so we don't need to include  */
+static void load_boot_idt(const struct desc_ptr *dtr)
+{
+   asm volatile("lidt %0"::"m" (*dtr));
+}
+
+/* Setup IDT before kernel jumping to  .Lrelocated */
+void load_stage1_idt(void)
+{
+   boot_idt_desc.address = (unsigned long)boot_idt;
+
+   load_boot_idt(&boot_idt_desc);
+}
+
+/* Setup IDT after kernel jumping to  .Lrelocated */
+void load_stage2_idt(void)
+{
+   boot_idt_desc.address = (unsigned long)boot_idt;
+
+   load_boot_idt(&boot_idt_desc);
+}
diff --git a/arch/x86/boot/compressed/idt_handlers_64.S 
b/arch/x86/boot/compressed/idt_handlers_6

[tip: x86/seves] x86/umip: Factor out instruction decoding

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 172639d79977ca7b5ce6f84f6606262f4081718f
Gitweb:
https://git.kernel.org/tip/172639d79977ca7b5ce6f84f6606262f4081718f
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:10 +02:00
Committer: Borislav Petkov 
CommitterDate: Mon, 07 Sep 2020 19:45:24 +02:00

x86/umip: Factor out instruction decoding

Factor out the code used to decode an instruction with the correct
address and operand sizes to a helper function.

No functional changes.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-10-j...@8bytes.org
---
 arch/x86/include/asm/insn-eval.h |  2 +-
 arch/x86/kernel/umip.c   | 23 +
 arch/x86/lib/insn-eval.c | 45 +++-
 3 files changed, 48 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index b8b9ef1..392b4fe 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -21,5 +21,7 @@ unsigned long insn_get_seg_base(struct pt_regs *regs, int 
seg_reg_idx);
 int insn_get_code_seg_params(struct pt_regs *regs);
 int insn_fetch_from_user(struct pt_regs *regs,
 unsigned char buf[MAX_INSN_SIZE]);
+bool insn_decode(struct insn *insn, struct pt_regs *regs,
+unsigned char buf[MAX_INSN_SIZE], int buf_size);
 
 #endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
index ad135be..f6225bf 100644
--- a/arch/x86/kernel/umip.c
+++ b/arch/x86/kernel/umip.c
@@ -342,7 +342,6 @@ bool fixup_umip_exception(struct pt_regs *regs)
unsigned long *reg_addr;
void __user *uaddr;
struct insn insn;
-   int seg_defs;
 
if (!regs)
return false;
@@ -357,27 +356,7 @@ bool fixup_umip_exception(struct pt_regs *regs)
if (!nr_copied)
return false;
 
-   insn_init(&insn, buf, nr_copied, user_64bit_mode(regs));
-
-   /*
-* Override the default operand and address sizes with what is specified
-* in the code segment descriptor. The instruction decoder only sets
-* the address size it to either 4 or 8 address bytes and does nothing
-* for the operand bytes. This OK for most of the cases, but we could
-* have special cases where, for instance, a 16-bit code segment
-* descriptor is used.
-* If there is an address override prefix, the instruction decoder
-* correctly updates these values, even for 16-bit defaults.
-*/
-   seg_defs = insn_get_code_seg_params(regs);
-   if (seg_defs == -EINVAL)
-   return false;
-
-   insn.addr_bytes = INSN_CODE_SEG_ADDR_SZ(seg_defs);
-   insn.opnd_bytes = INSN_CODE_SEG_OPND_SZ(seg_defs);
-
-   insn_get_length(&insn);
-   if (nr_copied < insn.length)
+   if (!insn_decode(&insn, regs, buf, nr_copied))
return false;
 
umip_inst = identify_insn(&insn);
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 947b7f1..2323c85 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -1405,3 +1405,48 @@ int insn_fetch_from_user(struct pt_regs *regs, unsigned 
char buf[MAX_INSN_SIZE])
 
return MAX_INSN_SIZE - not_copied;
 }
+
+/**
+ * insn_decode() - Decode an instruction
+ * @insn:  Structure to store decoded instruction
+ * @regs:  Structure with register values as seen when entering kernel mode
+ * @buf:   Buffer containing the instruction bytes
+ * @buf_size:   Number of instruction bytes available in buf
+ *
+ * Decodes the instruction provided in buf and stores the decoding results in
+ * insn. Also determines the correct address and operand sizes.
+ *
+ * Returns:
+ *
+ * True if instruction was decoded, False otherwise.
+ */
+bool insn_decode(struct insn *insn, struct pt_regs *regs,
+unsigned char buf[MAX_INSN_SIZE], int buf_size)
+{
+   int seg_defs;
+
+   insn_init(insn, buf, buf_size, user_64bit_mode(regs));
+
+   /*
+* Override the default operand and address sizes with what is specified
+* in the code segment descriptor. The instruction decoder only sets
+* the address size it to either 4 or 8 address bytes and does nothing
+* for the operand bytes. This OK for most of the cases, but we could
+* have special cases where, for instance, a 16-bit code segment
+* descriptor is used.
+* If there is an address override prefix, the instruction decoder
+* correctly updates these values, even for 16-bit defaults.
+*/
+   seg_defs = insn_get_code_seg_params(regs);
+   if (seg_defs == -EINVAL)
+   return false;
+
+   insn->addr_bytes = INSN_CODE_SEG_ADDR_SZ(seg_defs);
+   insn->opnd_bytes = INSN_CODE_SEG_OPND_SZ(seg_defs);
+
+   insn

[tip: x86/seves] x86/sev-es: Compile early handler code into kernel image

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: f980f9c31a923e9040dee0bc679a5f5b09e61f40
Gitweb:
https://git.kernel.org/tip/f980f9c31a923e9040dee0bc679a5f5b09e61f40
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:39 +02:00
Committer: Borislav Petkov 
CommitterDate: Wed, 09 Sep 2020 10:44:46 +02:00

x86/sev-es: Compile early handler code into kernel image

Setup sev-es.c and include the code from the pre-decompression stage
to also build it into the image of the running kernel. Temporarily add
__maybe_unused annotations to avoid build warnings until the functions
get used.

 [ bp: Use the non-tracing rd/wrmsr variants because:
   vmlinux.o: warning: objtool: __sev_es_nmi_complete()+0x11f: \
   call to do_trace_write_msr() leaves .noinstr.text section
   as __sev_es_nmi_complete() is noinstr due to being called from the
   NMI handler exc_nmi(). ]

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-39-j...@8bytes.org
---
 arch/x86/kernel/Makefile|   1 +-
 arch/x86/kernel/sev-es-shared.c |  21 ++--
 arch/x86/kernel/sev-es.c| 163 +++-
 3 files changed, 175 insertions(+), 10 deletions(-)
 create mode 100644 arch/x86/kernel/sev-es.c

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index e77261d..23bc0f8 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -145,6 +145,7 @@ obj-$(CONFIG_UNWINDER_ORC)  += unwind_orc.o
 obj-$(CONFIG_UNWINDER_FRAME_POINTER)   += unwind_frame.o
 obj-$(CONFIG_UNWINDER_GUESS)   += unwind_guess.o
 
+obj-$(CONFIG_AMD_MEM_ENCRYPT)  += sev-es.o
 ###
 # 64 bit specific files
 ifeq ($(CONFIG_X86_64),y)
diff --git a/arch/x86/kernel/sev-es-shared.c b/arch/x86/kernel/sev-es-shared.c
index a6b4191..1861927 100644
--- a/arch/x86/kernel/sev-es-shared.c
+++ b/arch/x86/kernel/sev-es-shared.c
@@ -9,7 +9,7 @@
  * and is included directly into both code-bases.
  */
 
-static void sev_es_terminate(unsigned int reason)
+static void __maybe_unused sev_es_terminate(unsigned int reason)
 {
u64 val = GHCB_SEV_TERMINATE;
 
@@ -27,7 +27,7 @@ static void sev_es_terminate(unsigned int reason)
asm volatile("hlt\n" : : : "memory");
 }
 
-static bool sev_es_negotiate_protocol(void)
+static bool __maybe_unused sev_es_negotiate_protocol(void)
 {
u64 val;
 
@@ -46,7 +46,7 @@ static bool sev_es_negotiate_protocol(void)
return true;
 }
 
-static void vc_ghcb_invalidate(struct ghcb *ghcb)
+static void __maybe_unused vc_ghcb_invalidate(struct ghcb *ghcb)
 {
memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap));
 }
@@ -58,9 +58,9 @@ static bool vc_decoding_needed(unsigned long exit_code)
 exit_code <= SVM_EXIT_LAST_EXCP);
 }
 
-static enum es_result vc_init_em_ctxt(struct es_em_ctxt *ctxt,
- struct pt_regs *regs,
- unsigned long exit_code)
+static enum es_result __maybe_unused vc_init_em_ctxt(struct es_em_ctxt *ctxt,
+struct pt_regs *regs,
+unsigned long exit_code)
 {
enum es_result ret = ES_OK;
 
@@ -73,7 +73,7 @@ static enum es_result vc_init_em_ctxt(struct es_em_ctxt *ctxt,
return ret;
 }
 
-static void vc_finish_insn(struct es_em_ctxt *ctxt)
+static void __maybe_unused vc_finish_insn(struct es_em_ctxt *ctxt)
 {
ctxt->regs->ip += ctxt->insn.length;
 }
@@ -325,7 +325,8 @@ static enum es_result vc_ioio_exitinfo(struct es_em_ctxt 
*ctxt, u64 *exitinfo)
return ES_OK;
 }
 
-static enum es_result vc_handle_ioio(struct ghcb *ghcb, struct es_em_ctxt 
*ctxt)
+static enum es_result __maybe_unused
+vc_handle_ioio(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
 {
struct pt_regs *regs = ctxt->regs;
u64 exit_info_1, exit_info_2;
@@ -433,8 +434,8 @@ static enum es_result vc_handle_ioio(struct ghcb *ghcb, 
struct es_em_ctxt *ctxt)
return ret;
 }
 
-static enum es_result vc_handle_cpuid(struct ghcb *ghcb,
- struct es_em_ctxt *ctxt)
+static enum es_result __maybe_unused vc_handle_cpuid(struct ghcb *ghcb,
+struct es_em_ctxt *ctxt)
 {
struct pt_regs *regs = ctxt->regs;
u32 cr4 = native_read_cr4();
diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
new file mode 100644
index 000..4b1d217
--- /dev/null
+++ b/arch/x86/kernel/sev-es.c
@@ -0,0 +1,163 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2019 SUSE
+ *
+ * Author: Joerg Roedel 
+ */
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static inline u64 sev_es_rd_ghcb_msr(void)
+{
+   return __rdmsr(MSR_AMD64_SEV_E

[tip: x86/seves] x86/insn: Make inat-tables.c suitable for pre-decompression code

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 05a2ae7c033ee30f25fbed3ceed549a5cac398a9
Gitweb:
https://git.kernel.org/tip/05a2ae7c033ee30f25fbed3ceed549a5cac398a9
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:08 +02:00
Committer: Borislav Petkov 
CommitterDate: Mon, 07 Sep 2020 19:45:24 +02:00

x86/insn: Make inat-tables.c suitable for pre-decompression code

The inat-tables.c file has some arrays in it that contain pointers to
other arrays. These pointers need to be relocated when the kernel
image is moved to a different location.

The pre-decompression boot-code has no support for applying ELF
relocations, so initialize these arrays at runtime in the
pre-decompression code to make sure all pointers are correctly
initialized.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Acked-by: Masami Hiramatsu 
Link: https://lkml.kernel.org/r/20200907131613.12703-8-j...@8bytes.org
---
 arch/x86/tools/gen-insn-attr-x86.awk   | 50 -
 tools/arch/x86/tools/gen-insn-attr-x86.awk | 50 -
 2 files changed, 98 insertions(+), 2 deletions(-)

diff --git a/arch/x86/tools/gen-insn-attr-x86.awk 
b/arch/x86/tools/gen-insn-attr-x86.awk
index a42015b..af38469 100644
--- a/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/arch/x86/tools/gen-insn-attr-x86.awk
@@ -362,6 +362,9 @@ function convert_operands(count,opnd,   i,j,imm,mod)
 END {
if (awkchecked != "")
exit 1
+
+   print "#ifndef __BOOT_COMPRESSED\n"
+
# print escape opcode map's array
print "/* Escape opcode map array */"
print "const insn_attr_t * const inat_escape_tables[INAT_ESC_MAX + 1]" \
@@ -388,6 +391,51 @@ END {
for (j = 0; j < max_lprefix; j++)
if (atable[i,j])
print " ["i"]["j"] = "atable[i,j]","
-   print "};"
+   print "};\n"
+
+   print "#else /* !__BOOT_COMPRESSED */\n"
+
+   print "/* Escape opcode map array */"
+   print "static const insn_attr_t *inat_escape_tables[INAT_ESC_MAX + 1]" \
+ "[INAT_LSTPFX_MAX + 1];"
+   print ""
+
+   print "/* Group opcode map array */"
+   print "static const insn_attr_t *inat_group_tables[INAT_GRP_MAX + 1]"\
+ "[INAT_LSTPFX_MAX + 1];"
+   print ""
+
+   print "/* AVX opcode map array */"
+   print "static const insn_attr_t *inat_avx_tables[X86_VEX_M_MAX + 1]"\
+ "[INAT_LSTPFX_MAX + 1];"
+   print ""
+
+   print "static void inat_init_tables(void)"
+   print "{"
+
+   # print escape opcode map's array
+   print "\t/* Print Escape opcode map array */"
+   for (i = 0; i < geid; i++)
+   for (j = 0; j < max_lprefix; j++)
+   if (etable[i,j])
+   print "\tinat_escape_tables["i"]["j"] = 
"etable[i,j]";"
+   print ""
+
+   # print group opcode map's array
+   print "\t/* Print Group opcode map array */"
+   for (i = 0; i < ggid; i++)
+   for (j = 0; j < max_lprefix; j++)
+   if (gtable[i,j])
+   print "\tinat_group_tables["i"]["j"] = 
"gtable[i,j]";"
+   print ""
+   # print AVX opcode map's array
+   print "\t/* Print AVX opcode map array */"
+   for (i = 0; i < gaid; i++)
+   for (j = 0; j < max_lprefix; j++)
+   if (atable[i,j])
+   print "\tinat_avx_tables["i"]["j"] = 
"atable[i,j]";"
+
+   print "}"
+   print "#endif"
 }
 
diff --git a/tools/arch/x86/tools/gen-insn-attr-x86.awk 
b/tools/arch/x86/tools/gen-insn-attr-x86.awk
index a42015b..af38469 100644
--- a/tools/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/tools/arch/x86/tools/gen-insn-attr-x86.awk
@@ -362,6 +362,9 @@ function convert_operands(count,opnd,   i,j,imm,mod)
 END {
if (awkchecked != "")
exit 1
+
+   print "#ifndef __BOOT_COMPRESSED\n"
+
# print escape opcode map's array
print "/* Escape opcode map array */"
print "const insn_attr_t * const inat_escape_tables[INAT_ESC_MAX + 1]" \
@@ -388,6 +391,51 @@ END {
for (j = 0; j < max_lprefix; j++)
if (atable[i,j])
print " ["i"]["j"] = "atable[i,j]","
-   print "};"
+   print "};\n"
+
+   print "#else /* !__BOOT_COMPRESSED */\n"
+
+   print "/* Escape opcode map array */"
+   print "static const insn_attr_t *inat_escape_tables[INAT_ESC_MAX + 1]" \
+ "[INAT_LSTPFX_MAX + 1];"
+   print ""
+
+   print "/* Group opcode map array */"
+   print "static const insn_attr_t *inat_group_tables[INAT_GRP_MAX + 1]"\
+ "[INAT_LSTPFX_MAX + 1];"
+   print ""
+
+   print "/* AVX opcode map array */"
+   print "static const insn_attr_t *inat_avx_tables[X86_VEX_M_MAX + 1]"\
+ "

[tip: x86/seves] x86/traps: Move pf error codes to

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 05a2fdf3230306daee1def019b8f52cd06bd2e48
Gitweb:
https://git.kernel.org/tip/05a2fdf3230306daee1def019b8f52cd06bd2e48
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:07 +02:00
Committer: Borislav Petkov 
CommitterDate: Mon, 07 Sep 2020 19:45:24 +02:00

x86/traps: Move pf error codes to 

Move the definition of the x86 page-fault error code bits to a new
header file asm/trap_pf.h. This makes it easier to include them into
pre-decompression boot code. No functional changes.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-7-j...@8bytes.org
---
 arch/x86/include/asm/trap_pf.h | 24 
 arch/x86/include/asm/traps.h   | 19 +--
 2 files changed, 25 insertions(+), 18 deletions(-)
 create mode 100644 arch/x86/include/asm/trap_pf.h

diff --git a/arch/x86/include/asm/trap_pf.h b/arch/x86/include/asm/trap_pf.h
new file mode 100644
index 000..305bc12
--- /dev/null
+++ b/arch/x86/include/asm/trap_pf.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_TRAP_PF_H
+#define _ASM_X86_TRAP_PF_H
+
+/*
+ * Page fault error code bits:
+ *
+ *   bit 0 ==   0: no page found   1: protection fault
+ *   bit 1 ==   0: read access 1: write access
+ *   bit 2 ==   0: kernel-mode access  1: user-mode access
+ *   bit 3 ==  1: use of reserved bit detected
+ *   bit 4 ==  1: fault was an instruction fetch
+ *   bit 5 ==  1: protection keys block access
+ */
+enum x86_pf_error_code {
+   X86_PF_PROT =   1 << 0,
+   X86_PF_WRITE=   1 << 1,
+   X86_PF_USER =   1 << 2,
+   X86_PF_RSVD =   1 << 3,
+   X86_PF_INSTR=   1 << 4,
+   X86_PF_PK   =   1 << 5,
+};
+
+#endif /* _ASM_X86_TRAP_PF_H */
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 714b1a3..6a30835 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include/* TRAP_TRACE, ... */
+#include 
 
 #ifdef CONFIG_X86_64
 asmlinkage __visible notrace struct pt_regs *sync_regs(struct pt_regs *eregs);
@@ -41,22 +42,4 @@ void __noreturn handle_stack_overflow(const char *message,
  unsigned long fault_address);
 #endif
 
-/*
- * Page fault error code bits:
- *
- *   bit 0 ==   0: no page found   1: protection fault
- *   bit 1 ==   0: read access 1: write access
- *   bit 2 ==   0: kernel-mode access  1: user-mode access
- *   bit 3 ==  1: use of reserved bit detected
- *   bit 4 ==  1: fault was an instruction fetch
- *   bit 5 ==  1: protection keys block access
- */
-enum x86_pf_error_code {
-   X86_PF_PROT =   1 << 0,
-   X86_PF_WRITE=   1 << 1,
-   X86_PF_USER =   1 << 2,
-   X86_PF_RSVD =   1 << 3,
-   X86_PF_INSTR=   1 << 4,
-   X86_PF_PK   =   1 << 5,
-};
 #endif /* _ASM_X86_TRAPS_H */


[tip: x86/seves] x86/entry/64: Add entry code for #VC handler

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: a13644f3a53de4e95a7bce6459f834e832ea44c5
Gitweb:
https://git.kernel.org/tip/a13644f3a53de4e95a7bce6459f834e832ea44c5
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:46 +02:00
Committer: Borislav Petkov 
CommitterDate: Wed, 09 Sep 2020 11:33:19 +02:00

x86/entry/64: Add entry code for #VC handler

The #VC handler needs special entry code because:

1. It runs on an IST stack

2. It needs to be able to handle nested #VC exceptions

To make this work, the entry code is implemented to pretend it doesn't
use an IST stack. When entered from user-mode or early SYSCALL entry
path it switches to the task stack. If entered from kernel-mode it tries
to switch back to the previous stack in the IRET frame.

The stack found in the IRET frame is validated first, and if it is not
safe to use it for the #VC handler, the code will switch to a
fall-back stack (the #VC2 IST stack). From there, it can cause nested
exceptions again.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-46-j...@8bytes.org
---
 arch/x86/entry/entry_64.S   | 80 -
 arch/x86/include/asm/idtentry.h | 44 ++-
 arch/x86/include/asm/proto.h|  1 +-
 arch/x86/include/asm/traps.h|  1 +-
 arch/x86/kernel/traps.c | 45 ++-
 5 files changed, 171 insertions(+)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 70dea93..15aa189 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -101,6 +101,8 @@ SYM_CODE_START(entry_SYSCALL_64)
SWITCH_TO_KERNEL_CR3 scratch_reg=%rsp
movqPER_CPU_VAR(cpu_current_top_of_stack), %rsp
 
+SYM_INNER_LABEL(entry_SYSCALL_64_safe_stack, SYM_L_GLOBAL)
+
/* Construct struct pt_regs on stack */
pushq   $__USER_DS  /* pt_regs->ss */
pushq   PER_CPU_VAR(cpu_tss_rw + TSS_sp2)   /* pt_regs->sp */
@@ -446,6 +448,84 @@ _ASM_NOKPROBE(\asmsym)
 SYM_CODE_END(\asmsym)
 .endm
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+/**
+ * idtentry_vc - Macro to generate entry stub for #VC
+ * @vector:Vector number
+ * @asmsym:ASM symbol for the entry point
+ * @cfunc: C function to be called
+ *
+ * The macro emits code to set up the kernel context for #VC. The #VC handler
+ * runs on an IST stack and needs to be able to cause nested #VC exceptions.
+ *
+ * To make this work the #VC entry code tries its best to pretend it doesn't 
use
+ * an IST stack by switching to the task stack if coming from user-space (which
+ * includes early SYSCALL entry path) or back to the stack in the IRET frame if
+ * entered from kernel-mode.
+ *
+ * If entered from kernel-mode the return stack is validated first, and if it 
is
+ * not safe to use (e.g. because it points to the entry stack) the #VC handler
+ * will switch to a fall-back stack (VC2) and call a special handler function.
+ *
+ * The macro is only used for one vector, but it is planned to be extended in
+ * the future for the #HV exception.
+ */
+.macro idtentry_vc vector asmsym cfunc
+SYM_CODE_START(\asmsym)
+   UNWIND_HINT_IRET_REGS
+   ASM_CLAC
+
+   /*
+* If the entry is from userspace, switch stacks and treat it as
+* a normal entry.
+*/
+   testb   $3, CS-ORIG_RAX(%rsp)
+   jnz .Lfrom_usermode_switch_stack_\@
+
+   /*
+* paranoid_entry returns SWAPGS flag for paranoid_exit in EBX.
+* EBX == 0 -> SWAPGS, EBX == 1 -> no SWAPGS
+*/
+   callparanoid_entry
+
+   UNWIND_HINT_REGS
+
+   /*
+* Switch off the IST stack to make it free for nested exceptions. The
+* vc_switch_off_ist() function will switch back to the interrupted
+* stack if it is safe to do so. If not it switches to the VC fall-back
+* stack.
+*/
+   movq%rsp, %rdi  /* pt_regs pointer */
+   callvc_switch_off_ist
+   movq%rax, %rsp  /* Switch to new stack */
+
+   UNWIND_HINT_REGS
+
+   /* Update pt_regs */
+   movqORIG_RAX(%rsp), %rsi/* get error code into 2nd argument*/
+   movq$-1, ORIG_RAX(%rsp) /* no syscall to restart */
+
+   movq%rsp, %rdi  /* pt_regs pointer */
+
+   call\cfunc
+
+   /*
+* No need to switch back to the IST stack. The current stack is either
+* identical to the stack in the IRET frame or the VC fall-back stack,
+* so it is definitly mapped even with PTI enabled.
+*/
+   jmp paranoid_exit
+
+   /* Switch to the regular task stack */
+.Lfrom_usermode_switch_stack_\@:
+   idtentry_body safe_stack_\cfunc, has_error_code=1
+
+_ASM_NOKPROBE(\asmsym)
+SYM_CODE_END(\asmsym)
+.endm
+#endif
+
 /*
  * Double fault entry. Straight paranoid. No 

[tip: x86/seves] x86/sev-es: Setup an early #VC handler

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 74d8d9d531b4cc945a9f75aa2fc21d99ca5a9fe3
Gitweb:
https://git.kernel.org/tip/74d8d9d531b4cc945a9f75aa2fc21d99ca5a9fe3
Author:Joerg Roedel 
AuthorDate:Tue, 08 Sep 2020 14:35:17 +02:00
Committer: Borislav Petkov 
CommitterDate: Wed, 09 Sep 2020 10:45:24 +02:00

x86/sev-es: Setup an early #VC handler

Setup an early handler for #VC exceptions. There is no GHCB mapped
yet, so just re-use the vc_no_ghcb_handler(). It can only handle
CPUID exit-codes, but that should be enough to get the kernel through
verify_cpu() and __startup_64() until it runs on virtual addresses.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
[ boot failure Error: kernel_ident_mapping_init() failed. ]
Reported-by: kernel test robot 
Link: https://lkml.kernel.org/r/20200908123517.ga3...@8bytes.org
---
 arch/x86/include/asm/sev-es.h |  3 +++
 arch/x86/kernel/head64.c  | 25 -
 arch/x86/kernel/head_64.S | 30 ++
 3 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/sev-es.h b/arch/x86/include/asm/sev-es.h
index 6dc5244..7175d43 100644
--- a/arch/x86/include/asm/sev-es.h
+++ b/arch/x86/include/asm/sev-es.h
@@ -73,4 +73,7 @@ static inline u64 lower_bits(u64 val, unsigned int bits)
return (val & mask);
 }
 
+/* Early IDT entry points for #VC handler */
+extern void vc_no_ghcb(void);
+
 #endif
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 4282dac..fc55cc9 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Manage page tables very early on.
@@ -540,12 +541,34 @@ static struct desc_ptr bringup_idt_descr = {
.address= 0, /* Set at runtime */
 };
 
+static void set_bringup_idt_handler(gate_desc *idt, int n, void *handler)
+{
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+   struct idt_data data;
+   gate_desc desc;
+
+   init_idt_data(&data, n, handler);
+   idt_init_desc(&desc, &data);
+   native_write_idt_entry(idt, n, &desc);
+#endif
+}
+
 /* This runs while still in the direct mapping */
 static void startup_64_load_idt(unsigned long physbase)
 {
struct desc_ptr *desc = fixup_pointer(&bringup_idt_descr, physbase);
+   gate_desc *idt = fixup_pointer(bringup_idt_table, physbase);
+
+
+   if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
+   void *handler;
+
+   /* VMM Communication Exception */
+   handler = fixup_pointer(vc_no_ghcb, physbase);
+   set_bringup_idt_handler(idt, X86_TRAP_VC, handler);
+   }
 
-   desc->address = (unsigned long)fixup_pointer(bringup_idt_table, 
physbase);
+   desc->address = (unsigned long)idt;
native_load_idt(desc);
 }
 
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 3b40ec4..6e68bca 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -348,6 +348,36 @@ SYM_CODE_START_LOCAL(early_idt_handler_common)
jmp restore_regs_and_return_to_kernel
 SYM_CODE_END(early_idt_handler_common)
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+/*
+ * VC Exception handler used during very early boot. The
+ * early_idt_handler_array can't be used because it returns via the
+ * paravirtualized INTERRUPT_RETURN and pv-ops don't work that early.
+ *
+ * This handler will end up in the .init.text section and not be
+ * available to boot secondary CPUs.
+ */
+SYM_CODE_START_NOALIGN(vc_no_ghcb)
+   UNWIND_HINT_IRET_REGS offset=8
+
+   /* Build pt_regs */
+   PUSH_AND_CLEAR_REGS
+
+   /* Call C handler */
+   movq%rsp, %rdi
+   movqORIG_RAX(%rsp), %rsi
+   calldo_vc_no_ghcb
+
+   /* Unwind pt_regs */
+   POP_REGS
+
+   /* Remove Error Code */
+   addq$8, %rsp
+
+   /* Pure iret required here - don't use INTERRUPT_RETURN */
+   iretq
+SYM_CODE_END(vc_no_ghcb)
+#endif
 
 #define SYM_DATA_START_PAGE_ALIGNED(name)  \
SYM_START(name, SYM_L_GLOBAL, .balign PAGE_SIZE)


[tip: x86/seves] x86/head/64: Move early exception dispatch to C code

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 4b47cdbda6f1ad73b08dc7d497bac12b8f26ae0d
Gitweb:
https://git.kernel.org/tip/4b47cdbda6f1ad73b08dc7d497bac12b8f26ae0d
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:36 +02:00
Committer: Borislav Petkov 
CommitterDate: Mon, 07 Sep 2020 22:49:18 +02:00

x86/head/64: Move early exception dispatch to C code

Move the assembly coded dispatch between page-faults and all other
exceptions to C code to make it easier to maintain and extend.

Also change the return-type of early_make_pgtable() to bool and make it
static.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-36-j...@8bytes.org
---
 arch/x86/include/asm/pgtable.h |  2 +-
 arch/x86/include/asm/setup.h   |  4 +++-
 arch/x86/kernel/head64.c   | 19 +++
 arch/x86/kernel/head_64.S  | 11 +--
 4 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index b836138..7b8f212 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -28,7 +28,7 @@
 #include 
 
 extern pgd_t early_top_pgt[PTRS_PER_PGD];
-int __init __early_make_pgtable(unsigned long address, pmdval_t pmd);
+bool __init __early_make_pgtable(unsigned long address, pmdval_t pmd);
 
 void ptdump_walk_pgd_level(struct seq_file *m, struct mm_struct *mm);
 void ptdump_walk_pgd_level_debugfs(struct seq_file *m, struct mm_struct *mm,
diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index 4b3ca5a..7d7a064 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -39,6 +39,8 @@ void vsmp_init(void);
 static inline void vsmp_init(void) { }
 #endif
 
+struct pt_regs;
+
 void setup_bios_corruption_check(void);
 void early_platform_quirks(void);
 
@@ -49,8 +51,8 @@ extern void i386_reserve_resources(void);
 extern unsigned long __startup_64(unsigned long physaddr, struct boot_params 
*bp);
 extern unsigned long __startup_secondary_64(void);
 extern void startup_64_setup_env(unsigned long physbase);
-extern int early_make_pgtable(unsigned long address);
 extern void early_setup_idt(void);
+extern void __init do_early_exception(struct pt_regs *regs, int trapnr);
 
 #ifdef CONFIG_X86_INTEL_MID
 extern void x86_intel_mid_early_setup(void);
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 7bfd5c2..4282dac 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -38,6 +38,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /*
  * Manage page tables very early on.
@@ -317,7 +319,7 @@ static void __init reset_early_page_tables(void)
 }
 
 /* Create a new PMD entry */
-int __init __early_make_pgtable(unsigned long address, pmdval_t pmd)
+bool __init __early_make_pgtable(unsigned long address, pmdval_t pmd)
 {
unsigned long physaddr = address - __PAGE_OFFSET;
pgdval_t pgd, *pgd_p;
@@ -327,7 +329,7 @@ int __init __early_make_pgtable(unsigned long address, 
pmdval_t pmd)
 
/* Invalid address or early pgt is done ?  */
if (physaddr >= MAXMEM || read_cr3_pa() != __pa_nodebug(early_top_pgt))
-   return -1;
+   return false;
 
 again:
pgd_p = &early_top_pgt[pgd_index(address)].pgd;
@@ -384,10 +386,10 @@ again:
}
pmd_p[pmd_index(address)] = pmd;
 
-   return 0;
+   return true;
 }
 
-int __init early_make_pgtable(unsigned long address)
+static bool __init early_make_pgtable(unsigned long address)
 {
unsigned long physaddr = address - __PAGE_OFFSET;
pmdval_t pmd;
@@ -397,6 +399,15 @@ int __init early_make_pgtable(unsigned long address)
return __early_make_pgtable(address, pmd);
 }
 
+void __init do_early_exception(struct pt_regs *regs, int trapnr)
+{
+   if (trapnr == X86_TRAP_PF &&
+   early_make_pgtable(native_read_cr2()))
+   return;
+
+   early_fixup_exception(regs, trapnr);
+}
+
 /* Don't add a printk in there. printk relies on the PDA which is not 
initialized 
yet. */
 static void __init clear_bss(void)
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 1de09b5..3b40ec4 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -341,18 +341,9 @@ SYM_CODE_START_LOCAL(early_idt_handler_common)
pushq %r15  /* pt_regs->r15 */
UNWIND_HINT_REGS
 
-   cmpq $14,%rsi   /* Page fault? */
-   jnz 10f
-   GET_CR2_INTO(%rdi)  /* can clobber %rax if pv */
-   call early_make_pgtable
-   andl %eax,%eax
-   jz 20f  /* All good */
-
-10:
movq %rsp,%rdi  /* RDI = pt_regs; RSI is already trapnr */
-   call early_fixup_exception
+   call do_early_exception
 
-20:
decl early_recursion_flag(%rip)
jmp restore_regs_and_return_to_kernel
 SYM

[tip: x86/seves] KVM: SVM: Add GHCB Accessor functions

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 3702c2f4eed2188440f65ecdfc89165106fe565d
Gitweb:
https://git.kernel.org/tip/3702c2f4eed2188440f65ecdfc89165106fe565d
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:04 +02:00
Committer: Borislav Petkov 
CommitterDate: Mon, 07 Sep 2020 19:45:24 +02:00

KVM: SVM: Add GHCB Accessor functions

Building a correct GHCB for the hypervisor requires setting valid bits
in the GHCB. Simplify that process by providing accessor functions to
set values and to update the valid bitmap and to check the valid bitmap
in KVM.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-4-j...@8bytes.org
---
 arch/x86/include/asm/svm.h | 43 +-
 1 file changed, 43 insertions(+)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index acac55d..06e5258 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -345,4 +345,47 @@ struct __attribute__ ((__packed__)) vmcb {
 
 #define SVM_CR0_SELECTIVE_MASK (X86_CR0_TS | X86_CR0_MP)
 
+/* GHCB Accessor functions */
+
+#define GHCB_BITMAP_IDX(field) 
\
+   (offsetof(struct vmcb_save_area, field) / sizeof(u64))
+
+#define DEFINE_GHCB_ACCESSORS(field)   
\
+   static inline bool ghcb_##field##_is_valid(const struct ghcb *ghcb) 
\
+   {   
\
+   return test_bit(GHCB_BITMAP_IDX(field), 
\
+   (unsigned long *)&ghcb->save.valid_bitmap); 
\
+   }   
\
+   
\
+   static inline void ghcb_set_##field(struct ghcb *ghcb, u64 value)   
\
+   {   
\
+   __set_bit(GHCB_BITMAP_IDX(field),   
\
+ (unsigned long *)&ghcb->save.valid_bitmap);   
\
+   ghcb->save.field = value;   
\
+   }
+
+DEFINE_GHCB_ACCESSORS(cpl)
+DEFINE_GHCB_ACCESSORS(rip)
+DEFINE_GHCB_ACCESSORS(rsp)
+DEFINE_GHCB_ACCESSORS(rax)
+DEFINE_GHCB_ACCESSORS(rcx)
+DEFINE_GHCB_ACCESSORS(rdx)
+DEFINE_GHCB_ACCESSORS(rbx)
+DEFINE_GHCB_ACCESSORS(rbp)
+DEFINE_GHCB_ACCESSORS(rsi)
+DEFINE_GHCB_ACCESSORS(rdi)
+DEFINE_GHCB_ACCESSORS(r8)
+DEFINE_GHCB_ACCESSORS(r9)
+DEFINE_GHCB_ACCESSORS(r10)
+DEFINE_GHCB_ACCESSORS(r11)
+DEFINE_GHCB_ACCESSORS(r12)
+DEFINE_GHCB_ACCESSORS(r13)
+DEFINE_GHCB_ACCESSORS(r14)
+DEFINE_GHCB_ACCESSORS(r15)
+DEFINE_GHCB_ACCESSORS(sw_exit_code)
+DEFINE_GHCB_ACCESSORS(sw_exit_info_1)
+DEFINE_GHCB_ACCESSORS(sw_exit_info_2)
+DEFINE_GHCB_ACCESSORS(sw_scratch)
+DEFINE_GHCB_ACCESSORS(xcr0)
+
 #endif


[tip: x86/seves] x86/head/64: Install a CPU bringup IDT

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: f5963ba7a45fc6ff298a34976064354be437e1d8
Gitweb:
https://git.kernel.org/tip/f5963ba7a45fc6ff298a34976064354be437e1d8
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:34 +02:00
Committer: Borislav Petkov 
CommitterDate: Mon, 07 Sep 2020 22:18:38 +02:00

x86/head/64: Install a CPU bringup IDT

Add a separate bringup IDT for the CPU bringup code that will be used
until the kernel switches to the idt_table. There are two reasons for a
separate IDT:

1) When the idt_table is set up and the secondary CPUs are
   booted, it contains entries (e.g. IST entries) which
   require certain CPU state to be set up. This includes a
   working TSS (for IST), MSR_GS_BASE (for stack protector) or
   CR4.FSGSBASE (for paranoid_entry) path. By using a
   dedicated IDT for early boot this state need not to be set
   up early.

2) The idt_table is static to idt.c, so any function
   using/modifying must be in idt.c too. That means that all
   compiler driven instrumentation like tracing or KASAN is
   also active in this code. But during early CPU bringup the
   environment is not set up for this instrumentation to work
   correctly.

To avoid all of these hassles and make early exception handling robust,
use a dedicated bringup IDT.

The IDT is loaded two times, first on the boot CPU while the kernel is
still running on direct mapped addresses, and again later after the
switch to kernel addresses has happened. The second IDT load happens on
the boot and secondary CPUs.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-34-j...@8bytes.org
---
 arch/x86/include/asm/setup.h |  1 +-
 arch/x86/kernel/head64.c | 39 +++-
 arch/x86/kernel/head_64.S|  5 -
 3 files changed, 45 insertions(+)

diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index 5c2fd05..4b3ca5a 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -50,6 +50,7 @@ extern unsigned long __startup_64(unsigned long physaddr, 
struct boot_params *bp
 extern unsigned long __startup_secondary_64(void);
 extern void startup_64_setup_env(unsigned long physbase);
 extern int early_make_pgtable(unsigned long address);
+extern void early_setup_idt(void);
 
 #ifdef CONFIG_X86_INTEL_MID
 extern void x86_intel_mid_early_setup(void);
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 8c82be4..7bfd5c2 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -36,6 +36,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /*
  * Manage page tables very early on.
@@ -509,6 +511,41 @@ void __init x86_64_start_reservations(char *real_mode_data)
 }
 
 /*
+ * Data structures and code used for IDT setup in head_64.S. The bringup-IDT is
+ * used until the idt_table takes over. On the boot CPU this happens in
+ * x86_64_start_kernel(), on secondary CPUs in start_secondary(). In both cases
+ * this happens in the functions called from head_64.S.
+ *
+ * The idt_table can't be used that early because all the code modifying it is
+ * in idt.c and can be instrumented by tracing or KASAN, which both don't work
+ * during early CPU bringup. Also the idt_table has the runtime vectors
+ * configured which require certain CPU state to be setup already (like TSS),
+ * which also hasn't happened yet in early CPU bringup.
+ */
+static gate_desc bringup_idt_table[NUM_EXCEPTION_VECTORS] __page_aligned_data;
+
+static struct desc_ptr bringup_idt_descr = {
+   .size   = (NUM_EXCEPTION_VECTORS * sizeof(gate_desc)) - 1,
+   .address= 0, /* Set at runtime */
+};
+
+/* This runs while still in the direct mapping */
+static void startup_64_load_idt(unsigned long physbase)
+{
+   struct desc_ptr *desc = fixup_pointer(&bringup_idt_descr, physbase);
+
+   desc->address = (unsigned long)fixup_pointer(bringup_idt_table, 
physbase);
+   native_load_idt(desc);
+}
+
+/* This is used when running on kernel addresses */
+void early_setup_idt(void)
+{
+   bringup_idt_descr.address = (unsigned long)bringup_idt_table;
+   native_load_idt(&bringup_idt_descr);
+}
+
+/*
  * Setup boot CPU state needed before kernel switches to virtual addresses.
  */
 void __head startup_64_setup_env(unsigned long physbase)
@@ -521,4 +558,6 @@ void __head startup_64_setup_env(unsigned long physbase)
asm volatile("movl %%eax, %%ds\n"
 "movl %%eax, %%ss\n"
 "movl %%eax, %%es\n" : : "a"(__KERNEL_DS) : "memory");
+
+   startup_64_load_idt(physbase);
 }
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 83050c9..1de09b5 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -198,6 +198,11 @@ SYM_CODE_START(

[tip: x86/seves] x86/dumpstack/64: Add noinstr version of get_stack_info()

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 6b27edd74a5e9669120f7bd0ae1f475d124c1042
Gitweb:
https://git.kernel.org/tip/6b27edd74a5e9669120f7bd0ae1f475d124c1042
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:45 +02:00
Committer: Borislav Petkov 
CommitterDate: Wed, 09 Sep 2020 11:33:19 +02:00

x86/dumpstack/64: Add noinstr version of get_stack_info()

The get_stack_info() functionality is needed in the entry code for the
#VC exception handler. Provide a version of it in the .text.noinstr
section which can be called safely from there.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-45-j...@8bytes.org
---
 arch/x86/include/asm/stacktrace.h |  2 ++-
 arch/x86/kernel/dumpstack.c   |  7 +++---
 arch/x86/kernel/dumpstack_64.c| 38 +-
 arch/x86/mm/cpu_entry_area.c  |  3 +-
 4 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/stacktrace.h 
b/arch/x86/include/asm/stacktrace.h
index 5ae5a68..4960064 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -35,6 +35,8 @@ bool in_entry_stack(unsigned long *stack, struct stack_info 
*info);
 
 int get_stack_info(unsigned long *stack, struct task_struct *task,
   struct stack_info *info, unsigned long *visit_mask);
+bool get_stack_info_noinstr(unsigned long *stack, struct task_struct *task,
+   struct stack_info *info);
 
 const char *stack_type_name(enum stack_type type);
 
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 48ce445..74147f7 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -29,8 +29,8 @@ static int die_counter;
 
 static struct pt_regs exec_summary_regs;
 
-bool in_task_stack(unsigned long *stack, struct task_struct *task,
-  struct stack_info *info)
+bool noinstr in_task_stack(unsigned long *stack, struct task_struct *task,
+  struct stack_info *info)
 {
unsigned long *begin = task_stack_page(task);
unsigned long *end   = task_stack_page(task) + THREAD_SIZE;
@@ -46,7 +46,8 @@ bool in_task_stack(unsigned long *stack, struct task_struct 
*task,
return true;
 }
 
-bool in_entry_stack(unsigned long *stack, struct stack_info *info)
+/* Called from get_stack_info_noinstr - so must be noinstr too */
+bool noinstr in_entry_stack(unsigned long *stack, struct stack_info *info)
 {
struct entry_stack *ss = cpu_entry_stack(smp_processor_id());
 
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index c49cf59..1dd8513 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -85,7 +85,7 @@ struct estack_pages estack_pages[CEA_ESTACK_PAGES] 
cacheline_aligned = {
EPAGERANGE(VC2),
 };
 
-static bool in_exception_stack(unsigned long *stack, struct stack_info *info)
+static __always_inline bool in_exception_stack(unsigned long *stack, struct 
stack_info *info)
 {
unsigned long begin, end, stk = (unsigned long)stack;
const struct estack_pages *ep;
@@ -126,7 +126,7 @@ static bool in_exception_stack(unsigned long *stack, struct 
stack_info *info)
return true;
 }
 
-static bool in_irq_stack(unsigned long *stack, struct stack_info *info)
+static __always_inline bool in_irq_stack(unsigned long *stack, struct 
stack_info *info)
 {
unsigned long *end   = (unsigned long 
*)this_cpu_read(hardirq_stack_ptr);
unsigned long *begin = end - (IRQ_STACK_SIZE / sizeof(long));
@@ -151,32 +151,38 @@ static bool in_irq_stack(unsigned long *stack, struct 
stack_info *info)
return true;
 }
 
-int get_stack_info(unsigned long *stack, struct task_struct *task,
-  struct stack_info *info, unsigned long *visit_mask)
+bool noinstr get_stack_info_noinstr(unsigned long *stack, struct task_struct 
*task,
+   struct stack_info *info)
 {
-   if (!stack)
-   goto unknown;
-
-   task = task ? : current;
-
if (in_task_stack(stack, task, info))
-   goto recursion_check;
+   return true;
 
if (task != current)
-   goto unknown;
+   return false;
 
if (in_exception_stack(stack, info))
-   goto recursion_check;
+   return true;
 
if (in_irq_stack(stack, info))
-   goto recursion_check;
+   return true;
 
if (in_entry_stack(stack, info))
-   goto recursion_check;
+   return true;
+
+   return false;
+}
+
+int get_stack_info(unsigned long *stack, struct task_struct *task,
+  struct stack_info *info, unsigned long *visit_mask)
+{
+   task = task ? : current;
 
-   goto unknown;
+   if (!stack)
+   goto unknown;
+
+   if (!g

[tip: x86/seves] x86/sev-es: Print SEV-ES info into the kernel log

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: c685eb0c12b4d4816d22ee734e91f4005b152fcd
Gitweb:
https://git.kernel.org/tip/c685eb0c12b4d4816d22ee734e91f4005b152fcd
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:38 +02:00
Committer: Borislav Petkov 
CommitterDate: Tue, 08 Sep 2020 00:38:01 +02:00

x86/sev-es: Print SEV-ES info into the kernel log

Refactor the message printed to the kernel log which indicates whether
SEV or SME, etc is active. This will scale better in the future when
more memory encryption features might be added. Also add SEV-ES to the
list of features.

 [ bp: Massage. ]

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Reviewed-by: Kees Cook 
Link: https://lkml.kernel.org/r/20200907131613.12703-38-j...@8bytes.org
---
 arch/x86/mm/mem_encrypt.c | 29 ++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index a38f556..ebb7edc 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -407,6 +407,31 @@ void __init mem_encrypt_free_decrypted_mem(void)
free_init_pages("unused decrypted", vaddr, vaddr_end);
 }
 
+static void print_mem_encrypt_feature_info(void)
+{
+   pr_info("AMD Memory Encryption Features active:");
+
+   /* Secure Memory Encryption */
+   if (sme_active()) {
+   /*
+* SME is mutually exclusive with any of the SEV
+* features below.
+*/
+   pr_cont(" SME\n");
+   return;
+   }
+
+   /* Secure Encrypted Virtualization */
+   if (sev_active())
+   pr_cont(" SEV");
+
+   /* Encrypted Register State */
+   if (sev_es_active())
+   pr_cont(" SEV-ES");
+
+   pr_cont("\n");
+}
+
 /* Architecture __weak replacement functions */
 void __init mem_encrypt_init(void)
 {
@@ -422,8 +447,6 @@ void __init mem_encrypt_init(void)
if (sev_active())
static_branch_enable(&sev_enable_key);
 
-   pr_info("AMD %s active\n",
-   sev_active() ? "Secure Encrypted Virtualization (SEV)"
-: "Secure Memory Encryption (SME)");
+   print_mem_encrypt_feature_info();
 }
 


[tip: x86/seves] x86/head/64: Load GDT after switch to virtual addresses

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: e04b88336360e101329add0c05e5cb1cebae64fd
Gitweb:
https://git.kernel.org/tip/e04b88336360e101329add0c05e5cb1cebae64fd
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:31 +02:00
Committer: Borislav Petkov 
CommitterDate: Mon, 07 Sep 2020 21:35:54 +02:00

x86/head/64: Load GDT after switch to virtual addresses

Load the GDT right after switching to virtual addresses to make sure
there is a defined GDT for exception handling.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Reviewed-by: Kees Cook 
Link: https://lkml.kernel.org/r/20200907131613.12703-31-j...@8bytes.org
---
 arch/x86/kernel/head_64.S | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 2b2e916..03b03f2 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -158,6 +158,14 @@ SYM_CODE_START(secondary_startup_64)
 1:
UNWIND_HINT_EMPTY
 
+   /*
+* We must switch to a new descriptor in kernel space for the GDT
+* because soon the kernel won't have access anymore to the userspace
+* addresses where we're currently running on. We have to do that here
+* because in 32bit we couldn't load a 64bit linear address.
+*/
+   lgdtearly_gdt_descr(%rip)
+
/* Check if nx is implemented */
movl$0x8001, %eax
cpuid
@@ -185,14 +193,6 @@ SYM_CODE_START(secondary_startup_64)
pushq $0
popfq
 
-   /*
-* We must switch to a new descriptor in kernel space for the GDT
-* because soon the kernel won't have access anymore to the userspace
-* addresses where we're currently running on. We have to do that here
-* because in 32bit we couldn't load a 64bit linear address.
-*/
-   lgdtearly_gdt_descr(%rip)
-
/* set up data segments */
xorl %eax,%eax
movl %eax,%ds


[tip: x86/seves] x86/idt: Split idt_data setup out of set_intr_gate()

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 4bed2266cc6f9c3f6cd91378ea4fc76edde674cf
Gitweb:
https://git.kernel.org/tip/4bed2266cc6f9c3f6cd91378ea4fc76edde674cf
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:29 +02:00
Committer: Borislav Petkov 
CommitterDate: Mon, 07 Sep 2020 21:30:38 +02:00

x86/idt: Split idt_data setup out of set_intr_gate()

The code to setup idt_data is needed for early exception handling, but
set_intr_gate() can't be used that early because it has pv-ops in its
code path which don't work that early.

Split out the idt_data initialization part from set_intr_gate() so
that it can be used separately.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Reviewed-by: Kees Cook 
Link: https://lkml.kernel.org/r/20200907131613.12703-29-j...@8bytes.org
---
 arch/x86/kernel/idt.c | 22 ++
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index 7ecf9ba..53946c1 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -205,18 +205,24 @@ idt_setup_from_table(gate_desc *idt, const struct 
idt_data *t, int size, bool sy
}
 }
 
+static void init_idt_data(struct idt_data *data, unsigned int n,
+ const void *addr)
+{
+   BUG_ON(n > 0xFF);
+
+   memset(data, 0, sizeof(*data));
+   data->vector= n;
+   data->addr  = addr;
+   data->segment   = __KERNEL_CS;
+   data->bits.type = GATE_INTERRUPT;
+   data->bits.p= 1;
+}
+
 static __init void set_intr_gate(unsigned int n, const void *addr)
 {
struct idt_data data;
 
-   BUG_ON(n > 0xFF);
-
-   memset(&data, 0, sizeof(data));
-   data.vector = n;
-   data.addr   = addr;
-   data.segment= __KERNEL_CS;
-   data.bits.type  = GATE_INTERRUPT;
-   data.bits.p = 1;
+   init_idt_data(&data, n, addr);
 
idt_setup_from_table(idt_table, &data, 1, false);
 }


[tip: x86/seves] x86/boot/compressed/64: Unmap GHCB page before booting the kernel

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 69add17a7c1992593a7cf775a66e0256ad4b3ef8
Gitweb:
https://git.kernel.org/tip/69add17a7c1992593a7cf775a66e0256ad4b3ef8
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:25 +02:00
Committer: Borislav Petkov 
CommitterDate: Mon, 07 Sep 2020 19:45:26 +02:00

x86/boot/compressed/64: Unmap GHCB page before booting the kernel

Force a page-fault on any further accesses to the GHCB page when they
shouldn't happen anymore. This will catch any bugs where a #VC exception
is raised even though none is expected anymore.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-25-j...@8bytes.org
---
 arch/x86/boot/compressed/ident_map_64.c | 17 +++--
 arch/x86/boot/compressed/misc.h |  6 ++
 arch/x86/boot/compressed/sev-es.c   | 14 ++
 3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/arch/x86/boot/compressed/ident_map_64.c 
b/arch/x86/boot/compressed/ident_map_64.c
index 05742f6..063a60e 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -298,6 +298,11 @@ int set_page_encrypted(unsigned long address)
return set_clr_page_flags(&mapping_info, address, _PAGE_ENC, 0);
 }
 
+int set_page_non_present(unsigned long address)
+{
+   return set_clr_page_flags(&mapping_info, address, 0, _PAGE_PRESENT);
+}
+
 static void do_pf_error(const char *msg, unsigned long error_code,
unsigned long address, unsigned long ip)
 {
@@ -316,8 +321,14 @@ static void do_pf_error(const char *msg, unsigned long 
error_code,
 
 void do_boot_page_fault(struct pt_regs *regs, unsigned long error_code)
 {
-   unsigned long address = native_read_cr2() & PMD_MASK;
-   unsigned long end = address + PMD_SIZE;
+   unsigned long address = native_read_cr2();
+   unsigned long end;
+   bool ghcb_fault;
+
+   ghcb_fault = sev_es_check_ghcb_fault(address);
+
+   address   &= PMD_MASK;
+   end= address + PMD_SIZE;
 
/*
 * Check for unexpected error codes. Unexpected are:
@@ -327,6 +338,8 @@ void do_boot_page_fault(struct pt_regs *regs, unsigned long 
error_code)
 */
if (error_code & (X86_PF_PROT | X86_PF_USER | X86_PF_RSVD))
do_pf_error("Unexpected page-fault:", error_code, address, 
regs->ip);
+   else if (ghcb_fault)
+   do_pf_error("Page-fault on GHCB page:", error_code, address, 
regs->ip);
 
/*
 * Error code is sane - now identity map the 2M region around
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 9995c70..c0e0ffe 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -100,6 +100,7 @@ static inline void choose_random_location(unsigned long 
input,
 #ifdef CONFIG_X86_64
 extern int set_page_decrypted(unsigned long address);
 extern int set_page_encrypted(unsigned long address);
+extern int set_page_non_present(unsigned long address);
 extern unsigned char _pgtable[];
 #endif
 
@@ -117,8 +118,13 @@ void set_sev_encryption_mask(void);
 
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 void sev_es_shutdown_ghcb(void);
+extern bool sev_es_check_ghcb_fault(unsigned long address);
 #else
 static inline void sev_es_shutdown_ghcb(void) { }
+static inline bool sev_es_check_ghcb_fault(unsigned long address)
+{
+   return false;
+}
 #endif
 
 /* acpi.c */
diff --git a/arch/x86/boot/compressed/sev-es.c 
b/arch/x86/boot/compressed/sev-es.c
index fa62af7..1e1fab5 100644
--- a/arch/x86/boot/compressed/sev-es.c
+++ b/arch/x86/boot/compressed/sev-es.c
@@ -121,6 +121,20 @@ void sev_es_shutdown_ghcb(void)
 */
if (set_page_encrypted((unsigned long)&boot_ghcb_page))
error("Can't map GHCB page encrypted");
+
+   /*
+* GHCB page is mapped encrypted again and flushed from the cache.
+* Mark it non-present now to catch bugs when #VC exceptions trigger
+* after this point.
+*/
+   if (set_page_non_present((unsigned long)&boot_ghcb_page))
+   error("Can't unmap GHCB page");
+}
+
+bool sev_es_check_ghcb_fault(unsigned long address)
+{
+   /* Check whether the fault was on the GHCB page */
+   return ((address & PAGE_MASK) == (unsigned long)&boot_ghcb_page);
 }
 
 void do_boot_stage2_vc(struct pt_regs *regs, unsigned long exit_code)


[tip: x86/seves] x86/fpu: Move xgetbv()/xsetbv() into a separate header

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 1b4fb8545f2b00f2844c4b7619d64d98440a477c
Gitweb:
https://git.kernel.org/tip/1b4fb8545f2b00f2844c4b7619d64d98440a477c
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:27 +02:00
Committer: Borislav Petkov 
CommitterDate: Mon, 07 Sep 2020 19:54:20 +02:00

x86/fpu: Move xgetbv()/xsetbv() into a separate header

The xgetbv() function is needed in the pre-decompression boot code,
but asm/fpu/internal.h can't be included there directly. Doing so
opens the door to include-hell due to various include-magic in
boot/compressed/misc.h.

Avoid that by moving xgetbv()/xsetbv() to a separate header file and
include it instead.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-27-j...@8bytes.org
---
 arch/x86/include/asm/fpu/internal.h | 30 +-
 arch/x86/include/asm/fpu/xcr.h  | 34 -
 2 files changed, 35 insertions(+), 29 deletions(-)
 create mode 100644 arch/x86/include/asm/fpu/xcr.h

diff --git a/arch/x86/include/asm/fpu/internal.h 
b/arch/x86/include/asm/fpu/internal.h
index 21a8b52..ceeba9f 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -585,33 +586,4 @@ static inline void switch_fpu_finish(struct fpu *new_fpu)
__write_pkru(pkru_val);
 }
 
-/*
- * MXCSR and XCR definitions:
- */
-
-static inline void ldmxcsr(u32 mxcsr)
-{
-   asm volatile("ldmxcsr %0" :: "m" (mxcsr));
-}
-
-extern unsigned int mxcsr_feature_mask;
-
-#define XCR_XFEATURE_ENABLED_MASK  0x
-
-static inline u64 xgetbv(u32 index)
-{
-   u32 eax, edx;
-
-   asm volatile("xgetbv" : "=a" (eax), "=d" (edx) : "c" (index));
-   return eax + ((u64)edx << 32);
-}
-
-static inline void xsetbv(u32 index, u64 value)
-{
-   u32 eax = value;
-   u32 edx = value >> 32;
-
-   asm volatile("xsetbv" :: "a" (eax), "d" (edx), "c" (index));
-}
-
 #endif /* _ASM_X86_FPU_INTERNAL_H */
diff --git a/arch/x86/include/asm/fpu/xcr.h b/arch/x86/include/asm/fpu/xcr.h
new file mode 100644
index 000..1c7ab8d
--- /dev/null
+++ b/arch/x86/include/asm/fpu/xcr.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_FPU_XCR_H
+#define _ASM_X86_FPU_XCR_H
+
+/*
+ * MXCSR and XCR definitions:
+ */
+
+static inline void ldmxcsr(u32 mxcsr)
+{
+   asm volatile("ldmxcsr %0" :: "m" (mxcsr));
+}
+
+extern unsigned int mxcsr_feature_mask;
+
+#define XCR_XFEATURE_ENABLED_MASK  0x
+
+static inline u64 xgetbv(u32 index)
+{
+   u32 eax, edx;
+
+   asm volatile("xgetbv" : "=a" (eax), "=d" (edx) : "c" (index));
+   return eax + ((u64)edx << 32);
+}
+
+static inline void xsetbv(u32 index, u64 value)
+{
+   u32 eax = value;
+   u32 edx = value >> 32;
+
+   asm volatile("xsetbv" :: "a" (eax), "d" (edx), "c" (index));
+}
+
+#endif /* _ASM_X86_FPU_XCR_H */


[tip: x86/seves] KVM: SVM: nested: Don't allocate VMCB structures on stack

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 6ccbd29ade0d159ee1be398dc9defaae567c253d
Gitweb:
https://git.kernel.org/tip/6ccbd29ade0d159ee1be398dc9defaae567c253d
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:02 +02:00
Committer: Borislav Petkov 
CommitterDate: Mon, 07 Sep 2020 19:45:24 +02:00

KVM: SVM: nested: Don't allocate VMCB structures on stack

Do not allocate a vmcb_control_area and a vmcb_save_area on the stack,
as these structures will become larger with future extenstions of
SVM and thus the svm_set_nested_state() function will become a too large
stack frame.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-2-j...@8bytes.org
---
 arch/x86/kvm/svm/nested.c | 47 ++
 1 file changed, 33 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index fb68467..2803662 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1060,10 +1060,14 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
struct vmcb *hsave = svm->nested.hsave;
struct vmcb __user *user_vmcb = (struct vmcb __user *)
&user_kvm_nested_state->data.svm[0];
-   struct vmcb_control_area ctl;
-   struct vmcb_save_area save;
+   struct vmcb_control_area *ctl;
+   struct vmcb_save_area *save;
+   int ret;
u32 cr0;
 
+   BUILD_BUG_ON(sizeof(struct vmcb_control_area) + sizeof(struct 
vmcb_save_area) >
+KVM_STATE_NESTED_SVM_VMCB_SIZE);
+
if (kvm_state->format != KVM_STATE_NESTED_FORMAT_SVM)
return -EINVAL;
 
@@ -1095,13 +1099,22 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
return -EINVAL;
if (kvm_state->size < sizeof(*kvm_state) + 
KVM_STATE_NESTED_SVM_VMCB_SIZE)
return -EINVAL;
-   if (copy_from_user(&ctl, &user_vmcb->control, sizeof(ctl)))
-   return -EFAULT;
-   if (copy_from_user(&save, &user_vmcb->save, sizeof(save)))
-   return -EFAULT;
 
-   if (!nested_vmcb_check_controls(&ctl))
-   return -EINVAL;
+   ret  = -ENOMEM;
+   ctl  = kzalloc(sizeof(*ctl),  GFP_KERNEL);
+   save = kzalloc(sizeof(*save), GFP_KERNEL);
+   if (!ctl || !save)
+   goto out_free;
+
+   ret = -EFAULT;
+   if (copy_from_user(ctl, &user_vmcb->control, sizeof(*ctl)))
+   goto out_free;
+   if (copy_from_user(save, &user_vmcb->save, sizeof(*save)))
+   goto out_free;
+
+   ret = -EINVAL;
+   if (!nested_vmcb_check_controls(ctl))
+   goto out_free;
 
/*
 * Processor state contains L2 state.  Check that it is
@@ -1109,15 +1122,15 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
 */
cr0 = kvm_read_cr0(vcpu);
 if (((cr0 & X86_CR0_CD) == 0) && (cr0 & X86_CR0_NW))
-return -EINVAL;
+   goto out_free;
 
/*
 * Validate host state saved from before VMRUN (see
 * nested_svm_check_permissions).
 * TODO: validate reserved bits for all saved state.
 */
-   if (!(save.cr0 & X86_CR0_PG))
-   return -EINVAL;
+   if (!(save->cr0 & X86_CR0_PG))
+   goto out_free;
 
/*
 * All checks done, we can enter guest mode.  L1 control fields
@@ -1126,15 +1139,21 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
 * contains saved L1 state.
 */
copy_vmcb_control_area(&hsave->control, &svm->vmcb->control);
-   hsave->save = save;
+   hsave->save = *save;
 
svm->nested.vmcb = kvm_state->hdr.svm.vmcb_pa;
-   load_nested_vmcb_control(svm, &ctl);
+   load_nested_vmcb_control(svm, ctl);
nested_prepare_vmcb_control(svm);
 
 out_set_gif:
svm_set_gif(svm, !!(kvm_state->flags & KVM_STATE_NESTED_GIF_SET));
-   return 0;
+
+   ret = 0;
+out_free:
+   kfree(save);
+   kfree(ctl);
+
+   return ret;
 }
 
 struct kvm_x86_nested_ops svm_nested_ops = {


[tip: x86/seves] x86/boot/compressed/64: Rename kaslr_64.c to ident_map_64.c

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 5f2bb01682b7b067783207994c7b8a3dbeb1cd83
Gitweb:
https://git.kernel.org/tip/5f2bb01682b7b067783207994c7b8a3dbeb1cd83
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:15 +02:00
Committer: Borislav Petkov 
CommitterDate: Mon, 07 Sep 2020 19:45:25 +02:00

x86/boot/compressed/64: Rename kaslr_64.c to ident_map_64.c

The file contains only code related to identity-mapped page tables.
Rename the file and compile it always in.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Reviewed-by: Kees Cook 
Link: https://lkml.kernel.org/r/20200907131613.12703-15-j...@8bytes.org
---
 arch/x86/boot/compressed/Makefile   |   2 +-
 arch/x86/boot/compressed/ident_map_64.c | 162 +++-
 arch/x86/boot/compressed/kaslr.c|   9 +-
 arch/x86/boot/compressed/kaslr_64.c | 153 +--
 arch/x86/boot/compressed/misc.h |   8 +-
 5 files changed, 171 insertions(+), 163 deletions(-)
 create mode 100644 arch/x86/boot/compressed/ident_map_64.c
 delete mode 100644 arch/x86/boot/compressed/kaslr_64.c

diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index c661dc5..e7f3eba 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -84,7 +84,7 @@ vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/kernel_info.o 
$(obj)/head_$(BITS).o 
 vmlinux-objs-$(CONFIG_EARLY_PRINTK) += $(obj)/early_serial_console.o
 vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr.o
 ifdef CONFIG_X86_64
-   vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr_64.o
+   vmlinux-objs-y += $(obj)/ident_map_64.o
vmlinux-objs-y += $(obj)/idt_64.o $(obj)/idt_handlers_64.o
vmlinux-objs-y += $(obj)/mem_encrypt.o
vmlinux-objs-y += $(obj)/pgtable_64.o
diff --git a/arch/x86/boot/compressed/ident_map_64.c 
b/arch/x86/boot/compressed/ident_map_64.c
new file mode 100644
index 000..d9932a1
--- /dev/null
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -0,0 +1,162 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * This code is used on x86_64 to create page table identity mappings on
+ * demand by building up a new set of page tables (or appending to the
+ * existing ones), and then switching over to them when ready.
+ *
+ * Copyright (C) 2015-2016  Yinghai Lu
+ * Copyright (C)  2016  Kees Cook
+ */
+
+/*
+ * Since we're dealing with identity mappings, physical and virtual
+ * addresses are the same, so override these defines which are ultimately
+ * used by the headers in misc.h.
+ */
+#define __pa(x)  ((unsigned long)(x))
+#define __va(x)  ((void *)((unsigned long)(x)))
+
+/* No PAGE_TABLE_ISOLATION support needed either: */
+#undef CONFIG_PAGE_TABLE_ISOLATION
+
+#include "misc.h"
+
+/* These actually do the work of building the kernel identity maps. */
+#include 
+#include 
+/* Use the static base for this part of the boot process */
+#undef __PAGE_OFFSET
+#define __PAGE_OFFSET __PAGE_OFFSET_BASE
+#include "../../mm/ident_map.c"
+
+#ifdef CONFIG_X86_5LEVEL
+unsigned int __pgtable_l5_enabled;
+unsigned int pgdir_shift = 39;
+unsigned int ptrs_per_p4d = 1;
+#endif
+
+/* Used by PAGE_KERN* macros: */
+pteval_t __default_kernel_pte_mask __read_mostly = ~0;
+
+/* Used to track our page table allocation area. */
+struct alloc_pgt_data {
+   unsigned char *pgt_buf;
+   unsigned long pgt_buf_size;
+   unsigned long pgt_buf_offset;
+};
+
+/*
+ * Allocates space for a page table entry, using struct alloc_pgt_data
+ * above. Besides the local callers, this is used as the allocation
+ * callback in mapping_info below.
+ */
+static void *alloc_pgt_page(void *context)
+{
+   struct alloc_pgt_data *pages = (struct alloc_pgt_data *)context;
+   unsigned char *entry;
+
+   /* Validate there is space available for a new page. */
+   if (pages->pgt_buf_offset >= pages->pgt_buf_size) {
+   debug_putstr("out of pgt_buf in " __FILE__ "!?\n");
+   debug_putaddr(pages->pgt_buf_offset);
+   debug_putaddr(pages->pgt_buf_size);
+   return NULL;
+   }
+
+   entry = pages->pgt_buf + pages->pgt_buf_offset;
+   pages->pgt_buf_offset += PAGE_SIZE;
+
+   return entry;
+}
+
+/* Used to track our allocated page tables. */
+static struct alloc_pgt_data pgt_data;
+
+/* The top level page table entry pointer. */
+static unsigned long top_level_pgt;
+
+phys_addr_t physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
+
+/*
+ * Mapping information structure passed to kernel_ident_mapping_init().
+ * Due to relocation, pointers must be assigned at run time not build time.
+ */
+static struct x86_mapping_info mapping_info;
+
+/* Locates and clears a region for a new top level page table. */
+void initialize_identity_maps(void)
+{
+   /* If running as an SEV guest, the encryption mask is required. */
+   set_sev_encryption_mask();
+
+   /* Exclud

[tip: x86/seves] x86/boot/compressed/64: Always switch to own page table

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: ca0e22d4f011a56e974fa3a712d76e86a791559d
Gitweb:
https://git.kernel.org/tip/ca0e22d4f011a56e974fa3a712d76e86a791559d
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:15:17 +02:00
Committer: Borislav Petkov 
CommitterDate: Mon, 07 Sep 2020 19:45:25 +02:00

x86/boot/compressed/64: Always switch to own page table

When booted through startup_64(), the kernel keeps running on the EFI
page table until the KASLR code sets up its own page table. Without
KASLR, the pre-decompression boot code never switches off the EFI page
table. Change that by unconditionally switching to a kernel-controlled
page table after relocation.

This makes sure the kernel can make changes to the mapping when
necessary, for example map pages unencrypted in SEV and SEV-ES guests.

Also, remove the debug_putstr() calls in initialize_identity_maps()
because the function now runs before console_init() is called.

 [ bp: Massage commit message. ]

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Reviewed-by: Kees Cook 
Link: https://lkml.kernel.org/r/20200907131613.12703-17-j...@8bytes.org
---
 arch/x86/boot/compressed/head_64.S  |  3 +-
 arch/x86/boot/compressed/ident_map_64.c | 51 ++--
 arch/x86/boot/compressed/kaslr.c|  3 +-
 3 files changed, 32 insertions(+), 25 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index c634ed8..fb6c039 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -543,10 +543,11 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
rep stosq
 
 /*
- * Load stage2 IDT
+ * Load stage2 IDT and switch to our own page-table
  */
pushq   %rsi
callload_stage2_idt
+   callinitialize_identity_maps
popq%rsi
 
 /*
diff --git a/arch/x86/boot/compressed/ident_map_64.c 
b/arch/x86/boot/compressed/ident_map_64.c
index e3d980a..ecf9353 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -86,9 +86,31 @@ phys_addr_t physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) 
- 1;
  */
 static struct x86_mapping_info mapping_info;
 
+/*
+ * Adds the specified range to what will become the new identity mappings.
+ * Once all ranges have been added, the new mapping is activated by calling
+ * finalize_identity_maps() below.
+ */
+void add_identity_map(unsigned long start, unsigned long size)
+{
+   unsigned long end = start + size;
+
+   /* Align boundary to 2M. */
+   start = round_down(start, PMD_SIZE);
+   end = round_up(end, PMD_SIZE);
+   if (start >= end)
+   return;
+
+   /* Build the mapping. */
+   kernel_ident_mapping_init(&mapping_info, (pgd_t *)top_level_pgt,
+ start, end);
+}
+
 /* Locates and clears a region for a new top level page table. */
 void initialize_identity_maps(void)
 {
+   unsigned long start, size;
+
/* If running as an SEV guest, the encryption mask is required. */
set_sev_encryption_mask();
 
@@ -121,37 +143,24 @@ void initialize_identity_maps(void)
 */
top_level_pgt = read_cr3_pa();
if (p4d_offset((pgd_t *)top_level_pgt, 0) == (p4d_t *)_pgtable) {
-   debug_putstr("booted via startup_32()\n");
pgt_data.pgt_buf = _pgtable + BOOT_INIT_PGT_SIZE;
pgt_data.pgt_buf_size = BOOT_PGT_SIZE - BOOT_INIT_PGT_SIZE;
memset(pgt_data.pgt_buf, 0, pgt_data.pgt_buf_size);
} else {
-   debug_putstr("booted via startup_64()\n");
pgt_data.pgt_buf = _pgtable;
pgt_data.pgt_buf_size = BOOT_PGT_SIZE;
memset(pgt_data.pgt_buf, 0, pgt_data.pgt_buf_size);
top_level_pgt = (unsigned long)alloc_pgt_page(&pgt_data);
}
-}
 
-/*
- * Adds the specified range to what will become the new identity mappings.
- * Once all ranges have been added, the new mapping is activated by calling
- * finalize_identity_maps() below.
- */
-void add_identity_map(unsigned long start, unsigned long size)
-{
-   unsigned long end = start + size;
-
-   /* Align boundary to 2M. */
-   start = round_down(start, PMD_SIZE);
-   end = round_up(end, PMD_SIZE);
-   if (start >= end)
-   return;
-
-   /* Build the mapping. */
-   kernel_ident_mapping_init(&mapping_info, (pgd_t *)top_level_pgt,
- start, end);
+   /*
+* New page-table is set up - map the kernel image and load it
+* into cr3.
+*/
+   start = (unsigned long)_head;
+   size  = _end - _head;
+   add_identity_map(start, size);
+   write_cr3(top_level_pgt);
 }
 
 /*
diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index e27de98..8266286 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++

[tip: x86/seves] x86/sev-es: Handle #AC Events

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: a2d0171a9cf59637411281a929900fde80e6c1cb
Gitweb:
https://git.kernel.org/tip/a2d0171a9cf59637411281a929900fde80e6c1cb
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:16:01 +02:00
Committer: Borislav Petkov 
CommitterDate: Wed, 09 Sep 2020 11:33:20 +02:00

x86/sev-es: Handle #AC Events

Implement a handler for #VC exceptions caused by #AC exceptions. The
#AC exception is just forwarded to do_alignment_check() and not pushed
down to the hypervisor, as requested by the SEV-ES GHCB Standardization
Specification.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-61-j...@8bytes.org
---
 arch/x86/kernel/sev-es.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 86cb4c5..8867c48 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -909,6 +909,19 @@ static enum es_result vc_handle_vmmcall(struct ghcb *ghcb,
return ES_OK;
 }
 
+static enum es_result vc_handle_trap_ac(struct ghcb *ghcb,
+   struct es_em_ctxt *ctxt)
+{
+   /*
+* Calling ecx_alignment_check() directly does not work, because it
+* enables IRQs and the GHCB is active. Forward the exception and call
+* it later from vc_forward_exception().
+*/
+   ctxt->fi.vector = X86_TRAP_AC;
+   ctxt->fi.error_code = 0;
+   return ES_EXCEPTION;
+}
+
 static enum es_result vc_handle_exitcode(struct es_em_ctxt *ctxt,
 struct ghcb *ghcb,
 unsigned long exit_code)
@@ -922,6 +935,9 @@ static enum es_result vc_handle_exitcode(struct es_em_ctxt 
*ctxt,
case SVM_EXIT_WRITE_DR7:
result = vc_handle_dr7_write(ghcb, ctxt);
break;
+   case SVM_EXIT_EXCP_BASE + X86_TRAP_AC:
+   result = vc_handle_trap_ac(ghcb, ctxt);
+   break;
case SVM_EXIT_RDTSC:
case SVM_EXIT_RDTSCP:
result = vc_handle_rdtsc(ghcb, ctxt, exit_code);
@@ -981,6 +997,9 @@ static __always_inline void vc_forward_exception(struct 
es_em_ctxt *ctxt)
case X86_TRAP_UD:
exc_invalid_op(ctxt->regs);
break;
+   case X86_TRAP_AC:
+   exc_alignment_check(ctxt->regs, error_code);
+   break;
default:
pr_emerg("Unsupported exception in #VC instruction emulation - 
can't continue\n");
BUG();


[tip: x86/seves] x86/sev-es: Support CPU offline/online

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 094794f59720d7e877a1eeb372ecedeed6b441ab
Gitweb:
https://git.kernel.org/tip/094794f59720d7e877a1eeb372ecedeed6b441ab
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:16:10 +02:00
Committer: Borislav Petkov 
CommitterDate: Wed, 09 Sep 2020 11:33:20 +02:00

x86/sev-es: Support CPU offline/online

Add a play_dead handler when running under SEV-ES. This is needed
because the hypervisor can't deliver an SIPI request to restart the AP.
Instead, the kernel has to issue a VMGEXIT to halt the VCPU until the
hypervisor wakes it up again.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-70-j...@8bytes.org
---
 arch/x86/include/uapi/asm/svm.h |  1 +-
 arch/x86/kernel/sev-es.c| 63 -
 2 files changed, 64 insertions(+)

diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 346b8a7..c1dcf3e 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -84,6 +84,7 @@
 /* SEV-ES software-defined VMGEXIT events */
 #define SVM_VMGEXIT_MMIO_READ  0x8001
 #define SVM_VMGEXIT_MMIO_WRITE 0x8002
+#define SVM_VMGEXIT_AP_HLT_LOOP0x8004
 #define SVM_VMGEXIT_AP_JUMP_TABLE  0x8005
 #define SVM_VMGEXIT_SET_AP_JUMP_TABLE  0
 #define SVM_VMGEXIT_GET_AP_JUMP_TABLE  1
diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 9ad36ce..d1bcd0d 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -29,6 +29,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #define DR7_RESET_VALUE0x400
 
@@ -518,6 +520,65 @@ static bool __init sev_es_setup_ghcb(void)
return true;
 }
 
+#ifdef CONFIG_HOTPLUG_CPU
+static void sev_es_ap_hlt_loop(void)
+{
+   struct ghcb_state state;
+   struct ghcb *ghcb;
+
+   ghcb = sev_es_get_ghcb(&state);
+
+   while (true) {
+   vc_ghcb_invalidate(ghcb);
+   ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_AP_HLT_LOOP);
+   ghcb_set_sw_exit_info_1(ghcb, 0);
+   ghcb_set_sw_exit_info_2(ghcb, 0);
+
+   sev_es_wr_ghcb_msr(__pa(ghcb));
+   VMGEXIT();
+
+   /* Wakeup signal? */
+   if (ghcb_sw_exit_info_2_is_valid(ghcb) &&
+   ghcb->save.sw_exit_info_2)
+   break;
+   }
+
+   sev_es_put_ghcb(&state);
+}
+
+/*
+ * Play_dead handler when running under SEV-ES. This is needed because
+ * the hypervisor can't deliver an SIPI request to restart the AP.
+ * Instead the kernel has to issue a VMGEXIT to halt the VCPU until the
+ * hypervisor wakes it up again.
+ */
+static void sev_es_play_dead(void)
+{
+   play_dead_common();
+
+   /* IRQs now disabled */
+
+   sev_es_ap_hlt_loop();
+
+   /*
+* If we get here, the VCPU was woken up again. Jump to CPU
+* startup code to get it back online.
+*/
+   start_cpu0();
+}
+#else  /* CONFIG_HOTPLUG_CPU */
+#define sev_es_play_dead   native_play_dead
+#endif /* CONFIG_HOTPLUG_CPU */
+
+#ifdef CONFIG_SMP
+static void __init sev_es_setup_play_dead(void)
+{
+   smp_ops.play_dead = sev_es_play_dead;
+}
+#else
+static inline void sev_es_setup_play_dead(void) { }
+#endif
+
 static void __init alloc_runtime_data(int cpu)
 {
struct sev_es_runtime_data *data;
@@ -566,6 +627,8 @@ void __init sev_es_init_vc_handling(void)
setup_vc_stacks(cpu);
}
 
+   sev_es_setup_play_dead();
+
/* Secondary CPUs use the runtime #VC handler */
initial_vc_handler = (unsigned long)safe_stack_exc_vmm_communication;
 }


[tip: x86/seves] x86/realmode: Add SEV-ES specific trampoline entry point

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: bf5ff276448f64f1f9ef9ffc9e231026e3887d3d
Gitweb:
https://git.kernel.org/tip/bf5ff276448f64f1f9ef9ffc9e231026e3887d3d
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:16:06 +02:00
Committer: Borislav Petkov 
CommitterDate: Wed, 09 Sep 2020 11:33:20 +02:00

x86/realmode: Add SEV-ES specific trampoline entry point

The code at the trampoline entry point is executed in real-mode. In
real-mode, #VC exceptions can't be handled so anything that might cause
such an exception must be avoided.

In the standard trampoline entry code this is the WBINVD instruction and
the call to verify_cpu(), which are both not needed anyway when running
as an SEV-ES guest.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-66-j...@8bytes.org
---
 arch/x86/include/asm/realmode.h  |  3 +++
 arch/x86/realmode/rm/header.S|  3 +++
 arch/x86/realmode/rm/trampoline_64.S | 20 
 3 files changed, 26 insertions(+)

diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index 96118fb..4d4d853 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -21,6 +21,9 @@ struct real_mode_header {
/* SMP trampoline */
u32 trampoline_start;
u32 trampoline_header;
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+   u32 sev_es_trampoline_start;
+#endif
 #ifdef CONFIG_X86_64
u32 trampoline_pgd;
 #endif
diff --git a/arch/x86/realmode/rm/header.S b/arch/x86/realmode/rm/header.S
index af04512..8c1db5b 100644
--- a/arch/x86/realmode/rm/header.S
+++ b/arch/x86/realmode/rm/header.S
@@ -20,6 +20,9 @@ SYM_DATA_START(real_mode_header)
/* SMP trampoline */
.long   pa_trampoline_start
.long   pa_trampoline_header
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+   .long   pa_sev_es_trampoline_start
+#endif
 #ifdef CONFIG_X86_64
.long   pa_trampoline_pgd;
 #endif
diff --git a/arch/x86/realmode/rm/trampoline_64.S 
b/arch/x86/realmode/rm/trampoline_64.S
index 251758e..84c5d1b 100644
--- a/arch/x86/realmode/rm/trampoline_64.S
+++ b/arch/x86/realmode/rm/trampoline_64.S
@@ -56,6 +56,7 @@ SYM_CODE_START(trampoline_start)
testl   %eax, %eax  # Check for return code
jnz no_longmode
 
+.Lswitch_to_protected:
/*
 * GDT tables in non default location kernel can be beyond 16MB and
 * lgdt will not be able to load the address as in real mode default
@@ -80,6 +81,25 @@ no_longmode:
jmp no_longmode
 SYM_CODE_END(trampoline_start)
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+/* SEV-ES supports non-zero IP for entry points - no alignment needed */
+SYM_CODE_START(sev_es_trampoline_start)
+   cli # We should be safe anyway
+
+   LJMPW_RM(1f)
+1:
+   mov %cs, %ax# Code and data in the same place
+   mov %ax, %ds
+   mov %ax, %es
+   mov %ax, %ss
+
+   # Setup stack
+   movl$rm_stack_end, %esp
+
+   jmp .Lswitch_to_protected
+SYM_CODE_END(sev_es_trampoline_start)
+#endif /* CONFIG_AMD_MEM_ENCRYPT */
+
 #include "../kernel/verify_cpu.S"
 
.section ".text32","ax"


[tip: x86/seves] x86/sev-es: Handle NMI State

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 4ca68e023b11e4d5908bf9ee326fab0d77d5
Gitweb:
https://git.kernel.org/tip/4ca68e023b11e4d5908bf9ee326fab0d77d5
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:16:11 +02:00
Committer: Borislav Petkov 
CommitterDate: Wed, 09 Sep 2020 18:02:35 +02:00

x86/sev-es: Handle NMI State

When running under SEV-ES, the kernel has to tell the hypervisor when to
open the NMI window again after an NMI was injected. This is done with
an NMI-complete message to the hypervisor.

Add code to the kernel's NMI handler to send this message right at the
beginning of do_nmi(). This always allows nesting NMIs.

 [ bp: Mark __sev_es_nmi_complete() noinstr:
   vmlinux.o: warning: objtool: exc_nmi()+0x17: call to __sev_es_nmi_complete()
leaves .noinstr.text section
   While at it, use __pa_nodebug() for the same reason due to
   CONFIG_DEBUG_VIRTUAL=y:
   vmlinux.o: warning: objtool: __sev_es_nmi_complete()+0xd9: call to 
__phys_addr()
leaves .noinstr.text section ]

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200907131613.12703-71-j...@8bytes.org
---
 arch/x86/include/asm/sev-es.h   |  7 +++
 arch/x86/include/uapi/asm/svm.h |  1 +
 arch/x86/kernel/nmi.c   |  6 ++
 arch/x86/kernel/sev-es.c| 18 ++
 4 files changed, 32 insertions(+)

diff --git a/arch/x86/include/asm/sev-es.h b/arch/x86/include/asm/sev-es.h
index db88e1c..e919f09 100644
--- a/arch/x86/include/asm/sev-es.h
+++ b/arch/x86/include/asm/sev-es.h
@@ -96,10 +96,17 @@ static __always_inline void sev_es_ist_exit(void)
__sev_es_ist_exit();
 }
 extern int sev_es_setup_ap_jump_table(struct real_mode_header *rmh);
+extern void __sev_es_nmi_complete(void);
+static __always_inline void sev_es_nmi_complete(void)
+{
+   if (static_branch_unlikely(&sev_es_enable_key))
+   __sev_es_nmi_complete();
+}
 #else
 static inline void sev_es_ist_enter(struct pt_regs *regs) { }
 static inline void sev_es_ist_exit(void) { }
 static inline int sev_es_setup_ap_jump_table(struct real_mode_header *rmh) { 
return 0; }
+static inline void sev_es_nmi_complete(void) { }
 #endif
 
 #endif
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index c1dcf3e..a7a3403 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -84,6 +84,7 @@
 /* SEV-ES software-defined VMGEXIT events */
 #define SVM_VMGEXIT_MMIO_READ  0x8001
 #define SVM_VMGEXIT_MMIO_WRITE 0x8002
+#define SVM_VMGEXIT_NMI_COMPLETE   0x8003
 #define SVM_VMGEXIT_AP_HLT_LOOP0x8004
 #define SVM_VMGEXIT_AP_JUMP_TABLE  0x8005
 #define SVM_VMGEXIT_SET_AP_JUMP_TABLE  0
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 4c89c4d..56b64d7 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -478,6 +478,12 @@ DEFINE_IDTENTRY_RAW(exc_nmi)
 {
bool irq_state;
 
+   /*
+* Re-enable NMIs right here when running as an SEV-ES guest. This might
+* cause nested NMIs, but those can be handled safely.
+*/
+   sev_es_nmi_complete();
+
if (IS_ENABLED(CONFIG_SMP) && arch_cpu_is_offline(smp_processor_id()))
return;
 
diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index d1bcd0d..b6518e9 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -408,6 +408,24 @@ static bool vc_slow_virt_to_phys(struct ghcb *ghcb, struct 
es_em_ctxt *ctxt,
 /* Include code shared with pre-decompression boot stage */
 #include "sev-es-shared.c"
 
+void noinstr __sev_es_nmi_complete(void)
+{
+   struct ghcb_state state;
+   struct ghcb *ghcb;
+
+   ghcb = sev_es_get_ghcb(&state);
+
+   vc_ghcb_invalidate(ghcb);
+   ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_NMI_COMPLETE);
+   ghcb_set_sw_exit_info_1(ghcb, 0);
+   ghcb_set_sw_exit_info_2(ghcb, 0);
+
+   sev_es_wr_ghcb_msr(__pa_nodebug(ghcb));
+   VMGEXIT();
+
+   sev_es_put_ghcb(&state);
+}
+
 static u64 get_jump_table_addr(void)
 {
struct ghcb_state state;


[tip: x86/seves] x86/head/64: Don't call verify_cpu() on starting APs

2020-09-10 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 3ecacdbd23956a549d93023f86adc87b4a9d6520
Gitweb:
https://git.kernel.org/tip/3ecacdbd23956a549d93023f86adc87b4a9d6520
Author:Joerg Roedel 
AuthorDate:Mon, 07 Sep 2020 15:16:09 +02:00
Committer: Borislav Petkov 
CommitterDate: Wed, 09 Sep 2020 11:33:20 +02:00

x86/head/64: Don't call verify_cpu() on starting APs

The APs are not ready to handle exceptions when verify_cpu() is called
in secondary_startup_64().

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Reviewed-by: Kees Cook 
Link: https://lkml.kernel.org/r/20200907131613.12703-69-j...@8bytes.org
---
 arch/x86/include/asm/realmode.h |  1 +
 arch/x86/kernel/head_64.S   | 12 
 arch/x86/realmode/init.c|  6 ++
 3 files changed, 19 insertions(+)

diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index 4d4d853..5db5d08 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -72,6 +72,7 @@ extern unsigned char startup_32_smp[];
 extern unsigned char boot_gdt[];
 #else
 extern unsigned char secondary_startup_64[];
+extern unsigned char secondary_startup_64_no_verify[];
 #endif
 
 static inline size_t real_mode_size_needed(void)
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 1a71d0d..7eb2a1c 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -126,6 +126,18 @@ SYM_CODE_START(secondary_startup_64)
call verify_cpu
 
/*
+* The secondary_startup_64_no_verify entry point is only used by
+* SEV-ES guests. In those guests the call to verify_cpu() would cause
+* #VC exceptions which can not be handled at this stage of secondary
+* CPU bringup.
+*
+* All non SEV-ES systems, especially Intel systems, need to execute
+* verify_cpu() above to make sure NX is enabled.
+*/
+SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
+   UNWIND_HINT_EMPTY
+
+   /*
 * Retrieve the modifier (SME encryption mask if SME is active) to be
 * added to the initial pgdir entry that will be programmed into CR3.
 */
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 3fb9b60..22fda7d 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -46,6 +46,12 @@ static void sme_sev_setup_real_mode(struct trampoline_header 
*th)
th->flags |= TH_FLAGS_SME_ACTIVE;
 
if (sev_es_active()) {
+   /*
+* Skip the call to verify_cpu() in secondary_startup_64 as it
+* will cause #VC exceptions when the AP can't handle them yet.
+*/
+   th->start = (u64) secondary_startup_64_no_verify;
+
if (sev_es_setup_ap_jump_table(real_mode_header))
panic("Failed to get/update SEV-ES AP Jump Table");
}


[tip: x86/urgent] x86/mm/32: Bring back vmalloc faulting on x86_32

2020-09-03 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 4819e15f740ec884a50bdc431d7f1e7638b6f7d9
Gitweb:
https://git.kernel.org/tip/4819e15f740ec884a50bdc431d7f1e7638b6f7d9
Author:Joerg Roedel 
AuthorDate:Wed, 02 Sep 2020 17:59:04 +02:00
Committer: Ingo Molnar 
CommitterDate: Thu, 03 Sep 2020 11:23:35 +02:00

x86/mm/32: Bring back vmalloc faulting on x86_32

One can not simply remove vmalloc faulting on x86-32. Upstream

commit: 7f0a002b5a21 ("x86/mm: remove vmalloc faulting")

removed it on x86 alltogether because previously the
arch_sync_kernel_mappings() interface was introduced. This interface
added synchronization of vmalloc/ioremap page-table updates to all
page-tables in the system at creation time and was thought to make
vmalloc faulting obsolete.

But that assumption was incredibly naive.

It turned out that there is a race window between the time the vmalloc
or ioremap code establishes a mapping and the time it synchronizes
this change to other page-tables in the system.

During this race window another CPU or thread can establish a vmalloc
mapping which uses the same intermediate page-table entries (e.g. PMD
or PUD) and does no synchronization in the end, because it found all
necessary mappings already present in the kernel reference page-table.

But when these intermediate page-table entries are not yet
synchronized, the other CPU or thread will continue with a vmalloc
address that is not yet mapped in the page-table it currently uses,
causing an unhandled page fault and oops like below:

BUG: unable to handle page fault for address: fe80c000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
*pde = 33183067 *pte = a8648163
Oops: 0002 [#1] SMP
CPU: 1 PID: 13514 Comm: cve-2017-17053 Tainted: G
...
Call Trace:
 ldt_dup_context+0x66/0x80
 dup_mm+0x2b3/0x480
 copy_process+0x133b/0x15c0
 _do_fork+0x94/0x3e0
 __ia32_sys_clone+0x67/0x80
 __do_fast_syscall_32+0x3f/0x70
 do_fast_syscall_32+0x29/0x60
 do_SYSENTER_32+0x15/0x20
 entry_SYSENTER_32+0x9f/0xf2
EIP: 0xb7eef549

So the arch_sync_kernel_mappings() interface is racy, but removing it
would mean to re-introduce the vmalloc_sync_all() interface, which is
even more awful. Keep arch_sync_kernel_mappings() in place and catch
the race condition in the page-fault handler instead.

Do a partial revert of above commit to get vmalloc faulting on x86-32
back in place.

Fixes: 7f0a002b5a21 ("x86/mm: remove vmalloc faulting")
Reported-by: Naresh Kamboju 
Signed-off-by: Joerg Roedel 
Signed-off-by: Ingo Molnar 
Link: https://lore.kernel.org/r/20200902155904.17544-1-j...@8bytes.org
---
 arch/x86/mm/fault.c | 78 -
 1 file changed, 78 insertions(+)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 35f1498..6e3e8a1 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -190,6 +190,53 @@ static inline pmd_t *vmalloc_sync_one(pgd_t *pgd, unsigned 
long address)
return pmd_k;
 }
 
+/*
+ *   Handle a fault on the vmalloc or module mapping area
+ *
+ *   This is needed because there is a race condition between the time
+ *   when the vmalloc mapping code updates the PMD to the point in time
+ *   where it synchronizes this update with the other page-tables in the
+ *   system.
+ *
+ *   In this race window another thread/CPU can map an area on the same
+ *   PMD, finds it already present and does not synchronize it with the
+ *   rest of the system yet. As a result v[mz]alloc might return areas
+ *   which are not mapped in every page-table in the system, causing an
+ *   unhandled page-fault when they are accessed.
+ */
+static noinline int vmalloc_fault(unsigned long address)
+{
+   unsigned long pgd_paddr;
+   pmd_t *pmd_k;
+   pte_t *pte_k;
+
+   /* Make sure we are in vmalloc area: */
+   if (!(address >= VMALLOC_START && address < VMALLOC_END))
+   return -1;
+
+   /*
+* Synchronize this task's top level page-table
+* with the 'reference' page table.
+*
+* Do _not_ use "current" here. We might be inside
+* an interrupt in the middle of a task switch..
+*/
+   pgd_paddr = read_cr3_pa();
+   pmd_k = vmalloc_sync_one(__va(pgd_paddr), address);
+   if (!pmd_k)
+   return -1;
+
+   if (pmd_large(*pmd_k))
+   return 0;
+
+   pte_k = pte_offset_kernel(pmd_k, address);
+   if (!pte_present(*pte_k))
+   return -1;
+
+   return 0;
+}
+NOKPROBE_SYMBOL(vmalloc_fault);
+
 void arch_sync_kernel_mappings(unsigned long start, unsigned long end)
 {
unsigned long addr;
@@ -1110,6 +1157,37 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long 
hw_error_code,
 */
WARN_ON_ONCE(hw_error_code & X86_PF_PK);

[tip: x86/mm] x86/mm/64: Do not sync vmalloc/ioremap mappings

2020-08-15 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/mm branch of tip:

Commit-ID: 58a18fe95e83b8396605154db04d73b08063f31b
Gitweb:
https://git.kernel.org/tip/58a18fe95e83b8396605154db04d73b08063f31b
Author:Joerg Roedel 
AuthorDate:Fri, 14 Aug 2020 17:19:46 +02:00
Committer: Ingo Molnar 
CommitterDate: Sat, 15 Aug 2020 13:56:16 +02:00

x86/mm/64: Do not sync vmalloc/ioremap mappings

Remove the code to sync the vmalloc and ioremap ranges for x86-64. The
page-table pages are all pre-allocated so that synchronization is
no longer necessary.

This is a patch that already went into the kernel as:

commit 8bb9bf242d1f ("x86/mm/64: Do not sync vmalloc/ioremap mappings")

But it had to be reverted later because it unveiled a bug from:

commit 6eb82f994026 ("x86/mm: Pre-allocate P4D/PUD pages for vmalloc 
area")

The bug in that commit causes the P4D/PUD pages not to be correctly
allocated, making the synchronization still necessary. That issue got
fixed meanwhile upstream:

commit 995909a4e22b ("x86/mm/64: Do not dereference non-present PGD 
entries")

With that fix it is safe again to remove the page-table synchronization
for vmalloc/ioremap ranges on x86-64.

Signed-off-by: Joerg Roedel 
Signed-off-by: Ingo Molnar 
Link: https://lore.kernel.org/r/20200814151947.26229-2-j...@8bytes.org
---
 arch/x86/include/asm/pgtable_64_types.h | 2 --
 arch/x86/mm/init_64.c   | 5 -
 2 files changed, 7 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_64_types.h 
b/arch/x86/include/asm/pgtable_64_types.h
index 8f63efb..52e5f5f 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -159,6 +159,4 @@ extern unsigned int ptrs_per_p4d;
 
 #define PGD_KERNEL_START   ((PAGE_SIZE / 2) / sizeof(pgd_t))
 
-#define ARCH_PAGE_TABLE_SYNC_MASK  (pgtable_l5_enabled() ? 
PGTBL_PGD_MODIFIED : PGTBL_P4D_MODIFIED)
-
 #endif /* _ASM_X86_PGTABLE_64_DEFS_H */
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index a4ac13c..777d835 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -217,11 +217,6 @@ static void sync_global_pgds(unsigned long start, unsigned 
long end)
sync_global_pgds_l4(start, end);
 }
 
-void arch_sync_kernel_mappings(unsigned long start, unsigned long end)
-{
-   sync_global_pgds(start, end);
-}
-
 /*
  * NOTE: This function is marked __ref because it calls __init function
  * (alloc_bootmem_pages). It's safe to do it ONLY when after_bootmem == 0.


[tip: x86/mm] x86/mm/64: Update comment in preallocate_vmalloc_pages()

2020-08-15 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/mm branch of tip:

Commit-ID: 7a27ef5e83089090f3a4073a9157c862ef00acfc
Gitweb:
https://git.kernel.org/tip/7a27ef5e83089090f3a4073a9157c862ef00acfc
Author:Joerg Roedel 
AuthorDate:Fri, 14 Aug 2020 17:19:47 +02:00
Committer: Ingo Molnar 
CommitterDate: Sat, 15 Aug 2020 13:56:16 +02:00

x86/mm/64: Update comment in preallocate_vmalloc_pages()

The comment explaining why 4-level systems only need to allocate on
the P4D level caused some confustion. Update it to better explain why
on 4-level systems the allocation on PUD level is necessary.

Signed-off-by: Joerg Roedel 
Signed-off-by: Ingo Molnar 
Link: https://lore.kernel.org/r/20200814151947.26229-3-j...@8bytes.org
---
 arch/x86/mm/init_64.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 777d835..b5a3fa4 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1252,14 +1252,19 @@ static void __init preallocate_vmalloc_pages(void)
if (!p4d)
goto failed;
 
-   /*
-* With 5-level paging the P4D level is not folded. So the PGDs
-* are now populated and there is no need to walk down to the
-* PUD level.
-*/
if (pgtable_l5_enabled())
continue;
 
+   /*
+* The goal here is to allocate all possibly required
+* hardware page tables pointed to by the top hardware
+* level.
+*
+* On 4-level systems, the P4D layer is folded away and
+* the above code does no preallocation.  Below, go down
+* to the pud _software_ level to ensure the second
+* hardware level is allocated on 4-level systems too.
+*/
lvl = "pud";
pud = pud_alloc(&init_mm, p4d, addr);
if (!pud)


[tip: x86/mm] x86/mm/64: Make sync_global_pgds() static

2020-07-27 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/mm branch of tip:

Commit-ID: 2b32ab031e82a109e2c5b0d30ce563db0fe286b4
Gitweb:
https://git.kernel.org/tip/2b32ab031e82a109e2c5b0d30ce563db0fe286b4
Author:Joerg Roedel 
AuthorDate:Tue, 21 Jul 2020 11:59:53 +02:00
Committer: Ingo Molnar 
CommitterDate: Mon, 27 Jul 2020 12:32:29 +02:00

x86/mm/64: Make sync_global_pgds() static

The function is only called from within init_64.c and can be static.
Also remove it from pgtable_64.h.

Signed-off-by: Joerg Roedel 
Signed-off-by: Ingo Molnar 
Reviewed-by: Mike Rapoport 
Link: https://lore.kernel.org/r/20200721095953.6218-4-j...@8bytes.org
---
 arch/x86/include/asm/pgtable_64.h | 2 --
 arch/x86/mm/init_64.c | 2 +-
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_64.h 
b/arch/x86/include/asm/pgtable_64.h
index 1b68d24..95ac911 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -168,8 +168,6 @@ static inline void native_pgd_clear(pgd_t *pgd)
native_set_pgd(pgd, native_make_pgd(0));
 }
 
-extern void sync_global_pgds(unsigned long start, unsigned long end);
-
 /*
  * Conversion functions: convert a page and protection to a page entry,
  * and a page entry and page directory to the page they refer to.
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index e0cd2df..e65b96f 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -209,7 +209,7 @@ static void sync_global_pgds_l4(unsigned long start, 
unsigned long end)
  * When memory was added make sure all the processes MM have
  * suitable PGD entries in the local PGD level page.
  */
-void sync_global_pgds(unsigned long start, unsigned long end)
+static void sync_global_pgds(unsigned long start, unsigned long end)
 {
if (pgtable_l5_enabled())
sync_global_pgds_l5(start, end);


[tip: x86/mm] x86/mm/64: Do not sync vmalloc/ioremap mappings

2020-07-27 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/mm branch of tip:

Commit-ID: 8bb9bf242d1fee925636353807c511d54fde8986
Gitweb:
https://git.kernel.org/tip/8bb9bf242d1fee925636353807c511d54fde8986
Author:Joerg Roedel 
AuthorDate:Tue, 21 Jul 2020 11:59:52 +02:00
Committer: Ingo Molnar 
CommitterDate: Mon, 27 Jul 2020 12:32:29 +02:00

x86/mm/64: Do not sync vmalloc/ioremap mappings

Remove the code to sync the vmalloc and ioremap ranges for x86-64. The
page-table pages are all pre-allocated now so that synchronization is
no longer necessary.

Signed-off-by: Joerg Roedel 
Signed-off-by: Ingo Molnar 
Reviewed-by: Mike Rapoport 
Link: https://lore.kernel.org/r/20200721095953.6218-3-j...@8bytes.org
---
 arch/x86/include/asm/pgtable_64_types.h | 2 --
 arch/x86/mm/init_64.c   | 5 -
 2 files changed, 7 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_64_types.h 
b/arch/x86/include/asm/pgtable_64_types.h
index 8f63efb..52e5f5f 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -159,6 +159,4 @@ extern unsigned int ptrs_per_p4d;
 
 #define PGD_KERNEL_START   ((PAGE_SIZE / 2) / sizeof(pgd_t))
 
-#define ARCH_PAGE_TABLE_SYNC_MASK  (pgtable_l5_enabled() ? 
PGTBL_PGD_MODIFIED : PGTBL_P4D_MODIFIED)
-
 #endif /* _ASM_X86_PGTABLE_64_DEFS_H */
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index e76bdb0..e0cd2df 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -217,11 +217,6 @@ void sync_global_pgds(unsigned long start, unsigned long 
end)
sync_global_pgds_l4(start, end);
 }
 
-void arch_sync_kernel_mappings(unsigned long start, unsigned long end)
-{
-   sync_global_pgds(start, end);
-}
-
 /*
  * NOTE: This function is marked __ref because it calls __init function
  * (alloc_bootmem_pages). It's safe to do it ONLY when after_bootmem == 0.


[tip: x86/mm] x86/mm: Pre-allocate P4D/PUD pages for vmalloc area

2020-07-27 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/mm branch of tip:

Commit-ID: 6eb82f9940267d3af260989d077a2833f588beae
Gitweb:
https://git.kernel.org/tip/6eb82f9940267d3af260989d077a2833f588beae
Author:Joerg Roedel 
AuthorDate:Tue, 21 Jul 2020 11:59:51 +02:00
Committer: Ingo Molnar 
CommitterDate: Mon, 27 Jul 2020 12:32:29 +02:00

x86/mm: Pre-allocate P4D/PUD pages for vmalloc area

Pre-allocate the page-table pages for the vmalloc area at the level
which needs synchronization on x86-64, which is P4D for 5-level and
PUD for 4-level paging.

Doing this at boot makes sure no synchronization of that area is
necessary at runtime. The synchronization takes the pgd_lock and
iterates over all page-tables in the system, so it can take quite long
and is better avoided.

Signed-off-by: Joerg Roedel 
Signed-off-by: Ingo Molnar 
Reviewed-by: Mike Rapoport 
Link: https://lore.kernel.org/r/20200721095953.6218-2-j...@8bytes.org
---
 arch/x86/mm/init_64.c | 52 ++-
 1 file changed, 52 insertions(+)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index dbae185..e76bdb0 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1238,6 +1238,56 @@ static void __init register_page_bootmem_info(void)
 #endif
 }
 
+/*
+ * Pre-allocates page-table pages for the vmalloc area in the kernel 
page-table.
+ * Only the level which needs to be synchronized between all page-tables is
+ * allocated because the synchronization can be expensive.
+ */
+static void __init preallocate_vmalloc_pages(void)
+{
+   unsigned long addr;
+   const char *lvl;
+
+   for (addr = VMALLOC_START; addr <= VMALLOC_END; addr = ALIGN(addr + 1, 
PGDIR_SIZE)) {
+   pgd_t *pgd = pgd_offset_k(addr);
+   p4d_t *p4d;
+   pud_t *pud;
+
+   p4d = p4d_offset(pgd, addr);
+   if (p4d_none(*p4d)) {
+   /* Can only happen with 5-level paging */
+   p4d = p4d_alloc(&init_mm, pgd, addr);
+   if (!p4d) {
+   lvl = "p4d";
+   goto failed;
+   }
+   }
+
+   if (pgtable_l5_enabled())
+   continue;
+
+   pud = pud_offset(p4d, addr);
+   if (pud_none(*pud)) {
+   /* Ends up here only with 4-level paging */
+   pud = pud_alloc(&init_mm, p4d, addr);
+   if (!pud) {
+   lvl = "pud";
+   goto failed;
+   }
+   }
+   }
+
+   return;
+
+failed:
+
+   /*
+* The pages have to be there now or they will be missing in
+* process page-tables later.
+*/
+   panic("Failed to pre-allocate %s pages for vmalloc area\n", lvl);
+}
+
 void __init mem_init(void)
 {
pci_iommu_alloc();
@@ -1261,6 +1311,8 @@ void __init mem_init(void)
if (get_gate_vma(&init_mm))
kclist_add(&kcore_vsyscall, (void *)VSYSCALL_ADDR, PAGE_SIZE, 
KCORE_USER);
 
+   preallocate_vmalloc_pages();
+
mem_init_print_info(NULL);
 }
 


[tip: x86/urgent] x86, vmlinux.lds: Page-align end of ..page_aligned sections

2020-07-22 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: de2b41be8fcccb2f5b6c480d35df590476344201
Gitweb:
https://git.kernel.org/tip/de2b41be8fcccb2f5b6c480d35df590476344201
Author:Joerg Roedel 
AuthorDate:Tue, 21 Jul 2020 11:34:48 +02:00
Committer: Thomas Gleixner 
CommitterDate: Wed, 22 Jul 2020 09:38:37 +02:00

x86, vmlinux.lds: Page-align end of ..page_aligned sections

On x86-32 the idt_table with 256 entries needs only 2048 bytes. It is
page-aligned, but the end of the .bss..page_aligned section is not
guaranteed to be page-aligned.

As a result, objects from other .bss sections may end up on the same 4k
page as the idt_table, and will accidentially get mapped read-only during
boot, causing unexpected page-faults when the kernel writes to them.

This could be worked around by making the objects in the page aligned
sections page sized, but that's wrong.

Explicit sections which store only page aligned objects have an implicit
guarantee that the object is alone in the page in which it is placed. That
works for all objects except the last one. That's inconsistent.

Enforcing page sized objects for these sections would wreckage memory
sanitizers, because the object becomes artificially larger than it should
be and out of bound access becomes legit.

Align the end of the .bss..page_aligned and .data..page_aligned section on
page-size so all objects places in these sections are guaranteed to have
their own page.

[ tglx: Amended changelog ]

Signed-off-by: Joerg Roedel 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Kees Cook 
Cc: sta...@vger.kernel.org
Link: https://lkml.kernel.org/r/20200721093448.10417-1-j...@8bytes.org
---
 arch/x86/kernel/vmlinux.lds.S | 1 +
 include/asm-generic/vmlinux.lds.h | 5 -
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 3bfc8dd..9a03e5b 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -358,6 +358,7 @@ SECTIONS
.bss : AT(ADDR(.bss) - LOAD_OFFSET) {
__bss_start = .;
*(.bss..page_aligned)
+   . = ALIGN(PAGE_SIZE);
*(BSS_MAIN)
BSS_DECRYPTED
. = ALIGN(PAGE_SIZE);
diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index db600ef..052e0f0 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -341,7 +341,8 @@
 
 #define PAGE_ALIGNED_DATA(page_align)  \
. = ALIGN(page_align);  \
-   *(.data..page_aligned)
+   *(.data..page_aligned)  \
+   . = ALIGN(page_align);
 
 #define READ_MOSTLY_DATA(align)
\
. = ALIGN(align);   \
@@ -737,7 +738,9 @@
. = ALIGN(bss_align);   \
.bss : AT(ADDR(.bss) - LOAD_OFFSET) {   \
BSS_FIRST_SECTIONS  \
+   . = ALIGN(PAGE_SIZE);   \
*(.bss..page_aligned)   \
+   . = ALIGN(PAGE_SIZE);   \
*(.dynbss)  \
*(BSS_MAIN) \
*(COMMON)   \


[tip: x86/boot] x86/boot/compressed/64: Switch to __KERNEL_CS after GDT is loaded

2020-05-04 Thread tip-bot2 for Joerg Roedel
The following commit has been merged into the x86/boot branch of tip:

Commit-ID: 34bb49229f19399a5b45c323afb5749f31f7876c
Gitweb:
https://git.kernel.org/tip/34bb49229f19399a5b45c323afb5749f31f7876c
Author:Joerg Roedel 
AuthorDate:Tue, 28 Apr 2020 17:16:22 +02:00
Committer: Borislav Petkov 
CommitterDate: Mon, 04 May 2020 19:53:08 +02:00

x86/boot/compressed/64: Switch to __KERNEL_CS after GDT is loaded

When the pre-decompression code loads its first GDT in startup_64(), it
is still running on the CS value of the previous GDT. In the case of
SEV-ES, this is the EFI GDT but it can be anything depending on what has
loaded the kernel (boot loader, container runtime, etc.)

To make exception handling work (especially IRET) the CPU needs to
switch to a CS value in the current GDT, so jump to __KERNEL_CS after
the first GDT is loaded. This is prudent also as a general sanitization
of CS to a known good value.

 [ bp: Massage commit message. ]

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20200428151725.31091-13-j...@8bytes.org
---
 arch/x86/boot/compressed/head_64.S | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index 4f7e6b8..6b11060 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -393,6 +393,14 @@ SYM_CODE_START(startup_64)
addq%rax, 2(%rax)
lgdt(%rax)
 
+   /* Reload CS so IRET returns to a CS actually in the GDT */
+   pushq   $__KERNEL_CS
+   leaq.Lon_kernel_cs(%rip), %rax
+   pushq   %rax
+   lretq
+
+.Lon_kernel_cs:
+
/*
 * paging_prepare() sets up the trampoline and checks if we need to
 * enable 5-level paging.