[PATCH 0/4] Paravirtualized Control Register pinning

2020-06-17 Thread John Andersen
Linux will benefit from
this protection without the need to specify pv_cr_pin on the command
line.

Pinning of sensitive CR bits has already been implemented to protect
against exploits directly calling native_write_cr*(). The current
protection cannot stop ROP attacks which jump directly to a MOV CR
instruction. Guests running with paravirtualized CR pinning are now
protected against the use of ROP to disable CR bits. The same bits that
are being pinned natively may be pinned via the CR pinned MSRs. These
bits are WP in CR0, and SMEP, SMAP, and UMIP in CR4.

Future patches could implement similar MSRs to protect bits in MSRs.
The NXE bit of the EFER MSR is a prime candidate.



=== Plan for kexec support ===

Andy's suggestion of a boot option has been incorporated as the
pv_cr_pin command line option. Boris mentioned that short-term
solutions become immutable. However, for the reasons outlined below
we need a way for the user to opt-in to pinning over kexec if both
are compiled in, and the command line parameter seems to be a good
way to do that. Liran's proposed solution of a flag within the ELF
would allow us to identify which kernels have support is assumed to
be implemented in the following scenarios.

We then have the following cases (without the addition of pv_cr_pin):


- Kernel running without pinning enabled kexecs kernel with pinning.

  - Loaded kernel has kexec

- Do not enable pinning

  - Loaded kernel lacks kexec

- Enable pinning

- Kernel running with pinning enabled kexecs kernel with pinning (as
  identified by ELF addition).

  - Okay

- Kernel running with pinning enabled kexecs kernel without pinning
  (as identified by lack of ELF addition).

  - User is presented with an error saying that they may not kexec
a kernel without pinning support.


With the addition of pv_cr_pin we have the following situations:


- Kernel running without pinning enabled kexecs kernel with pinning.

  - Loaded kernel has kexec

- pv_cr_pin command line parameter present for new kernel

  - Enable pinning

- pv_cr_pin command line parameter not present for new kernel

  - Do not enable pinning

  - Loaded kernel lacks kexec

- Enable pinning

- Kernel running with pinning enabled kexecs kernel with pinning (as
  identified by ELF addition).

  - Okay

- Kernel running with kexec and pinning enabled (opt-in via pv_cr_pin)
  kexecs kernel without pinning (as identified by lack of ELF addition).

  - User is presented with an error saying that they have opted
into pinning support and may not kexec a kernel without pinning
support.


Without the command line parameter I'm not sure how we could preserve
users workflows which might rely on kexecing older kernels (ones
which wouldn't have support). I see the benefit here being that users
have to opt-in to the possibility of breaking their workflow, via
their addition of the pv_cr_pin command line flag. Which could of
course also be called nokexec. A deprecation period could then be
chosen where eventually pinning takes preference over kexec and users
are presented with the error if they try to kexec an older kernel.
Input on this would be much appreciated, as well as if this is the
best way to handle things or if there's another way that would be
preferred. This is just what we were able to come up with to ensure
users didn't get anything broken they didn't agree to have broken.


Thanks,
John

John Andersen (4):
  X86: Update mmu_cr4_features during feature identification
  KVM: x86: Introduce paravirt feature CR0/CR4 pinning
  selftests: kvm: add test for CR pinning with SMM
  X86: Use KVM CR pin MSRs

 .../admin-guide/kernel-parameters.txt |  11 +
 Documentation/virt/kvm/msr.rst|  53 +
 arch/x86/Kconfig  |  10 +
 arch/x86/include/asm/kvm_host.h   |   7 +
 arch/x86/include/asm/kvm_para.h   |  28 +++
 arch/x86/include/uapi/asm/kvm_para.h  |   7 +
 arch/x86/kernel/cpu/common.c  |  11 +-
 arch/x86/kernel/kvm.c |  39 
 arch/x86/kernel/setup.c   |  12 +-
 arch/x86/kvm/cpuid.c  |   3 +-
 arch/x86/kvm/emulate.c|   3 +-
 arch/x86/kvm/kvm_emulate.h|   2 +-
 arch/x86/kvm/svm/nested.c |  11 +-
 arch/x86/kvm/vmx/nested.c |  10 +-
 arch/x86/kvm/x86.c| 106 -
 tools/testing/selftests/kvm/.gitignore|   1 +
 tools/testing/selftests/kvm/Makefile  |   1 +
 .../selftests/kvm/include/x86_64/processor.h  |  13 ++
 .../selftests/kvm/x86_64/smm_cr_pin_test.c| 207 ++
 19 files changed, 521 insertions(+), 14 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/smm_cr_pin_test.c


base-commit: 49b3deaad3452217d62dbd78da8df24eb0c7e169
-- 
2.21.0



[PATCH 2/4] KVM: x86: Introduce paravirt feature CR0/CR4 pinning

2020-06-17 Thread John Andersen
Add a CR pin feature bit to the KVM cpuid. Add read only MSRs to KVM
which guests use to identify which bits they may request be pinned. Add
CR pinned MSRs to KVM. Allow guests to request that KVM pin certain
bits within control register 0 or 4 via the CR pinned MSRs. Writes to
the MSRs fail if they include bits which aren't allowed. Host userspace
may clear or modify pinned bits at any time. Once pinned bits are set,
the guest may pin additional allowed bits, but not clear any. Clear
pinning on vCPU reset.

In the event that the guest vCPU attempts to disable any of the pinned
bits, send that vCPU a general protection fault, and leave the register
unchanged. Clear pinning on vCPU reset to avoid faulting non-boot
CPUs when they are disabled and then re-enabled, which is done when
hibernating.

Pinning is not active when running in SMM. Entering SMM disables pinned
bits. Writes to control registers within SMM would therefore trigger
general protection faults if pinning was enforced. Upon exit from SMM,
SMRAM is modified to ensure the values of CR0/4 that will be restored
contain the correct values for pinned bits. CR0/4 values are then
restored from SMRAM as usual.

When running with nested virtualization, should pinned bits be cleared
from host VMCS / VMCB, on VM-Exit, they will be silently restored.

Should userspace expose the CR pinning CPUID feature bit, it must zero
CR pinned MSRs on reboot. If it does not, it runs the risk of having the
guest enable pinning and subsequently cause general protection faults on
next boot due to early boot code setting control registers to values
which do not contain the pinned bits. Userspace is responsible for
migrating the contents of the CR* pinned MSRs. If userspace fails to
migrate the MSRs the protection will no longer be active.

Pinning of sensitive CR bits has already been implemented to protect
against exploits directly calling native_write_cr*(). The current
protection cannot stop ROP attacks which jump directly to a MOV CR
instruction.

https://web.archive.org/web/20171029060939/http://www.blackbunny.io/linux-kernel-x86-64-bypass-smep-kaslr-kptr_restric/

Guests running with paravirtualized CR pinning can now be protected
against the use of ROP to disable CR bits. The same bits that are being
pinned natively may be pinned via the CR pinned MSRs. These bits are WP
in CR0, and SMEP, SMAP, and UMIP in CR4.

Other hypervisors such as HyperV have implemented similar protections
for Control Registers and MSRs; which security researchers have found
effective.

https://www.abatchy.com/2018/01/kernel-exploitation-4

Future work could implement similar MSRs to protect bits elsewhere, such
as MSRs. The NXE bit of the EFER MSR is a prime candidate.

Changes for QEMU are required to expose the CR pin cpuid feature bit. As
well as clear the MSRs on reboot and enable migration.

https://github.com/qemu/qemu/commit/1b26f03653669c97dd8729f9f59be561d68e2b2d
https://github.com/qemu/qemu/commit/3af36d57457892c3088ee88de759d4f258c159a7

Signed-off-by: John Andersen 
---
 Documentation/virt/kvm/msr.rst   |  53 ++
 arch/x86/include/asm/kvm_host.h  |   7 ++
 arch/x86/include/uapi/asm/kvm_para.h |   7 ++
 arch/x86/kvm/cpuid.c |   3 +-
 arch/x86/kvm/emulate.c   |   3 +-
 arch/x86/kvm/kvm_emulate.h   |   2 +-
 arch/x86/kvm/svm/nested.c|  11 ++-
 arch/x86/kvm/vmx/nested.c|  10 ++-
 arch/x86/kvm/x86.c   | 106 ++-
 9 files changed, 193 insertions(+), 9 deletions(-)

diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
index e37a14c323d2..9fa43c4a5895 100644
--- a/Documentation/virt/kvm/msr.rst
+++ b/Documentation/virt/kvm/msr.rst
@@ -376,3 +376,56 @@ data:
write '1' to bit 0 of the MSR, this causes the host to re-scan its queue
and check if there are more notifications pending. The MSR is available
if KVM_FEATURE_ASYNC_PF_INT is present in CPUID.
+
+MSR_KVM_CR0_PIN_ALLOWED:
+   0x4b564d08
+MSR_KVM_CR4_PIN_ALLOWED:
+   0x4b564d09
+
+   Read only registers informing the guest which bits may be pinned for
+   each control register respectively via the CR pinned MSRs.
+
+data:
+   Bits which may be pinned.
+
+   Attempting to pin bits other than these will result in a failure when
+   writing to the respective CR pinned MSR.
+
+   Bits which are allowed to be pinned are WP for CR0 and SMEP, SMAP, and
+   UMIP for CR4.
+
+MSR_KVM_CR0_PINNED_LOW:
+   0x4b564d0a
+MSR_KVM_CR0_PINNED_HIGH:
+   0x4b564d0b
+MSR_KVM_CR4_PINNED_LOW:
+   0x4b564d0c
+MSR_KVM_CR4_PINNED_HIGH:
+   0x4b564d0d
+
+   Used to configure pinned bits in control registers
+
+data:
+   Bits to be pinned.
+
+   Fails if data contains bits which are not allowed to be pinned. Or if
+   attempting to pin bits high that are already pinned low, or vice versa.
+   Bits which are allow

[PATCH 3/4] selftests: kvm: add test for CR pinning with SMM

2020-06-17 Thread John Andersen
Check that paravirtualized control register pinning blocks modifications
of pinned CR values stored in SMRAM on exit from SMM.

Signed-off-by: John Andersen 
---
 tools/testing/selftests/kvm/.gitignore|   1 +
 tools/testing/selftests/kvm/Makefile  |   1 +
 .../selftests/kvm/include/x86_64/processor.h  |  13 ++
 .../selftests/kvm/x86_64/smm_cr_pin_test.c| 207 ++
 4 files changed, 222 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86_64/smm_cr_pin_test.c

diff --git a/tools/testing/selftests/kvm/.gitignore 
b/tools/testing/selftests/kvm/.gitignore
index 452787152748..c0666c3efcbb 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -10,6 +10,7 @@
 /x86_64/platform_info_test
 /x86_64/set_sregs_test
 /x86_64/smm_test
+/x86_64/smm_cr_pin_test
 /x86_64/state_test
 /x86_64/vmx_preemption_timer_test
 /x86_64/svm_vmcall_test
diff --git a/tools/testing/selftests/kvm/Makefile 
b/tools/testing/selftests/kvm/Makefile
index 4a166588d99f..c5c205637f38 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -45,6 +45,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/mmio_warning_test
 TEST_GEN_PROGS_x86_64 += x86_64/platform_info_test
 TEST_GEN_PROGS_x86_64 += x86_64/set_sregs_test
 TEST_GEN_PROGS_x86_64 += x86_64/smm_test
+TEST_GEN_PROGS_x86_64 += x86_64/smm_cr_pin_test
 TEST_GEN_PROGS_x86_64 += x86_64/state_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_preemption_timer_test
 TEST_GEN_PROGS_x86_64 += x86_64/svm_vmcall_test
diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h 
b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 82b7fe16a824..8a2da0449772 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -200,6 +200,11 @@ static inline uint64_t get_cr0(void)
return cr0;
 }
 
+static inline void set_cr0(uint64_t val)
+{
+   __asm__ __volatile__("mov %0, %%cr0" : : "r" (val) : "memory");
+}
+
 static inline uint64_t get_cr3(void)
 {
uint64_t cr3;
@@ -383,4 +388,12 @@ void kvm_get_cpu_address_width(unsigned int *pa_bits, 
unsigned int *va_bits);
 /* VMX_EPT_VPID_CAP bits */
 #define VMX_EPT_VPID_CAP_AD_BITS   (1ULL << 21)
 
+/* KVM MSRs */
+#define MSR_KVM_CR0_PIN_ALLOWED0x4b564d08
+#define MSR_KVM_CR4_PIN_ALLOWED0x4b564d09
+#define MSR_KVM_CR0_PINNED_LOW 0x4b564d0a
+#define MSR_KVM_CR0_PINNED_HIGH0x4b564d0b
+#define MSR_KVM_CR4_PINNED_LOW 0x4b564d0c
+#define MSR_KVM_CR4_PINNED_HIGH0x4b564d0d
+
 #endif /* SELFTEST_KVM_PROCESSOR_H */
diff --git a/tools/testing/selftests/kvm/x86_64/smm_cr_pin_test.c 
b/tools/testing/selftests/kvm/x86_64/smm_cr_pin_test.c
new file mode 100644
index ..a32f577ca1e5
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/smm_cr_pin_test.c
@@ -0,0 +1,207 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Tests for control register pinning not being affected by SMRAM writes.
+ */
+#define _GNU_SOURCE /* for program_invocation_short_name */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "test_util.h"
+
+#include "kvm_util.h"
+
+#include "processor.h"
+
+#define VCPU_ID  1
+
+#define PAGE_SIZE  4096
+
+#define SMRAM_SIZE 65536
+#define SMRAM_MEMSLOT ((1 << 16) | 1)
+#define SMRAM_PAGES (SMRAM_SIZE / PAGE_SIZE)
+#define SMRAM_GPA 0x100
+#define SMRAM_STAGE_SUCCESS 0xfe
+#define SMRAM_STAGE_FAILURE 0xfd
+
+#define XSTR(s) __stringify(s)
+
+#define SYNC_PORT 0xe
+#define DONE 0xff
+
+#define CR0_PINNED X86_CR0_WP
+#define CR4_PINNED (X86_CR4_SMAP | X86_CR4_UMIP)
+#define CR4_ALL (CR4_PINNED | X86_CR4_SMEP)
+
+/*
+ * This is compiled as normal 64-bit code, however, SMI handler is executed
+ * in real-address mode. To stay simple we're limiting ourselves to a mode
+ * independent subset of asm here.
+ * SMI handler always report back fixed stage SMRAM_STAGE_SUCCESS.
+ */
+uint8_t smi_handler_success[] = {
+   0xb0, SMRAM_STAGE_SUCCESS,/* mov $SMRAM_STAGE_SUCCESS, %al */
+   0xe4, SYNC_PORT,  /* in $SYNC_PORT, %al */
+   0x0f, 0xaa,   /* rsm */
+};
+uint8_t smi_handler_fault[] = {
+   0xb0, SMRAM_STAGE_FAILURE,/* mov $SMRAM_STAGE_FAILURE, %al */
+   0xe4, SYNC_PORT,  /* in $SYNC_PORT, %al */
+   0x0f, 0xaa,   /* rsm */
+};
+
+/* We opt not to use GUEST_SYNC() here because we also have to make a sync call
+ * from SMM. As such, the address of the ucall struct we'd need to pass isn't
+ * something we can put into the above machine code in a maintainable way
+ */
+static inline void sync_with_host(uint64_t phase)
+{
+   asm volatile("in $" XSTR(SYNC_PORT)", %%al\n"
+: "+a" (phase));
+}
+
+void self_smi(void)
+{
+   wrmsr(APIC_

[PATCH 4/4] X86: Use KVM CR pin MSRs

2020-06-17 Thread John Andersen
Strengthen existing control register pinning when running
paravirtualized under KVM. Check which bits KVM supports pinning for
each control register and only pin supported bits which are already
pinned via the existing native protection. Write to KVM CR0/4 pinned
MSRs to enable pinning.

Initiate KVM assisted pinning directly following the setup of native
pinning on boot CPU. For non-boot CPUs initiate paravirtualized pinning
on CPU identification.

Identification of non-boot CPUs takes place after the boot CPU has setup
native CR pinning. Therefore, non-boot CPUs access pinned bits setup by
the boot CPU and request that those be pinned. All CPUs request
paravirtualized pinning of the same bits which are already pinned
natively.

Guests using the kexec system call currently do not support
paravirtualized control register pinning. This is due to early boot
code writing known good values to control registers, these values do
not contain the protected bits. This is due to CPU feature
identification being done at a later time, when the kernel properly
checks if it can enable protections. As such, the pv_cr_pin command line
option has been added which instructs the kernel to disable kexec in
favor of enabling paravirtualized control register pinning. crashkernel
is also disabled when the pv_cr_pin parameter is specified due to its
reliance on kexec.

When we fix kexec, we will still need a way for a kernel with support to
know if the kernel it is attempting to load has support. If a kernel
with this enabled attempts to kexec a kernel where this is not
supported, it would trigger a fault almost immediately.

Liran suggested adding a section to the built image acting as a flag to
signify support for being kexec'd by a kernel with pinning enabled.
Should that approach be implemented, it is likely that the command line
flag (pv_cr_pin) would still be desired for some deprecation period. We
wouldn't want the default behavior to change from being able to kexec
older kernels to not being able to, as this might break some users
workflows.

Signed-off-by: John Andersen 
---
 .../admin-guide/kernel-parameters.txt | 11 ++
 arch/x86/Kconfig  | 10 +
 arch/x86/include/asm/kvm_para.h   | 28 +
 arch/x86/kernel/cpu/common.c  |  5 +++
 arch/x86/kernel/kvm.c | 39 +++
 arch/x86/kernel/setup.c   |  8 
 6 files changed, 101 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 89386f6f3ab6..54fb2b5ab8fc 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3926,6 +3926,17 @@
[KNL] Number of legacy pty's. Overwrites compiled-in
default number.
 
+   pv_cr_pin   [SECURITY,X86]
+   Enable paravirtualized control register pinning. When
+   running paravirutalized under KVM, request that KVM not
+   allow the guest to disable kernel protection features
+   set in CPU control registers. Specifying this option
+   will disable kexec (and crashkernel). If kexec support
+   has not been compiled into the kernel and host KVM
+   supports paravirtualized control register pinning, it
+   will be active by default without the need to specify
+   this parameter.
+
quiet   [KNL] Disable most log messages
 
r128=   [HW,DRM]
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 67f6a40b5e93..bc0b27483001 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -800,6 +800,7 @@ config KVM_GUEST
bool "KVM Guest support (including kvmclock)"
depends on PARAVIRT
select PARAVIRT_CLOCK
+   select PARAVIRT_CR_PIN
select ARCH_CPUIDLE_HALTPOLL
default y
---help---
@@ -835,6 +836,15 @@ config PARAVIRT_TIME_ACCOUNTING
 config PARAVIRT_CLOCK
bool
 
+config PARAVIRT_CR_PIN
+   bool "Paravirtual bit pinning for CR0 and CR4"
+   depends on KVM_GUEST
+   help
+ Select this option to have the virtualised guest request that the
+ hypervisor disallow it from disabling protections set in control
+ registers. The hypervisor will prevent exploits from disabling
+ features such as SMEP, SMAP, UMIP, and WP.
+
 config JAILHOUSE_GUEST
bool "Jailhouse non-root cell support"
depends on X86_64 && PCI
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 57fd1966c4ea..f021531e98dc 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -112,6 +112,23 @@ static inline void kvm_spinlock_init(void)

[PATCH 1/4] X86: Update mmu_cr4_features during feature identification

2020-06-17 Thread John Andersen
In identify_cpu when setting up SMEP/SMAP/UMIP call
cr4_set_bits_and_update_boot instead of cr4_set_bits. This ensures that
mmu_cr4_features contains those bits, and does not disable those
protections when in hibernation asm.

setup_arch updates mmu_cr4_features to save what identified features are
supported for later use in hibernation asm when cr4 needs to be modified
to toggle PGE. cr4 writes happen in restore_image and restore_registers.
setup_arch occurs before identify_cpu, this leads to mmu_cr4_features
not containing some of the cr4 features which were enabled via
identify_cpu when hibernation asm is executed.

On CPU bringup when cr4_set_bits_and_update_boot is called
mmu_cr4_features will now be written to. For the boot CPU,
the __ro_after_init on mmu_cr4_features does not cause a fault. However,
__ro_after_init was removed due to it triggering faults on non-boot
CPUs.

Signed-off-by: John Andersen 
---
 arch/x86/kernel/cpu/common.c | 6 +++---
 arch/x86/kernel/setup.c  | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index d07809286b95..921e67086a00 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -297,7 +297,7 @@ __setup("nosmep", setup_disable_smep);
 static __always_inline void setup_smep(struct cpuinfo_x86 *c)
 {
if (cpu_has(c, X86_FEATURE_SMEP))
-   cr4_set_bits(X86_CR4_SMEP);
+   cr4_set_bits_and_update_boot(X86_CR4_SMEP);
 }
 
 static __init int setup_disable_smap(char *arg)
@@ -316,7 +316,7 @@ static __always_inline void setup_smap(struct cpuinfo_x86 
*c)
 
if (cpu_has(c, X86_FEATURE_SMAP)) {
 #ifdef CONFIG_X86_SMAP
-   cr4_set_bits(X86_CR4_SMAP);
+   cr4_set_bits_and_update_boot(X86_CR4_SMAP);
 #else
cr4_clear_bits(X86_CR4_SMAP);
 #endif
@@ -333,7 +333,7 @@ static __always_inline void setup_umip(struct cpuinfo_x86 
*c)
if (!cpu_has(c, X86_FEATURE_UMIP))
goto out;
 
-   cr4_set_bits(X86_CR4_UMIP);
+   cr4_set_bits_and_update_boot(X86_CR4_UMIP);
 
pr_info_once("x86/cpu: User Mode Instruction Prevention (UMIP) 
activated\n");
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index a3767e74c758..d9c678b37a9b 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -138,9 +138,9 @@ EXPORT_SYMBOL(boot_cpu_data);
 
 
 #if !defined(CONFIG_X86_PAE) || defined(CONFIG_X86_64)
-__visible unsigned long mmu_cr4_features __ro_after_init;
+__visible unsigned long mmu_cr4_features;
 #else
-__visible unsigned long mmu_cr4_features __ro_after_init = X86_CR4_PAE;
+__visible unsigned long mmu_cr4_features = X86_CR4_PAE;
 #endif
 
 /* Boot loader ID and version as integers, for the benefit of proc_dointvec */
-- 
2.21.0