Re: [RFC PATCH v3 0/5] Hypervisor-Enforced Kernel Integrity - CR pinning

2024-05-03 Thread Sean Christopherson
On Fri, May 03, 2024, Mickaël Salaün wrote:
> Hi,
> 
> This patch series implements control-register (CR) pinning for KVM and
> provides an hypervisor-agnostic API to protect guests.  It includes the
> guest interface, the host interface, and the KVM implementation.
> 
> It's not ready for mainline yet (see the current limitations), but we
> think the overall design and interfaces are good and we'd like to have
> some feedback on that.

...

> # Current limitations
> 
> This patch series doesn't handle VM reboot, kexec, nor hybernate yet.
> We'd like to leverage the realated feature from KVM CR-pinning patch
> series [3].  Help appreciated!

Until you have a story for those scenarios, I don't expect you'll get a lot of
valuable feedback, or much feedback at all.  They were the hot topic for KVM CR
pinning, and they'll likely be the hot topic now.



[RFC PATCH v3 0/5] Hypervisor-Enforced Kernel Integrity - CR pinning

2024-05-03 Thread Mickaël Salaün
Hi,

This patch series implements control-register (CR) pinning for KVM and
provides an hypervisor-agnostic API to protect guests.  It includes the
guest interface, the host interface, and the KVM implementation.

It's not ready for mainline yet (see the current limitations), but we
think the overall design and interfaces are good and we'd like to have
some feedback on that.

# Changes since previous version

We choose to remove as much as possible from the previous version of
this patch series to only keep the CR pinning feature and the API.  This
makes the patches simpler and brings the foundation for future
enhancement.  This will also enables us to quickly iterate on new
versions.  We are still working on memory protection but that should be
part of another patch series, if possible once this one land.

We implemented proper KUnit tests and we are also improving the test
framework to make it easier to run tests (and another series is planed):
https://lore.kernel.org/r/20240408074625.65017-1-...@digikod.net
It makes sense to use KUnit for hypervisor-agnostic features.

This series is rebased on top of v6.9-rc6 . guest_memfd is now merged
in mainline, which will help upcoming memory-related changes.

# Overview

The main idea being that kernel self-protection mechanisms should be
delegated to a more privileged part of the system, that is the
hypervisor (see the Threat model below for more details).  It is still
the role of the guest kernel to request such restrictions according to
its configuration. The high-level security guarantees provided by the
hypervisor are semantically the same as a subset of those the kernel
already enforces on itself (CR pinning hardening), but with much strong
guarantees.

The guest kernel API layer contains a global struct heki_hypervisor to
share data and functions between the common code and the hypervisor
support code.  The struct heki_hypervisor enables to plug in different
backend implementations that are initialized with the heki_early_init()
and heki_late_init() calls.

We took inspiration from previous patches, mainly the KVMI [1] [2] and
KVM CR-pinning [3] series, revamped and simplified relevant parts to fit
well with our goal, added one hypercall, and created a kernel API for
VMs to request protection in a generic way that can be leveraged by any
hypervisor.

When a guest request to change one of its previously protected CR, KVM
creates a GP fault.

Because the VMM needs to be involved and know the guests' requested
memory permissions, we implemented two new kind of VM exits to be able
to notify the VMM about guests' Heki configurations and policy
violations.  Indeed, forwarding such signals to the VMM could help
improve attack detection, and react to such attempt (e.g. log events,
stop the VM).  Giving visibility to the VMM would also enable us to
migrate VMs.

# Threat model

The main threat model is a malicious user space process exploiting a
kernel vulnerability to gain more privileges or to bypass the
access-control system.  This threat also covers attacks coming from
network or storage data (e.g., malformed network packet, inconsistent
drive content).

Considering all potential ways to compromise a kernel, Heki's goal is to
harden a sane kernel before a runtime attack to make it more difficult,
and potentially to cause such an attack to fail. Because current attack
mitigations are only mitigations, we consider the kernel itself to be
partially malicious during its lifetime e.g., because a ROP attack that
could disable kernel self-protection mechanisms and make kernel
exploitation much easier. Indeed, an exploit is often split into several
stages, each bypassing some security measures (including CFI).

CR pinning should already be enforced by the guest kernel and the reason
to pin such registers is the same.  With this patch series it
significantly improve such protection.

Getting the guarantee that these control registers cannot be changed
increases the cost of an attack.

# Prerequisites

For this new security layer to be effective, guest kernels must be
trusted by the VM owners at boot time, before launching any user space
processes nor receiving potentially malicious network packets. It is
then required to have a security mechanism to provide or check this
initial trust (e.g., secure boot, kernel module signing).  To protect
against persistent attacks, complementary security mechanisms should be
used (e.g., IMA, IPE, Lockdown).

# How does it work?

The KVM_HC_LOCK_CR_UPDATE hypercall enables guests to pin some of its
CPU control register flags (e.g., X86_CR0_WP, X86_CR4_SMEP,
X86_CR4_SMAP).

Two new kinds of VM exits are implemented: one for a guest Heki request
(i.e. hypercall), and another for a guest attempt to change its pinned
CRs.  When the guest attempts to update pinned CRs or to access memory
in a way that is not allowed, the VMM can then be notified and react to
such attack attempt. After that, if the VM is still running, KVM sends
a GP fault