date:20220519

Re: [PATCH v6 21/22] KVM: Allow for different capacities in kvm_mmu_memory_cache structs

2022-05-19 Thread Anup Patel

On Tue, May 17, 2022 at 4:52 AM David Matlack  wrote:
>
> Allow the capacity of the kvm_mmu_memory_cache struct to be chosen at
> declaration time rather than being fixed for all declarations. This will
> be used in a follow-up commit to declare an cache in x86 with a capacity
> of 512+ objects without having to increase the capacity of all caches in
> KVM.
>
> This change requires each cache now specify its capacity at runtime,
> since the cache struct itself no longer has a fixed capacity known at
> compile time. To protect against someone accidentally defining a
> kvm_mmu_memory_cache struct directly (without the extra storage), this
> commit includes a WARN_ON() in kvm_mmu_topup_memory_cache().
>
> In order to support different capacities, this commit changes the
> objects pointer array to be dynamically allocated the first time the
> cache is topped-up.
>
> While here, opportunistically clean up the stack-allocated
> kvm_mmu_memory_cache structs in riscv and arm64 to use designated
> initializers.
>
> No functional change intended.
>
> Reviewed-by: Marc Zyngier 
> Signed-off-by: David Matlack 

Looks good to me for KVM RISC-V.

Reviewed-by: Anup Patel 

A small heads-up that function stage2_ioremap() is going to be
renamed for Linux-5.19 so you might have to rebase one more time.

Thanks,
Anup

> ---
>  arch/arm64/kvm/mmu.c  |  2 +-
>  arch/riscv/kvm/mmu.c  |  5 +
>  include/linux/kvm_types.h |  6 +-
>  virt/kvm/kvm_main.c   | 33 ++---
>  4 files changed, 37 insertions(+), 9 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 53ae2c0640bc..f443ed845f85 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -764,7 +764,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t 
> guest_ipa,
>  {
> phys_addr_t addr;
> int ret = 0;
> -   struct kvm_mmu_memory_cache cache = { 0, __GFP_ZERO, NULL, };
> +   struct kvm_mmu_memory_cache cache = { .gfp_zero = __GFP_ZERO };
> struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
> enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_DEVICE |
>  KVM_PGTABLE_PROT_R |
> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index f80a34fbf102..4d95ebe4114f 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
> @@ -347,10 +347,7 @@ static int stage2_ioremap(struct kvm *kvm, gpa_t gpa, 
> phys_addr_t hpa,
> int ret = 0;
> unsigned long pfn;
> phys_addr_t addr, end;
> -   struct kvm_mmu_memory_cache pcache;
> -
> -   memset(, 0, sizeof(pcache));
> -   pcache.gfp_zero = __GFP_ZERO;
> +   struct kvm_mmu_memory_cache pcache = { .gfp_zero = __GFP_ZERO };
>
> end = (gpa + size + PAGE_SIZE - 1) & PAGE_MASK;
> pfn = __phys_to_pfn(hpa);
> diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
> index ac1ebb37a0ff..68529884eaf8 100644
> --- a/include/linux/kvm_types.h
> +++ b/include/linux/kvm_types.h
> @@ -83,12 +83,16 @@ struct gfn_to_pfn_cache {
>   * MMU flows is problematic, as is triggering reclaim, I/O, etc... while
>   * holding MMU locks.  Note, these caches act more like prefetch buffers than
>   * classical caches, i.e. objects are not returned to the cache on being 
> freed.
> + *
> + * The @capacity field and @objects array are lazily initialized when the 
> cache
> + * is topped up (__kvm_mmu_topup_memory_cache()).
>   */
>  struct kvm_mmu_memory_cache {
> int nobjs;
> gfp_t gfp_zero;
> struct kmem_cache *kmem_cache;
> -   void *objects[KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE];
> +   int capacity;
> +   void **objects;
>  };
>  #endif
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index e089db822c12..5e2e75014256 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -369,14 +369,31 @@ static inline void *mmu_memory_cache_alloc_obj(struct 
> kvm_mmu_memory_cache *mc,
> return (void *)__get_free_page(gfp_flags);
>  }
>
> -int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min)
> +static int __kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int 
> capacity, int min)
>  {
> +   gfp_t gfp = GFP_KERNEL_ACCOUNT;
> void *obj;
>
> if (mc->nobjs >= min)
> return 0;
> -   while (mc->nobjs < ARRAY_SIZE(mc->objects)) {
> -   obj = mmu_memory_cache_alloc_obj(mc, GFP_KERNEL_ACCOUNT);
> +
> +   if (unlikely(!mc->objects)) {
> +   if (WARN_ON_ONCE(!capacity))
> +   return -EIO;
> +
> +   mc->objects = kvmalloc_array(sizeof(void *), capacity, gfp);
> +   if (!mc->objects)
> +   return -ENOMEM;
> +
> +   mc->capacity = capacity;
> +   }
> +
> +   /* It is illegal to request a different capacity across topups. */
> +   if (WARN_ON_ONCE(mc->capacity != capacity))
> +   return -EIO;

[PATCH 89/89] Documentation: KVM: Add some documentation for Protected KVM on arm64

2022-05-19 Thread Will Deacon

Add some initial documentation for the Protected KVM (pKVM) feature on
arm64, describing the user ABI for creating protected VMs as well as
their limitations.

Signed-off-by: Will Deacon 
---
 .../admin-guide/kernel-parameters.txt |  4 +-
 Documentation/virt/kvm/arm/index.rst  |  1 +
 Documentation/virt/kvm/arm/pkvm.rst   | 96 +++
 3 files changed, 100 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/virt/kvm/arm/pkvm.rst

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 63a764ec7fec..b8841a969f59 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2437,7 +2437,9 @@
  protected guests.
 
protected: nVHE-based mode with support for guests whose
-  state is kept private from the host.
+  state is kept private from the host. See
+  Documentation/virt/kvm/arm/pkvm.rst for more
+  information about this mode of operation.
 
Defaults to VHE/nVHE based on hardware support. Setting
mode to "protected" will disable kexec and hibernation
diff --git a/Documentation/virt/kvm/arm/index.rst 
b/Documentation/virt/kvm/arm/index.rst
index b4067da3fcb6..49c388df662a 100644
--- a/Documentation/virt/kvm/arm/index.rst
+++ b/Documentation/virt/kvm/arm/index.rst
@@ -9,6 +9,7 @@ ARM
 
hyp-abi
hypercalls
+   pkvm
psci
pvtime
ptp_kvm
diff --git a/Documentation/virt/kvm/arm/pkvm.rst 
b/Documentation/virt/kvm/arm/pkvm.rst
new file mode 100644
index ..64f099a5ac2e
--- /dev/null
+++ b/Documentation/virt/kvm/arm/pkvm.rst
@@ -0,0 +1,96 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Protected virtual machines (pKVM)
+=
+
+Introduction
+
+
+Protected KVM (pKVM) is a KVM/arm64 extension which uses the two-stage
+translation capability of the Armv8 MMU to isolate guest memory from the host
+system. This allows for the creation of a confidential computing environment
+without relying on whizz-bang features in hardware, but still allowing room for
+complementary technologies such as memory encryption and hardware-backed
+attestation.
+
+The major implementation change brought about by pKVM is that the hypervisor
+code running at EL2 is now largely independent of (and isolated from) the rest
+of the host kernel running at EL1 and therefore additional hypercalls are
+introduced to manage manipulation of guest stage-2 page tables, creation of VM
+data structures and reclamation of memory on teardown. An immediate consequence
+of this change is that the host itself runs with an identity mapping enabled
+at stage-2, providing the hypervisor code with a mechanism to restrict host
+access to an arbitrary physical page.
+
+Enabling pKVM
+-
+
+The pKVM hypervisor is enabled by booting the host kernel at EL2 with
+"``kvm-arm.mode=protected``" on the command-line. Once enabled, VMs can be 
spawned
+in either protected or non-protected state, although the hypervisor is still
+responsible for managing most of the VM metadata in either case.
+
+Limitations
+---
+
+Enabling pKVM places some significant limitations on KVM guests, regardless of
+whether they are spawned in protected state. It is therefore recommended only
+to enable pKVM if protected VMs are required, with non-protected state acting
+primarily as a debug and development aid.
+
+If you're still keen, then here is an incomplete list of caveats that apply
+to all VMs running under pKVM:
+
+- Guest memory cannot be file-backed (with the exception of shmem/memfd) and is
+  pinned as it is mapped into the guest. This prevents the host from
+  swapping-out, migrating, merging or generally doing anything useful with the
+  guest pages. It also requires that the VMM has either ``CAP_IPC_LOCK`` or
+  sufficient ``RLIMIT_MEMLOCK`` to account for this pinned memory.
+
+- GICv2 is not supported and therefore GICv3 hardware is required in order
+  to expose a virtual GICv3 to the guest.
+
+- Read-only memslots are unsupported and therefore dirty logging cannot be
+  enabled.
+
+- Memslot configuration is fixed once a VM has started running, with subsequent
+  move or deletion requests being rejected with ``-EPERM``.
+
+- There are probably many others.
+
+Since the host is unable to tear down the hypervisor when pKVM is enabled,
+hibernation (``CONFIG_HIBERNATION``) and kexec (``CONFIG_KEXEC``) will fail
+with ``-EBUSY``.
+
+If you are not happy with these limitations, then please don't enable pKVM :)
+
+VM creation
+---
+
+When pKVM is enabled, protected VMs can be created by specifying the
+``KVM_VM_TYPE_ARM_PROTECTED`` flag in the machine type identifier parameter
+passed to ``KVM_CREATE_VM``.
+

[PATCH 88/89] KVM: arm64: Introduce KVM_VM_TYPE_ARM_PROTECTED machine type for PVMs

2022-05-19 Thread Will Deacon

Introduce a new virtual machine type, KVM_VM_TYPE_ARM_PROTECTED, which
specifies that the guest memory pages are to be unmapped from the host
stage-2 by the hypervisor.

Signed-off-by: Will Deacon 
---
 arch/arm64/include/asm/kvm_pkvm.h |  2 +-
 arch/arm64/kvm/arm.c  |  5 -
 arch/arm64/kvm/mmu.c  |  3 ---
 arch/arm64/kvm/pkvm.c | 10 +-
 include/uapi/linux/kvm.h  |  6 ++
 5 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pkvm.h 
b/arch/arm64/include/asm/kvm_pkvm.h
index 062ae2ffbdfb..952e3c3fa32d 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -16,7 +16,7 @@
 
 #define HYP_MEMBLOCK_REGIONS 128
 
-int kvm_init_pvm(struct kvm *kvm);
+int kvm_init_pvm(struct kvm *kvm, unsigned long type);
 int kvm_shadow_create(struct kvm *kvm);
 void kvm_shadow_destroy(struct kvm *kvm);
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 9c5a935a9a73..26fd69727c81 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -141,11 +141,14 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
int ret;
 
+   if (type & ~KVM_VM_TYPE_MASK)
+   return -EINVAL;
+
ret = kvm_share_hyp(kvm, kvm + 1);
if (ret)
return ret;
 
-   ret = kvm_init_pvm(kvm);
+   ret = kvm_init_pvm(kvm, type);
if (ret)
goto err_unshare_kvm;
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 137d4382ed1c..392ff7b2362d 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -652,9 +652,6 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu 
*mmu, unsigned long t
u64 mmfr0, mmfr1;
u32 phys_shift;
 
-   if (type & ~KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
-   return -EINVAL;
-
phys_shift = KVM_VM_TYPE_ARM_IPA_SIZE(type);
if (is_protected_kvm_enabled()) {
phys_shift = kvm_ipa_limit;
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 67aad91dc3e5..ebf93ff6a77e 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -218,8 +218,16 @@ void kvm_shadow_destroy(struct kvm *kvm)
}
 }
 
-int kvm_init_pvm(struct kvm *kvm)
+int kvm_init_pvm(struct kvm *kvm, unsigned long type)
 {
mutex_init(>arch.pkvm.shadow_lock);
+
+   if (!(type & KVM_VM_TYPE_ARM_PROTECTED))
+   return 0;
+
+   if (!is_protected_kvm_enabled())
+   return -EINVAL;
+
+   kvm->arch.pkvm.enabled = true;
return 0;
 }
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 91a6fe4e02c0..fdb0289cfecc 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -887,6 +887,12 @@ struct kvm_ppc_resize_hpt {
 #define KVM_VM_TYPE_ARM_IPA_SIZE_MASK  0xffULL
 #define KVM_VM_TYPE_ARM_IPA_SIZE(x)\
((x) & KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
+
+#define KVM_VM_TYPE_ARM_PROTECTED  (1UL << 8)
+
+#define KVM_VM_TYPE_MASK   (KVM_VM_TYPE_ARM_IPA_SIZE_MASK | \
+KVM_VM_TYPE_ARM_PROTECTED)
+
 /*
  * ioctls for /dev/kvm fds:
  */
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 87/89] KVM: arm64: Expose memory sharing hypercalls to protected guests

2022-05-19 Thread Will Deacon

Extend our KVM "vendor" hypercalls to expose three new hypercalls to
protected guests for the purpose of opening and closing shared memory
windows with the host:

  MEMINFO:  Query the stage-2 page size (i.e. the minimum granule at
which memory can be shared)

  MEM_SHARE:Share a page RWX with the host, faulting the page in if
necessary.

  MEM_UNSHARE:  Unshare a page with the host. Subsequent host accesses
to the page will result in a fault being injected by the
hypervisor.

Signed-off-by: Will Deacon 
---
 Documentation/virt/kvm/arm/hypercalls.rst |  72 ++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c|  24 -
 arch/arm64/kvm/hyp/nvhe/pkvm.c| 109 +-
 include/linux/arm-smccc.h |  21 +
 4 files changed, 223 insertions(+), 3 deletions(-)

diff --git a/Documentation/virt/kvm/arm/hypercalls.rst 
b/Documentation/virt/kvm/arm/hypercalls.rst
index 17be111f493f..d96c9fd7d8c5 100644
--- a/Documentation/virt/kvm/arm/hypercalls.rst
+++ b/Documentation/virt/kvm/arm/hypercalls.rst
@@ -44,3 +44,75 @@ Provides a discovery mechanism for other KVM/arm64 
hypercalls.
 
 
 See ptp_kvm.rst
+
+``ARM_SMCCC_KVM_FUNC_HYP_MEMINFO``
+--
+
+Query the memory protection parameters for a protected virtual machine.
+
++-+-+
+| Presence:   | Optional; protected guests only.   
 |
++-+-+
+| Calling convention: | HVC64  
 |
++-+--+--+
+| Function ID:| (uint32) | 0xC602  
 |
++-+--++-+
+| Arguments:  | (uint64) | R1 | Reserved / Must be zero
 |
+| 
+--++-+
+| | (uint64) | R2 | Reserved / Must be zero
 |
+| 
+--++-+
+| | (uint64) | R3 | Reserved / Must be zero
 |
++-+--++-+
+| Return Values:  | (int64)  | R0 | ``INVALID_PARAMETER (-3)`` on error, 
else   |
+| |  || memory protection granule in bytes 
 |
++-+--++-+
+
+``ARM_SMCCC_KVM_FUNC_MEM_SHARE``
+
+
+Share a region of memory with the KVM host, granting it read, write and execute
+permissions. The size of the region is equal to the memory protection granule
+advertised by ``ARM_SMCCC_KVM_FUNC_HYP_MEMINFO``.
+
++-+-+
+| Presence:   | Optional; protected guests only.   
 |
++-+-+
+| Calling convention: | HVC64  
 |
++-+--+--+
+| Function ID:| (uint32) | 0xC603  
 |
++-+--++-+
+| Arguments:  | (uint64) | R1 | Base IPA of memory region to share 
 |
+| 
+--++-+
+| | (uint64) | R2 | Reserved / Must be zero
 |
+| 
+--++-+
+| | (uint64) | R3 | Reserved / Must be zero
 |
++-+--++-+
+| Return Values:  | (int64)  | R0 | ``SUCCESS (0)``
 |
+| |  |
+-+
+| |  || ``INVALID_PARAMETER (-3)`` 
 |
++-+--++-+
+
+``ARM_SMCCC_KVM_FUNC_MEM_UNSHARE``
+--
+
+Revoke access permission from the KVM host to a memory region previously shared
+with ``ARM_SMCCC_KVM_FUNC_MEM_SHARE``. The size of the region is equal to the
+memory protection granule advertised by ``ARM_SMCCC_KVM_FUNC_HYP_MEMINFO``.
+
++-+-+
+|

[PATCH 85/89] KVM: arm64: Document the KVM/arm64-specific calls in hypercalls.rst

2022-05-19 Thread Will Deacon

KVM/arm64 makes use of the SMCCC "Vendor Specific Hypervisor Service
Call Range" to expose KVM-specific hypercalls to guests in a
discoverable and extensible fashion.

Document the existence of this interface and the discovery hypercall.

Signed-off-by: Will Deacon 
---
 Documentation/virt/kvm/arm/hypercalls.rst | 46 +++
 Documentation/virt/kvm/arm/index.rst  |  1 +
 2 files changed, 47 insertions(+)
 create mode 100644 Documentation/virt/kvm/arm/hypercalls.rst

diff --git a/Documentation/virt/kvm/arm/hypercalls.rst 
b/Documentation/virt/kvm/arm/hypercalls.rst
new file mode 100644
index ..17be111f493f
--- /dev/null
+++ b/Documentation/virt/kvm/arm/hypercalls.rst
@@ -0,0 +1,46 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===
+KVM/arm64-specific hypercalls exposed to guests
+===
+
+This file documents the KVM/arm64-specific hypercalls which may be
+exposed by KVM/arm64 to guest operating systems. These hypercalls are
+issued using the HVC instruction according to version 1.1 of the Arm SMC
+Calling Convention (DEN0028/C):
+
+https://developer.arm.com/docs/den0028/c
+
+All KVM/arm64-specific hypercalls are allocated within the "Vendor
+Specific Hypervisor Service Call" range with a UID of
+``28b46fb6-2ec5-11e9-a9ca-4b564d003a74``. This UID should be queried by the
+guest using the standard "Call UID" function for the service range in
+order to determine that the KVM/arm64-specific hypercalls are available.
+
+``ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID``
+-
+
+Provides a discovery mechanism for other KVM/arm64 hypercalls.
+
++-+-+
+| Presence:   | Mandatory for the KVM/arm64 UID
 |
++-+-+
+| Calling convention: | HVC32  
 |
++-+--+--+
+| Function ID:| (uint32) | 0x8600  
 |
++-+--+--+
+| Arguments:  | None   
 |
++-+--++-+
+| Return Values:  | (uint32) | R0 | Bitmap of available function numbers 
0-31   |
+| 
+--++-+
+| | (uint32) | R1 | Bitmap of available function numbers 
32-63  |
+| 
+--++-+
+| | (uint32) | R2 | Bitmap of available function numbers 
64-95  |
+| 
+--++-+
+| | (uint32) | R3 | Bitmap of available function numbers 
96-127 |
++-+--++-+
+
+``ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID``
+
+
+See ptp_kvm.rst
diff --git a/Documentation/virt/kvm/arm/index.rst 
b/Documentation/virt/kvm/arm/index.rst
index 78a9b670aafe..b4067da3fcb6 100644
--- a/Documentation/virt/kvm/arm/index.rst
+++ b/Documentation/virt/kvm/arm/index.rst
@@ -8,6 +8,7 @@ ARM
:maxdepth: 2
 
hyp-abi
+   hypercalls
psci
pvtime
ptp_kvm
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 86/89] KVM: arm64: Reformat/beautify PTP hypercall documentation

2022-05-19 Thread Will Deacon

The PTP hypercall documentation doesn't produce the best-looking table
when formatting in HTML as all of the return value definitions end up
on the same line.

Reformat the PTP hypercall documentation to follow the formatting used
by hypercalls.rst.

Signed-off-by: Will Deacon 
---
 Documentation/virt/kvm/arm/ptp_kvm.rst | 38 --
 1 file changed, 24 insertions(+), 14 deletions(-)

diff --git a/Documentation/virt/kvm/arm/ptp_kvm.rst 
b/Documentation/virt/kvm/arm/ptp_kvm.rst
index aecdc80ddcd8..7c0960970a0e 100644
--- a/Documentation/virt/kvm/arm/ptp_kvm.rst
+++ b/Documentation/virt/kvm/arm/ptp_kvm.rst
@@ -7,19 +7,29 @@ PTP_KVM is used for high precision time sync between host and 
guests.
 It relies on transferring the wall clock and counter value from the
 host to the guest using a KVM-specific hypercall.
 
-* ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID: 0x8601
+``ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID``
+
 
-This hypercall uses the SMC32/HVC32 calling convention:
+Retrieve current time information for the specific counter. There are no
+endianness restrictions.
 
-ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID
-===
-Function ID:  (uint32)0x8601
-Arguments:(uint32)KVM_PTP_VIRT_COUNTER(0)
-  KVM_PTP_PHYS_COUNTER(1)
-Return Values:(int32) NOT_SUPPORTED(-1) on error, or
-  (uint32)Upper 32 bits of wall clock time (r0)
-  (uint32)Lower 32 bits of wall clock time (r1)
-  (uint32)Upper 32 bits of counter (r2)
-  (uint32)Lower 32 bits of counter (r3)
-Endianness:   No Restrictions.
-===
++-+---+
+| Presence:   | Optional  |
++-+---+
+| Calling convention: | HVC32 |
++-+--++
+| Function ID:| (uint32) | 0x8601 |
++-+--++---+
+| Arguments:  | (uint32) | R1 | ``KVM_PTP_VIRT_COUNTER (0)``  |
+| |  |+---+
+| |  || ``KVM_PTP_PHYS_COUNTER (1)``  |
++-+--++---+
+| Return Values:  | (int32)  | R0 | ``NOT_SUPPORTED (-1)`` on error, else |
+| |  || upper 32 bits of wall clock time  |
+| +--++---+
+| | (uint32) | R1 | Lower 32 bits of wall clock time  |
+| +--++---+
+| | (uint32) | R2 | Upper 32 bits of counter  |
+| +--++---+
+| | (uint32) | R3 | Lower 32 bits of counter  |
++-+--++---+
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 84/89] KVM: arm64: Extend memory sharing to allow guest-to-host transitions

2022-05-19 Thread Will Deacon

A guest that can only operate on private memory is pretty useless, as it
has no way to share buffers with the host for things like virtio.

Extend our memory protection mechanisms to support the sharing and
unsharing of guest pages from the guest to the host. For now, this
functionality is unused but will later be exposed to the guest via
hypercalls.

Signed-off-by: Will Deacon 
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |   2 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 232 ++
 2 files changed, 234 insertions(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h 
b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index b01b5cdb38de..e0bbb1726fa3 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -70,6 +70,8 @@ int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
 int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
 int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct kvm_vcpu *vcpu);
 int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct kvm_vcpu *vcpu);
+int __pkvm_guest_share_host(struct kvm_vcpu *vcpu, u64 ipa);
+int __pkvm_guest_unshare_host(struct kvm_vcpu *vcpu, u64 ipa);
 
 bool addr_is_memory(phys_addr_t phys);
 int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot 
prot);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c 
b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 8459dc33e460..d839bb573b49 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -780,11 +780,41 @@ static int __host_ack_transition(u64 addr, const struct 
pkvm_mem_transition *tx,
return __host_check_page_state_range(addr, size, state);
 }
 
+static int host_ack_share(u64 addr, const struct pkvm_mem_transition *tx,
+ enum kvm_pgtable_prot perms)
+{
+   if (perms != PKVM_HOST_MEM_PROT)
+   return -EPERM;
+
+   return __host_ack_transition(addr, tx, PKVM_NOPAGE);
+}
+
 static int host_ack_donation(u64 addr, const struct pkvm_mem_transition *tx)
 {
return __host_ack_transition(addr, tx, PKVM_NOPAGE);
 }
 
+static int host_ack_unshare(u64 addr, const struct pkvm_mem_transition *tx)
+{
+   return __host_ack_transition(addr, tx, PKVM_PAGE_SHARED_BORROWED);
+}
+
+static int host_complete_share(u64 addr, const struct pkvm_mem_transition *tx,
+  enum kvm_pgtable_prot perms)
+{
+   u64 size = tx->nr_pages * PAGE_SIZE;
+
+   return __host_set_page_state_range(addr, size, 
PKVM_PAGE_SHARED_BORROWED);
+}
+
+static int host_complete_unshare(u64 addr, const struct pkvm_mem_transition 
*tx)
+{
+   u8 owner_id = tx->initiator.id;
+   u64 size = tx->nr_pages * PAGE_SIZE;
+
+   return host_stage2_set_owner_locked(addr, size, owner_id);
+}
+
 static int host_complete_donation(u64 addr, const struct pkvm_mem_transition 
*tx)
 {
u64 size = tx->nr_pages * PAGE_SIZE;
@@ -970,6 +1000,120 @@ static int guest_complete_donation(u64 addr, const 
struct pkvm_mem_transition *t
  prot, >arch.pkvm_memcache);
 }
 
+static int __guest_get_completer_addr(u64 *completer_addr, phys_addr_t phys,
+ const struct pkvm_mem_transition *tx)
+{
+   switch (tx->completer.id) {
+   case PKVM_ID_HOST:
+   *completer_addr = phys;
+   break;
+   case PKVM_ID_HYP:
+   *completer_addr = (u64)__hyp_va(phys);
+   break;
+   default:
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
+static int __guest_request_page_transition(u64 *completer_addr,
+  const struct pkvm_mem_transition *tx,
+  enum pkvm_page_state desired)
+{
+   struct kvm_vcpu *vcpu = tx->initiator.guest.vcpu;
+   struct kvm_shadow_vm *vm = get_shadow_vm(vcpu);
+   enum pkvm_page_state state;
+   phys_addr_t phys;
+   kvm_pte_t pte;
+   u32 level;
+   int ret;
+
+   if (tx->nr_pages != 1)
+   return -E2BIG;
+
+   ret = kvm_pgtable_get_leaf(>pgt, tx->initiator.addr, , );
+   if (ret)
+   return ret;
+
+   state = guest_get_page_state(pte);
+   if (state == PKVM_NOPAGE)
+   return -EFAULT;
+
+   if (state != desired)
+   return -EPERM;
+
+   /*
+* We only deal with page granular mappings in the guest for now as
+* the pgtable code relies on being able to recreate page mappings
+* lazily after zapping a block mapping, which doesn't work once the
+* pages have been donated.
+*/
+   if (level != KVM_PGTABLE_MAX_LEVELS - 1)
+   return -EINVAL;
+
+   phys = kvm_pte_to_phys(pte);
+   if (!addr_is_allowed_memory(phys))
+   return -EINVAL;
+
+   return __guest_get_completer_addr(completer_addr, phys, tx);
+}
+
+static int

[PATCH 83/89] KVM: arm64: Avoid BBM when changing only s/w bits in Stage-2 PTE

2022-05-19 Thread Will Deacon

Break-before-make (BBM) can be expensive, as transitioning via an
invalid mapping (i.e. the "break" step) requires the completion of TLB
invalidation and can also cause other agents to fault concurrently on
the invalid mapping.

Since BBM is not required when changing only the software bits of a PTE,
avoid the sequence in this case and just update the PTE directly.

Signed-off-by: Will Deacon 
---
 arch/arm64/kvm/hyp/pgtable.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 2069e6833831..756bbb15c1f3 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -744,6 +744,13 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, 
u32 level,
if (!stage2_pte_needs_update(old, new))
return -EAGAIN;
 
+   /*
+* If we're only changing software bits, then we don't need to
+* do anything else/
+*/
+   if (!((old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
+   goto out_set_pte;
+
stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
}
 
@@ -754,9 +761,11 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, 
u32 level,
if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
 
-   smp_store_release(ptep, new);
if (stage2_pte_is_counted(new))
mm_ops->get_page(ptep);
+
+out_set_pte:
+   smp_store_release(ptep, new);
if (kvm_phys_is_valid(phys))
data->phys += granule;
return 0;
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 82/89] KVM: arm64: Support TLB invalidation in guest context

2022-05-19 Thread Will Deacon

Typically, TLB invalidation of guest stage-2 mappings using nVHE is
performed by a hypercall originating from the host. For the invalidation
instruction to be effective, therefore, __tlb_switch_to_{guest,host}()
swizzle the active stage-2 context around the TLBI instruction.

With guest-to-host memory sharing and unsharing hypercalls originating
from the guest under pKVM, there is now a need to support both guest
and host VMID invalidations issued from guest context.

Replace the __tlb_switch_to_{guest,host}() functions with a more general
{enter,exit}_vmid_context() implementation which supports being invoked
from guest context and acts as a no-op if the target context matches the
running context.

Signed-off-by: Will Deacon 
---
 arch/arm64/kvm/hyp/nvhe/tlb.c | 96 ---
 1 file changed, 78 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
index d296d617f589..3f5601176fab 100644
--- a/arch/arm64/kvm/hyp/nvhe/tlb.c
+++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
@@ -11,26 +11,62 @@
 #include 
 
 struct tlb_inv_context {
-   u64 tcr;
+   struct kvm_s2_mmu   *mmu;
+   u64 tcr;
+   u64 sctlr;
 };
 
-static void __tlb_switch_to_guest(struct kvm_s2_mmu *mmu,
- struct tlb_inv_context *cxt)
+static void enter_vmid_context(struct kvm_s2_mmu *mmu,
+  struct tlb_inv_context *cxt)
 {
+   struct kvm_s2_mmu *host_mmu = _kvm.arch.mmu;
+   struct kvm_cpu_context *host_ctxt;
+   struct kvm_vcpu *vcpu;
+
+   host_ctxt = _cpu_ptr(_host_data)->host_ctxt;
+   vcpu = host_ctxt->__hyp_running_vcpu;
+   cxt->mmu = NULL;
+
+   /*
+* If we're already in the desired context, then there's nothing
+* to do.
+*/
+   if (vcpu) {
+   if (mmu == vcpu->arch.hw_mmu || WARN_ON(mmu != host_mmu))
+   return;
+   } else if (mmu == host_mmu) {
+   return;
+   }
+
+   cxt->mmu = mmu;
if (cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT)) {
u64 val;
 
/*
 * For CPUs that are affected by ARM 1319367, we need to
-* avoid a host Stage-1 walk while we have the guest's
-* VMID set in the VTTBR in order to invalidate TLBs.
-* We're guaranteed that the S1 MMU is enabled, so we can
-* simply set the EPD bits to avoid any further TLB fill.
+* avoid a Stage-1 walk with the old VMID while we have
+* the new VMID set in the VTTBR in order to invalidate TLBs.
+* We're guaranteed that the host S1 MMU is enabled, so
+* we can simply set the EPD bits to avoid any further
+* TLB fill. For guests, we ensure that the S1 MMU is
+* temporarily enabled in the next context.
 */
val = cxt->tcr = read_sysreg_el1(SYS_TCR);
val |= TCR_EPD1_MASK | TCR_EPD0_MASK;
write_sysreg_el1(val, SYS_TCR);
isb();
+
+   if (vcpu) {
+   val = cxt->sctlr = read_sysreg_el1(SYS_SCTLR);
+   if (!(val & SCTLR_ELx_M)) {
+   val |= SCTLR_ELx_M;
+   write_sysreg_el1(val, SYS_SCTLR);
+   isb();
+   }
+   } else {
+   /* The host S1 MMU is always enabled. */
+   cxt->sctlr = SCTLR_ELx_M;
+   }
}
 
/*
@@ -39,20 +75,44 @@ static void __tlb_switch_to_guest(struct kvm_s2_mmu *mmu,
 * ensuring that we always have an ISB, but not two ISBs back
 * to back.
 */
-   __load_stage2(mmu, kern_hyp_va(mmu->arch));
+   if (vcpu)
+   __load_host_stage2();
+   else
+   __load_stage2(mmu, kern_hyp_va(mmu->arch));
+
asm(ALTERNATIVE("isb", "nop", ARM64_WORKAROUND_SPECULATIVE_AT));
 }
 
-static void __tlb_switch_to_host(struct tlb_inv_context *cxt)
+static void exit_vmid_context(struct tlb_inv_context *cxt)
 {
-   __load_host_stage2();
+   struct kvm_s2_mmu *mmu = cxt->mmu;
+   struct kvm_cpu_context *host_ctxt;
+   struct kvm_vcpu *vcpu;
+
+   host_ctxt = _cpu_ptr(_host_data)->host_ctxt;
+   vcpu = host_ctxt->__hyp_running_vcpu;
+
+   if (!mmu)
+   return;
+
+   if (vcpu)
+   __load_stage2(mmu, kern_hyp_va(mmu->arch));
+   else
+   __load_host_stage2();
 
if (cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT)) {
-   /* Ensure write of the host VMID */
+   /* Ensure write of the old VMID */
isb();
-   /* Restore the host's TCR_EL1 */
+
+   if (!(cxt->sctlr &

[PATCH 81/89] KVM: arm64: Inject SIGSEGV on illegal accesses

2022-05-19 Thread Will Deacon

From: Quentin Perret 

The pKVM hypervisor will currently panic if the host tries to access
memory that it doesn't own (e.g. protected guest memory). Sadly, as
guest memory can still be mapped into the VMM's address space, userspace
can trivially crash the kernel/hypervisor by poking into guest memory.

To prevent this, inject the abort back in the host with S1PTW set in the
ESR, hence allowing the host to differentiate this abort from normal
userspace faults and inject a SIGSEGV cleanly.

Signed-off-by: Quentin Perret 
---
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 50 ++-
 arch/arm64/mm/fault.c | 22 
 2 files changed, 71 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c 
b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index d0544259eb01..8459dc33e460 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -549,6 +549,50 @@ static int host_stage2_idmap(u64 addr)
return ret;
 }
 
+static void host_inject_abort(struct kvm_cpu_context *host_ctxt)
+{
+   u64 spsr = read_sysreg_el2(SYS_SPSR);
+   u64 esr = read_sysreg_el2(SYS_ESR);
+   u64 ventry, ec;
+
+   /* Repaint the ESR to report a same-level fault if taken from EL1 */
+   if ((spsr & PSR_MODE_MASK) != PSR_MODE_EL0t) {
+   ec = ESR_ELx_EC(esr);
+   if (ec == ESR_ELx_EC_DABT_LOW)
+   ec = ESR_ELx_EC_DABT_CUR;
+   else if (ec == ESR_ELx_EC_IABT_LOW)
+   ec = ESR_ELx_EC_IABT_CUR;
+   else
+   WARN_ON(1);
+   esr &= ~ESR_ELx_EC_MASK;
+   esr |= ec << ESR_ELx_EC_SHIFT;
+   }
+
+   /*
+* Since S1PTW should only ever be set for stage-2 faults, we're pretty
+* much guaranteed that it won't be set in ESR_EL1 by the hardware. So,
+* let's use that bit to allow the host abort handler to differentiate
+* this abort from normal userspace faults.
+*
+* Note: although S1PTW is RES0 at EL1, it is guaranteed by the
+* architecture to be backed by flops, so it should be safe to use.
+*/
+   esr |= ESR_ELx_S1PTW;
+
+   write_sysreg_el1(esr, SYS_ESR);
+   write_sysreg_el1(spsr, SYS_SPSR);
+   write_sysreg_el1(read_sysreg_el2(SYS_ELR), SYS_ELR);
+   write_sysreg_el1(read_sysreg_el2(SYS_FAR), SYS_FAR);
+
+   ventry = read_sysreg_el1(SYS_VBAR);
+   ventry += get_except64_offset(spsr, PSR_MODE_EL1h, except_type_sync);
+   write_sysreg_el2(ventry, SYS_ELR);
+
+   spsr = get_except64_cpsr(spsr, system_supports_mte(),
+read_sysreg_el1(SYS_SCTLR), PSR_MODE_EL1h);
+   write_sysreg_el2(spsr, SYS_SPSR);
+}
+
 void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
 {
struct kvm_vcpu_fault_info fault;
@@ -560,7 +604,11 @@ void handle_host_mem_abort(struct kvm_cpu_context 
*host_ctxt)
 
addr = (fault.hpfar_el2 & HPFAR_MASK) << 8;
ret = host_stage2_idmap(addr);
-   BUG_ON(ret && ret != -EAGAIN);
+
+   if (ret == -EPERM)
+   host_inject_abort(host_ctxt);
+   else
+   BUG_ON(ret && ret != -EAGAIN);
 }
 
 struct pkvm_mem_transition {
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 77341b160aca..2b2c16a2535c 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -41,6 +41,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct fault_info {
int (*fn)(unsigned long far, unsigned int esr,
@@ -257,6 +258,15 @@ static inline bool is_el1_permission_fault(unsigned long 
addr, unsigned int esr,
return false;
 }
 
+static bool is_pkvm_stage2_abort(unsigned int esr)
+{
+   /*
+* S1PTW should only ever be set in ESR_EL1 if the pkvm hypervisor
+* injected a stage-2 abort -- see host_inject_abort().
+*/
+   return is_pkvm_initialized() && (esr & ESR_ELx_S1PTW);
+}
+
 static bool __kprobes is_spurious_el1_translation_fault(unsigned long addr,
unsigned int esr,
struct pt_regs *regs)
@@ -268,6 +278,9 @@ static bool __kprobes 
is_spurious_el1_translation_fault(unsigned long addr,
(esr & ESR_ELx_FSC_TYPE) != ESR_ELx_FSC_FAULT)
return false;
 
+   if (is_pkvm_stage2_abort(esr))
+   return false;
+
local_irq_save(flags);
asm volatile("at s1e1r, %0" :: "r" (addr));
isb();
@@ -383,6 +396,8 @@ static void __do_kernel_fault(unsigned long addr, unsigned 
int esr,
msg = "read from unreadable memory";
} else if (addr < PAGE_SIZE) {
msg = "NULL pointer dereference";
+   } else if (is_pkvm_stage2_abort(esr)) {
+   msg = "access to hypervisor-protected memory";
} else {
if

[PATCH 80/89] KVM: arm64: Refactor enter_exception64()

2022-05-19 Thread Will Deacon

From: Quentin Perret 

In order to simplify the injection of exceptions in the host in pkvm
context, let's factor out of enter_exception64() the code calculating
the exception offset from VBAR_EL1 and the cpsr.

Signed-off-by: Quentin Perret 
---
 arch/arm64/include/asm/kvm_emulate.h |  5 ++
 arch/arm64/kvm/hyp/exception.c   | 89 
 2 files changed, 57 insertions(+), 37 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index 2a79c861b8e0..8b6c391bbee8 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -41,6 +41,11 @@ void kvm_inject_vabt(struct kvm_vcpu *vcpu);
 void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
 void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
 
+unsigned long get_except64_offset(unsigned long psr, unsigned long target_mode,
+ enum exception_type type);
+unsigned long get_except64_cpsr(unsigned long old, bool has_mte,
+   unsigned long sctlr, unsigned long mode);
+
 void kvm_vcpu_wfi(struct kvm_vcpu *vcpu);
 
 static inline int kvm_vcpu_enable_ptrauth(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
index c5d009715402..14a80b0e2f91 100644
--- a/arch/arm64/kvm/hyp/exception.c
+++ b/arch/arm64/kvm/hyp/exception.c
@@ -60,31 +60,12 @@ static void __vcpu_write_spsr_und(struct kvm_vcpu *vcpu, 
u64 val)
vcpu->arch.ctxt.spsr_und = val;
 }
 
-/*
- * This performs the exception entry at a given EL (@target_mode), stashing PC
- * and PSTATE into ELR and SPSR respectively, and compute the new PC/PSTATE.
- * The EL passed to this function *must* be a non-secure, privileged mode with
- * bit 0 being set (PSTATE.SP == 1).
- *
- * When an exception is taken, most PSTATE fields are left unchanged in the
- * handler. However, some are explicitly overridden (e.g. M[4:0]). Luckily all
- * of the inherited bits have the same position in the AArch64/AArch32 SPSR_ELx
- * layouts, so we don't need to shuffle these for exceptions from AArch32 EL0.
- *
- * For the SPSR_ELx layout for AArch64, see ARM DDI 0487E.a page C5-429.
- * For the SPSR_ELx layout for AArch32, see ARM DDI 0487E.a page C5-426.
- *
- * Here we manipulate the fields in order of the AArch64 SPSR_ELx layout, from
- * MSB to LSB.
- */
-static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
- enum exception_type type)
+unsigned long get_except64_offset(unsigned long psr, unsigned long target_mode,
+ enum exception_type type)
 {
-   unsigned long sctlr, vbar, old, new, mode;
+   u64 mode = psr & (PSR_MODE_MASK | PSR_MODE32_BIT);
u64 exc_offset;
 
-   mode = *vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT);
-
if  (mode == target_mode)
exc_offset = CURRENT_EL_SP_ELx_VECTOR;
else if ((mode | PSR_MODE_THREAD_BIT) == target_mode)
@@ -94,28 +75,32 @@ static void enter_exception64(struct kvm_vcpu *vcpu, 
unsigned long target_mode,
else
exc_offset = LOWER_EL_AArch32_VECTOR;
 
-   switch (target_mode) {
-   case PSR_MODE_EL1h:
-   vbar = __vcpu_read_sys_reg(vcpu, VBAR_EL1);
-   sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL1);
-   __vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL1);
-   break;
-   default:
-   /* Don't do that */
-   BUG();
-   }
-
-   *vcpu_pc(vcpu) = vbar + exc_offset + type;
+   return exc_offset + type;
+}
 
-   old = *vcpu_cpsr(vcpu);
-   new = 0;
+/*
+ * When an exception is taken, most PSTATE fields are left unchanged in the
+ * handler. However, some are explicitly overridden (e.g. M[4:0]). Luckily all
+ * of the inherited bits have the same position in the AArch64/AArch32 SPSR_ELx
+ * layouts, so we don't need to shuffle these for exceptions from AArch32 EL0.
+ *
+ * For the SPSR_ELx layout for AArch64, see ARM DDI 0487E.a page C5-429.
+ * For the SPSR_ELx layout for AArch32, see ARM DDI 0487E.a page C5-426.
+ *
+ * Here we manipulate the fields in order of the AArch64 SPSR_ELx layout, from
+ * MSB to LSB.
+ */
+unsigned long get_except64_cpsr(unsigned long old, bool has_mte,
+   unsigned long sctlr, unsigned long target_mode)
+{
+   u64 new = 0;
 
new |= (old & PSR_N_BIT);
new |= (old & PSR_Z_BIT);
new |= (old & PSR_C_BIT);
new |= (old & PSR_V_BIT);
 
-   if (kvm_has_mte(vcpu->kvm))
+   if (has_mte)
new |= PSR_TCO_BIT;
 
new |= (old & PSR_DIT_BIT);
@@ -151,6 +136,36 @@ static void enter_exception64(struct kvm_vcpu *vcpu, 
unsigned long target_mode,
 
new |= target_mode;
 
+   return new;
+}
+
+/*
+ * This performs the exception entry at a given EL (@target_mode), stashing PC
+ *

[PATCH 79/89] KVM: arm64: Add is_pkvm_initialized() helper

2022-05-19 Thread Will Deacon

From: Quentin Perret 

Add a helper allowing to check when the pkvm static key is enabled to
ease the introduction of pkvm hooks in other parts of the code.

Signed-off-by: Quentin Perret 
---
 arch/arm64/include/asm/virt.h | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index 0e80db4327b6..3d5bfcdb49aa 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -74,6 +74,12 @@ void __hyp_reset_vectors(void);
 
 DECLARE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
 
+static inline bool is_pkvm_initialized(void)
+{
+   return IS_ENABLED(CONFIG_KVM) &&
+  static_branch_likely(_protected_mode_initialized);
+}
+
 /* Reports the availability of HYP mode */
 static inline bool is_hyp_mode_available(void)
 {
@@ -81,8 +87,7 @@ static inline bool is_hyp_mode_available(void)
 * If KVM protected mode is initialized, all CPUs must have been booted
 * in EL2. Avoid checking __boot_cpu_mode as CPUs now come up in EL1.
 */
-   if (IS_ENABLED(CONFIG_KVM) &&
-   static_branch_likely(_protected_mode_initialized))
+   if (is_pkvm_initialized())
return true;
 
return (__boot_cpu_mode[0] == BOOT_CPU_MODE_EL2 &&
@@ -96,8 +101,7 @@ static inline bool is_hyp_mode_mismatched(void)
 * If KVM protected mode is initialized, all CPUs must have been booted
 * in EL2. Avoid checking __boot_cpu_mode as CPUs now come up in EL1.
 */
-   if (IS_ENABLED(CONFIG_KVM) &&
-   static_branch_likely(_protected_mode_initialized))
+   if (is_pkvm_initialized())
return false;
 
return __boot_cpu_mode[0] != __boot_cpu_mode[1];
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 78/89] KVM: arm64: Don't expose TLBI hypercalls after de-privilege

2022-05-19 Thread Will Deacon

Now that TLBI invalidation is handled entirely at EL2 for both protected
and non-protected guests when protected KVM has initialised, unplug the
unused TLBI hypercalls.

Signed-off-by: Will Deacon 
---
 arch/arm64/include/asm/kvm_asm.h   | 8 
 arch/arm64/kvm/hyp/nvhe/hyp-main.c | 8 
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 7af0b7695a2c..d020c4cce888 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -59,6 +59,10 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___kvm_enable_ssbs,
__KVM_HOST_SMCCC_FUNC___vgic_v3_init_lrs,
__KVM_HOST_SMCCC_FUNC___vgic_v3_get_gic_config,
+   __KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
+   __KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid_ipa,
+   __KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid,
+   __KVM_HOST_SMCCC_FUNC___kvm_flush_cpu_context,
__KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize,
 
/* Hypercalls available after pKVM finalisation */
@@ -68,10 +72,6 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___pkvm_host_map_guest,
__KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
__KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
-   __KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
-   __KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid_ipa,
-   __KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid,
-   __KVM_HOST_SMCCC_FUNC___kvm_flush_cpu_context,
__KVM_HOST_SMCCC_FUNC___kvm_timer_set_cntvoff,
__KVM_HOST_SMCCC_FUNC___vgic_v3_save_vmcr_aprs,
__KVM_HOST_SMCCC_FUNC___vgic_v3_restore_vmcr_aprs,
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c 
b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index c4778c7d8c4b..694e0071b13e 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -1030,6 +1030,10 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__kvm_enable_ssbs),
HANDLE_FUNC(__vgic_v3_init_lrs),
HANDLE_FUNC(__vgic_v3_get_gic_config),
+   HANDLE_FUNC(__kvm_flush_vm_context),
+   HANDLE_FUNC(__kvm_tlb_flush_vmid_ipa),
+   HANDLE_FUNC(__kvm_tlb_flush_vmid),
+   HANDLE_FUNC(__kvm_flush_cpu_context),
HANDLE_FUNC(__pkvm_prot_finalize),
 
HANDLE_FUNC(__pkvm_host_share_hyp),
@@ -1038,10 +1042,6 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__pkvm_host_map_guest),
HANDLE_FUNC(__kvm_adjust_pc),
HANDLE_FUNC(__kvm_vcpu_run),
-   HANDLE_FUNC(__kvm_flush_vm_context),
-   HANDLE_FUNC(__kvm_tlb_flush_vmid_ipa),
-   HANDLE_FUNC(__kvm_tlb_flush_vmid),
-   HANDLE_FUNC(__kvm_flush_cpu_context),
HANDLE_FUNC(__kvm_timer_set_cntvoff),
HANDLE_FUNC(__vgic_v3_save_vmcr_aprs),
HANDLE_FUNC(__vgic_v3_restore_vmcr_aprs),
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 77/89] KVM: arm64: Handle PSCI for protected VMs in EL2

2022-05-19 Thread Will Deacon

From: Fuad Tabba 

Add PSCI 1.1 support for protected VMs at EL2.

Signed-off-by: Fuad Tabba 
---
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h |  13 +
 arch/arm64/kvm/hyp/nvhe/hyp-main.c |  69 +-
 arch/arm64/kvm/hyp/nvhe/pkvm.c | 320 -
 3 files changed, 399 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h 
b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index 33d34cc639ea..c1987115b217 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -28,6 +28,15 @@ struct kvm_shadow_vcpu_state {
/* Tracks exit code for the protected guest. */
u32 exit_code;
 
+   /*
+* Track the power state transition of a protected vcpu.
+* Can be in one of three states:
+* PSCI_0_2_AFFINITY_LEVEL_ON
+* PSCI_0_2_AFFINITY_LEVEL_OFF
+* PSCI_0_2_AFFINITY_LEVEL_PENDING
+*/
+   int power_state;
+
/*
 * Points to the per-cpu pointer of the cpu where it's loaded, or NULL
 * if not loaded.
@@ -101,6 +110,10 @@ bool kvm_handle_pvm_restricted(struct kvm_vcpu *vcpu, u64 
*exit_code);
 void kvm_reset_pvm_sys_regs(struct kvm_vcpu *vcpu);
 int kvm_check_pvm_sysreg_table(void);
 
+void pkvm_reset_vcpu(struct kvm_shadow_vcpu_state *shadow_state);
+
 bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code);
 
+struct kvm_shadow_vcpu_state *pkvm_mpidr_to_vcpu_state(struct kvm_shadow_vm 
*vm, unsigned long mpidr);
+
 #endif /* __ARM64_KVM_NVHE_PKVM_H__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c 
b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 26c8709f5494..c4778c7d8c4b 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -21,6 +21,7 @@
 #include 
 
 #include 
+#include 
 
 #include "../../sys_regs.h"
 
@@ -46,8 +47,37 @@ static void handle_pvm_entry_wfx(struct kvm_vcpu *host_vcpu, 
struct kvm_vcpu *sh
 
 static void handle_pvm_entry_hvc64(struct kvm_vcpu *host_vcpu, struct kvm_vcpu 
*shadow_vcpu)
 {
+   u32 psci_fn = smccc_get_function(shadow_vcpu);
u64 ret = READ_ONCE(host_vcpu->arch.ctxt.regs.regs[0]);
 
+   switch (psci_fn) {
+   case PSCI_0_2_FN_CPU_ON:
+   case PSCI_0_2_FN64_CPU_ON:
+   /*
+* Check whether the cpu_on request to the host was successful.
+* If not, reset the vcpu state from ON_PENDING to OFF.
+* This could happen if this vcpu attempted to turn on the other
+* vcpu while the other one is in the process of turning itself
+* off.
+*/
+   if (ret != PSCI_RET_SUCCESS) {
+   unsigned long cpu_id = smccc_get_arg1(shadow_vcpu);
+   struct kvm_shadow_vm *shadow_vm;
+   struct kvm_shadow_vcpu_state *target_vcpu_state;
+
+   shadow_vm = get_shadow_vm(shadow_vcpu);
+   target_vcpu_state = pkvm_mpidr_to_vcpu_state(shadow_vm, 
cpu_id);
+
+   if (target_vcpu_state && 
READ_ONCE(target_vcpu_state->power_state) == PSCI_0_2_AFFINITY_LEVEL_ON_PENDING)
+   WRITE_ONCE(target_vcpu_state->power_state, 
PSCI_0_2_AFFINITY_LEVEL_OFF);
+
+   ret = PSCI_RET_INTERNAL_FAILURE;
+   }
+   break;
+   default:
+   break;
+   }
+
vcpu_set_reg(shadow_vcpu, 0, ret);
 }
 
@@ -206,13 +236,45 @@ static void handle_pvm_exit_sys64(struct kvm_vcpu 
*host_vcpu, struct kvm_vcpu *s
 
 static void handle_pvm_exit_hvc64(struct kvm_vcpu *host_vcpu, struct kvm_vcpu 
*shadow_vcpu)
 {
-   int i;
+   int n, i;
+
+   switch (smccc_get_function(shadow_vcpu)) {
+   /*
+* CPU_ON takes 3 arguments, however, to wake up the target vcpu the
+* host only needs to know the target's cpu_id, which is passed as the
+* first argument. The processing of the reset state is done at hyp.
+*/
+   case PSCI_0_2_FN_CPU_ON:
+   case PSCI_0_2_FN64_CPU_ON:
+   n = 2;
+   break;
+
+   case PSCI_0_2_FN_CPU_OFF:
+   case PSCI_0_2_FN_SYSTEM_OFF:
+   case PSCI_0_2_FN_SYSTEM_RESET:
+   case PSCI_0_2_FN_CPU_SUSPEND:
+   case PSCI_0_2_FN64_CPU_SUSPEND:
+   n = 1;
+   break;
+
+   case PSCI_1_1_FN_SYSTEM_RESET2:
+   case PSCI_1_1_FN64_SYSTEM_RESET2:
+   n = 3;
+   break;
+
+   /*
+* The rest are either blocked or handled by HYP, so we should
+* really never be here.
+*/
+   default:
+   BUG();
+   }
 
WRITE_ONCE(host_vcpu->arch.fault.esr_el2,
   shadow_vcpu->arch.fault.esr_el2);
 
/* Pass the hvc function id (r0) as well as any potential arguments. */
-   for (i = 0; i < 8; i++)
+   for (i = 0; i < n; i++)
WRITE_ONCE(host_vcpu->arch.ctxt.regs.regs[i],

[PATCH 76/89] KVM: arm64: Factor out vcpu_reset code for core registers and PSCI

2022-05-19 Thread Will Deacon

From: Fuad Tabba 

Factor out logic that resets a vcpu's core registers, including
additional PSCI handling. This code will be reused when resetting
VMs in protected mode.

Signed-off-by: Fuad Tabba 
---
 arch/arm64/include/asm/kvm_emulate.h | 41 +
 arch/arm64/kvm/reset.c   | 45 +---
 2 files changed, 48 insertions(+), 38 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index 82515b015eb4..2a79c861b8e0 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -522,4 +522,45 @@ static inline unsigned long psci_affinity_mask(unsigned 
long affinity_level)
return 0;
 }
 
+/* Reset a vcpu's core registers. */
+static inline void kvm_reset_vcpu_core(struct kvm_vcpu *vcpu)
+{
+   u32 pstate;
+
+   if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
+   pstate = VCPU_RESET_PSTATE_SVC;
+   } else {
+   pstate = VCPU_RESET_PSTATE_EL1;
+   }
+
+   /* Reset core registers */
+   memset(vcpu_gp_regs(vcpu), 0, sizeof(*vcpu_gp_regs(vcpu)));
+   memset(>arch.ctxt.fp_regs, 0, sizeof(vcpu->arch.ctxt.fp_regs));
+   vcpu->arch.ctxt.spsr_abt = 0;
+   vcpu->arch.ctxt.spsr_und = 0;
+   vcpu->arch.ctxt.spsr_irq = 0;
+   vcpu->arch.ctxt.spsr_fiq = 0;
+   vcpu_gp_regs(vcpu)->pstate = pstate;
+}
+
+/* PSCI reset handling for a vcpu. */
+static inline void kvm_reset_vcpu_psci(struct kvm_vcpu *vcpu,
+  struct vcpu_reset_state *reset_state)
+{
+   unsigned long target_pc = reset_state->pc;
+
+   /* Gracefully handle Thumb2 entry point */
+   if (vcpu_mode_is_32bit(vcpu) && (target_pc & 1)) {
+   target_pc &= ~1UL;
+   vcpu_set_thumb(vcpu);
+   }
+
+   /* Propagate caller endianness */
+   if (reset_state->be)
+   kvm_vcpu_set_be(vcpu);
+
+   *vcpu_pc(vcpu) = target_pc;
+   vcpu_set_reg(vcpu, 0, reset_state->r0);
+}
+
 #endif /* __ARM64_KVM_EMULATE_H__ */
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 6bc979aece3c..4d223fae996d 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -109,7 +109,7 @@ static int kvm_vcpu_finalize_sve(struct kvm_vcpu *vcpu)
kfree(buf);
return ret;
}
-   
+
vcpu->arch.sve_state = buf;
vcpu->arch.flags |= KVM_ARM64_VCPU_SVE_FINALIZED;
return 0;
@@ -202,7 +202,6 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
struct vcpu_reset_state reset_state;
int ret;
bool loaded;
-   u32 pstate;
 
mutex_lock(>kvm->lock);
reset_state = vcpu->arch.reset_state;
@@ -240,29 +239,13 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
goto out;
}
 
-   switch (vcpu->arch.target) {
-   default:
-   if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
-   pstate = VCPU_RESET_PSTATE_SVC;
-   } else {
-   pstate = VCPU_RESET_PSTATE_EL1;
-   }
-
-   if (kvm_vcpu_has_pmu(vcpu) && !kvm_arm_support_pmu_v3()) {
-   ret = -EINVAL;
-   goto out;
-   }
-   break;
+   if (kvm_vcpu_has_pmu(vcpu) && !kvm_arm_support_pmu_v3()) {
+   ret = -EINVAL;
+   goto out;
}
 
/* Reset core registers */
-   memset(vcpu_gp_regs(vcpu), 0, sizeof(*vcpu_gp_regs(vcpu)));
-   memset(>arch.ctxt.fp_regs, 0, sizeof(vcpu->arch.ctxt.fp_regs));
-   vcpu->arch.ctxt.spsr_abt = 0;
-   vcpu->arch.ctxt.spsr_und = 0;
-   vcpu->arch.ctxt.spsr_irq = 0;
-   vcpu->arch.ctxt.spsr_fiq = 0;
-   vcpu_gp_regs(vcpu)->pstate = pstate;
+   kvm_reset_vcpu_core(vcpu);
 
/* Reset system registers */
kvm_reset_sys_regs(vcpu);
@@ -271,22 +254,8 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 * Additional reset state handling that PSCI may have imposed on us.
 * Must be done after all the sys_reg reset.
 */
-   if (reset_state.reset) {
-   unsigned long target_pc = reset_state.pc;
-
-   /* Gracefully handle Thumb2 entry point */
-   if (vcpu_mode_is_32bit(vcpu) && (target_pc & 1)) {
-   target_pc &= ~1UL;
-   vcpu_set_thumb(vcpu);
-   }
-
-   /* Propagate caller endianness */
-   if (reset_state.be)
-   kvm_vcpu_set_be(vcpu);
-
-   *vcpu_pc(vcpu) = target_pc;
-   vcpu_set_reg(vcpu, 0, reset_state.r0);
-   }
+   if (reset_state.reset)
+   kvm_reset_vcpu_psci(vcpu, _state);
 
/* Reset timer */
ret = kvm_timer_vcpu_reset(vcpu);
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list

[PATCH 75/89] KVM: arm64: Move some kvm_psci functions to a shared header

2022-05-19 Thread Will Deacon

From: Fuad Tabba 

Move some PSCI functions and macros to a shared header to be used
by hyp in protected mode.

No functional change intended.

Signed-off-by: Fuad Tabba 
---
 arch/arm64/include/asm/kvm_emulate.h | 30 
 arch/arm64/kvm/psci.c| 28 --
 2 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index bb56aff4de95..82515b015eb4 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -492,4 +492,34 @@ static inline bool vcpu_has_feature(struct kvm_vcpu *vcpu, 
int feature)
return test_bit(feature, vcpu->arch.features);
 }
 
+/* Narrow the PSCI register arguments (r1 to r3) to 32 bits. */
+static inline void kvm_psci_narrow_to_32bit(struct kvm_vcpu *vcpu)
+{
+   int i;
+
+   /*
+* Zero the input registers' upper 32 bits. They will be fully
+* zeroed on exit, so we're fine changing them in place.
+*/
+   for (i = 1; i < 4; i++)
+   vcpu_set_reg(vcpu, i, lower_32_bits(vcpu_get_reg(vcpu, i)));
+}
+
+static inline bool kvm_psci_valid_affinity(struct kvm_vcpu *vcpu,
+  unsigned long affinity)
+{
+   return !(affinity & ~MPIDR_HWID_BITMASK);
+}
+
+
+#define AFFINITY_MASK(level)   ~((0x1UL << ((level) * MPIDR_LEVEL_BITS)) - 1)
+
+static inline unsigned long psci_affinity_mask(unsigned long affinity_level)
+{
+   if (affinity_level <= 3)
+   return MPIDR_HWID_BITMASK & AFFINITY_MASK(affinity_level);
+
+   return 0;
+}
+
 #endif /* __ARM64_KVM_EMULATE_H__ */
diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index 372da09a2fab..e7baacd696ad 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -21,16 +21,6 @@
  * as described in ARM document number ARM DEN 0022A.
  */
 
-#define AFFINITY_MASK(level)   ~((0x1UL << ((level) * MPIDR_LEVEL_BITS)) - 1)
-
-static unsigned long psci_affinity_mask(unsigned long affinity_level)
-{
-   if (affinity_level <= 3)
-   return MPIDR_HWID_BITMASK & AFFINITY_MASK(affinity_level);
-
-   return 0;
-}
-
 static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu *vcpu)
 {
/*
@@ -58,12 +48,6 @@ static void kvm_psci_vcpu_off(struct kvm_vcpu *vcpu)
kvm_vcpu_kick(vcpu);
 }
 
-static inline bool kvm_psci_valid_affinity(struct kvm_vcpu *vcpu,
-  unsigned long affinity)
-{
-   return !(affinity & ~MPIDR_HWID_BITMASK);
-}
-
 static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
 {
struct vcpu_reset_state *reset_state;
@@ -201,18 +185,6 @@ static void kvm_psci_system_reset2(struct kvm_vcpu *vcpu)
 KVM_SYSTEM_EVENT_RESET_FLAG_PSCI_RESET2);
 }
 
-static void kvm_psci_narrow_to_32bit(struct kvm_vcpu *vcpu)
-{
-   int i;
-
-   /*
-* Zero the input registers' upper 32 bits. They will be fully
-* zeroed on exit, so we're fine changing them in place.
-*/
-   for (i = 1; i < 4; i++)
-   vcpu_set_reg(vcpu, i, lower_32_bits(vcpu_get_reg(vcpu, i)));
-}
-
 static unsigned long kvm_psci_check_allowed_function(struct kvm_vcpu *vcpu, 
u32 fn)
 {
switch(fn) {
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 74/89] KVM: arm64: Move pstate reset values to kvm_arm.h

2022-05-19 Thread Will Deacon

From: Fuad Tabba 

Move the macro defines of the pstate reset values to a shared
header to be used by hyp in protected mode.

No functional change intended.

Signed-off-by: Fuad Tabba 
---
 arch/arm64/include/asm/kvm_arm.h | 9 +
 arch/arm64/kvm/reset.c   | 9 -
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 98b60fa86853..056cda220bff 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -359,4 +359,13 @@
 #define CPACR_EL1_DEFAULT  (CPACR_EL1_FPEN_EL0EN | CPACR_EL1_FPEN_EL1EN |\
 CPACR_EL1_ZEN_EL1EN)
 
+/*
+ * ARMv8 Reset Values
+ */
+#define VCPU_RESET_PSTATE_EL1  (PSR_MODE_EL1h | PSR_A_BIT | PSR_I_BIT | \
+PSR_F_BIT | PSR_D_BIT)
+
+#define VCPU_RESET_PSTATE_SVC  (PSR_AA32_MODE_SVC | PSR_AA32_A_BIT | \
+PSR_AA32_I_BIT | PSR_AA32_F_BIT)
+
 #endif /* __ARM64_KVM_ARM_H__ */
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index cc25f540962b..6bc979aece3c 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -32,15 +32,6 @@
 /* Maximum phys_shift supported for any VM on this host */
 static u32 kvm_ipa_limit;
 
-/*
- * ARMv8 Reset Values
- */
-#define VCPU_RESET_PSTATE_EL1  (PSR_MODE_EL1h | PSR_A_BIT | PSR_I_BIT | \
-PSR_F_BIT | PSR_D_BIT)
-
-#define VCPU_RESET_PSTATE_SVC  (PSR_AA32_MODE_SVC | PSR_AA32_A_BIT | \
-PSR_AA32_I_BIT | PSR_AA32_F_BIT)
-
 unsigned int kvm_sve_max_vl;
 
 int kvm_arm_init_sve(void)
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 73/89] KVM: arm64: Add HVC handling for protected guests at EL2

2022-05-19 Thread Will Deacon

From: Fuad Tabba 

Rather than forwarding guest hypercalls back to the host for handling,
implement some basic handling at EL2 which will later be extending to
provide additional functionality such as PSCI.

Signed-off-by: Fuad Tabba 
---
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h |  2 ++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c | 24 
 arch/arm64/kvm/hyp/nvhe/pkvm.c | 22 ++
 arch/arm64/kvm/hyp/nvhe/switch.c   |  1 +
 4 files changed, 49 insertions(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h 
b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index e772f9835a86..33d34cc639ea 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -101,4 +101,6 @@ bool kvm_handle_pvm_restricted(struct kvm_vcpu *vcpu, u64 
*exit_code);
 void kvm_reset_pvm_sys_regs(struct kvm_vcpu *vcpu);
 int kvm_check_pvm_sysreg_table(void);
 
+bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code);
+
 #endif /* __ARM64_KVM_NVHE_PKVM_H__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c 
b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 1e39dc7eab4d..26c8709f5494 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -4,6 +4,8 @@
  * Author: Andrew Scull 
  */
 
+#include 
+
 #include 
 
 #include 
@@ -42,6 +44,13 @@ static void handle_pvm_entry_wfx(struct kvm_vcpu *host_vcpu, 
struct kvm_vcpu *sh
   KVM_ARM64_INCREMENT_PC;
 }
 
+static void handle_pvm_entry_hvc64(struct kvm_vcpu *host_vcpu, struct kvm_vcpu 
*shadow_vcpu)
+{
+   u64 ret = READ_ONCE(host_vcpu->arch.ctxt.regs.regs[0]);
+
+   vcpu_set_reg(shadow_vcpu, 0, ret);
+}
+
 static void handle_pvm_entry_sys64(struct kvm_vcpu *host_vcpu, struct kvm_vcpu 
*shadow_vcpu)
 {
unsigned long host_flags;
@@ -195,6 +204,19 @@ static void handle_pvm_exit_sys64(struct kvm_vcpu 
*host_vcpu, struct kvm_vcpu *s
}
 }
 
+static void handle_pvm_exit_hvc64(struct kvm_vcpu *host_vcpu, struct kvm_vcpu 
*shadow_vcpu)
+{
+   int i;
+
+   WRITE_ONCE(host_vcpu->arch.fault.esr_el2,
+  shadow_vcpu->arch.fault.esr_el2);
+
+   /* Pass the hvc function id (r0) as well as any potential arguments. */
+   for (i = 0; i < 8; i++)
+   WRITE_ONCE(host_vcpu->arch.ctxt.regs.regs[i],
+  vcpu_get_reg(shadow_vcpu, i));
+}
+
 static void handle_pvm_exit_iabt(struct kvm_vcpu *host_vcpu, struct kvm_vcpu 
*shadow_vcpu)
 {
WRITE_ONCE(host_vcpu->arch.fault.esr_el2,
@@ -273,6 +295,7 @@ static void handle_vm_exit_abt(struct kvm_vcpu *host_vcpu, 
struct kvm_vcpu *shad
 static const shadow_entry_exit_handler_fn entry_pvm_shadow_handlers[] = {
[0 ... ESR_ELx_EC_MAX]  = NULL,
[ESR_ELx_EC_WFx]= handle_pvm_entry_wfx,
+   [ESR_ELx_EC_HVC64]  = handle_pvm_entry_hvc64,
[ESR_ELx_EC_SYS64]  = handle_pvm_entry_sys64,
[ESR_ELx_EC_IABT_LOW]   = handle_pvm_entry_iabt,
[ESR_ELx_EC_DABT_LOW]   = handle_pvm_entry_dabt,
@@ -281,6 +304,7 @@ static const shadow_entry_exit_handler_fn 
entry_pvm_shadow_handlers[] = {
 static const shadow_entry_exit_handler_fn exit_pvm_shadow_handlers[] = {
[0 ... ESR_ELx_EC_MAX]  = NULL,
[ESR_ELx_EC_WFx]= handle_pvm_exit_wfx,
+   [ESR_ELx_EC_HVC64]  = handle_pvm_exit_hvc64,
[ESR_ELx_EC_SYS64]  = handle_pvm_exit_sys64,
[ESR_ELx_EC_IABT_LOW]   = handle_pvm_exit_iabt,
[ESR_ELx_EC_DABT_LOW]   = handle_pvm_exit_dabt,
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 9feeb0b5433a..92e60ebeced5 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -7,6 +7,8 @@
 #include 
 #include 
 
+#include 
+
 #include 
 
 #include 
@@ -797,3 +799,23 @@ int __pkvm_teardown_shadow(unsigned int shadow_handle)
hyp_spin_unlock(_lock);
return err;
 }
+
+/*
+ * Handler for protected VM HVC calls.
+ *
+ * Returns true if the hypervisor has handled the exit, and control should go
+ * back to the guest, or false if it hasn't.
+ */
+bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code)
+{
+   u32 fn = smccc_get_function(vcpu);
+
+   switch (fn) {
+   case ARM_SMCCC_VERSION_FUNC_ID:
+   /* Nothing to be handled by the host. Go back to the guest. */
+   smccc_set_retval(vcpu, ARM_SMCCC_VERSION_1_1, 0, 0, 0);
+   return true;
+   default:
+   return false;
+   }
+}
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 6bb979ee51cc..87338775288c 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -205,6 +205,7 @@ static const exit_handler_fn hyp_exit_handlers[] = {
 
 static const exit_handler_fn pvm_exit_handlers[] = {
[0 ... ESR_ELx_EC_MAX]

[PATCH 72/89] KVM: arm64: Track the SVE state in the shadow vcpu

2022-05-19 Thread Will Deacon

From: Marc Zyngier 

When dealing with a guest with SVE enabled, make sure the host SVE
state is pinned at EL2 S1, and that the shadow state is correctly
initialised (and then unpinned on teardown).

Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/hyp/nvhe/hyp-main.c |  9 
 arch/arm64/kvm/hyp/nvhe/pkvm.c | 33 ++
 2 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c 
b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 5d6cee7436f4..1e39dc7eab4d 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -416,8 +416,7 @@ static void flush_shadow_state(struct kvm_shadow_vcpu_state 
*shadow_state)
if (host_flags & KVM_ARM64_PKVM_STATE_DIRTY)
__flush_vcpu_state(shadow_state);
 
-   shadow_vcpu->arch.sve_state = 
kern_hyp_va(host_vcpu->arch.sve_state);
-   shadow_vcpu->arch.sve_max_vl = host_vcpu->arch.sve_max_vl;
+   shadow_vcpu->arch.flags = host_flags;
 
shadow_vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS & ~(HCR_RW | 
HCR_TWI | HCR_TWE);
shadow_vcpu->arch.hcr_el2 |= READ_ONCE(host_vcpu->arch.hcr_el2);
@@ -488,8 +487,10 @@ static void sync_shadow_state(struct kvm_shadow_vcpu_state 
*shadow_state,
BUG();
}
 
-   host_flags = READ_ONCE(host_vcpu->arch.flags) &
-   ~(KVM_ARM64_PENDING_EXCEPTION | KVM_ARM64_INCREMENT_PC);
+   host_flags = shadow_vcpu->arch.flags;
+   if (shadow_state_is_protected(shadow_state))
+   host_flags &= ~(KVM_ARM64_PENDING_EXCEPTION | 
KVM_ARM64_INCREMENT_PC);
+
WRITE_ONCE(host_vcpu->arch.flags, host_flags);
shadow_state->exit_code = exit_reason;
 }
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 51da5c1d7e0d..9feeb0b5433a 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -372,7 +372,19 @@ static void unpin_host_vcpus(struct kvm_shadow_vcpu_state 
*shadow_vcpu_states,
 
for (i = 0; i < nr_vcpus; i++) {
struct kvm_vcpu *host_vcpu = shadow_vcpu_states[i].host_vcpu;
+   struct kvm_vcpu *shadow_vcpu = 
_vcpu_states[i].shadow_vcpu;
+   size_t sve_state_size;
+   void *sve_state;
+
hyp_unpin_shared_mem(host_vcpu, host_vcpu + 1);
+
+   if (!test_bit(KVM_ARM_VCPU_SVE, shadow_vcpu->arch.features))
+   continue;
+
+   sve_state = shadow_vcpu->arch.sve_state;
+   sve_state = kern_hyp_va(sve_state);
+   sve_state_size = vcpu_sve_state_size(shadow_vcpu);
+   hyp_unpin_shared_mem(sve_state, sve_state + sve_state_size);
}
 }
 
@@ -448,6 +460,27 @@ static int init_shadow_structs(struct kvm *kvm, struct 
kvm_shadow_vm *vm,
if (ret)
return ret;
 
+   if (test_bit(KVM_ARM_VCPU_SVE, shadow_vcpu->arch.features)) {
+   size_t sve_state_size;
+   void *sve_state;
+
+   shadow_vcpu->arch.sve_state = 
READ_ONCE(host_vcpu->arch.sve_state);
+   shadow_vcpu->arch.sve_max_vl = 
READ_ONCE(host_vcpu->arch.sve_max_vl);
+
+   sve_state = kern_hyp_va(shadow_vcpu->arch.sve_state);
+   sve_state_size = vcpu_sve_state_size(shadow_vcpu);
+
+   if (!shadow_vcpu->arch.sve_state || !sve_state_size ||
+   hyp_pin_shared_mem(sve_state,
+  sve_state + sve_state_size)) {
+   clear_bit(KVM_ARM_VCPU_SVE,
+ shadow_vcpu->arch.features);
+   shadow_vcpu->arch.sve_state = NULL;
+   shadow_vcpu->arch.sve_max_vl = 0;
+   return -EINVAL;
+   }
+   }
+
pkvm_vcpu_init_traps(shadow_vcpu, host_vcpu);
kvm_reset_pvm_sys_regs(shadow_vcpu);
}
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 71/89] KVM: arm64: Initialize shadow vm state at hyp

2022-05-19 Thread Will Deacon

From: Fuad Tabba 

Do not rely on the state of the vm as provided by the host, but
initialize it instead at EL2 to a known good and safe state.

Signed-off-by: Fuad Tabba 
---
 arch/arm64/kvm/hyp/nvhe/pkvm.c | 71 ++
 1 file changed, 71 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 839506a546c7..51da5c1d7e0d 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -6,6 +6,9 @@
 
 #include 
 #include 
+
+#include 
+
 #include 
 #include 
 #include 
@@ -315,6 +318,53 @@ struct kvm_shadow_vcpu_state 
*pkvm_loaded_shadow_vcpu_state(void)
return __this_cpu_read(loaded_shadow_state);
 }
 
+/* Check and copy the supported features for the vcpu from the host. */
+static void copy_features(struct kvm_vcpu *shadow_vcpu, struct kvm_vcpu 
*host_vcpu)
+{
+   DECLARE_BITMAP(allowed_features, KVM_VCPU_MAX_FEATURES);
+
+   /* No restrictions for non-protected VMs. */
+   if (!kvm_vm_is_protected(shadow_vcpu->kvm)) {
+   bitmap_copy(shadow_vcpu->arch.features,
+   host_vcpu->arch.features,
+   KVM_VCPU_MAX_FEATURES);
+   return;
+   }
+
+   bitmap_zero(allowed_features, KVM_VCPU_MAX_FEATURES);
+
+   /*
+* For protected vms, always allow:
+* - CPU starting in poweroff state
+* - PSCI v0.2
+*/
+   set_bit(KVM_ARM_VCPU_POWER_OFF, allowed_features);
+   set_bit(KVM_ARM_VCPU_PSCI_0_2, allowed_features);
+
+   /*
+* Check if remaining features are allowed:
+* - Performance Monitoring
+* - Scalable Vectors
+* - Pointer Authentication
+*/
+   if (FIELD_GET(ARM64_FEATURE_MASK(ID_AA64DFR0_PMUVER), 
PVM_ID_AA64DFR0_ALLOW))
+   set_bit(KVM_ARM_VCPU_PMU_V3, allowed_features);
+
+   if (FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR0_SVE), 
PVM_ID_AA64PFR0_ALLOW))
+   set_bit(KVM_ARM_VCPU_SVE, allowed_features);
+
+   if (FIELD_GET(ARM64_FEATURE_MASK(ID_AA64ISAR1_API), 
PVM_ID_AA64ISAR1_ALLOW) &&
+   FIELD_GET(ARM64_FEATURE_MASK(ID_AA64ISAR1_APA), 
PVM_ID_AA64ISAR1_ALLOW))
+   set_bit(KVM_ARM_VCPU_PTRAUTH_ADDRESS, allowed_features);
+
+   if (FIELD_GET(ARM64_FEATURE_MASK(ID_AA64ISAR1_GPI), 
PVM_ID_AA64ISAR1_ALLOW) &&
+   FIELD_GET(ARM64_FEATURE_MASK(ID_AA64ISAR1_GPA), 
PVM_ID_AA64ISAR1_ALLOW))
+   set_bit(KVM_ARM_VCPU_PTRAUTH_GENERIC, allowed_features);
+
+   bitmap_and(shadow_vcpu->arch.features, host_vcpu->arch.features,
+  allowed_features, KVM_VCPU_MAX_FEATURES);
+}
+
 static void unpin_host_vcpus(struct kvm_shadow_vcpu_state *shadow_vcpu_states,
 unsigned int nr_vcpus)
 {
@@ -350,6 +400,17 @@ static int set_host_vcpus(struct kvm_shadow_vcpu_state 
*shadow_vcpu_states,
return 0;
 }
 
+static int init_ptrauth(struct kvm_vcpu *shadow_vcpu)
+{
+   int ret = 0;
+
+   if (test_bit(KVM_ARM_VCPU_PTRAUTH_ADDRESS, shadow_vcpu->arch.features) 
||
+   test_bit(KVM_ARM_VCPU_PTRAUTH_GENERIC, shadow_vcpu->arch.features))
+   ret = kvm_vcpu_enable_ptrauth(shadow_vcpu);
+
+   return ret;
+}
+
 static int init_shadow_structs(struct kvm *kvm, struct kvm_shadow_vm *vm,
   struct kvm_vcpu **vcpu_array,
   int *last_ran,
@@ -357,10 +418,12 @@ static int init_shadow_structs(struct kvm *kvm, struct 
kvm_shadow_vm *vm,
   unsigned int nr_vcpus)
 {
int i;
+   int ret;
 
vm->host_kvm = kvm;
vm->kvm.created_vcpus = nr_vcpus;
vm->kvm.arch.vtcr = host_kvm.arch.vtcr;
+   vm->kvm.arch.pkvm.enabled = READ_ONCE(kvm->arch.pkvm.enabled);
vm->kvm.arch.mmu.last_vcpu_ran = last_ran;
vm->last_ran_size = last_ran_size;
memset(vm->kvm.arch.mmu.last_vcpu_ran, -1, sizeof(int) * hyp_nr_cpus);
@@ -377,8 +440,16 @@ static int init_shadow_structs(struct kvm *kvm, struct 
kvm_shadow_vm *vm,
shadow_vcpu->vcpu_idx = i;
 
shadow_vcpu->arch.hw_mmu = >kvm.arch.mmu;
+   shadow_vcpu->arch.power_off = true;
+
+   copy_features(shadow_vcpu, host_vcpu);
+
+   ret = init_ptrauth(shadow_vcpu);
+   if (ret)
+   return ret;
 
pkvm_vcpu_init_traps(shadow_vcpu, host_vcpu);
+   kvm_reset_pvm_sys_regs(shadow_vcpu);
}
 
return 0;
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 70/89] KVM: arm64: Refactor kvm_vcpu_enable_ptrauth() for hyp use

2022-05-19 Thread Will Deacon

From: Fuad Tabba 

Move kvm_vcpu_enable_ptrauth() to a shared header to be used by
hyp in protected mode.

No functional change intended.

Signed-off-by: Fuad Tabba 
---
 arch/arm64/include/asm/kvm_emulate.h | 16 
 arch/arm64/kvm/reset.c   | 16 
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index d62405ce3e6d..bb56aff4de95 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -43,6 +43,22 @@ void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long 
addr);
 
 void kvm_vcpu_wfi(struct kvm_vcpu *vcpu);
 
+static inline int kvm_vcpu_enable_ptrauth(struct kvm_vcpu *vcpu)
+{
+   /*
+* For now make sure that both address/generic pointer authentication
+* features are requested by the userspace together and the system
+* supports these capabilities.
+*/
+   if (!test_bit(KVM_ARM_VCPU_PTRAUTH_ADDRESS, vcpu->arch.features) ||
+   !test_bit(KVM_ARM_VCPU_PTRAUTH_GENERIC, vcpu->arch.features) ||
+   !system_has_full_ptr_auth())
+   return -EINVAL;
+
+   vcpu->arch.flags |= KVM_ARM64_GUEST_HAS_PTRAUTH;
+   return 0;
+}
+
 static __always_inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
 {
return !(vcpu->arch.hcr_el2 & HCR_RW);
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index c07265ea72fd..cc25f540962b 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -165,22 +165,6 @@ static void kvm_vcpu_reset_sve(struct kvm_vcpu *vcpu)
memset(vcpu->arch.sve_state, 0, vcpu_sve_state_size(vcpu));
 }
 
-static int kvm_vcpu_enable_ptrauth(struct kvm_vcpu *vcpu)
-{
-   /*
-* For now make sure that both address/generic pointer authentication
-* features are requested by the userspace together and the system
-* supports these capabilities.
-*/
-   if (!test_bit(KVM_ARM_VCPU_PTRAUTH_ADDRESS, vcpu->arch.features) ||
-   !test_bit(KVM_ARM_VCPU_PTRAUTH_GENERIC, vcpu->arch.features) ||
-   !system_has_full_ptr_auth())
-   return -EINVAL;
-
-   vcpu->arch.flags |= KVM_ARM64_GUEST_HAS_PTRAUTH;
-   return 0;
-}
-
 static bool vcpu_allowed_register_width(struct kvm_vcpu *vcpu)
 {
struct kvm_vcpu *tmp;
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 69/89] KVM: arm64: Do not update virtual timer state for protected VMs

2022-05-19 Thread Will Deacon

From: Marc Zyngier 

Protected vCPUs always run with a virtual counter offset of 0, so don't
bother trying to update it from the host.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/arch_timer.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 6e542e2eae32..63d06f372eb1 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -88,7 +88,9 @@ static u64 timer_get_offset(struct arch_timer_context *ctxt)
 
switch(arch_timer_ctx_index(ctxt)) {
case TIMER_VTIMER:
-   return __vcpu_sys_reg(vcpu, CNTVOFF_EL2);
+   if (likely(!kvm_vm_is_protected(vcpu->kvm)))
+   return __vcpu_sys_reg(vcpu, CNTVOFF_EL2);
+   fallthrough;
default:
return 0;
}
@@ -753,6 +755,9 @@ static void update_vtimer_cntvoff(struct kvm_vcpu *vcpu, 
u64 cntvoff)
struct kvm *kvm = vcpu->kvm;
struct kvm_vcpu *tmp;
 
+   if (unlikely(kvm_vm_is_protected(vcpu->kvm)))
+   cntvoff = 0;
+
mutex_lock(>lock);
kvm_for_each_vcpu(i, tmp, kvm)
timer_set_offset(vcpu_vtimer(tmp), cntvoff);
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 68/89] KVM: arm64: Move vgic state between host and shadow vcpu structures

2022-05-19 Thread Will Deacon

From: Marc Zyngier 

Since the world switch vgic code operates on the shadow data
structure, move the state back and forth between the host and
shadow vcpu.

This is currently limited to the VMCR and APR registers, but further
patches will deal with the rest of the state.

Note that some of the scontrol settings (such as SRE) are always
set to the same value. This will eventually be moved to the shadow
initialisation.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/hyp/nvhe/hyp-main.c | 65 --
 1 file changed, 61 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c 
b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 692576497ed9..5d6cee7436f4 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -619,6 +619,17 @@ static struct kvm_vcpu *__get_current_vcpu(struct kvm_vcpu 
*vcpu,
__get_current_vcpu(__vcpu, statepp);\
})
 
+#define get_current_vcpu_from_cpu_if(ctxt, regnr, statepp) \
+   ({  \
+   DECLARE_REG(struct vgic_v3_cpu_if *, cif, ctxt, regnr); \
+   struct kvm_vcpu *__vcpu;\
+   __vcpu = container_of(cif,  \
+ struct kvm_vcpu,  \
+ arch.vgic_cpu.vgic_v3);   \
+   \
+   __get_current_vcpu(__vcpu, statepp);\
+   })
+
 static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
 {
struct kvm_shadow_vcpu_state *shadow_state;
@@ -778,16 +789,62 @@ static void handle___kvm_get_mdcr_el2(struct 
kvm_cpu_context *host_ctxt)
 
 static void handle___vgic_v3_save_vmcr_aprs(struct kvm_cpu_context *host_ctxt)
 {
-   DECLARE_REG(struct vgic_v3_cpu_if *, cpu_if, host_ctxt, 1);
+   struct kvm_shadow_vcpu_state *shadow_state;
+   struct kvm_vcpu *vcpu;
+
+   vcpu = get_current_vcpu_from_cpu_if(host_ctxt, 1, _state);
+   if (!vcpu)
+   return;
+
+   if (shadow_state) {
+   struct vgic_v3_cpu_if *shadow_cpu_if, *cpu_if;
+   int i;
+
+   shadow_cpu_if = 
_state->shadow_vcpu.arch.vgic_cpu.vgic_v3;
+   __vgic_v3_save_vmcr_aprs(shadow_cpu_if);
+
+   cpu_if = >arch.vgic_cpu.vgic_v3;
 
-   __vgic_v3_save_vmcr_aprs(kern_hyp_va(cpu_if));
+   cpu_if->vgic_vmcr = shadow_cpu_if->vgic_vmcr;
+   for (i = 0; i < ARRAY_SIZE(cpu_if->vgic_ap0r); i++) {
+   cpu_if->vgic_ap0r[i] = shadow_cpu_if->vgic_ap0r[i];
+   cpu_if->vgic_ap1r[i] = shadow_cpu_if->vgic_ap1r[i];
+   }
+   } else {
+   __vgic_v3_save_vmcr_aprs(>arch.vgic_cpu.vgic_v3);
+   }
 }
 
 static void handle___vgic_v3_restore_vmcr_aprs(struct kvm_cpu_context 
*host_ctxt)
 {
-   DECLARE_REG(struct vgic_v3_cpu_if *, cpu_if, host_ctxt, 1);
+   struct kvm_shadow_vcpu_state *shadow_state;
+   struct kvm_vcpu *vcpu;
 
-   __vgic_v3_restore_vmcr_aprs(kern_hyp_va(cpu_if));
+   vcpu = get_current_vcpu_from_cpu_if(host_ctxt, 1, _state);
+   if (!vcpu)
+   return;
+
+   if (shadow_state) {
+   struct vgic_v3_cpu_if *shadow_cpu_if, *cpu_if;
+   int i;
+
+   shadow_cpu_if = 
_state->shadow_vcpu.arch.vgic_cpu.vgic_v3;
+   cpu_if = >arch.vgic_cpu.vgic_v3;
+
+   shadow_cpu_if->vgic_vmcr = cpu_if->vgic_vmcr;
+   /* Should be a one-off */
+   shadow_cpu_if->vgic_sre = (ICC_SRE_EL1_DIB |
+  ICC_SRE_EL1_DFB |
+  ICC_SRE_EL1_SRE);
+   for (i = 0; i < ARRAY_SIZE(cpu_if->vgic_ap0r); i++) {
+   shadow_cpu_if->vgic_ap0r[i] = cpu_if->vgic_ap0r[i];
+   shadow_cpu_if->vgic_ap1r[i] = cpu_if->vgic_ap1r[i];
+   }
+
+   __vgic_v3_restore_vmcr_aprs(shadow_cpu_if);
+   } else {
+   __vgic_v3_restore_vmcr_aprs(>arch.vgic_cpu.vgic_v3);
+   }
 }
 
 static void handle___pkvm_init(struct kvm_cpu_context *host_ctxt)
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 67/89] KVM: arm64: Add EL2 entry/exit handlers for pKVM guests

2022-05-19 Thread Will Deacon

From: Fuad Tabba 

Introduce separate El2 entry/exit handlers for protected and
non-protected guests under pKVM and hook up the protected handlers to
expose the minimum amount of data to the host required for EL1 handling.

Signed-off-by: Fuad Tabba 
---
 arch/arm64/kvm/hyp/nvhe/hyp-main.c | 230 -
 1 file changed, 228 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c 
b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index e987f34641dd..692576497ed9 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -20,6 +20,8 @@
 
 #include 
 
+#include "../../sys_regs.h"
+
 /*
  * Host FPSIMD state. Written to when the guest accesses its own FPSIMD state,
  * and read when the guest state is live and we need to switch back to the 
host.
@@ -34,6 +36,207 @@ void __kvm_hyp_host_forward_smc(struct kvm_cpu_context 
*host_ctxt);
 
 typedef void (*shadow_entry_exit_handler_fn)(struct kvm_vcpu *, struct 
kvm_vcpu *);
 
+static void handle_pvm_entry_wfx(struct kvm_vcpu *host_vcpu, struct kvm_vcpu 
*shadow_vcpu)
+{
+   shadow_vcpu->arch.flags |= READ_ONCE(host_vcpu->arch.flags) &
+  KVM_ARM64_INCREMENT_PC;
+}
+
+static void handle_pvm_entry_sys64(struct kvm_vcpu *host_vcpu, struct kvm_vcpu 
*shadow_vcpu)
+{
+   unsigned long host_flags;
+
+   host_flags = READ_ONCE(host_vcpu->arch.flags);
+
+   /* Exceptions have priority on anything else */
+   if (host_flags & KVM_ARM64_PENDING_EXCEPTION) {
+   /* Exceptions caused by this should be undef exceptions. */
+   u32 esr = (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT);
+
+   __vcpu_sys_reg(shadow_vcpu, ESR_EL1) = esr;
+   shadow_vcpu->arch.flags &= ~(KVM_ARM64_PENDING_EXCEPTION |
+KVM_ARM64_EXCEPT_MASK);
+   shadow_vcpu->arch.flags |= (KVM_ARM64_PENDING_EXCEPTION |
+   KVM_ARM64_EXCEPT_AA64_ELx_SYNC |
+   KVM_ARM64_EXCEPT_AA64_EL1);
+
+   return;
+   }
+
+   if (host_flags & KVM_ARM64_INCREMENT_PC) {
+   shadow_vcpu->arch.flags &= ~(KVM_ARM64_PENDING_EXCEPTION |
+KVM_ARM64_EXCEPT_MASK);
+   shadow_vcpu->arch.flags |= KVM_ARM64_INCREMENT_PC;
+   }
+
+   if (!esr_sys64_to_params(shadow_vcpu->arch.fault.esr_el2).is_write) {
+   /* r0 as transfer register between the guest and the host. */
+   u64 rt_val = READ_ONCE(host_vcpu->arch.ctxt.regs.regs[0]);
+   int rt = kvm_vcpu_sys_get_rt(shadow_vcpu);
+
+   vcpu_set_reg(shadow_vcpu, rt, rt_val);
+   }
+}
+
+static void handle_pvm_entry_iabt(struct kvm_vcpu *host_vcpu, struct kvm_vcpu 
*shadow_vcpu)
+{
+   unsigned long cpsr = *vcpu_cpsr(shadow_vcpu);
+   unsigned long host_flags;
+   u32 esr = ESR_ELx_IL;
+
+   host_flags = READ_ONCE(host_vcpu->arch.flags);
+
+   if (!(host_flags & KVM_ARM64_PENDING_EXCEPTION))
+   return;
+
+   /*
+* If the host wants to inject an exception, get syndrom and
+* fault address.
+*/
+   if ((cpsr & PSR_MODE_MASK) == PSR_MODE_EL0t)
+   esr |= (ESR_ELx_EC_IABT_LOW << ESR_ELx_EC_SHIFT);
+   else
+   esr |= (ESR_ELx_EC_IABT_CUR << ESR_ELx_EC_SHIFT);
+
+   esr |= ESR_ELx_FSC_EXTABT;
+
+   __vcpu_sys_reg(shadow_vcpu, ESR_EL1) = esr;
+   __vcpu_sys_reg(shadow_vcpu, FAR_EL1) = kvm_vcpu_get_hfar(shadow_vcpu);
+
+   /* Tell the run loop that we want to inject something */
+   shadow_vcpu->arch.flags &= ~(KVM_ARM64_PENDING_EXCEPTION |
+KVM_ARM64_EXCEPT_MASK);
+   shadow_vcpu->arch.flags |= (KVM_ARM64_PENDING_EXCEPTION |
+   KVM_ARM64_EXCEPT_AA64_ELx_SYNC |
+   KVM_ARM64_EXCEPT_AA64_EL1);
+}
+
+static void handle_pvm_entry_dabt(struct kvm_vcpu *host_vcpu, struct kvm_vcpu 
*shadow_vcpu)
+{
+   unsigned long host_flags;
+   bool rd_update;
+
+   host_flags = READ_ONCE(host_vcpu->arch.flags);
+
+   /* Exceptions have priority over anything else */
+   if (host_flags & KVM_ARM64_PENDING_EXCEPTION) {
+   unsigned long cpsr = *vcpu_cpsr(shadow_vcpu);
+   u32 esr = ESR_ELx_IL;
+
+   if ((cpsr & PSR_MODE_MASK) == PSR_MODE_EL0t)
+   esr |= (ESR_ELx_EC_DABT_LOW << ESR_ELx_EC_SHIFT);
+   else
+   esr |= (ESR_ELx_EC_DABT_CUR << ESR_ELx_EC_SHIFT);
+
+   esr |= ESR_ELx_FSC_EXTABT;
+
+   __vcpu_sys_reg(shadow_vcpu, ESR_EL1) = esr;
+   __vcpu_sys_reg(shadow_vcpu, FAR_EL1) = 
kvm_vcpu_get_hfar(shadow_vcpu);
+   /* Tell the run loop that we want to inject something */
+   shadow_vcpu->arch.flags &=

[PATCH 66/89] KVM: arm64: Donate memory to protected guests

2022-05-19 Thread Will Deacon

From: Marc Zyngier 

Instead of sharing memory with protected guests, which still leaves the
host with r/w access, donate the underlying pages so that they are
unmapped from the host stage-2.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/hyp/nvhe/hyp-main.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c 
b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index c1939dd2294f..e987f34641dd 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -465,7 +465,10 @@ static void handle___pkvm_host_map_guest(struct 
kvm_cpu_context *host_ctxt)
if (ret)
goto out;
 
-   ret = __pkvm_host_share_guest(pfn, gfn, shadow_vcpu);
+   if (shadow_state_is_protected(shadow_state))
+   ret = __pkvm_host_donate_guest(pfn, gfn, shadow_vcpu);
+   else
+   ret = __pkvm_host_share_guest(pfn, gfn, shadow_vcpu);
 out:
cpu_reg(host_ctxt, 1) =  ret;
 }
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 65/89] KVM: arm64: Force injection of a data abort on NISV MMIO exit

2022-05-19 Thread Will Deacon

From: Marc Zyngier 

If a vcpu exits for a data abort with an invalid syndrome, the
expectations are that userspace has a chance to save the day if
it has requested to see such exits.

However, this is completely futile in the case of a protected VM,
as none of the state is available. In this particular case, inject
a data abort directly into the vcpu, consistent with what userspace
could do.

This also helps with pKVM, which discards all syndrome information when
forwarding data aborts that are not known to be MMIO.

Finally, hide the RETURN_NISV_IO_ABORT_TO_USER cap from userspace on
protected VMs, and document this tweak to the API.

Signed-off-by: Marc Zyngier 
---
 Documentation/virt/kvm/api.rst |  7 +++
 arch/arm64/kvm/arm.c   | 14 ++
 arch/arm64/kvm/mmio.c  |  9 +
 3 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index d13fa6600467..207706260f67 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6115,6 +6115,13 @@ Note that KVM does not skip the faulting instruction as 
it does for
 KVM_EXIT_MMIO, but userspace has to emulate any change to the processing state
 if it decides to decode and emulate the instruction.
 
+This feature isn't available to protected VMs, as userspace does not
+have access to the state that is required to perform the emulation.
+Instead, a data abort exception is directly injected in the guest.
+Note that although KVM_CAP_ARM_NISV_TO_USER will be reported if
+queried outside of a protected VM context, the feature will not be
+exposed if queried on a protected VM file descriptor.
+
 ::
 
/* KVM_EXIT_X86_RDMSR / KVM_EXIT_X86_WRMSR */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 65af1757e73a..9c5a935a9a73 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -84,9 +84,13 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 
switch (cap->cap) {
case KVM_CAP_ARM_NISV_TO_USER:
-   r = 0;
-   set_bit(KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER,
-   >arch.flags);
+   if (kvm_vm_is_protected(kvm)) {
+   r = -EINVAL;
+   } else {
+   r = 0;
+   set_bit(KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER,
+   >arch.flags);
+   }
break;
case KVM_CAP_ARM_MTE:
mutex_lock(>lock);
@@ -217,13 +221,15 @@ static int kvm_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_IMMEDIATE_EXIT:
case KVM_CAP_VCPU_EVENTS:
case KVM_CAP_ARM_IRQ_LINE_LAYOUT_2:
-   case KVM_CAP_ARM_NISV_TO_USER:
case KVM_CAP_ARM_INJECT_EXT_DABT:
case KVM_CAP_SET_GUEST_DEBUG:
case KVM_CAP_VCPU_ATTRIBUTES:
case KVM_CAP_PTP_KVM:
r = 1;
break;
+   case KVM_CAP_ARM_NISV_TO_USER:
+   r = !kvm || !kvm_vm_is_protected(kvm);
+   break;
case KVM_CAP_SET_GUEST_DEBUG2:
return KVM_GUESTDBG_VALID_MASK;
case KVM_CAP_ARM_SET_DEVICE_ADDR:
diff --git a/arch/arm64/kvm/mmio.c b/arch/arm64/kvm/mmio.c
index 3dd38a151d2a..db6630c70f8b 100644
--- a/arch/arm64/kvm/mmio.c
+++ b/arch/arm64/kvm/mmio.c
@@ -133,8 +133,17 @@ int io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t 
fault_ipa)
/*
 * No valid syndrome? Ask userspace for help if it has
 * volunteered to do so, and bail out otherwise.
+*
+* In the protected VM case, there isn't much userspace can do
+* though, so directly deliver an exception to the guest.
 */
if (!kvm_vcpu_dabt_isvalid(vcpu)) {
+   if (is_protected_kvm_enabled() &&
+   kvm_vm_is_protected(vcpu->kvm)) {
+   kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
+   return 1;
+   }
+
if (test_bit(KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER,
 >kvm->arch.flags)) {
run->exit_reason = KVM_EXIT_ARM_NISV;
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 64/89] KVM: arm64: Advertise GICv3 sysreg interface to protected guests

2022-05-19 Thread Will Deacon

Advertise the system register GICv3 CPU interface to protected guests
as that is the only supported configuration under pKVM.

Signed-off-by: Will Deacon 
---
 arch/arm64/include/asm/kvm_pkvm.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_pkvm.h 
b/arch/arm64/include/asm/kvm_pkvm.h
index 6f13f62558dd..062ae2ffbdfb 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -43,11 +43,13 @@ void kvm_shadow_destroy(struct kvm *kvm);
 /*
  * Allow for protected VMs:
  * - Floating-point and Advanced SIMD
+ * - GICv3(+) system register interface
  * - Data Independent Timing
  */
 #define PVM_ID_AA64PFR0_ALLOW (\
ARM64_FEATURE_MASK(ID_AA64PFR0_FP) | \
ARM64_FEATURE_MASK(ID_AA64PFR0_ASIMD) | \
+   ARM64_FEATURE_MASK(ID_AA64PFR0_GIC) | \
ARM64_FEATURE_MASK(ID_AA64PFR0_DIT) \
)
 
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 63/89] KVM: arm64: Fix initializing traps in protected mode

2022-05-19 Thread Will Deacon

From: Fuad Tabba 

The values of the trapping registers for protected VMs should be
computed from the ground up, and not depend on potentially
preexisting values.

Moreover, non-protected VMs should not be restricted in protected
mode in the same manner as protected VMs.

Signed-off-by: Fuad Tabba 
---
 arch/arm64/kvm/hyp/nvhe/pkvm.c | 48 ++
 1 file changed, 31 insertions(+), 17 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 2c13ba0f2bf2..839506a546c7 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -168,34 +168,48 @@ static void pvm_init_traps_aa64mmfr1(struct kvm_vcpu 
*vcpu)
  */
 static void pvm_init_trap_regs(struct kvm_vcpu *vcpu)
 {
-   const u64 hcr_trap_feat_regs = HCR_TID3;
-   const u64 hcr_trap_impdef = HCR_TACR | HCR_TIDCP | HCR_TID1;
-
/*
 * Always trap:
 * - Feature id registers: to control features exposed to guests
 * - Implementation-defined features
 */
-   vcpu->arch.hcr_el2 |= hcr_trap_feat_regs | hcr_trap_impdef;
+   vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS |
+HCR_TID3 | HCR_TACR | HCR_TIDCP | HCR_TID1;
+
+   if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
+   /* route synchronous external abort exceptions to EL2 */
+   vcpu->arch.hcr_el2 |= HCR_TEA;
+   /* trap error record accesses */
+   vcpu->arch.hcr_el2 |= HCR_TERR;
+   }
 
-   /* Clear res0 and set res1 bits to trap potential new features. */
-   vcpu->arch.hcr_el2 &= ~(HCR_RES0);
-   vcpu->arch.mdcr_el2 &= ~(MDCR_EL2_RES0);
-   vcpu->arch.cptr_el2 |= CPTR_NVHE_EL2_RES1;
-   vcpu->arch.cptr_el2 &= ~(CPTR_NVHE_EL2_RES0);
+   if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
+   vcpu->arch.hcr_el2 |= HCR_FWB;
+
+   if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE))
+   vcpu->arch.hcr_el2 |= HCR_TID2;
 }
 
 /*
  * Initialize trap register values for protected VMs.
  */
-static void pkvm_vcpu_init_traps(struct kvm_vcpu *vcpu)
+static void pkvm_vcpu_init_traps(struct kvm_vcpu *shadow_vcpu, struct kvm_vcpu 
*host_vcpu)
 {
-   pvm_init_trap_regs(vcpu);
-   pvm_init_traps_aa64pfr0(vcpu);
-   pvm_init_traps_aa64pfr1(vcpu);
-   pvm_init_traps_aa64dfr0(vcpu);
-   pvm_init_traps_aa64mmfr0(vcpu);
-   pvm_init_traps_aa64mmfr1(vcpu);
+   shadow_vcpu->arch.cptr_el2 = CPTR_EL2_DEFAULT;
+   shadow_vcpu->arch.mdcr_el2 = 0;
+
+   if (!vcpu_is_protected(shadow_vcpu)) {
+   shadow_vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS |
+   READ_ONCE(host_vcpu->arch.hcr_el2);
+   return;
+   }
+
+   pvm_init_trap_regs(shadow_vcpu);
+   pvm_init_traps_aa64pfr0(shadow_vcpu);
+   pvm_init_traps_aa64pfr1(shadow_vcpu);
+   pvm_init_traps_aa64dfr0(shadow_vcpu);
+   pvm_init_traps_aa64mmfr0(shadow_vcpu);
+   pvm_init_traps_aa64mmfr1(shadow_vcpu);
 }
 
 /*
@@ -364,7 +378,7 @@ static int init_shadow_structs(struct kvm *kvm, struct 
kvm_shadow_vm *vm,
 
shadow_vcpu->arch.hw_mmu = >kvm.arch.mmu;
 
-   pkvm_vcpu_init_traps(shadow_vcpu);
+   pkvm_vcpu_init_traps(shadow_vcpu, host_vcpu);
}
 
return 0;
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 62/89] KVM: arm64: Move pkvm_vcpu_init_traps to shadow vcpu init

2022-05-19 Thread Will Deacon

From: Fuad Tabba 

Move the initialization of traps to the initialization of the
shadow vcpu, and remove the associated hypercall.

No functional change intended.

Signed-off-by: Fuad Tabba 
---
 arch/arm64/include/asm/kvm_asm.h   | 1 -
 arch/arm64/kvm/arm.c   | 8 
 arch/arm64/kvm/hyp/include/nvhe/trap_handler.h | 2 --
 arch/arm64/kvm/hyp/nvhe/hyp-main.c | 8 
 arch/arm64/kvm/hyp/nvhe/pkvm.c | 4 +++-
 5 files changed, 3 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index ea3b3a60bedb..7af0b7695a2c 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -73,7 +73,6 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid,
__KVM_HOST_SMCCC_FUNC___kvm_flush_cpu_context,
__KVM_HOST_SMCCC_FUNC___kvm_timer_set_cntvoff,
-   __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_init_traps,
__KVM_HOST_SMCCC_FUNC___vgic_v3_save_vmcr_aprs,
__KVM_HOST_SMCCC_FUNC___vgic_v3_restore_vmcr_aprs,
__KVM_HOST_SMCCC_FUNC___pkvm_init_shadow,
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 8a1b4ba1dfa7..65af1757e73a 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -664,14 +664,6 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
static_branch_inc(_irqchip_in_use);
}
 
-   /*
-* Initialize traps for protected VMs.
-* NOTE: Move to run in EL2 directly, rather than via a hypercall, once
-* the code is in place for first run initialization at EL2.
-*/
-   if (kvm_vm_is_protected(kvm))
-   kvm_call_hyp_nvhe(__pkvm_vcpu_init_traps, vcpu);
-
mutex_lock(>lock);
set_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, >arch.flags);
mutex_unlock(>lock);
diff --git a/arch/arm64/kvm/hyp/include/nvhe/trap_handler.h 
b/arch/arm64/kvm/hyp/include/nvhe/trap_handler.h
index 45a84f0ade04..1e6d995968a1 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/trap_handler.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/trap_handler.h
@@ -15,6 +15,4 @@
 #define DECLARE_REG(type, name, ctxt, reg) \
type name = (type)cpu_reg(ctxt, (reg))
 
-void __pkvm_vcpu_init_traps(struct kvm_vcpu *vcpu);
-
 #endif /* __ARM64_KVM_NVHE_TRAP_HANDLER_H__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c 
b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 0f1c9d27f6eb..c1939dd2294f 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -620,13 +620,6 @@ static void handle___pkvm_prot_finalize(struct 
kvm_cpu_context *host_ctxt)
cpu_reg(host_ctxt, 1) = __pkvm_prot_finalize();
 }
 
-static void handle___pkvm_vcpu_init_traps(struct kvm_cpu_context *host_ctxt)
-{
-   DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
-
-   __pkvm_vcpu_init_traps(kern_hyp_va(vcpu));
-}
-
 static void handle___pkvm_init_shadow(struct kvm_cpu_context *host_ctxt)
 {
DECLARE_REG(struct kvm *, host_kvm, host_ctxt, 1);
@@ -674,7 +667,6 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__kvm_tlb_flush_vmid),
HANDLE_FUNC(__kvm_flush_cpu_context),
HANDLE_FUNC(__kvm_timer_set_cntvoff),
-   HANDLE_FUNC(__pkvm_vcpu_init_traps),
HANDLE_FUNC(__vgic_v3_save_vmcr_aprs),
HANDLE_FUNC(__vgic_v3_restore_vmcr_aprs),
HANDLE_FUNC(__pkvm_init_shadow),
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index cd0712e13ab0..2c13ba0f2bf2 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -188,7 +188,7 @@ static void pvm_init_trap_regs(struct kvm_vcpu *vcpu)
 /*
  * Initialize trap register values for protected VMs.
  */
-void __pkvm_vcpu_init_traps(struct kvm_vcpu *vcpu)
+static void pkvm_vcpu_init_traps(struct kvm_vcpu *vcpu)
 {
pvm_init_trap_regs(vcpu);
pvm_init_traps_aa64pfr0(vcpu);
@@ -363,6 +363,8 @@ static int init_shadow_structs(struct kvm *kvm, struct 
kvm_shadow_vm *vm,
shadow_vcpu->vcpu_idx = i;
 
shadow_vcpu->arch.hw_mmu = >kvm.arch.mmu;
+
+   pkvm_vcpu_init_traps(shadow_vcpu);
}
 
return 0;
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 61/89] KVM: arm64: Reset sysregs for protected VMs

2022-05-19 Thread Will Deacon

From: Fuad Tabba 

Create a framework for resetting protected VM system registers to
their architecturally defined reset values.

No functional change intended as these are not hooked in yet.

Signed-off-by: Fuad Tabba 
---
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h |  1 +
 arch/arm64/kvm/hyp/nvhe/sys_regs.c | 84 +-
 2 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h 
b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index d070400b5616..e772f9835a86 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -98,6 +98,7 @@ struct kvm_shadow_vcpu_state 
*pkvm_loaded_shadow_vcpu_state(void);
 u64 pvm_read_id_reg(const struct kvm_vcpu *vcpu, u32 id);
 bool kvm_handle_pvm_sysreg(struct kvm_vcpu *vcpu, u64 *exit_code);
 bool kvm_handle_pvm_restricted(struct kvm_vcpu *vcpu, u64 *exit_code);
+void kvm_reset_pvm_sys_regs(struct kvm_vcpu *vcpu);
 int kvm_check_pvm_sysreg_table(void);
 
 #endif /* __ARM64_KVM_NVHE_PKVM_H__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/sys_regs.c 
b/arch/arm64/kvm/hyp/nvhe/sys_regs.c
index e732826f9624..aeea565d84b8 100644
--- a/arch/arm64/kvm/hyp/nvhe/sys_regs.c
+++ b/arch/arm64/kvm/hyp/nvhe/sys_regs.c
@@ -470,8 +470,85 @@ static const struct sys_reg_desc pvm_sys_reg_descs[] = {
/* Performance Monitoring Registers are restricted. */
 };
 
+/* A structure to track reset values for system registers in protected vcpus. 
*/
+struct sys_reg_desc_reset {
+   /* Index into sys_reg[]. */
+   int reg;
+
+   /* Reset function. */
+   void (*reset)(struct kvm_vcpu *, const struct sys_reg_desc_reset *);
+
+   /* Reset value. */
+   u64 value;
+};
+
+static void reset_actlr(struct kvm_vcpu *vcpu, const struct sys_reg_desc_reset 
*r)
+{
+   __vcpu_sys_reg(vcpu, r->reg) = read_sysreg(actlr_el1);
+}
+
+static void reset_amair_el1(struct kvm_vcpu *vcpu, const struct 
sys_reg_desc_reset *r)
+{
+   __vcpu_sys_reg(vcpu, r->reg) = read_sysreg(amair_el1);
+}
+
+static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc_reset 
*r)
+{
+   __vcpu_sys_reg(vcpu, r->reg) = calculate_mpidr(vcpu);
+}
+
+static void reset_value(struct kvm_vcpu *vcpu, const struct sys_reg_desc_reset 
*r)
+{
+   __vcpu_sys_reg(vcpu, r->reg) = r->value;
+}
+
+/* Specify the register's reset value. */
+#define RESET_VAL(REG, RESET_VAL) {  REG, reset_value, RESET_VAL }
+
+/* Specify a function that calculates the register's reset value. */
+#define RESET_FUNC(REG, RESET_FUNC) {  REG, RESET_FUNC, 0 }
+
+/*
+ * Architected system registers reset values for Protected VMs.
+ * Important: Must be sorted ascending by REG (index into sys_reg[])
+ */
+static const struct sys_reg_desc_reset pvm_sys_reg_reset_vals[] = {
+   RESET_FUNC(MPIDR_EL1, reset_mpidr),
+   RESET_VAL(SCTLR_EL1, 0x00C50078),
+   RESET_FUNC(ACTLR_EL1, reset_actlr),
+   RESET_VAL(CPACR_EL1, 0),
+   RESET_VAL(ZCR_EL1, 0),
+   RESET_VAL(TCR_EL1, 0),
+   RESET_VAL(VBAR_EL1, 0),
+   RESET_VAL(CONTEXTIDR_EL1, 0),
+   RESET_FUNC(AMAIR_EL1, reset_amair_el1),
+   RESET_VAL(CNTKCTL_EL1, 0),
+   RESET_VAL(MDSCR_EL1, 0),
+   RESET_VAL(MDCCINT_EL1, 0),
+   RESET_VAL(DISR_EL1, 0),
+   RESET_VAL(PMCCFILTR_EL0, 0),
+   RESET_VAL(PMUSERENR_EL0, 0),
+};
+
 /*
- * Checks that the sysreg table is unique and in-order.
+ * Sets system registers to reset value
+ *
+ * This function finds the right entry and sets the registers on the protected
+ * vcpu to their architecturally defined reset values.
+ */
+void kvm_reset_pvm_sys_regs(struct kvm_vcpu *vcpu)
+{
+   unsigned long i;
+
+   for (i = 0; i < ARRAY_SIZE(pvm_sys_reg_reset_vals); i++) {
+   const struct sys_reg_desc_reset *r = _sys_reg_reset_vals[i];
+
+   r->reset(vcpu, r);
+   }
+}
+
+/*
+ * Checks that the sysreg tables are unique and in-order.
  *
  * Returns 0 if the table is consistent, or 1 otherwise.
  */
@@ -484,6 +561,11 @@ int kvm_check_pvm_sysreg_table(void)
return 1;
}
 
+   for (i = 1; i < ARRAY_SIZE(pvm_sys_reg_reset_vals); i++) {
+   if (pvm_sys_reg_reset_vals[i-1].reg >= 
pvm_sys_reg_reset_vals[i].reg)
+   return 1;
+   }
+
return 0;
 }
 
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 60/89] KVM: arm64: Refactor reset_mpidr to extract its computation

2022-05-19 Thread Will Deacon

From: Fuad Tabba 

Move the computation of the mpidr to its own function in a shared
header, as the computation will be used by hyp in protected mode.

No functional change intended.

Signed-off-by: Fuad Tabba 
---
 arch/arm64/kvm/sys_regs.c | 14 +-
 arch/arm64/kvm/sys_regs.h | 19 +++
 2 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 7886989443b9..d2b1ad662546 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -598,19 +598,7 @@ static void reset_actlr(struct kvm_vcpu *vcpu, const 
struct sys_reg_desc *r)
 
 static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 {
-   u64 mpidr;
-
-   /*
-* Map the vcpu_id into the first three affinity level fields of
-* the MPIDR. We limit the number of VCPUs in level 0 due to a
-* limitation to 16 CPUs in that level in the ICC_SGIxR registers
-* of the GICv3 to be able to address each CPU directly when
-* sending IPIs.
-*/
-   mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
-   mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
-   mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
-   vcpu_write_sys_reg(vcpu, (1ULL << 31) | mpidr, MPIDR_EL1);
+   vcpu_write_sys_reg(vcpu, calculate_mpidr(vcpu), MPIDR_EL1);
 }
 
 static unsigned int pmu_visibility(const struct kvm_vcpu *vcpu,
diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h
index cc0cc95a0280..9b32772f398e 100644
--- a/arch/arm64/kvm/sys_regs.h
+++ b/arch/arm64/kvm/sys_regs.h
@@ -183,6 +183,25 @@ find_reg(const struct sys_reg_params *params, const struct 
sys_reg_desc table[],
return __inline_bsearch((void *)pval, table, num, sizeof(table[0]), 
match_sys_reg);
 }
 
+static inline u64 calculate_mpidr(const struct kvm_vcpu *vcpu)
+{
+   u64 mpidr;
+
+   /*
+* Map the vcpu_id into the first three affinity level fields of
+* the MPIDR. We limit the number of VCPUs in level 0 due to a
+* limitation to 16 CPUs in that level in the ICC_SGIxR registers
+* of the GICv3 to be able to address each CPU directly when
+* sending IPIs.
+*/
+   mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
+   mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
+   mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
+   mpidr |= (1ULL << 31);
+
+   return mpidr;
+}
+
 const struct sys_reg_desc *find_reg_by_id(u64 id,
  struct sys_reg_params *params,
  const struct sys_reg_desc table[],
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 59/89] KVM: arm64: Do not support MTE for protected VMs

2022-05-19 Thread Will Deacon

From: Fuad Tabba 

Return an error (-EINVAL) if trying to enable MTE on a protected
vm.

Signed-off-by: Fuad Tabba 
---
 arch/arm64/kvm/arm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 10e036bf06e3..8a1b4ba1dfa7 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -90,7 +90,9 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
break;
case KVM_CAP_ARM_MTE:
mutex_lock(>lock);
-   if (!system_supports_mte() || kvm->created_vcpus) {
+   if (!system_supports_mte() ||
+   kvm_vm_is_protected(kvm) ||
+   kvm->created_vcpus) {
r = -EINVAL;
} else {
r = 0;
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 58/89] KVM: arm64: Restrict protected VM capabilities

2022-05-19 Thread Will Deacon

From: Fuad Tabba 

Restrict protected VM capabilities based on the
fixed-configuration for protected VMs.

No functional change intended in current KVM-supported modes
(nVHE, VHE).

Signed-off-by: Fuad Tabba 
---
 arch/arm64/include/asm/kvm_pkvm.h | 27 
 arch/arm64/kvm/arm.c  | 69 ++-
 2 files changed, 95 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_pkvm.h 
b/arch/arm64/include/asm/kvm_pkvm.h
index b92440cfb5b4..6f13f62558dd 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -208,6 +208,33 @@ void kvm_shadow_destroy(struct kvm *kvm);
ARM64_FEATURE_MASK(ID_AA64ISAR2_APA3) \
)
 
+/*
+ * Returns the maximum number of breakpoints supported for protected VMs.
+ */
+static inline int pkvm_get_max_brps(void)
+{
+   int num = FIELD_GET(ARM64_FEATURE_MASK(ID_AA64DFR0_BRPS),
+   PVM_ID_AA64DFR0_ALLOW);
+
+   /*
+* If breakpoints are supported, the maximum number is 1 + the field.
+* Otherwise, return 0, which is not compliant with the architecture,
+* but is reserved and is used here to indicate no debug support.
+*/
+   return num ? num + 1 : 0;
+}
+
+/*
+ * Returns the maximum number of watchpoints supported for protected VMs.
+ */
+static inline int pkvm_get_max_wrps(void)
+{
+   int num = FIELD_GET(ARM64_FEATURE_MASK(ID_AA64DFR0_WRPS),
+   PVM_ID_AA64DFR0_ALLOW);
+
+   return num ? num + 1 : 0;
+}
+
 extern struct memblock_region kvm_nvhe_sym(hyp_memory)[];
 extern unsigned int kvm_nvhe_sym(hyp_memblock_nr);
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 7c57c14e173a..10e036bf06e3 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -194,9 +194,10 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kvm_unshare_hyp(kvm, kvm + 1);
 }
 
-int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
+static int kvm_check_extension(struct kvm *kvm, long ext)
 {
int r;
+
switch (ext) {
case KVM_CAP_IRQCHIP:
r = vgic_present;
@@ -294,6 +295,72 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
return r;
 }
 
+/*
+ * Checks whether the extension specified in ext is supported in protected
+ * mode for the specified vm.
+ * The capabilities supported by kvm in general are passed in kvm_cap.
+ */
+static int pkvm_check_extension(struct kvm *kvm, long ext, int kvm_cap)
+{
+   int r;
+
+   switch (ext) {
+   case KVM_CAP_IRQCHIP:
+   case KVM_CAP_ARM_PSCI:
+   case KVM_CAP_ARM_PSCI_0_2:
+   case KVM_CAP_NR_VCPUS:
+   case KVM_CAP_MAX_VCPUS:
+   case KVM_CAP_MAX_VCPU_ID:
+   case KVM_CAP_MSI_DEVID:
+   case KVM_CAP_ARM_VM_IPA_SIZE:
+   r = kvm_cap;
+   break;
+   case KVM_CAP_GUEST_DEBUG_HW_BPS:
+   r = min(kvm_cap, pkvm_get_max_brps());
+   break;
+   case KVM_CAP_GUEST_DEBUG_HW_WPS:
+   r = min(kvm_cap, pkvm_get_max_wrps());
+   break;
+   case KVM_CAP_ARM_PMU_V3:
+   r = kvm_cap && FIELD_GET(ARM64_FEATURE_MASK(ID_AA64DFR0_PMUVER),
+PVM_ID_AA64DFR0_ALLOW);
+   break;
+   case KVM_CAP_ARM_SVE:
+   r = kvm_cap && FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR0_SVE),
+PVM_ID_AA64PFR0_RESTRICT_UNSIGNED);
+   break;
+   case KVM_CAP_ARM_PTRAUTH_ADDRESS:
+   r = kvm_cap &&
+   FIELD_GET(ARM64_FEATURE_MASK(ID_AA64ISAR1_API),
+ PVM_ID_AA64ISAR1_ALLOW) &&
+   FIELD_GET(ARM64_FEATURE_MASK(ID_AA64ISAR1_APA),
+ PVM_ID_AA64ISAR1_ALLOW);
+   break;
+   case KVM_CAP_ARM_PTRAUTH_GENERIC:
+   r = kvm_cap &&
+   FIELD_GET(ARM64_FEATURE_MASK(ID_AA64ISAR1_GPI),
+ PVM_ID_AA64ISAR1_ALLOW) &&
+   FIELD_GET(ARM64_FEATURE_MASK(ID_AA64ISAR1_GPA),
+ PVM_ID_AA64ISAR1_ALLOW);
+   break;
+   default:
+   r = 0;
+   break;
+   }
+
+   return r;
+}
+
+int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
+{
+   int r = kvm_check_extension(kvm, ext);
+
+   if (kvm && kvm_vm_is_protected(kvm))
+   r = pkvm_check_extension(kvm, ext, r);
+
+   return r;
+}
+
 long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
 {
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 57/89] KVM: arm64: Trap debug break and watch from guest

2022-05-19 Thread Will Deacon

From: Fuad Tabba 

Debug and trace are not currently supported for protected guests, so
trap accesses to the related registers and emulate them as RAZ/WI.

Signed-off-by: Fuad Tabba 
---
 arch/arm64/kvm/hyp/nvhe/pkvm.c |  2 +-
 arch/arm64/kvm/hyp/nvhe/sys_regs.c | 11 +++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index c0403416ce1d..cd0712e13ab0 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -108,7 +108,7 @@ static void pvm_init_traps_aa64dfr0(struct kvm_vcpu *vcpu)
 
/* Trap Debug */
if (!FIELD_GET(ARM64_FEATURE_MASK(ID_AA64DFR0_DEBUGVER), feature_ids))
-   mdcr_set |= MDCR_EL2_TDRA | MDCR_EL2_TDA | MDCR_EL2_TDE;
+   mdcr_set |= MDCR_EL2_TDRA | MDCR_EL2_TDA;
 
/* Trap OS Double Lock */
if (!FIELD_GET(ARM64_FEATURE_MASK(ID_AA64DFR0_DOUBLELOCK), feature_ids))
diff --git a/arch/arm64/kvm/hyp/nvhe/sys_regs.c 
b/arch/arm64/kvm/hyp/nvhe/sys_regs.c
index ddea42d7baf9..e732826f9624 100644
--- a/arch/arm64/kvm/hyp/nvhe/sys_regs.c
+++ b/arch/arm64/kvm/hyp/nvhe/sys_regs.c
@@ -356,6 +356,17 @@ static const struct sys_reg_desc pvm_sys_reg_descs[] = {
/* Cache maintenance by set/way operations are restricted. */
 
/* Debug and Trace Registers are restricted. */
+   RAZ_WI(SYS_DBGBVRn_EL1(0)),
+   RAZ_WI(SYS_DBGBCRn_EL1(0)),
+   RAZ_WI(SYS_DBGWVRn_EL1(0)),
+   RAZ_WI(SYS_DBGWCRn_EL1(0)),
+   RAZ_WI(SYS_MDSCR_EL1),
+   RAZ_WI(SYS_OSLAR_EL1),
+   RAZ_WI(SYS_OSLSR_EL1),
+   RAZ_WI(SYS_OSDLR_EL1),
+
+   /* Group 1 ID registers */
+   RAZ_WI(SYS_REVIDR_EL1),
 
/* AArch64 mappings of the AArch32 ID registers */
/* CRm=1 */
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 56/89] KVM: arm64: Check directly whether the vcpu is protected

2022-05-19 Thread Will Deacon

From: Fuad Tabba 

Simpler code and ensures we're always looking at hyp state.

Signed-off-by: Fuad Tabba 
---
 arch/arm64/kvm/hyp/nvhe/switch.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 9d2b971e8613..6bb979ee51cc 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -215,7 +215,7 @@ static const exit_handler_fn pvm_exit_handlers[] = {
 
 static const exit_handler_fn *kvm_get_exit_handler_array(struct kvm_vcpu *vcpu)
 {
-   if (unlikely(kvm_vm_is_protected(kern_hyp_va(vcpu->kvm
+   if (unlikely(vcpu_is_protected(vcpu)))
return pvm_exit_handlers;
 
return hyp_exit_handlers;
@@ -234,9 +234,7 @@ static const exit_handler_fn 
*kvm_get_exit_handler_array(struct kvm_vcpu *vcpu)
  */
 static void early_exit_filter(struct kvm_vcpu *vcpu, u64 *exit_code)
 {
-   struct kvm *kvm = kern_hyp_va(vcpu->kvm);
-
-   if (kvm_vm_is_protected(kvm) && vcpu_mode_is_32bit(vcpu)) {
+   if (unlikely(vcpu_is_protected(vcpu) && vcpu_mode_is_32bit(vcpu))) {
/*
 * As we have caught the guest red-handed, decide that it isn't
 * fit for purpose anymore by making the vcpu invalid. The VMM
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 55/89] KVM: arm64: Do not pass the vcpu to __pkvm_host_map_guest()

2022-05-19 Thread Will Deacon

From: Fuad Tabba 

__pkvm_host_map_guest() always applies to the loaded vcpu in hyp, and
should not trust the host to provide the vcpu.

Signed-off-by: Fuad Tabba 
---
 arch/arm64/kvm/hyp/nvhe/hyp-main.c | 15 ---
 arch/arm64/kvm/mmu.c   |  6 +++---
 2 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c 
b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index e82c0faf6c81..0f1c9d27f6eb 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -445,20 +445,15 @@ static void handle___pkvm_host_map_guest(struct 
kvm_cpu_context *host_ctxt)
 {
DECLARE_REG(u64, pfn, host_ctxt, 1);
DECLARE_REG(u64, gfn, host_ctxt, 2);
-   DECLARE_REG(struct kvm_vcpu *, host_vcpu, host_ctxt, 3);
-   struct kvm_shadow_vcpu_state *shadow_state;
+   struct kvm_vcpu *host_vcpu;
struct kvm_vcpu *shadow_vcpu;
-   struct kvm *host_kvm;
-   unsigned int handle;
+   struct kvm_shadow_vcpu_state *shadow_state;
int ret = -EINVAL;
 
if (!is_protected_kvm_enabled())
goto out;
 
-   host_vcpu = kern_hyp_va(host_vcpu);
-   host_kvm = kern_hyp_va(host_vcpu->kvm);
-   handle = host_kvm->arch.pkvm.shadow_handle;
-   shadow_state = pkvm_load_shadow_vcpu_state(handle, host_vcpu->vcpu_idx);
+   shadow_state = pkvm_loaded_shadow_vcpu_state();
if (!shadow_state)
goto out;
 
@@ -468,11 +463,9 @@ static void handle___pkvm_host_map_guest(struct 
kvm_cpu_context *host_ctxt)
/* Topup shadow memcache with the host's */
ret = pkvm_refill_memcache(shadow_vcpu, host_vcpu);
if (ret)
-   goto out_put_state;
+   goto out;
 
ret = __pkvm_host_share_guest(pfn, gfn, shadow_vcpu);
-out_put_state:
-   pkvm_put_shadow_vcpu_state(shadow_state);
 out:
cpu_reg(host_ctxt, 1) =  ret;
 }
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index c74c431588a3..137d4382ed1c 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1143,9 +1143,9 @@ static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t 
pfn,
return 0;
 }
 
-static int pkvm_host_map_guest(u64 pfn, u64 gfn, struct kvm_vcpu *vcpu)
+static int pkvm_host_map_guest(u64 pfn, u64 gfn)
 {
-   int ret = kvm_call_hyp_nvhe(__pkvm_host_map_guest, pfn, gfn, vcpu);
+   int ret = kvm_call_hyp_nvhe(__pkvm_host_map_guest, pfn, gfn);
 
/*
 * Getting -EPERM at this point implies that the pfn has already been
@@ -1211,7 +1211,7 @@ static int pkvm_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
 
write_lock(>mmu_lock);
pfn = page_to_pfn(page);
-   ret = pkvm_host_map_guest(pfn, fault_ipa >> PAGE_SHIFT, vcpu);
+   ret = pkvm_host_map_guest(pfn, fault_ipa >> PAGE_SHIFT);
if (ret) {
if (ret == -EAGAIN)
ret = 0;
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 54/89] KVM: arm64: Reduce host/shadow vcpu state copying

2022-05-19 Thread Will Deacon

From: Marc Zyngier 

When running with pKVM enabled, protected guests run with a fixed CPU
configuration and therefore features such as hardware debug and SVE are
unavailable and their state does not need to be copied from the host
structures on each flush operation. Although non-protected guests do
require the host and shadow structures to be kept in-sync with each
other, we can defer writing back to the host to an explicit sync
hypercall, rather than doing it after every vCPU run.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/hyp/nvhe/hyp-main.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c 
b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 228736a9ab40..e82c0faf6c81 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -196,17 +196,18 @@ static void flush_shadow_state(struct 
kvm_shadow_vcpu_state *shadow_state)
 
if (host_flags & KVM_ARM64_PKVM_STATE_DIRTY)
__flush_vcpu_state(shadow_state);
-   }
 
-   shadow_vcpu->arch.sve_state = 
kern_hyp_va(host_vcpu->arch.sve_state);
-   shadow_vcpu->arch.sve_max_vl= host_vcpu->arch.sve_max_vl;
+   shadow_vcpu->arch.sve_state = 
kern_hyp_va(host_vcpu->arch.sve_state);
+   shadow_vcpu->arch.sve_max_vl = host_vcpu->arch.sve_max_vl;
 
-   shadow_vcpu->arch.hcr_el2   = host_vcpu->arch.hcr_el2;
-   shadow_vcpu->arch.mdcr_el2  = host_vcpu->arch.mdcr_el2;
+   shadow_vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS & ~(HCR_RW | 
HCR_TWI | HCR_TWE);
+   shadow_vcpu->arch.hcr_el2 |= READ_ONCE(host_vcpu->arch.hcr_el2);
 
-   shadow_vcpu->arch.debug_ptr = 
kern_hyp_va(host_vcpu->arch.debug_ptr);
+   shadow_vcpu->arch.mdcr_el2 = host_vcpu->arch.mdcr_el2;
+   shadow_vcpu->arch.debug_ptr = 
kern_hyp_va(host_vcpu->arch.debug_ptr);
+   }
 
-   shadow_vcpu->arch.vsesr_el2 = host_vcpu->arch.vsesr_el2;
+   shadow_vcpu->arch.vsesr_el2 = host_vcpu->arch.vsesr_el2;
 
flush_vgic_state(host_vcpu, shadow_vcpu);
flush_timer_state(shadow_state);
@@ -238,10 +239,10 @@ static void sync_shadow_state(struct 
kvm_shadow_vcpu_state *shadow_state,
unsigned long host_flags;
u8 esr_ec;
 
-   host_vcpu->arch.ctxt= shadow_vcpu->arch.ctxt;
-
-   host_vcpu->arch.hcr_el2 = shadow_vcpu->arch.hcr_el2;
-
+   /*
+* Don't sync the vcpu GPR/sysreg state after a run. Instead,
+* leave it in the shadow until someone actually requires it.
+*/
sync_vgic_state(host_vcpu, shadow_vcpu);
sync_timer_state(shadow_state);
 
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 53/89] KVM: arm64: Lazy host FP save/restore

2022-05-19 Thread Will Deacon

From: Marc Zyngier 

Implement lazy save/restore of the host FPSIMD register state at EL2.
This allows us to save/restore guest FPSIMD registers without involving
the host and means that we can avoid having to repopulate the shadow
register state on every flush.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/hyp/nvhe/hyp-main.c | 57 ++
 1 file changed, 51 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c 
b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 2a12d6f710ef..228736a9ab40 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -20,6 +20,14 @@
 
 #include 
 
+/*
+ * Host FPSIMD state. Written to when the guest accesses its own FPSIMD state,
+ * and read when the guest state is live and we need to switch back to the 
host.
+ *
+ * Only valid when the KVM_ARM64_FP_ENABLED flag is set in the shadow 
structure.
+ */
+static DEFINE_PER_CPU(struct user_fpsimd_state, loaded_host_fpsimd_state);
+
 DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
 
 void __kvm_hyp_host_forward_smc(struct kvm_cpu_context *host_ctxt);
@@ -195,10 +203,8 @@ static void flush_shadow_state(struct 
kvm_shadow_vcpu_state *shadow_state)
 
shadow_vcpu->arch.hcr_el2   = host_vcpu->arch.hcr_el2;
shadow_vcpu->arch.mdcr_el2  = host_vcpu->arch.mdcr_el2;
-   shadow_vcpu->arch.cptr_el2  = host_vcpu->arch.cptr_el2;
 
shadow_vcpu->arch.debug_ptr = 
kern_hyp_va(host_vcpu->arch.debug_ptr);
-   shadow_vcpu->arch.host_fpsimd_state = host_vcpu->arch.host_fpsimd_state;
 
shadow_vcpu->arch.vsesr_el2 = host_vcpu->arch.vsesr_el2;
 
@@ -235,7 +241,6 @@ static void sync_shadow_state(struct kvm_shadow_vcpu_state 
*shadow_state,
host_vcpu->arch.ctxt= shadow_vcpu->arch.ctxt;
 
host_vcpu->arch.hcr_el2 = shadow_vcpu->arch.hcr_el2;
-   host_vcpu->arch.cptr_el2= shadow_vcpu->arch.cptr_el2;
 
sync_vgic_state(host_vcpu, shadow_vcpu);
sync_timer_state(shadow_state);
@@ -262,6 +267,27 @@ static void sync_shadow_state(struct kvm_shadow_vcpu_state 
*shadow_state,
shadow_state->exit_code = exit_reason;
 }
 
+static void fpsimd_host_restore(void)
+{
+   sysreg_clear_set(cptr_el2, CPTR_EL2_TZ | CPTR_EL2_TFP, 0);
+   isb();
+
+   if (unlikely(is_protected_kvm_enabled())) {
+   struct kvm_shadow_vcpu_state *shadow_state = 
pkvm_loaded_shadow_vcpu_state();
+   struct kvm_vcpu *shadow_vcpu = _state->shadow_vcpu;
+   struct user_fpsimd_state *host_fpsimd_state = 
this_cpu_ptr(_host_fpsimd_state);
+
+   __fpsimd_save_state(_vcpu->arch.ctxt.fp_regs);
+   __fpsimd_restore_state(host_fpsimd_state);
+
+   shadow_vcpu->arch.flags &= ~KVM_ARM64_FP_ENABLED;
+   shadow_vcpu->arch.flags |= KVM_ARM64_FP_HOST;
+   }
+
+   if (system_supports_sve())
+   sve_cond_update_zcr_vq(ZCR_ELx_LEN_MASK, SYS_ZCR_EL2);
+}
+
 static void handle___pkvm_vcpu_load(struct kvm_cpu_context *host_ctxt)
 {
DECLARE_REG(unsigned int, shadow_handle, host_ctxt, 1);
@@ -291,6 +317,9 @@ static void handle___pkvm_vcpu_load(struct kvm_cpu_context 
*host_ctxt)
*last_ran = shadow_vcpu->vcpu_id;
}
 
+   shadow_vcpu->arch.host_fpsimd_state = 
this_cpu_ptr(_host_fpsimd_state);
+   shadow_vcpu->arch.flags |= KVM_ARM64_FP_HOST;
+
if (shadow_state_is_protected(shadow_state)) {
/* Propagate WFx trapping flags, trap ptrauth */
shadow_vcpu->arch.hcr_el2 &= ~(HCR_TWE | HCR_TWI |
@@ -310,6 +339,10 @@ static void handle___pkvm_vcpu_put(struct kvm_cpu_context 
*host_ctxt)
 
if (shadow_state) {
struct kvm_vcpu *host_vcpu = shadow_state->host_vcpu;
+   struct kvm_vcpu *shadow_vcpu = _state->shadow_vcpu;
+
+   if (shadow_vcpu->arch.flags & KVM_ARM64_FP_ENABLED)
+   fpsimd_host_restore();
 
if (!shadow_state_is_protected(shadow_state) &&
!(READ_ONCE(host_vcpu->arch.flags) & 
KVM_ARM64_PKVM_STATE_DIRTY))
@@ -377,6 +410,19 @@ static void handle___kvm_vcpu_run(struct kvm_cpu_context 
*host_ctxt)
ret = __kvm_vcpu_run(_state->shadow_vcpu);
 
sync_shadow_state(shadow_state, ret);
+
+   if (shadow_state->shadow_vcpu.arch.flags & 
KVM_ARM64_FP_ENABLED) {
+   /*
+* The guest has used the FP, trap all accesses
+* from the host (both FP and SVE).
+*/
+   u64 reg = CPTR_EL2_TFP;
+
+   if (system_supports_sve())
+   reg |= CPTR_EL2_TZ;
+
+   sysreg_clear_set(cptr_el2, 0, reg);
+   }
} else {
ret = __kvm_vcpu_run(vcpu);
}
@@ -707,10 +753,9 @@ void

[PATCH 52/89] KVM: arm64: Introduce lazy-ish state sync for non-protected VMs

2022-05-19 Thread Will Deacon

From: Marc Zyngier 

Rather than blindly copying the register state between the shadow and
host vCPU structures, abstract this code into some helpers which are
called only for non-protected VMs running under pKVM. To faciliate
host accesses to guest registers within a get/put sequence, introduce a
new 'sync_state' hypercall to provide access to the registers of a
non-protected VM when handling traps.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/include/asm/kvm_asm.h   |  1 +
 arch/arm64/include/asm/kvm_host.h  |  1 +
 arch/arm64/kvm/arm.c   |  7 
 arch/arm64/kvm/handle_exit.c   | 22 ++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c | 65 +-
 5 files changed, 95 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 07ee95d0f97d..ea3b3a60bedb 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -80,6 +80,7 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___pkvm_teardown_shadow,
__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_load,
__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_put,
+   __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_sync_state,
 };
 
 #define DECLARE_KVM_VHE_SYM(sym)   extern char sym[]
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 066eb7234bdd..160cbf973bcb 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -512,6 +512,7 @@ struct kvm_vcpu_arch {
 #define KVM_ARM64_DEBUG_STATE_SAVE_TRBE(1 << 13) /* Save TRBE context 
if active  */
 #define KVM_ARM64_FP_FOREIGN_FPSTATE   (1 << 14)
 #define KVM_ARM64_ON_UNSUPPORTED_CPU   (1 << 15) /* Physical CPU not in 
supported_cpus */
+#define KVM_ARM64_PKVM_STATE_DIRTY (1 << 16)
 
 #define KVM_GUESTDBG_VALID_MASK (KVM_GUESTDBG_ENABLE | \
 KVM_GUESTDBG_USE_SW_BP | \
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 3bb379f15c07..7c57c14e173a 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -448,6 +448,10 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
kvm_call_hyp(__vgic_v3_save_vmcr_aprs,
 >arch.vgic_cpu.vgic_v3);
kvm_call_hyp_nvhe(__pkvm_vcpu_put);
+
+   /* __pkvm_vcpu_put implies a sync of the state */
+   if (!kvm_vm_is_protected(vcpu->kvm))
+   vcpu->arch.flags |= KVM_ARM64_PKVM_STATE_DIRTY;
}
 
kvm_arch_vcpu_put_debug_state_flags(vcpu);
@@ -575,6 +579,9 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
return ret;
 
if (is_protected_kvm_enabled()) {
+   /* Start with the vcpu in a dirty state */
+   if (!kvm_vm_is_protected(vcpu->kvm))
+   vcpu->arch.flags |= KVM_ARM64_PKVM_STATE_DIRTY;
ret = kvm_shadow_create(kvm);
if (ret)
return ret;
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 97fe14aab1a3..9334c4a64007 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -203,6 +203,21 @@ static int handle_trap_exceptions(struct kvm_vcpu *vcpu)
 {
int handled;
 
+   /*
+* If we run a non-protected VM when protection is enabled
+* system-wide, resync the state from the hypervisor and mark
+* it as dirty on the host side if it wasn't dirty already
+* (which could happen if preemption has taken place).
+*/
+   if (is_protected_kvm_enabled() && !kvm_vm_is_protected(vcpu->kvm)) {
+   preempt_disable();
+   if (!(vcpu->arch.flags & KVM_ARM64_PKVM_STATE_DIRTY)) {
+   kvm_call_hyp_nvhe(__pkvm_vcpu_sync_state);
+   vcpu->arch.flags |= KVM_ARM64_PKVM_STATE_DIRTY;
+   }
+   preempt_enable();
+   }
+
/*
 * See ARM ARM B1.14.1: "Hyp traps on instructions
 * that fail their condition code check"
@@ -270,6 +285,13 @@ int handle_exit(struct kvm_vcpu *vcpu, int exception_index)
 /* For exit types that need handling before we can be preempted */
 void handle_exit_early(struct kvm_vcpu *vcpu, int exception_index)
 {
+   /*
+* We just exited, so the state is clean from a hypervisor
+* perspective.
+*/
+   if (is_protected_kvm_enabled())
+   vcpu->arch.flags &= ~KVM_ARM64_PKVM_STATE_DIRTY;
+
if (ARM_SERROR_PENDING(exception_index)) {
if (this_cpu_has_cap(ARM64_HAS_RAS_EXTN)) {
u64 disr = kvm_vcpu_get_disr(vcpu);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c 
b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index bbf2621f1862..2a12d6f710ef 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -140,6 +140,38 @@ static void sync_timer_state(struct kvm_shadow_vcpu_state 
*shadow_state)

[PATCH 51/89] KVM: arm64: Introduce per-EC entry/exit handlers

2022-05-19 Thread Will Deacon

From: Marc Zyngier 

Introduce per-EC entry/exit handlers at EL2 and provide initial
implementations to manage the 'flags' and fault information registers.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h |  3 +
 arch/arm64/kvm/hyp/nvhe/hyp-main.c | 99 +++---
 2 files changed, 94 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h 
b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index da8de2b7afb4..d070400b5616 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -25,6 +25,9 @@ struct kvm_shadow_vcpu_state {
/* A pointer to the shadow vm. */
struct kvm_shadow_vm *shadow_vm;
 
+   /* Tracks exit code for the protected guest. */
+   u32 exit_code;
+
/*
 * Points to the per-cpu pointer of the cpu where it's loaded, or NULL
 * if not loaded.
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c 
b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 229ef890d459..bbf2621f1862 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -24,6 +24,51 @@ DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
 
 void __kvm_hyp_host_forward_smc(struct kvm_cpu_context *host_ctxt);
 
+typedef void (*shadow_entry_exit_handler_fn)(struct kvm_vcpu *, struct 
kvm_vcpu *);
+
+static void handle_vm_entry_generic(struct kvm_vcpu *host_vcpu, struct 
kvm_vcpu *shadow_vcpu)
+{
+   unsigned long host_flags = READ_ONCE(host_vcpu->arch.flags);
+
+   shadow_vcpu->arch.flags &= ~(KVM_ARM64_PENDING_EXCEPTION |
+KVM_ARM64_EXCEPT_MASK);
+
+   if (host_flags & KVM_ARM64_PENDING_EXCEPTION) {
+   shadow_vcpu->arch.flags |= KVM_ARM64_PENDING_EXCEPTION;
+   shadow_vcpu->arch.flags |= host_flags & KVM_ARM64_EXCEPT_MASK;
+   } else if (host_flags & KVM_ARM64_INCREMENT_PC) {
+   shadow_vcpu->arch.flags |= KVM_ARM64_INCREMENT_PC;
+   }
+}
+
+static void handle_vm_exit_generic(struct kvm_vcpu *host_vcpu, struct kvm_vcpu 
*shadow_vcpu)
+{
+   WRITE_ONCE(host_vcpu->arch.fault.esr_el2,
+  shadow_vcpu->arch.fault.esr_el2);
+}
+
+static void handle_vm_exit_abt(struct kvm_vcpu *host_vcpu, struct kvm_vcpu 
*shadow_vcpu)
+{
+   WRITE_ONCE(host_vcpu->arch.fault.esr_el2,
+  shadow_vcpu->arch.fault.esr_el2);
+   WRITE_ONCE(host_vcpu->arch.fault.far_el2,
+  shadow_vcpu->arch.fault.far_el2);
+   WRITE_ONCE(host_vcpu->arch.fault.hpfar_el2,
+  shadow_vcpu->arch.fault.hpfar_el2);
+   WRITE_ONCE(host_vcpu->arch.fault.disr_el1,
+  shadow_vcpu->arch.fault.disr_el1);
+}
+
+static const shadow_entry_exit_handler_fn entry_vm_shadow_handlers[] = {
+   [0 ... ESR_ELx_EC_MAX]  = handle_vm_entry_generic,
+};
+
+static const shadow_entry_exit_handler_fn exit_vm_shadow_handlers[] = {
+   [0 ... ESR_ELx_EC_MAX]  = handle_vm_exit_generic,
+   [ESR_ELx_EC_IABT_LOW]   = handle_vm_exit_abt,
+   [ESR_ELx_EC_DABT_LOW]   = handle_vm_exit_abt,
+};
+
 static void flush_vgic_state(struct kvm_vcpu *host_vcpu,
 struct kvm_vcpu *shadow_vcpu)
 {
@@ -99,6 +144,8 @@ static void flush_shadow_state(struct kvm_shadow_vcpu_state 
*shadow_state)
 {
struct kvm_vcpu *shadow_vcpu = _state->shadow_vcpu;
struct kvm_vcpu *host_vcpu = shadow_state->host_vcpu;
+   shadow_entry_exit_handler_fn ec_handler;
+   u8 esr_ec;
 
shadow_vcpu->arch.ctxt  = host_vcpu->arch.ctxt;
 
@@ -109,8 +156,6 @@ static void flush_shadow_state(struct kvm_shadow_vcpu_state 
*shadow_state)
shadow_vcpu->arch.mdcr_el2  = host_vcpu->arch.mdcr_el2;
shadow_vcpu->arch.cptr_el2  = host_vcpu->arch.cptr_el2;
 
-   shadow_vcpu->arch.flags = host_vcpu->arch.flags;
-
shadow_vcpu->arch.debug_ptr = 
kern_hyp_va(host_vcpu->arch.debug_ptr);
shadow_vcpu->arch.host_fpsimd_state = host_vcpu->arch.host_fpsimd_state;
 
@@ -118,24 +163,62 @@ static void flush_shadow_state(struct 
kvm_shadow_vcpu_state *shadow_state)
 
flush_vgic_state(host_vcpu, shadow_vcpu);
flush_timer_state(shadow_state);
+
+   switch (ARM_EXCEPTION_CODE(shadow_state->exit_code)) {
+   case ARM_EXCEPTION_IRQ:
+   case ARM_EXCEPTION_EL1_SERROR:
+   case ARM_EXCEPTION_IL:
+   break;
+   case ARM_EXCEPTION_TRAP:
+   esr_ec = ESR_ELx_EC(kvm_vcpu_get_esr(shadow_vcpu));
+   ec_handler = entry_vm_shadow_handlers[esr_ec];
+   if (ec_handler)
+   ec_handler(host_vcpu, shadow_vcpu);
+   break;
+   default:
+   BUG();
+   }
+
+   shadow_state->exit_code = 0;
 }
 
-static void sync_shadow_state(struct kvm_shadow_vcpu_state *shadow_state)
+static void sync_shadow_state(struct kvm_shadow_vcpu_state *shadow_state,

[PATCH 50/89] KVM: arm64: Ensure that TLBs and I-cache are private to each vcpu

2022-05-19 Thread Will Deacon

From: Fuad Tabba 

Guarantee that both TLBs and I-cache are private to each vcpu.
Flush the CPU context if a different vcpu from the same vm is
loaded on the same physical CPU.

Signed-off-by: Fuad Tabba 
---
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h |  9 +--
 arch/arm64/kvm/hyp/nvhe/hyp-main.c | 17 -
 arch/arm64/kvm/hyp/nvhe/pkvm.c | 34 --
 arch/arm64/kvm/pkvm.c  | 20 +++
 4 files changed, 70 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h 
b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index 343d87877aa2..da8de2b7afb4 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -45,6 +45,9 @@ struct kvm_shadow_vm {
/* The total size of the donated shadow area. */
size_t shadow_area_size;
 
+   /* The total size of the donated area for last_ran. */
+   size_t last_ran_size;
+
struct kvm_pgtable pgt;
struct kvm_pgtable_mm_ops mm_ops;
struct hyp_pool pool;
@@ -78,8 +81,10 @@ static inline bool vcpu_is_protected(struct kvm_vcpu *vcpu)
 }
 
 void hyp_shadow_table_init(void *tbl);
-int __pkvm_init_shadow(struct kvm *kvm, unsigned long shadow_hva,
-  size_t shadow_size, unsigned long pgd_hva);
+int __pkvm_init_shadow(struct kvm *kvm,
+  unsigned long shadow_hva, size_t shadow_size,
+  unsigned long pgd_hva,
+  unsigned long last_ran_hva, size_t last_ran_size);
 int __pkvm_teardown_shadow(unsigned int shadow_handle);
 
 struct kvm_shadow_vcpu_state *
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c 
b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 86dff0dc05f3..229ef890d459 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -145,6 +145,7 @@ static void handle___pkvm_vcpu_load(struct kvm_cpu_context 
*host_ctxt)
DECLARE_REG(u64, hcr_el2, host_ctxt, 3);
struct kvm_shadow_vcpu_state *shadow_state;
struct kvm_vcpu *shadow_vcpu;
+   int *last_ran;
 
if (!is_protected_kvm_enabled())
return;
@@ -155,6 +156,17 @@ static void handle___pkvm_vcpu_load(struct kvm_cpu_context 
*host_ctxt)
 
shadow_vcpu = _state->shadow_vcpu;
 
+   /*
+* Guarantee that both TLBs and I-cache are private to each vcpu. If a
+* vcpu from the same VM has previously run on the same physical CPU,
+* nuke the relevant contexts.
+*/
+   last_ran = 
_vcpu->arch.hw_mmu->last_vcpu_ran[hyp_smp_processor_id()];
+   if (*last_ran != shadow_vcpu->vcpu_id) {
+   __kvm_flush_cpu_context(shadow_vcpu->arch.hw_mmu);
+   *last_ran = shadow_vcpu->vcpu_id;
+   }
+
if (shadow_state_is_protected(shadow_state)) {
/* Propagate WFx trapping flags, trap ptrauth */
shadow_vcpu->arch.hcr_el2 &= ~(HCR_TWE | HCR_TWI |
@@ -436,9 +448,12 @@ static void handle___pkvm_init_shadow(struct 
kvm_cpu_context *host_ctxt)
DECLARE_REG(unsigned long, host_shadow_va, host_ctxt, 2);
DECLARE_REG(size_t, shadow_size, host_ctxt, 3);
DECLARE_REG(unsigned long, pgd, host_ctxt, 4);
+   DECLARE_REG(unsigned long, last_ran, host_ctxt, 5);
+   DECLARE_REG(size_t, last_ran_size, host_ctxt, 6);
 
cpu_reg(host_ctxt, 1) = __pkvm_init_shadow(host_kvm, host_shadow_va,
-  shadow_size, pgd);
+  shadow_size, pgd,
+  last_ran, last_ran_size);
 }
 
 static void handle___pkvm_teardown_shadow(struct kvm_cpu_context *host_ctxt)
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index f18f622336b8..c0403416ce1d 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -338,6 +338,8 @@ static int set_host_vcpus(struct kvm_shadow_vcpu_state 
*shadow_vcpu_states,
 
 static int init_shadow_structs(struct kvm *kvm, struct kvm_shadow_vm *vm,
   struct kvm_vcpu **vcpu_array,
+  int *last_ran,
+  size_t last_ran_size,
   unsigned int nr_vcpus)
 {
int i;
@@ -345,6 +347,9 @@ static int init_shadow_structs(struct kvm *kvm, struct 
kvm_shadow_vm *vm,
vm->host_kvm = kvm;
vm->kvm.created_vcpus = nr_vcpus;
vm->kvm.arch.vtcr = host_kvm.arch.vtcr;
+   vm->kvm.arch.mmu.last_vcpu_ran = last_ran;
+   vm->last_ran_size = last_ran_size;
+   memset(vm->kvm.arch.mmu.last_vcpu_ran, -1, sizeof(int) * hyp_nr_cpus);
 
for (i = 0; i < nr_vcpus; i++) {
struct kvm_shadow_vcpu_state *shadow_vcpu_state = 
>shadow_vcpu_states[i];
@@ -471,6 +476,15 @@ static int check_shadow_size(unsigned int nr_vcpus, size_t 
shadow_size)
return 0;
 }
 
+/*
+ * Check whether

[PATCH 49/89] KVM: arm64: Add hyp per_cpu variable to track current physical cpu number

2022-05-19 Thread Will Deacon

From: Fuad Tabba 

Hyp cannot trust the equivalent variable at the host.

Signed-off-by: Fuad Tabba 
---
 arch/arm64/include/asm/kvm_hyp.h  | 3 +++
 arch/arm64/kvm/arm.c  | 4 
 arch/arm64/kvm/hyp/nvhe/hyp-smp.c | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 4adf7c2a77bd..e38869e88019 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -15,6 +15,9 @@
 DECLARE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
 DECLARE_PER_CPU(unsigned long, kvm_hyp_vector);
 DECLARE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
+DECLARE_PER_CPU(int, hyp_cpu_number);
+
+#define hyp_smp_processor_id() (__this_cpu_read(hyp_cpu_number))
 
 #define read_sysreg_elx(r,nvh,vh)  \
({  \
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 514519563976..3bb379f15c07 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -52,6 +52,7 @@ DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
 
 static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
 DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
+DECLARE_KVM_NVHE_PER_CPU(int, hyp_cpu_number);
 
 static bool vgic_present;
 
@@ -1487,6 +1488,9 @@ static void cpu_prepare_hyp_mode(int cpu)
 {
struct kvm_nvhe_init_params *params = 
per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
unsigned long tcr;
+   int *hyp_cpu_number_ptr = per_cpu_ptr_nvhe_sym(hyp_cpu_number, cpu);
+
+   *hyp_cpu_number_ptr = cpu;
 
/*
 * Calculate the raw per-cpu offset without a translation from the
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-smp.c 
b/arch/arm64/kvm/hyp/nvhe/hyp-smp.c
index 04d194583f1e..9fcb92abd0b5 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-smp.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-smp.c
@@ -8,6 +8,8 @@
 #include 
 #include 
 
+DEFINE_PER_CPU(int, hyp_cpu_number);
+
 /*
  * nVHE copy of data structures tracking available CPU cores.
  * Only entries for CPUs that were online at KVM init are populated.
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 48/89] KVM: arm64: Skip __kvm_adjust_pc() for protected vcpus

2022-05-19 Thread Will Deacon

From: Marc Zyngier 

Prevent the host from issuing arbitrary PC adjustments for protected
vCPUs.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/hyp/nvhe/hyp-main.c | 17 +++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c 
b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 40cbf45800b7..86dff0dc05f3 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -275,9 +275,22 @@ static void handle___pkvm_host_map_guest(struct 
kvm_cpu_context *host_ctxt)
 
 static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt)
 {
-   DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
+   struct kvm_shadow_vcpu_state *shadow_state;
+   struct kvm_vcpu *vcpu;
+
+   vcpu = get_current_vcpu(host_ctxt, 1, _state);
+   if (!vcpu)
+   return;
+
+   if (shadow_state) {
+   /* This only applies to non-protected VMs */
+   if (shadow_state_is_protected(shadow_state))
+   return;
+
+   vcpu = _state->shadow_vcpu;
+   }
 
-   __kvm_adjust_pc(kern_hyp_va(vcpu));
+   __kvm_adjust_pc(vcpu);
 }
 
 static void handle___kvm_flush_vm_context(struct kvm_cpu_context *host_ctxt)
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 47/89] KVM: arm64: Add current vcpu and shadow_state lookup primitive

2022-05-19 Thread Will Deacon

From: Marc Zyngier 

In order to be able to safely manipulate the loaded state,
add a helper that always return the vcpu as mapped in the EL2 S1
address space as well as the pointer to the shadow state.

In case of failure, both pointers are returned as NULL values.

For non-protected setups, state is always NULL, and vcpu the
EL2 mapping of the input value.

handle___kvm_vcpu_run() is converted to this helper.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/hyp/nvhe/hyp-main.c | 41 +-
 1 file changed, 35 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c 
b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 9e3a2aa6f737..40cbf45800b7 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -177,22 +177,51 @@ static void handle___pkvm_vcpu_put(struct kvm_cpu_context 
*host_ctxt)
}
 }
 
+static struct kvm_vcpu *__get_current_vcpu(struct kvm_vcpu *vcpu,
+  struct kvm_shadow_vcpu_state **state)
+{
+   struct kvm_shadow_vcpu_state *sstate = NULL;
+
+   vcpu = kern_hyp_va(vcpu);
+
+   if (unlikely(is_protected_kvm_enabled())) {
+   sstate = pkvm_loaded_shadow_vcpu_state();
+   if (!sstate || vcpu != sstate->host_vcpu) {
+   sstate = NULL;
+   vcpu = NULL;
+   }
+   }
+
+   *state = sstate;
+   return vcpu;
+}
+
+#define get_current_vcpu(ctxt, regnr, statepp) \
+   ({  \
+   DECLARE_REG(struct kvm_vcpu *, __vcpu, ctxt, regnr);\
+   __get_current_vcpu(__vcpu, statepp);\
+   })
+
 static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
 {
-   DECLARE_REG(struct kvm_vcpu *, host_vcpu, host_ctxt, 1);
+   struct kvm_shadow_vcpu_state *shadow_state;
+   struct kvm_vcpu *vcpu;
int ret;
 
-   if (unlikely(is_protected_kvm_enabled())) {
-   struct kvm_shadow_vcpu_state *shadow_state = 
pkvm_loaded_shadow_vcpu_state();
-   struct kvm_vcpu *shadow_vcpu = _state->shadow_vcpu;
+   vcpu = get_current_vcpu(host_ctxt, 1, _state);
+   if (!vcpu) {
+   cpu_reg(host_ctxt, 1) =  -EINVAL;
+   return;
+   }
 
+   if (unlikely(shadow_state)) {
flush_shadow_state(shadow_state);
 
-   ret = __kvm_vcpu_run(shadow_vcpu);
+   ret = __kvm_vcpu_run(_state->shadow_vcpu);
 
sync_shadow_state(shadow_state);
} else {
-   ret = __kvm_vcpu_run(kern_hyp_va(host_vcpu));
+   ret = __kvm_vcpu_run(vcpu);
}
 
cpu_reg(host_ctxt, 1) =  ret;
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 46/89] KVM: arm64: Introduce the pkvm_vcpu_{load, put} hypercalls

2022-05-19 Thread Will Deacon

From: Marc Zyngier 

Rather than look-up the shadow vCPU on every run hypercall at EL2,
introduce a per-CPU loaded_shadow_state' which is updated by a pair of
load/put hypercalls that are called directly from
kvm_arch_vcpu_{load,put}() when pKVM is enabled.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/include/asm/kvm_asm.h   |  2 +
 arch/arm64/kvm/arm.c   | 14 ++
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h |  7 +++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c | 65 ++
 arch/arm64/kvm/hyp/nvhe/pkvm.c | 28 +++
 arch/arm64/kvm/vgic/vgic-v3.c  |  6 ++-
 6 files changed, 100 insertions(+), 22 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 22b5ee9f2b5c..07ee95d0f97d 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -78,6 +78,8 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___vgic_v3_restore_vmcr_aprs,
__KVM_HOST_SMCCC_FUNC___pkvm_init_shadow,
__KVM_HOST_SMCCC_FUNC___pkvm_teardown_shadow,
+   __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_load,
+   __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_put,
 };
 
 #define DECLARE_KVM_VHE_SYM(sym)   extern char sym[]
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index c9b8e2ca5cb5..514519563976 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -429,12 +429,26 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
vcpu_ptrauth_disable(vcpu);
kvm_arch_vcpu_load_debug_state_flags(vcpu);
 
+   if (is_protected_kvm_enabled()) {
+   kvm_call_hyp_nvhe(__pkvm_vcpu_load,
+ vcpu->kvm->arch.pkvm.shadow_handle,
+ vcpu->vcpu_idx, vcpu->arch.hcr_el2);
+   kvm_call_hyp(__vgic_v3_restore_vmcr_aprs,
+>arch.vgic_cpu.vgic_v3);
+   }
+
if (!cpumask_test_cpu(smp_processor_id(), 
vcpu->kvm->arch.supported_cpus))
vcpu_set_on_unsupported_cpu(vcpu);
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+   if (is_protected_kvm_enabled()) {
+   kvm_call_hyp(__vgic_v3_save_vmcr_aprs,
+>arch.vgic_cpu.vgic_v3);
+   kvm_call_hyp_nvhe(__pkvm_vcpu_put);
+   }
+
kvm_arch_vcpu_put_debug_state_flags(vcpu);
kvm_arch_vcpu_put_fp(vcpu);
if (has_vhe())
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h 
b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index 3997eb3dff55..343d87877aa2 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -24,6 +24,12 @@ struct kvm_shadow_vcpu_state {
 
/* A pointer to the shadow vm. */
struct kvm_shadow_vm *shadow_vm;
+
+   /*
+* Points to the per-cpu pointer of the cpu where it's loaded, or NULL
+* if not loaded.
+*/
+   struct kvm_shadow_vcpu_state **loaded_shadow_state;
 };
 
 /*
@@ -79,6 +85,7 @@ int __pkvm_teardown_shadow(unsigned int shadow_handle);
 struct kvm_shadow_vcpu_state *
 pkvm_load_shadow_vcpu_state(unsigned int shadow_handle, unsigned int vcpu_idx);
 void pkvm_put_shadow_vcpu_state(struct kvm_shadow_vcpu_state *shadow_state);
+struct kvm_shadow_vcpu_state *pkvm_loaded_shadow_vcpu_state(void);
 
 u64 pvm_read_id_reg(const struct kvm_vcpu *vcpu, u32 id);
 bool kvm_handle_pvm_sysreg(struct kvm_vcpu *vcpu, u64 *exit_code);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c 
b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 32e7e1cad00f..9e3a2aa6f737 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -138,40 +138,63 @@ static void sync_shadow_state(struct 
kvm_shadow_vcpu_state *shadow_state)
sync_timer_state(shadow_state);
 }
 
+static void handle___pkvm_vcpu_load(struct kvm_cpu_context *host_ctxt)
+{
+   DECLARE_REG(unsigned int, shadow_handle, host_ctxt, 1);
+   DECLARE_REG(unsigned int, vcpu_idx, host_ctxt, 2);
+   DECLARE_REG(u64, hcr_el2, host_ctxt, 3);
+   struct kvm_shadow_vcpu_state *shadow_state;
+   struct kvm_vcpu *shadow_vcpu;
+
+   if (!is_protected_kvm_enabled())
+   return;
+
+   shadow_state = pkvm_load_shadow_vcpu_state(shadow_handle, vcpu_idx);
+   if (!shadow_state)
+   return;
+
+   shadow_vcpu = _state->shadow_vcpu;
+
+   if (shadow_state_is_protected(shadow_state)) {
+   /* Propagate WFx trapping flags, trap ptrauth */
+   shadow_vcpu->arch.hcr_el2 &= ~(HCR_TWE | HCR_TWI |
+  HCR_API | HCR_APK);
+   shadow_vcpu->arch.hcr_el2 |= hcr_el2 & (HCR_TWE | HCR_TWI);
+   }
+}
+
+static void handle___pkvm_vcpu_put(struct kvm_cpu_context *host_ctxt)
+{
+   struct kvm_shadow_vcpu_state *shadow_state;
+
+   if (!is_protected_kvm_enabled())
+   return;
+
+   shadow_state = pkvm_loaded_shadow_vcpu_state();
+
+

[PATCH 44/89] KVM: arm64: Introduce predicates to check for protected state

2022-05-19 Thread Will Deacon

From: Marc Zyngier 

In order to determine whether or not a VM or (shadow) vCPU are protected,
introduce a helper function to query this state. For now, these will
always return 'false' as the underlying field is never configured.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/include/asm/kvm_host.h  |  6 ++
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 13 +
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index c55aadfdfd63..066eb7234bdd 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -164,6 +164,7 @@ struct kvm_pinned_page {
 };
 
 struct kvm_protected_vm {
+   bool enabled;
unsigned int shadow_handle;
struct mutex shadow_lock;
struct kvm_hyp_memcache teardown_mc;
@@ -895,10 +896,7 @@ int kvm_set_ipa_limit(void);
 #define __KVM_HAVE_ARCH_VM_ALLOC
 struct kvm *kvm_arch_alloc_vm(void);
 
-static inline bool kvm_vm_is_protected(struct kvm *kvm)
-{
-   return false;
-}
+#define kvm_vm_is_protected(kvm)   ((kvm)->arch.pkvm.enabled)
 
 void kvm_init_protected_traps(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h 
b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index f76af6e0177a..3997eb3dff55 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -58,6 +58,19 @@ static inline struct kvm_shadow_vm *get_shadow_vm(struct 
kvm_vcpu *shadow_vcpu)
return get_shadow_state(shadow_vcpu)->shadow_vm;
 }
 
+static inline bool shadow_state_is_protected(struct kvm_shadow_vcpu_state 
*shadow_state)
+{
+   return shadow_state->shadow_vm->kvm.arch.pkvm.enabled;
+}
+
+static inline bool vcpu_is_protected(struct kvm_vcpu *vcpu)
+{
+   if (!is_protected_kvm_enabled())
+   return false;
+
+   return shadow_state_is_protected(get_shadow_state(vcpu));
+}
+
 void hyp_shadow_table_init(void *tbl);
 int __pkvm_init_shadow(struct kvm *kvm, unsigned long shadow_hva,
   size_t shadow_size, unsigned long pgd_hva);
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 45/89] KVM: arm64: Add the {flush, sync}_timer_state() primitives

2022-05-19 Thread Will Deacon

From: Marc Zyngier 

In preparation for save/restore of the timer state at EL2 for protected
VMs, introduce a couple of sync/flush primitives for the architected
timer, in much the same way as we have for the GIC.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/hyp/nvhe/hyp-main.c | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c 
b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 58515e5d24ec..32e7e1cad00f 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -63,6 +63,38 @@ static void sync_vgic_state(struct kvm_vcpu *host_vcpu,
WRITE_ONCE(host_cpu_if->vgic_lr[i], shadow_cpu_if->vgic_lr[i]);
 }
 
+static void flush_timer_state(struct kvm_shadow_vcpu_state *shadow_state)
+{
+   struct kvm_vcpu *shadow_vcpu = _state->shadow_vcpu;
+
+   if (!shadow_state_is_protected(shadow_state))
+   return;
+
+   /*
+* A shadow vcpu has no offset, and sees vtime == ptime. The
+* ptimer is fully emulated by EL1 and cannot be trusted.
+*/
+   write_sysreg(0, cntvoff_el2);
+   isb();
+   write_sysreg_el0(__vcpu_sys_reg(shadow_vcpu, CNTV_CVAL_EL0), 
SYS_CNTV_CVAL);
+   write_sysreg_el0(__vcpu_sys_reg(shadow_vcpu, CNTV_CTL_EL0), 
SYS_CNTV_CTL);
+}
+
+static void sync_timer_state(struct kvm_shadow_vcpu_state *shadow_state)
+{
+   struct kvm_vcpu *shadow_vcpu = _state->shadow_vcpu;
+
+   if (!shadow_state_is_protected(shadow_state))
+   return;
+
+   /*
+* Preserve the vtimer state so that it is always correct,
+* even if the host tries to make a mess.
+*/
+   __vcpu_sys_reg(shadow_vcpu, CNTV_CVAL_EL0) = 
read_sysreg_el0(SYS_CNTV_CVAL);
+   __vcpu_sys_reg(shadow_vcpu, CNTV_CTL_EL0) = 
read_sysreg_el0(SYS_CNTV_CTL);
+}
+
 static void flush_shadow_state(struct kvm_shadow_vcpu_state *shadow_state)
 {
struct kvm_vcpu *shadow_vcpu = _state->shadow_vcpu;
@@ -85,6 +117,7 @@ static void flush_shadow_state(struct kvm_shadow_vcpu_state 
*shadow_state)
shadow_vcpu->arch.vsesr_el2 = host_vcpu->arch.vsesr_el2;
 
flush_vgic_state(host_vcpu, shadow_vcpu);
+   flush_timer_state(shadow_state);
 }
 
 static void sync_shadow_state(struct kvm_shadow_vcpu_state *shadow_state)
@@ -102,6 +135,7 @@ static void sync_shadow_state(struct kvm_shadow_vcpu_state 
*shadow_state)
host_vcpu->arch.flags   = shadow_vcpu->arch.flags;
 
sync_vgic_state(host_vcpu, shadow_vcpu);
+   sync_timer_state(shadow_state);
 }
 
 static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 41/89] KVM: arm64: Make vcpu_{read, write}_sys_reg available to HYP code

2022-05-19 Thread Will Deacon

From: Marc Zyngier 

Allow vcpu_{read,write}_sys_reg() to be called from EL2 so that nVHE hyp
code can reuse existing helper functions for operations such as
resetting the vCPU state.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/include/asm/kvm_host.h | 30 +++---
 arch/arm64/kvm/sys_regs.c | 20 
 2 files changed, 27 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 9252841850e4..c55aadfdfd63 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -553,9 +553,6 @@ struct kvm_vcpu_arch {
 
 #define __vcpu_sys_reg(v,r)(ctxt_sys_reg(&(v)->arch.ctxt, (r)))
 
-u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg);
-void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg);
-
 static inline bool __vcpu_read_sys_reg_from_cpu(int reg, u64 *val)
 {
/*
@@ -647,6 +644,33 @@ static inline bool __vcpu_write_sys_reg_to_cpu(u64 val, 
int reg)
return true;
 }
 
+static inline u64 vcpu_arch_read_sys_reg(const struct kvm_vcpu_arch *vcpu_arch,
+int reg)
+{
+   u64 val = 0x8badf00d8badf00d;
+
+   /* sysregs_loaded_on_cpu is only used in VHE */
+   if (!is_nvhe_hyp_code() && vcpu_arch->sysregs_loaded_on_cpu &&
+   __vcpu_read_sys_reg_from_cpu(reg, ))
+   return val;
+
+   return ctxt_sys_reg(_arch->ctxt, reg);
+}
+
+static inline void vcpu_arch_write_sys_reg(struct kvm_vcpu_arch *vcpu_arch,
+  u64 val, int reg)
+{
+   /* sysregs_loaded_on_cpu is only used in VHE */
+   if (!is_nvhe_hyp_code() && vcpu_arch->sysregs_loaded_on_cpu &&
+   __vcpu_write_sys_reg_to_cpu(val, reg))
+   return;
+
+ctxt_sys_reg(_arch->ctxt, reg) = val;
+}
+
+#define vcpu_read_sys_reg(vcpu, reg) vcpu_arch_read_sys_reg(&((vcpu)->arch), 
reg)
+#define vcpu_write_sys_reg(vcpu, val, reg) 
vcpu_arch_write_sys_reg(&((vcpu)->arch), val, reg)
+
 struct kvm_vm_stat {
struct kvm_vm_stat_generic generic;
 };
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 7b45c040cc27..7886989443b9 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -68,26 +68,6 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
return false;
 }
 
-u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
-{
-   u64 val = 0x8badf00d8badf00d;
-
-   if (vcpu->arch.sysregs_loaded_on_cpu &&
-   __vcpu_read_sys_reg_from_cpu(reg, ))
-   return val;
-
-   return __vcpu_sys_reg(vcpu, reg);
-}
-
-void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
-{
-   if (vcpu->arch.sysregs_loaded_on_cpu &&
-   __vcpu_write_sys_reg_to_cpu(val, reg))
-   return;
-
-__vcpu_sys_reg(vcpu, reg) = val;
-}
-
 /* 3 bits per cache level, as per CLIDR, but non-existent caches always 0 */
 static u32 cache_levels;
 
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 43/89] KVM: arm64: Add the {flush, sync}_vgic_state() primitives

2022-05-19 Thread Will Deacon

From: Marc Zyngier 

Rather than blindly copying the vGIC state to/from the host at EL2,
introduce a couple of helpers to copy only what is needed and to
sanitise untrusted data passed by the host kernel.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/hyp/nvhe/hyp-main.c | 50 +-
 1 file changed, 43 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c 
b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 5b46742d9f9b..58515e5d24ec 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -18,10 +18,51 @@
 #include 
 #include 
 
+#include 
+
 DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
 
 void __kvm_hyp_host_forward_smc(struct kvm_cpu_context *host_ctxt);
 
+static void flush_vgic_state(struct kvm_vcpu *host_vcpu,
+struct kvm_vcpu *shadow_vcpu)
+{
+   struct vgic_v3_cpu_if *host_cpu_if, *shadow_cpu_if;
+   unsigned int used_lrs, max_lrs, i;
+
+   host_cpu_if = _vcpu->arch.vgic_cpu.vgic_v3;
+   shadow_cpu_if   = _vcpu->arch.vgic_cpu.vgic_v3;
+
+   max_lrs = (read_gicreg(ICH_VTR_EL2) & 0xf) + 1;
+   used_lrs = READ_ONCE(host_cpu_if->used_lrs);
+   used_lrs = min(used_lrs, max_lrs);
+
+   shadow_cpu_if->vgic_hcr = READ_ONCE(host_cpu_if->vgic_hcr);
+   /* Should be a one-off */
+   shadow_cpu_if->vgic_sre = (ICC_SRE_EL1_DIB |
+  ICC_SRE_EL1_DFB |
+  ICC_SRE_EL1_SRE);
+   shadow_cpu_if->used_lrs = used_lrs;
+
+   for (i = 0; i < used_lrs; i++)
+   shadow_cpu_if->vgic_lr[i] = READ_ONCE(host_cpu_if->vgic_lr[i]);
+}
+
+static void sync_vgic_state(struct kvm_vcpu *host_vcpu,
+   struct kvm_vcpu *shadow_vcpu)
+{
+   struct vgic_v3_cpu_if *host_cpu_if, *shadow_cpu_if;
+   unsigned int i;
+
+   host_cpu_if = _vcpu->arch.vgic_cpu.vgic_v3;
+   shadow_cpu_if   = _vcpu->arch.vgic_cpu.vgic_v3;
+
+   WRITE_ONCE(host_cpu_if->vgic_hcr, shadow_cpu_if->vgic_hcr);
+
+   for (i = 0; i < shadow_cpu_if->used_lrs; i++)
+   WRITE_ONCE(host_cpu_if->vgic_lr[i], shadow_cpu_if->vgic_lr[i]);
+}
+
 static void flush_shadow_state(struct kvm_shadow_vcpu_state *shadow_state)
 {
struct kvm_vcpu *shadow_vcpu = _state->shadow_vcpu;
@@ -43,16 +84,13 @@ static void flush_shadow_state(struct kvm_shadow_vcpu_state 
*shadow_state)
 
shadow_vcpu->arch.vsesr_el2 = host_vcpu->arch.vsesr_el2;
 
-   shadow_vcpu->arch.vgic_cpu.vgic_v3 = host_vcpu->arch.vgic_cpu.vgic_v3;
+   flush_vgic_state(host_vcpu, shadow_vcpu);
 }
 
 static void sync_shadow_state(struct kvm_shadow_vcpu_state *shadow_state)
 {
struct kvm_vcpu *shadow_vcpu = _state->shadow_vcpu;
struct kvm_vcpu *host_vcpu = shadow_state->host_vcpu;
-   struct vgic_v3_cpu_if *shadow_cpu_if = 
_vcpu->arch.vgic_cpu.vgic_v3;
-   struct vgic_v3_cpu_if *host_cpu_if = _vcpu->arch.vgic_cpu.vgic_v3;
-   unsigned int i;
 
host_vcpu->arch.ctxt= shadow_vcpu->arch.ctxt;
 
@@ -63,9 +101,7 @@ static void sync_shadow_state(struct kvm_shadow_vcpu_state 
*shadow_state)
 
host_vcpu->arch.flags   = shadow_vcpu->arch.flags;
 
-   host_cpu_if->vgic_hcr   = shadow_cpu_if->vgic_hcr;
-   for (i = 0; i < shadow_cpu_if->used_lrs; ++i)
-   host_cpu_if->vgic_lr[i] = shadow_cpu_if->vgic_lr[i];
+   sync_vgic_state(host_vcpu, shadow_vcpu);
 }
 
 static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 39/89] KVM: arm64: Extend memory donation to allow host-to-guest transitions

2022-05-19 Thread Will Deacon

In preparation for supporting protected guests, where guest memory
defaults to being inaccessible to the host, extend our memory protection
mechanisms to support donation of pages from the host to a specific
guest.

Signed-off-by: Will Deacon 
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  1 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 62 +++
 arch/arm64/kvm/hyp/pgtable.c  |  2 +-
 3 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h 
b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 364432276fe0..b01b5cdb38de 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -69,6 +69,7 @@ int __pkvm_host_reclaim_page(u64 pfn);
 int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
 int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
 int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct kvm_vcpu *vcpu);
+int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct kvm_vcpu *vcpu);
 
 bool addr_is_memory(phys_addr_t phys);
 int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot 
prot);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c 
b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 2e92be8bb463..d0544259eb01 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -890,6 +890,14 @@ static int guest_ack_share(u64 addr, const struct 
pkvm_mem_transition *tx,
  size, PKVM_NOPAGE);
 }
 
+static int guest_ack_donation(u64 addr, const struct pkvm_mem_transition *tx)
+{
+   u64 size = tx->nr_pages * PAGE_SIZE;
+
+   return __guest_check_page_state_range(tx->completer.guest.vcpu, addr,
+ size, PKVM_NOPAGE);
+}
+
 static int guest_complete_share(u64 addr, const struct pkvm_mem_transition *tx,
enum kvm_pgtable_prot perms)
 {
@@ -903,6 +911,17 @@ static int guest_complete_share(u64 addr, const struct 
pkvm_mem_transition *tx,
  prot, >arch.pkvm_memcache);
 }
 
+static int guest_complete_donation(u64 addr, const struct pkvm_mem_transition 
*tx)
+{
+   enum kvm_pgtable_prot prot = pkvm_mkstate(KVM_PGTABLE_PROT_RWX, 
PKVM_PAGE_OWNED);
+   struct kvm_vcpu *vcpu = tx->completer.guest.vcpu;
+   struct kvm_shadow_vm *vm = get_shadow_vm(vcpu);
+   u64 size = tx->nr_pages * PAGE_SIZE;
+
+   return kvm_pgtable_stage2_map(>pgt, addr, size, 
tx->completer.guest.phys,
+ prot, >arch.pkvm_memcache);
+}
+
 static int check_share(struct pkvm_mem_share *share)
 {
const struct pkvm_mem_transition *tx = >tx;
@@ -1088,6 +1107,9 @@ static int check_donation(struct pkvm_mem_donation 
*donation)
case PKVM_ID_HYP:
ret = hyp_ack_donation(completer_addr, tx);
break;
+   case PKVM_ID_GUEST:
+   ret = guest_ack_donation(completer_addr, tx);
+   break;
default:
ret = -EINVAL;
}
@@ -1122,6 +1144,9 @@ static int __do_donate(struct pkvm_mem_donation *donation)
case PKVM_ID_HYP:
ret = hyp_complete_donation(completer_addr, tx);
break;
+   case PKVM_ID_GUEST:
+   ret = guest_complete_donation(completer_addr, tx);
+   break;
default:
ret = -EINVAL;
}
@@ -1362,6 +1387,43 @@ int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct 
kvm_vcpu *vcpu)
return ret;
 }
 
+int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct kvm_vcpu *vcpu)
+{
+   int ret;
+   u64 host_addr = hyp_pfn_to_phys(pfn);
+   u64 guest_addr = hyp_pfn_to_phys(gfn);
+   struct kvm_shadow_vm *vm = get_shadow_vm(vcpu);
+   struct pkvm_mem_donation donation = {
+   .tx = {
+   .nr_pages   = 1,
+   .initiator  = {
+   .id = PKVM_ID_HOST,
+   .addr   = host_addr,
+   .host   = {
+   .completer_addr = guest_addr,
+   },
+   },
+   .completer  = {
+   .id = PKVM_ID_GUEST,
+   .guest  = {
+   .vcpu = vcpu,
+   .phys = host_addr,
+   },
+   },
+   },
+   };
+
+   host_lock_component();
+   guest_lock_component(vm);
+
+   ret = do_donate();
+
+   guest_unlock_component(vm);
+   host_unlock_component();
+
+   return ret;
+}
+
 static int hyp_zero_page(phys_addr_t phys)
 {
void *addr;
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index

[PATCH 42/89] KVM: arm64: Simplify vgic-v3 hypercalls

2022-05-19 Thread Will Deacon

From: Marc Zyngier 

Consolidate the GICv3 VMCR accessor hypercalls into the APR save/restore
hypercalls so that all of the EL2 GICv3 state is covered by a single pair
of hypercalls.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/include/asm/kvm_asm.h   |  8 ++--
 arch/arm64/include/asm/kvm_hyp.h   |  4 ++--
 arch/arm64/kvm/arm.c   |  7 +++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c | 24 ++--
 arch/arm64/kvm/hyp/vgic-v3-sr.c| 27 +++
 arch/arm64/kvm/vgic/vgic-v2.c  |  9 +
 arch/arm64/kvm/vgic/vgic-v3.c  | 26 --
 arch/arm64/kvm/vgic/vgic.c | 17 +++--
 arch/arm64/kvm/vgic/vgic.h |  6 ++
 include/kvm/arm_vgic.h |  3 +--
 10 files changed, 47 insertions(+), 84 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 35b9d590bb74..22b5ee9f2b5c 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -73,11 +73,9 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid,
__KVM_HOST_SMCCC_FUNC___kvm_flush_cpu_context,
__KVM_HOST_SMCCC_FUNC___kvm_timer_set_cntvoff,
-   __KVM_HOST_SMCCC_FUNC___vgic_v3_read_vmcr,
-   __KVM_HOST_SMCCC_FUNC___vgic_v3_write_vmcr,
-   __KVM_HOST_SMCCC_FUNC___vgic_v3_save_aprs,
-   __KVM_HOST_SMCCC_FUNC___vgic_v3_restore_aprs,
__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_init_traps,
+   __KVM_HOST_SMCCC_FUNC___vgic_v3_save_vmcr_aprs,
+   __KVM_HOST_SMCCC_FUNC___vgic_v3_restore_vmcr_aprs,
__KVM_HOST_SMCCC_FUNC___pkvm_init_shadow,
__KVM_HOST_SMCCC_FUNC___pkvm_teardown_shadow,
 };
@@ -218,8 +216,6 @@ extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 extern void __kvm_adjust_pc(struct kvm_vcpu *vcpu);
 
 extern u64 __vgic_v3_get_gic_config(void);
-extern u64 __vgic_v3_read_vmcr(void);
-extern void __vgic_v3_write_vmcr(u32 vmcr);
 extern void __vgic_v3_init_lrs(void);
 
 extern u64 __kvm_get_mdcr_el2(void);
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 6797eafe7890..4adf7c2a77bd 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -61,8 +61,8 @@ void __vgic_v3_save_state(struct vgic_v3_cpu_if *cpu_if);
 void __vgic_v3_restore_state(struct vgic_v3_cpu_if *cpu_if);
 void __vgic_v3_activate_traps(struct vgic_v3_cpu_if *cpu_if);
 void __vgic_v3_deactivate_traps(struct vgic_v3_cpu_if *cpu_if);
-void __vgic_v3_save_aprs(struct vgic_v3_cpu_if *cpu_if);
-void __vgic_v3_restore_aprs(struct vgic_v3_cpu_if *cpu_if);
+void __vgic_v3_save_vmcr_aprs(struct vgic_v3_cpu_if *cpu_if);
+void __vgic_v3_restore_vmcr_aprs(struct vgic_v3_cpu_if *cpu_if);
 int __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu);
 
 #ifdef __KVM_NVHE_HYPERVISOR__
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 07d2ac6a5aff..c9b8e2ca5cb5 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -440,7 +440,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
if (has_vhe())
kvm_vcpu_put_sysregs_vhe(vcpu);
kvm_timer_vcpu_put(vcpu);
-   kvm_vgic_put(vcpu);
+   kvm_vgic_put(vcpu, false);
kvm_vcpu_pmu_restore_host(vcpu);
kvm_arm_vmid_clear_active();
 
@@ -656,15 +656,14 @@ void kvm_vcpu_wfi(struct kvm_vcpu *vcpu)
 * doorbells to be signalled, should an interrupt become pending.
 */
preempt_disable();
-   kvm_vgic_vmcr_sync(vcpu);
-   vgic_v4_put(vcpu, true);
+   kvm_vgic_put(vcpu, true);
preempt_enable();
 
kvm_vcpu_halt(vcpu);
kvm_clear_request(KVM_REQ_UNHALT, vcpu);
 
preempt_disable();
-   vgic_v4_load(vcpu);
+   kvm_vgic_load(vcpu);
preempt_enable();
 }
 
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c 
b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 245d267064b3..5b46742d9f9b 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -205,16 +205,6 @@ static void handle___vgic_v3_get_gic_config(struct 
kvm_cpu_context *host_ctxt)
cpu_reg(host_ctxt, 1) = __vgic_v3_get_gic_config();
 }
 
-static void handle___vgic_v3_read_vmcr(struct kvm_cpu_context *host_ctxt)
-{
-   cpu_reg(host_ctxt, 1) = __vgic_v3_read_vmcr();
-}
-
-static void handle___vgic_v3_write_vmcr(struct kvm_cpu_context *host_ctxt)
-{
-   __vgic_v3_write_vmcr(cpu_reg(host_ctxt, 1));
-}
-
 static void handle___vgic_v3_init_lrs(struct kvm_cpu_context *host_ctxt)
 {
__vgic_v3_init_lrs();
@@ -225,18 +215,18 @@ static void handle___kvm_get_mdcr_el2(struct 
kvm_cpu_context *host_ctxt)
cpu_reg(host_ctxt, 1) = __kvm_get_mdcr_el2();
 }
 
-static void handle___vgic_v3_save_aprs(struct kvm_cpu_context *host_ctxt)
+static void handle___vgic_v3_save_vmcr_aprs(struct kvm_cpu_context *host_ctxt)
 {
DECLARE_REG(struct vgic_v3_cpu_if *, cpu_if, host_ctxt, 1);
 
-

[PATCH 40/89] KVM: arm64: Split up nvhe/fixed_config.h

2022-05-19 Thread Will Deacon

In preparation for using some of the pKVM fixed configuration register
definitions to filter the available VM CAPs in the host, split the
nvhe/fixed_config.h header so that the definitions can be shared
with the host, while keeping the hypervisor function prototypes in
the nvhe/ namespace.

Signed-off-by: Will Deacon 
---
 arch/arm64/include/asm/kvm_pkvm.h | 190 
 .../arm64/kvm/hyp/include/nvhe/fixed_config.h | 205 --
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h|   6 +-
 arch/arm64/kvm/hyp/nvhe/pkvm.c|   1 -
 arch/arm64/kvm/hyp/nvhe/setup.c   |   1 -
 arch/arm64/kvm/hyp/nvhe/switch.c  |   2 +-
 arch/arm64/kvm/hyp/nvhe/sys_regs.c|   2 +-
 7 files changed, 197 insertions(+), 210 deletions(-)
 delete mode 100644 arch/arm64/kvm/hyp/include/nvhe/fixed_config.h

diff --git a/arch/arm64/include/asm/kvm_pkvm.h 
b/arch/arm64/include/asm/kvm_pkvm.h
index 1dc7372950b1..b92440cfb5b4 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -2,12 +2,14 @@
 /*
  * Copyright (C) 2020 - Google LLC
  * Author: Quentin Perret 
+ * Author: Fuad Tabba 
  */
 #ifndef __ARM64_KVM_PKVM_H__
 #define __ARM64_KVM_PKVM_H__
 
 #include 
 #include 
+#include 
 
 /* Maximum number of protected VMs that can be created. */
 #define KVM_MAX_PVMS 255
@@ -18,6 +20,194 @@ int kvm_init_pvm(struct kvm *kvm);
 int kvm_shadow_create(struct kvm *kvm);
 void kvm_shadow_destroy(struct kvm *kvm);
 
+/*
+ * Definitions for features to be allowed or restricted for guest virtual
+ * machines, depending on the mode KVM is running in and on the type of guest
+ * that is running.
+ *
+ * The ALLOW masks represent a bitmask of feature fields that are allowed
+ * without any restrictions as long as they are supported by the system.
+ *
+ * The RESTRICT_UNSIGNED masks, if present, represent unsigned fields for
+ * features that are restricted to support at most the specified feature.
+ *
+ * If a feature field is not present in either, than it is not supported.
+ *
+ * The approach taken for protected VMs is to allow features that are:
+ * - Needed by common Linux distributions (e.g., floating point)
+ * - Trivial to support, e.g., supporting the feature does not introduce or
+ * require tracking of additional state in KVM
+ * - Cannot be trapped or prevent the guest from using anyway
+ */
+
+/*
+ * Allow for protected VMs:
+ * - Floating-point and Advanced SIMD
+ * - Data Independent Timing
+ */
+#define PVM_ID_AA64PFR0_ALLOW (\
+   ARM64_FEATURE_MASK(ID_AA64PFR0_FP) | \
+   ARM64_FEATURE_MASK(ID_AA64PFR0_ASIMD) | \
+   ARM64_FEATURE_MASK(ID_AA64PFR0_DIT) \
+   )
+
+/*
+ * Restrict to the following *unsigned* features for protected VMs:
+ * - AArch64 guests only (no support for AArch32 guests):
+ * AArch32 adds complexity in trap handling, emulation, condition codes,
+ * etc...
+ * - RAS (v1)
+ * Supported by KVM
+ */
+#define PVM_ID_AA64PFR0_RESTRICT_UNSIGNED (\
+   FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64PFR0_EL0), 
ID_AA64PFR0_ELx_64BIT_ONLY) | \
+   FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64PFR0_EL1), 
ID_AA64PFR0_ELx_64BIT_ONLY) | \
+   FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64PFR0_EL2), 
ID_AA64PFR0_ELx_64BIT_ONLY) | \
+   FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64PFR0_EL3), 
ID_AA64PFR0_ELx_64BIT_ONLY) | \
+   FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64PFR0_RAS), ID_AA64PFR0_RAS_V1) \
+   )
+
+/*
+ * Allow for protected VMs:
+ * - Branch Target Identification
+ * - Speculative Store Bypassing
+ */
+#define PVM_ID_AA64PFR1_ALLOW (\
+   ARM64_FEATURE_MASK(ID_AA64PFR1_BT) | \
+   ARM64_FEATURE_MASK(ID_AA64PFR1_SSBS) \
+   )
+
+/*
+ * Allow for protected VMs:
+ * - Mixed-endian
+ * - Distinction between Secure and Non-secure Memory
+ * - Mixed-endian at EL0 only
+ * - Non-context synchronizing exception entry and exit
+ */
+#define PVM_ID_AA64MMFR0_ALLOW (\
+   ARM64_FEATURE_MASK(ID_AA64MMFR0_BIGENDEL) | \
+   ARM64_FEATURE_MASK(ID_AA64MMFR0_SNSMEM) | \
+   ARM64_FEATURE_MASK(ID_AA64MMFR0_BIGENDEL0) | \
+   ARM64_FEATURE_MASK(ID_AA64MMFR0_EXS) \
+   )
+
+/*
+ * Restrict to the following *unsigned* features for protected VMs:
+ * - 40-bit IPA
+ * - 16-bit ASID
+ */
+#define PVM_ID_AA64MMFR0_RESTRICT_UNSIGNED (\
+   FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64MMFR0_PARANGE), 
ID_AA64MMFR0_PARANGE_40) | \
+   FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64MMFR0_ASID), ID_AA64MMFR0_ASID_16) 
\
+   )
+
+/*
+ * Allow for protected VMs:
+ * - Hardware translation table updates to Access flag and Dirty state
+ * - Number of VMID bits from CPU
+ * - Hierarchical Permission Disables
+ * - Privileged Access Never
+ * - SError interrupt exceptions from speculative reads
+ * - Enhanced Translation Synchronization
+ */
+#define PVM_ID_AA64MMFR1_ALLOW (\
+   ARM64_FEATURE_MASK(ID_AA64MMFR1_HADBS) | \
+   ARM64_FEATURE_MASK(ID_AA64MMFR1_VMIDBITS) | \
+

[PATCH 38/89] KVM: arm64: Don't map host sections in pkvm

2022-05-19 Thread Will Deacon

From: Quentin Perret 

We no longer need to map the host's .rodata and .bss sections in the
pkvm hypervisor, so let's remove those mappings. This will avoid
creating dependencies at EL2 on host-controlled data-structures.

Signed-off-by: Quentin Perret 
---
 arch/arm64/kernel/image-vars.h  |  6 --
 arch/arm64/kvm/hyp/nvhe/setup.c | 14 +++---
 2 files changed, 3 insertions(+), 17 deletions(-)

diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 3e2489d23ff0..2d4d6836ff47 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -115,12 +115,6 @@ KVM_NVHE_ALIAS_HYP(__memcpy, __pi_memcpy);
 KVM_NVHE_ALIAS_HYP(__memset, __pi_memset);
 #endif
 
-/* Kernel memory sections */
-KVM_NVHE_ALIAS(__start_rodata);
-KVM_NVHE_ALIAS(__end_rodata);
-KVM_NVHE_ALIAS(__bss_start);
-KVM_NVHE_ALIAS(__bss_stop);
-
 /* Hyp memory sections */
 KVM_NVHE_ALIAS(__hyp_idmap_text_start);
 KVM_NVHE_ALIAS(__hyp_idmap_text_end);
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index a851de624074..c55661976f64 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -119,23 +119,15 @@ static int recreate_hyp_mappings(phys_addr_t phys, 
unsigned long size,
}
 
/*
-* Map the host's .bss and .rodata sections RO in the hypervisor, but
-* transfer the ownership from the host to the hypervisor itself to
-* make sure it can't be donated or shared with another entity.
+* Map the host sections RO in the hypervisor, but transfer the
+* ownership from the host to the hypervisor itself to make sure they
+* can't be donated or shared with another entity.
 *
 * The ownership transition requires matching changes in the host
 * stage-2. This will be done later (see finalize_host_mappings()) once
 * the hyp_vmemmap is addressable.
 */
prot = pkvm_mkstate(PAGE_HYP_RO, PKVM_PAGE_SHARED_OWNED);
-   ret = pkvm_create_mappings(__start_rodata, __end_rodata, prot);
-   if (ret)
-   return ret;
-
-   ret = pkvm_create_mappings(__hyp_bss_end, __bss_stop, prot);
-   if (ret)
-   return ret;
-
ret = pkvm_create_mappings(_vgic_global_state,
   _vgic_global_state + 1, prot);
if (ret)
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 37/89] KVM: arm64: Explicitly map kvm_vgic_global_state at EL2

2022-05-19 Thread Will Deacon

From: Quentin Perret 

The pkvm hypervisor may need to read the kvm_vgic_global_state variable
at EL2. Make sure to explicitly map it in its stage-1 page-table rather
than relying on mapping all of the host .rodata section.

Signed-off-by: Quentin Perret 
---
 arch/arm64/kvm/hyp/nvhe/setup.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 59a478dde533..a851de624074 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -136,6 +136,11 @@ static int recreate_hyp_mappings(phys_addr_t phys, 
unsigned long size,
if (ret)
return ret;
 
+   ret = pkvm_create_mappings(_vgic_global_state,
+  _vgic_global_state + 1, prot);
+   if (ret)
+   return ret;
+
return 0;
 }
 
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 36/89] KVM: arm64: Maintain a copy of 'kvm_arm_vmid_bits' at EL2

2022-05-19 Thread Will Deacon

Sharing 'kvm_arm_vmids_bits' between EL1 and EL2 allows the host to
modify the variable arbitrarily, potentially leading to all sorts of
shenanians as this is used to configure the VTTBR register for the
guest stage-2.

In preparation for unmapping host sections entirely from EL2, maintain
a copy of 'kvm_arm_vmid_bits' and initialise it from the host value
while it is still trusted.

Signed-off-by: Will Deacon 
---
 arch/arm64/include/asm/kvm_hyp.h | 2 ++
 arch/arm64/kernel/image-vars.h   | 3 ---
 arch/arm64/kvm/arm.c | 1 +
 arch/arm64/kvm/hyp/nvhe/pkvm.c   | 3 +++
 4 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index fd99cf09972d..6797eafe7890 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -124,4 +124,6 @@ extern u64 kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val);
 extern u64 kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val);
 
 extern unsigned long kvm_nvhe_sym(__icache_flags);
+extern unsigned int kvm_nvhe_sym(kvm_arm_vmid_bits);
+
 #endif /* __ARM64_KVM_HYP_H__ */
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 37a2d833851a..3e2489d23ff0 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -80,9 +80,6 @@ KVM_NVHE_ALIAS(nvhe_hyp_panic_handler);
 /* Vectors installed by hyp-init on reset HVC. */
 KVM_NVHE_ALIAS(__hyp_stub_vectors);
 
-/* VMID bits set by the KVM VMID allocator */
-KVM_NVHE_ALIAS(kvm_arm_vmid_bits);
-
 /* Kernel symbols needed for cpus_have_final/const_caps checks. */
 KVM_NVHE_ALIAS(arm64_const_caps_ready);
 KVM_NVHE_ALIAS(cpu_hwcap_keys);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index c7b362db692f..07d2ac6a5aff 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1839,6 +1839,7 @@ static int kvm_hyp_init_protection(u32 hyp_va_bits)
kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val) = 
read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val) = 
read_sanitised_ftr_reg(SYS_ID_AA64MMFR2_EL1);
kvm_nvhe_sym(__icache_flags) = __icache_flags;
+   kvm_nvhe_sym(kvm_arm_vmid_bits) = kvm_arm_vmid_bits;
 
ret = create_hyp_mappings(addr, addr + hyp_mem_size, PAGE_HYP);
if (ret)
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 9cd2bf75ed88..b29142a09e36 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -15,6 +15,9 @@
 /* Used by icache_is_vpipt(). */
 unsigned long __icache_flags;
 
+/* Used by kvm_get_vttbr(). */
+unsigned int kvm_arm_vmid_bits;
+
 /*
  * Set trap register values based on features in ID_AA64PFR0.
  */
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 35/89] KVM: arm64: Unmap kvm_arm_hyp_percpu_base from the host

2022-05-19 Thread Will Deacon

From: Quentin Perret 

In pKVM mode, we can't trust the host not to mess with the hypervisor
per-cpu offsets, so let's move the array containing them to the nVHE
code.

Signed-off-by: Quentin Perret 
---
 arch/arm64/include/asm/kvm_asm.h  | 4 ++--
 arch/arm64/kernel/image-vars.h| 3 ---
 arch/arm64/kvm/arm.c  | 9 -
 arch/arm64/kvm/hyp/nvhe/hyp-smp.c | 2 ++
 4 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 931a351da3f2..35b9d590bb74 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -110,7 +110,7 @@ enum __kvm_host_smccc_func {
 #define per_cpu_ptr_nvhe_sym(sym, cpu) 
\
({  
\
unsigned long base, off;
\
-   base = kvm_arm_hyp_percpu_base[cpu];
\
+   base = kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu];  
\
off = (unsigned long)_NVHE_SYM(sym) -
\
  (unsigned long)_NVHE_SYM(__per_cpu_start); 
\
base ? (typeof(CHOOSE_NVHE_SYM(sym))*)(base + off) : NULL;  
\
@@ -198,7 +198,7 @@ DECLARE_KVM_HYP_SYM(__kvm_hyp_vector);
 #define __kvm_hyp_init CHOOSE_NVHE_SYM(__kvm_hyp_init)
 #define __kvm_hyp_vector   CHOOSE_HYP_SYM(__kvm_hyp_vector)
 
-extern unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
+extern unsigned long kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[];
 DECLARE_KVM_NVHE_SYM(__per_cpu_start);
 DECLARE_KVM_NVHE_SYM(__per_cpu_end);
 
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 4e3b6d618ac1..37a2d833851a 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -102,9 +102,6 @@ KVM_NVHE_ALIAS(gic_nonsecure_priorities);
 KVM_NVHE_ALIAS(__start___kvm_ex_table);
 KVM_NVHE_ALIAS(__stop___kvm_ex_table);
 
-/* Array containing bases of nVHE per-CPU memory regions. */
-KVM_NVHE_ALIAS(kvm_arm_hyp_percpu_base);
-
 /* PMU available static key */
 #ifdef CONFIG_HW_PERF_EVENTS
 KVM_NVHE_ALIAS(kvm_arm_pmu_available);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 5b41551a978b..c7b362db692f 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -51,7 +51,6 @@ DEFINE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
 DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
 
 static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
-unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
 DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
 
 static bool vgic_present;
@@ -1800,13 +1799,13 @@ static void teardown_hyp_mode(void)
free_hyp_pgds();
for_each_possible_cpu(cpu) {
free_page(per_cpu(kvm_arm_hyp_stack_page, cpu));
-   free_pages(kvm_arm_hyp_percpu_base[cpu], nvhe_percpu_order());
+   free_pages(kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu], 
nvhe_percpu_order());
}
 }
 
 static int do_pkvm_init(u32 hyp_va_bits)
 {
-   void *per_cpu_base = kvm_ksym_ref(kvm_arm_hyp_percpu_base);
+   void *per_cpu_base = 
kvm_ksym_ref(kvm_nvhe_sym(kvm_arm_hyp_percpu_base));
int ret;
 
preempt_disable();
@@ -1907,7 +1906,7 @@ static int init_hyp_mode(void)
 
page_addr = page_address(page);
memcpy(page_addr, CHOOSE_NVHE_SYM(__per_cpu_start), 
nvhe_percpu_size());
-   kvm_arm_hyp_percpu_base[cpu] = (unsigned long)page_addr;
+   kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu] = (unsigned 
long)page_addr;
}
 
/*
@@ -1968,7 +1967,7 @@ static int init_hyp_mode(void)
}
 
for_each_possible_cpu(cpu) {
-   char *percpu_begin = (char *)kvm_arm_hyp_percpu_base[cpu];
+   char *percpu_begin = (char 
*)kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu];
char *percpu_end = percpu_begin + nvhe_percpu_size();
 
/* Map Hyp percpu pages */
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-smp.c 
b/arch/arm64/kvm/hyp/nvhe/hyp-smp.c
index 9f54833af400..04d194583f1e 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-smp.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-smp.c
@@ -23,6 +23,8 @@ u64 cpu_logical_map(unsigned int cpu)
return hyp_cpu_logical_map[cpu];
 }
 
+unsigned long __ro_after_init kvm_arm_hyp_percpu_base[NR_CPUS];
+
 unsigned long __hyp_per_cpu_offset(unsigned int cpu)
 {
unsigned long *cpu_base_array;
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 34/89] KVM: arm64: Don't access kvm_arm_hyp_percpu_base at EL1

2022-05-19 Thread Will Deacon

From: Quentin Perret 

The host KVM PMU code can currently index kvm_arm_hyp_percpu_base[]
through this_cpu_ptr_hyp_sym(), but will not actually dereference that
pointer when protected KVM is enabled. In preparation for making
kvm_arm_hyp_percpu_base[] unaccessible to the host, let's make sure the
indexing in hyp per-cpu pages is also done after the static key check to
avoid spurious accesses to EL2-private data from EL1.

Signed-off-by: Quentin Perret 
---
 arch/arm64/kvm/pmu.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
index 03a6c1f4a09a..a8878fd8b696 100644
--- a/arch/arm64/kvm/pmu.c
+++ b/arch/arm64/kvm/pmu.c
@@ -31,9 +31,13 @@ static bool kvm_pmu_switch_needed(struct perf_event_attr 
*attr)
  */
 void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr)
 {
-   struct kvm_host_data *ctx = this_cpu_ptr_hyp_sym(kvm_host_data);
+   struct kvm_host_data *ctx;
 
-   if (!kvm_arm_support_pmu_v3() || !ctx || !kvm_pmu_switch_needed(attr))
+   if (!kvm_arm_support_pmu_v3())
+   return;
+
+   ctx = this_cpu_ptr_hyp_sym(kvm_host_data);
+   if (!ctx || !kvm_pmu_switch_needed(attr))
return;
 
if (!attr->exclude_host)
@@ -47,9 +51,13 @@ void kvm_set_pmu_events(u32 set, struct perf_event_attr 
*attr)
  */
 void kvm_clr_pmu_events(u32 clr)
 {
-   struct kvm_host_data *ctx = this_cpu_ptr_hyp_sym(kvm_host_data);
+   struct kvm_host_data *ctx;
+
+   if (!kvm_arm_support_pmu_v3())
+   return;
 
-   if (!kvm_arm_support_pmu_v3() || !ctx)
+   ctx = this_cpu_ptr_hyp_sym(kvm_host_data);
+   if (!ctx)
return;
 
ctx->pmu_events.events_host &= ~clr;
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 33/89] KVM: arm64: Handle guest stage-2 page-tables entirely at EL2

2022-05-19 Thread Will Deacon

Now that EL2 is able to manage guest stage-2 page-tables, avoid
allocating a separate MMU structure in the host and instead introduce a
new fault handler which responds to guest stage-2 faults by sharing
GUP-pinned pages with the guest via a hypercall. These pages are
recovered (and unpinned) on guest teardown via the page reclaim
hypercall.

Signed-off-by: Will Deacon 
---
 arch/arm64/include/asm/kvm_asm.h   |   1 +
 arch/arm64/include/asm/kvm_host.h  |   6 ++
 arch/arm64/kvm/arm.c   |  10 ++-
 arch/arm64/kvm/hyp/nvhe/hyp-main.c |  49 ++-
 arch/arm64/kvm/mmu.c   | 137 +++--
 arch/arm64/kvm/pkvm.c  |  17 
 6 files changed, 212 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index a68381699c40..931a351da3f2 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -65,6 +65,7 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
__KVM_HOST_SMCCC_FUNC___pkvm_host_reclaim_page,
+   __KVM_HOST_SMCCC_FUNC___pkvm_host_map_guest,
__KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
__KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
__KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 3c6ed1f3887d..9252841850e4 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -158,10 +158,16 @@ struct kvm_s2_mmu {
 struct kvm_arch_memory_slot {
 };
 
+struct kvm_pinned_page {
+   struct list_headlink;
+   struct page *page;
+};
+
 struct kvm_protected_vm {
unsigned int shadow_handle;
struct mutex shadow_lock;
struct kvm_hyp_memcache teardown_mc;
+   struct list_head pinned_pages;
 };
 
 struct kvm_arch {
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 694ba3792e9d..5b41551a978b 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -358,7 +358,11 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
if (vcpu_has_run_once(vcpu) && unlikely(!irqchip_in_kernel(vcpu->kvm)))
static_branch_dec(_irqchip_in_use);
 
-   kvm_mmu_free_memory_cache(>arch.mmu_page_cache);
+   if (is_protected_kvm_enabled())
+   free_hyp_memcache(>arch.pkvm_memcache);
+   else
+   kvm_mmu_free_memory_cache(>arch.mmu_page_cache);
+
kvm_timer_vcpu_terminate(vcpu);
kvm_pmu_vcpu_destroy(vcpu);
 
@@ -385,6 +389,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
struct kvm_s2_mmu *mmu;
int *last_ran;
 
+   if (is_protected_kvm_enabled())
+   goto nommu;
+
mmu = vcpu->arch.hw_mmu;
last_ran = this_cpu_ptr(mmu->last_vcpu_ran);
 
@@ -402,6 +409,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
*last_ran = vcpu->vcpu_id;
}
 
+nommu:
vcpu->cpu = cpu;
 
kvm_vgic_load(vcpu);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c 
b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 7a0d95e28e00..245d267064b3 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -32,8 +32,6 @@ static void flush_shadow_state(struct kvm_shadow_vcpu_state 
*shadow_state)
shadow_vcpu->arch.sve_state = 
kern_hyp_va(host_vcpu->arch.sve_state);
shadow_vcpu->arch.sve_max_vl= host_vcpu->arch.sve_max_vl;
 
-   shadow_vcpu->arch.hw_mmu= host_vcpu->arch.hw_mmu;
-
shadow_vcpu->arch.hcr_el2   = host_vcpu->arch.hcr_el2;
shadow_vcpu->arch.mdcr_el2  = host_vcpu->arch.mdcr_el2;
shadow_vcpu->arch.cptr_el2  = host_vcpu->arch.cptr_el2;
@@ -107,6 +105,52 @@ static void handle___kvm_vcpu_run(struct kvm_cpu_context 
*host_ctxt)
cpu_reg(host_ctxt, 1) =  ret;
 }
 
+static int pkvm_refill_memcache(struct kvm_vcpu *shadow_vcpu,
+   struct kvm_vcpu *host_vcpu)
+{
+   struct kvm_shadow_vcpu_state *shadow_vcpu_state = 
get_shadow_state(shadow_vcpu);
+   u64 nr_pages = 
VTCR_EL2_LVLS(shadow_vcpu_state->shadow_vm->kvm.arch.vtcr) - 1;
+
+   return refill_memcache(_vcpu->arch.pkvm_memcache, nr_pages,
+  _vcpu->arch.pkvm_memcache);
+}
+
+static void handle___pkvm_host_map_guest(struct kvm_cpu_context *host_ctxt)
+{
+   DECLARE_REG(u64, pfn, host_ctxt, 1);
+   DECLARE_REG(u64, gfn, host_ctxt, 2);
+   DECLARE_REG(struct kvm_vcpu *, host_vcpu, host_ctxt, 3);
+   struct kvm_shadow_vcpu_state *shadow_state;
+   struct kvm_vcpu *shadow_vcpu;
+   struct kvm *host_kvm;
+   unsigned int handle;
+   int ret = -EINVAL;
+
+   if (!is_protected_kvm_enabled())
+   goto out;
+
+   host_vcpu = kern_hyp_va(host_vcpu);
+   host_kvm = kern_hyp_va(host_vcpu->kvm);
+   handle =

[PATCH 32/89] KVM: arm64: Use the shadow vCPU structure in handle___kvm_vcpu_run()

2022-05-19 Thread Will Deacon

As a stepping stone towards deprivileging the host's access to the
guest's vCPU structures, introduce some naive flush/sync routines to
copy most of the host vCPU into the shadow vCPU on vCPU run and back
again on return to EL1.

This allows us to run using the shadow structure when KVM is initialised
in protected mode.

Signed-off-by: Will Deacon 
---
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h |  4 ++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c | 82 +-
 arch/arm64/kvm/hyp/nvhe/pkvm.c | 27 +
 3 files changed, 111 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h 
b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index f841e2b252cd..e600dc4965c4 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -63,5 +63,9 @@ int __pkvm_init_shadow(struct kvm *kvm, unsigned long 
shadow_hva,
   size_t shadow_size, unsigned long pgd_hva);
 int __pkvm_teardown_shadow(unsigned int shadow_handle);
 
+struct kvm_shadow_vcpu_state *
+pkvm_load_shadow_vcpu_state(unsigned int shadow_handle, unsigned int vcpu_idx);
+void pkvm_put_shadow_vcpu_state(struct kvm_shadow_vcpu_state *shadow_state);
+
 #endif /* __ARM64_KVM_NVHE_PKVM_H__ */
 
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c 
b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 629d306c91c0..7a0d95e28e00 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -22,11 +22,89 @@ DEFINE_PER_CPU(struct kvm_nvhe_init_params, 
kvm_init_params);
 
 void __kvm_hyp_host_forward_smc(struct kvm_cpu_context *host_ctxt);
 
+static void flush_shadow_state(struct kvm_shadow_vcpu_state *shadow_state)
+{
+   struct kvm_vcpu *shadow_vcpu = _state->shadow_vcpu;
+   struct kvm_vcpu *host_vcpu = shadow_state->host_vcpu;
+
+   shadow_vcpu->arch.ctxt  = host_vcpu->arch.ctxt;
+
+   shadow_vcpu->arch.sve_state = 
kern_hyp_va(host_vcpu->arch.sve_state);
+   shadow_vcpu->arch.sve_max_vl= host_vcpu->arch.sve_max_vl;
+
+   shadow_vcpu->arch.hw_mmu= host_vcpu->arch.hw_mmu;
+
+   shadow_vcpu->arch.hcr_el2   = host_vcpu->arch.hcr_el2;
+   shadow_vcpu->arch.mdcr_el2  = host_vcpu->arch.mdcr_el2;
+   shadow_vcpu->arch.cptr_el2  = host_vcpu->arch.cptr_el2;
+
+   shadow_vcpu->arch.flags = host_vcpu->arch.flags;
+
+   shadow_vcpu->arch.debug_ptr = 
kern_hyp_va(host_vcpu->arch.debug_ptr);
+   shadow_vcpu->arch.host_fpsimd_state = host_vcpu->arch.host_fpsimd_state;
+
+   shadow_vcpu->arch.vsesr_el2 = host_vcpu->arch.vsesr_el2;
+
+   shadow_vcpu->arch.vgic_cpu.vgic_v3 = host_vcpu->arch.vgic_cpu.vgic_v3;
+}
+
+static void sync_shadow_state(struct kvm_shadow_vcpu_state *shadow_state)
+{
+   struct kvm_vcpu *shadow_vcpu = _state->shadow_vcpu;
+   struct kvm_vcpu *host_vcpu = shadow_state->host_vcpu;
+   struct vgic_v3_cpu_if *shadow_cpu_if = 
_vcpu->arch.vgic_cpu.vgic_v3;
+   struct vgic_v3_cpu_if *host_cpu_if = _vcpu->arch.vgic_cpu.vgic_v3;
+   unsigned int i;
+
+   host_vcpu->arch.ctxt= shadow_vcpu->arch.ctxt;
+
+   host_vcpu->arch.hcr_el2 = shadow_vcpu->arch.hcr_el2;
+   host_vcpu->arch.cptr_el2= shadow_vcpu->arch.cptr_el2;
+
+   host_vcpu->arch.fault   = shadow_vcpu->arch.fault;
+
+   host_vcpu->arch.flags   = shadow_vcpu->arch.flags;
+
+   host_cpu_if->vgic_hcr   = shadow_cpu_if->vgic_hcr;
+   for (i = 0; i < shadow_cpu_if->used_lrs; ++i)
+   host_cpu_if->vgic_lr[i] = shadow_cpu_if->vgic_lr[i];
+}
+
 static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
 {
-   DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
+   DECLARE_REG(struct kvm_vcpu *, host_vcpu, host_ctxt, 1);
+   int ret;
+
+   host_vcpu = kern_hyp_va(host_vcpu);
+
+   if (unlikely(is_protected_kvm_enabled())) {
+   struct kvm_shadow_vcpu_state *shadow_state;
+   struct kvm_vcpu *shadow_vcpu;
+   struct kvm *host_kvm;
+   unsigned int handle;
+
+   host_kvm = kern_hyp_va(host_vcpu->kvm);
+   handle = host_kvm->arch.pkvm.shadow_handle;
+   shadow_state = pkvm_load_shadow_vcpu_state(handle,
+  host_vcpu->vcpu_idx);
+   if (!shadow_state) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   shadow_vcpu = _state->shadow_vcpu;
+   flush_shadow_state(shadow_state);
+
+   ret = __kvm_vcpu_run(shadow_vcpu);
+
+   sync_shadow_state(shadow_state);
+   pkvm_put_shadow_vcpu_state(shadow_state);
+   } else {
+   ret = __kvm_vcpu_run(host_vcpu);
+   }
 
-   cpu_reg(host_ctxt, 1) =  __kvm_vcpu_run(kern_hyp_va(vcpu));
+out:
+   cpu_reg(host_ctxt, 1) =  ret;
 }
 
 static void

[PATCH 31/89] KVM: arm64: Disallow dirty logging and RO memslots with pKVM

2022-05-19 Thread Will Deacon

From: Quentin Perret 

The current implementation of pKVM doesn't support dirty logging or
read-only memslots. Although support for these features is desirable,
this will require future work, so let's cleanly report the limitations
to userspace by failing the ioctls until then.

Signed-off-by: Quentin Perret 
---
 arch/arm64/kvm/mmu.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 67cac3340d49..df92b5f7ac63 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1679,11 +1679,17 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
hva_t hva, reg_end;
int ret = 0;
 
-   /* In protected mode, cannot modify memslots once a VM has run. */
-   if (is_protected_kvm_enabled() &&
-   (change == KVM_MR_DELETE || change == KVM_MR_MOVE) &&
-   kvm->arch.pkvm.shadow_handle) {
-   return -EPERM;
+   if (is_protected_kvm_enabled()) {
+   /* In protected mode, cannot modify memslots once a VM has run. 
*/
+   if ((change == KVM_MR_DELETE || change == KVM_MR_MOVE) &&
+   kvm->arch.pkvm.shadow_handle) {
+   return -EPERM;
+   }
+
+   if (new &&
+   new->flags & (KVM_MEM_LOG_DIRTY_PAGES | KVM_MEM_READONLY)) {
+   return -EPERM;
+   }
}
 
if (change != KVM_MR_CREATE && change != KVM_MR_MOVE &&
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 30/89] KVM: arm64: Do not allow memslot changes after first VM run under pKVM

2022-05-19 Thread Will Deacon

From: Fuad Tabba 

As the guest stage-2 page-tables will soon be managed entirely by EL2
when pKVM is enabled, guest memory will be pinned and the MMU notifiers
in the host will be unable to reconfigure mappings at EL2 other than
destrroying the guest and reclaiming all of the memory.

Forbid memslot move/delete operations for VMs that have run under pKVM,
returning -EPERM to userspace if such an operation is requested.

Signed-off-by: Fuad Tabba 
---
 arch/arm64/kvm/mmu.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 0071f035dde8..67cac3340d49 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1679,6 +1679,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
hva_t hva, reg_end;
int ret = 0;
 
+   /* In protected mode, cannot modify memslots once a VM has run. */
+   if (is_protected_kvm_enabled() &&
+   (change == KVM_MR_DELETE || change == KVM_MR_MOVE) &&
+   kvm->arch.pkvm.shadow_handle) {
+   return -EPERM;
+   }
+
if (change != KVM_MR_CREATE && change != KVM_MR_MOVE &&
change != KVM_MR_FLAGS_ONLY)
return 0;
@@ -1755,6 +1762,10 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
gpa_t gpa = slot->base_gfn << PAGE_SHIFT;
phys_addr_t size = slot->npages << PAGE_SHIFT;
 
+   /* Stage-2 is managed by hyp in protected mode. */
+   if (is_protected_kvm_enabled())
+   return;
+
write_lock(>mmu_lock);
unmap_stage2_range(>arch.mmu, gpa, size);
write_unlock(>mmu_lock);
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 29/89] KVM: arm64: Check for PTE validity when checking for executable/cacheable

2022-05-19 Thread Will Deacon

From: Marc Zyngier 

Don't blindly assume that the PTE is valid when checking whether
it describes an executable or cacheable mapping.

This makes sure that we don't issue CMOs for invalid mappings.

Suggested-by: Will Deacon 
Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/hyp/pgtable.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 1d300313009d..a6676fd14cf9 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -700,12 +700,12 @@ static void stage2_put_pte(kvm_pte_t *ptep, struct 
kvm_s2_mmu *mmu, u64 addr,
 static bool stage2_pte_cacheable(struct kvm_pgtable *pgt, kvm_pte_t pte)
 {
u64 memattr = pte & KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR;
-   return memattr == KVM_S2_MEMATTR(pgt, NORMAL);
+   return kvm_pte_valid(pte) && memattr == KVM_S2_MEMATTR(pgt, NORMAL);
 }
 
 static bool stage2_pte_executable(kvm_pte_t pte)
 {
-   return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
+   return kvm_pte_valid(pte) && !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
 }
 
 static bool stage2_leaf_mapping_allowed(u64 addr, u64 end, u32 level,
@@ -750,8 +750,7 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, 
u32 level,
/* Perform CMOs before installation of the guest stage-2 PTE */
if (mm_ops->dcache_clean_inval_poc && stage2_pte_cacheable(pgt, new))
mm_ops->dcache_clean_inval_poc(kvm_pte_follow(new, mm_ops),
-   granule);
-
+  granule);
if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
 
@@ -1148,7 +1147,7 @@ static int stage2_flush_walker(u64 addr, u64 end, u32 
level, kvm_pte_t *ptep,
struct kvm_pgtable_mm_ops *mm_ops = pgt->mm_ops;
kvm_pte_t pte = *ptep;
 
-   if (!kvm_pte_valid(pte) || !stage2_pte_cacheable(pgt, pte))
+   if (!stage2_pte_cacheable(pgt, pte))
return 0;
 
if (mm_ops->dcache_clean_inval_poc)
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 28/89] KVM: arm64: Consolidate stage-2 init in one function

2022-05-19 Thread Will Deacon

From: Quentin Perret 

The initialization of stage-2 page-tables of guests is currently split
across two functions: kvm_init_stage2_mmu() and kvm_arm_setup_stage2().
That is presumably for historical reasons as kvm_arm_setup_stage2()
originates from the (now defunct) KVM port for 32bit Arm.

Simplify this code path by merging both functions into one, and while at
it make sure to map the kvm struct into the hypervisor stage-1 early on
to simplify the failure path.

Signed-off-by: Quentin Perret 
---
 arch/arm64/include/asm/kvm_arm.h  |  2 +-
 arch/arm64/include/asm/kvm_host.h |  2 --
 arch/arm64/include/asm/kvm_mmu.h  |  2 +-
 arch/arm64/kvm/arm.c  | 27 +--
 arch/arm64/kvm/mmu.c  | 27 ++-
 arch/arm64/kvm/reset.c| 29 -
 6 files changed, 41 insertions(+), 48 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 1767ded83888..98b60fa86853 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -134,7 +134,7 @@
  * 40 bits wide (T0SZ = 24).  Systems with a PARange smaller than 40 bits are
  * not known to exist and will break with this configuration.
  *
- * The VTCR_EL2 is configured per VM and is initialised in 
kvm_arm_setup_stage2().
+ * The VTCR_EL2 is configured per VM and is initialised in kvm_init_stage2_mmu.
  *
  * Note that when using 4K pages, we concatenate two first level page tables
  * together. With 16K pages, we concatenate 16 first level page tables.
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 264b1d2c4eb6..3c6ed1f3887d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -865,8 +865,6 @@ int kvm_set_ipa_limit(void);
 #define __KVM_HAVE_ARCH_VM_ALLOC
 struct kvm *kvm_arch_alloc_vm(void);
 
-int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long type);
-
 static inline bool kvm_vm_is_protected(struct kvm *kvm)
 {
return false;
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 74735a864eee..eaac8dec97de 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -162,7 +162,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t 
size,
 void free_hyp_pgds(void);
 
 void stage2_unmap_vm(struct kvm *kvm);
-int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu);
+int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long 
type);
 void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
 int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
  phys_addr_t pa, unsigned long size, bool writable);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 6a32eaf768e5..694ba3792e9d 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -135,28 +135,24 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
int ret;
 
-   ret = kvm_arm_setup_stage2(kvm, type);
-   if (ret)
-   return ret;
-
-   ret = kvm_init_stage2_mmu(kvm, >arch.mmu);
-   if (ret)
-   return ret;
-
ret = kvm_share_hyp(kvm, kvm + 1);
if (ret)
-   goto out_free_stage2_pgd;
+   return ret;
 
ret = kvm_init_pvm(kvm);
if (ret)
-   goto out_free_stage2_pgd;
+   goto err_unshare_kvm;
 
if (!zalloc_cpumask_var(>arch.supported_cpus, GFP_KERNEL)) {
ret = -ENOMEM;
-   goto out_free_stage2_pgd;
+   goto err_unshare_kvm;
}
cpumask_copy(kvm->arch.supported_cpus, cpu_possible_mask);
 
+   ret = kvm_init_stage2_mmu(kvm, >arch.mmu, type);
+   if (ret)
+   goto err_free_cpumask;
+
kvm_vgic_early_init(kvm);
 
/* The maximum number of VCPUs is limited by the host's GIC model */
@@ -164,9 +160,12 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 
set_default_spectre(kvm);
 
-   return ret;
-out_free_stage2_pgd:
-   kvm_free_stage2_pgd(>arch.mmu);
+   return 0;
+
+err_free_cpumask:
+   free_cpumask_var(kvm->arch.supported_cpus);
+err_unshare_kvm:
+   kvm_unshare_hyp(kvm, kvm + 1);
return ret;
 }
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index e77686d54e9f..0071f035dde8 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -618,15 +618,40 @@ static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = {
  * kvm_init_stage2_mmu - Initialise a S2 MMU structure
  * @kvm:   The pointer to the KVM structure
  * @mmu:   The pointer to the s2 MMU structure
+ * @type:  The machine type of the virtual machine
  *
  * Allocates only the stage-2 HW PGD level table(s).
  * Note we don't need locking here as this is only called when the VM is
  * created, which can only be done once.
  */
-int kvm_init_stage2_mmu(struct kvm *kvm, struct

[PATCH 27/89] KVM: arm64: Extend memory sharing to allow host-to-guest transitions

2022-05-19 Thread Will Deacon

In preparation for handling guest stage-2 mappings at EL2, extend our
memory protection mechanisms to support sharing of pages from the host
to a specific guest.

Signed-off-by: Will Deacon 
---
 arch/arm64/include/asm/kvm_host.h |   8 +-
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |   2 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 100 ++
 3 files changed, 108 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 32ac88e60e6b..264b1d2c4eb6 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -421,8 +421,12 @@ struct kvm_vcpu_arch {
/* Don't run the guest (internal implementation need) */
bool pause;
 
-   /* Cache some mmu pages needed inside spinlock regions */
-   struct kvm_mmu_memory_cache mmu_page_cache;
+   union {
+   /* Cache some mmu pages needed inside spinlock regions */
+   struct kvm_mmu_memory_cache mmu_page_cache;
+   /* Pages to be donated to pkvm/EL2 if it runs out */
+   struct kvm_hyp_memcache pkvm_memcache;
+   };
 
/* Target CPU and feature flags */
int target;
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h 
b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index ecedc545e608..364432276fe0 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -57,6 +57,7 @@ extern struct host_kvm host_kvm;
 enum pkvm_component_id {
PKVM_ID_HOST,
PKVM_ID_HYP,
+   PKVM_ID_GUEST,
 };
 
 extern unsigned long hyp_nr_cpus;
@@ -67,6 +68,7 @@ int __pkvm_host_unshare_hyp(u64 pfn);
 int __pkvm_host_reclaim_page(u64 pfn);
 int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
 int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
+int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct kvm_vcpu *vcpu);
 
 bool addr_is_memory(phys_addr_t phys);
 int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot 
prot);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c 
b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index adb6a880c684..2e92be8bb463 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -579,11 +579,21 @@ struct pkvm_mem_transition {
struct {
u64 completer_addr;
} hyp;
+   struct {
+   struct kvm_vcpu *vcpu;
+   } guest;
};
} initiator;
 
struct {
enum pkvm_component_id  id;
+
+   union {
+   struct {
+   struct kvm_vcpu *vcpu;
+   phys_addr_t phys;
+   } guest;
+   };
} completer;
 };
 
@@ -847,6 +857,52 @@ static int hyp_complete_donation(u64 addr,
return pkvm_create_mappings_locked(start, end, prot);
 }
 
+static enum pkvm_page_state guest_get_page_state(kvm_pte_t pte)
+{
+   if (!kvm_pte_valid(pte))
+   return PKVM_NOPAGE;
+
+   return pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte));
+}
+
+static int __guest_check_page_state_range(struct kvm_vcpu *vcpu, u64 addr,
+ u64 size, enum pkvm_page_state state)
+{
+   struct kvm_shadow_vm *vm = get_shadow_vm(vcpu);
+   struct check_walk_data d = {
+   .desired= state,
+   .get_page_state = guest_get_page_state,
+   };
+
+   hyp_assert_lock_held(>lock);
+   return check_page_state_range(>pgt, addr, size, );
+}
+
+static int guest_ack_share(u64 addr, const struct pkvm_mem_transition *tx,
+  enum kvm_pgtable_prot perms)
+{
+   u64 size = tx->nr_pages * PAGE_SIZE;
+
+   if (perms != KVM_PGTABLE_PROT_RWX)
+   return -EPERM;
+
+   return __guest_check_page_state_range(tx->completer.guest.vcpu, addr,
+ size, PKVM_NOPAGE);
+}
+
+static int guest_complete_share(u64 addr, const struct pkvm_mem_transition *tx,
+   enum kvm_pgtable_prot perms)
+{
+   struct kvm_vcpu *vcpu = tx->completer.guest.vcpu;
+   struct kvm_shadow_vm *vm = get_shadow_vm(vcpu);
+   u64 size = tx->nr_pages * PAGE_SIZE;
+   enum kvm_pgtable_prot prot;
+
+   prot = pkvm_mkstate(perms, PKVM_PAGE_SHARED_BORROWED);
+   return kvm_pgtable_stage2_map(>pgt, addr, size, 
tx->completer.guest.phys,
+ prot, >arch.pkvm_memcache);
+}
+
 static int check_share(struct pkvm_mem_share *share)
 {
const struct pkvm_mem_transition *tx = >tx;
@@ -868,6 +924,9 @@ static int check_share(struct pkvm_mem_share *share)
case PKVM_ID_HYP:
ret = hyp_ack_share(completer_addr, tx, share->completer_prot);

[PATCH 26/89] KVM: arm64: Provide a hypercall for the host to reclaim guest memory

2022-05-19 Thread Will Deacon

Implement a new hypercall, __pkvm_host_reclaim_page(), so that the host
at EL1 can reclaim pages that were previously donated to EL2. This
allows EL2 to defer clearing of guest memory on teardown and allows
preemption in the host after reclaiming each page.

Signed-off-by: Will Deacon 
---
 arch/arm64/include/asm/kvm_asm.h  |  1 +
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  1 +
 arch/arm64/kvm/hyp/include/nvhe/memory.h  |  7 ++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c|  8 ++
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 91 ++-
 5 files changed, 107 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index f5030e88eb58..a68381699c40 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -64,6 +64,7 @@ enum __kvm_host_smccc_func {
/* Hypercalls available after pKVM finalisation */
__KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
+   __KVM_HOST_SMCCC_FUNC___pkvm_host_reclaim_page,
__KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
__KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
__KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h 
b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 663019992b67..ecedc545e608 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -64,6 +64,7 @@ extern unsigned long hyp_nr_cpus;
 int __pkvm_prot_finalize(void);
 int __pkvm_host_share_hyp(u64 pfn);
 int __pkvm_host_unshare_hyp(u64 pfn);
+int __pkvm_host_reclaim_page(u64 pfn);
 int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
 int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
 
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h 
b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index 29f2ebe306bc..15b719fefc86 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -7,6 +7,13 @@
 
 #include 
 
+/*
+ * Accesses to struct hyp_page flags are serialized by the host stage-2
+ * page-table lock.
+ */
+#define HOST_PAGE_NEED_POISONING   BIT(0)
+#define HOST_PAGE_PENDING_RECLAIM  BIT(1)
+
 struct hyp_page {
unsigned short refcount;
u8 order;
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c 
b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 8e51cdab00b7..629d306c91c0 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -155,6 +155,13 @@ static void handle___pkvm_host_unshare_hyp(struct 
kvm_cpu_context *host_ctxt)
cpu_reg(host_ctxt, 1) = __pkvm_host_unshare_hyp(pfn);
 }
 
+static void handle___pkvm_host_reclaim_page(struct kvm_cpu_context *host_ctxt)
+{
+   DECLARE_REG(u64, pfn, host_ctxt, 1);
+
+   cpu_reg(host_ctxt, 1) = __pkvm_host_reclaim_page(pfn);
+}
+
 static void handle___pkvm_create_private_mapping(struct kvm_cpu_context 
*host_ctxt)
 {
DECLARE_REG(phys_addr_t, phys, host_ctxt, 1);
@@ -211,6 +218,7 @@ static const hcall_t host_hcall[] = {
 
HANDLE_FUNC(__pkvm_host_share_hyp),
HANDLE_FUNC(__pkvm_host_unshare_hyp),
+   HANDLE_FUNC(__pkvm_host_reclaim_page),
HANDLE_FUNC(__kvm_adjust_pc),
HANDLE_FUNC(__kvm_vcpu_run),
HANDLE_FUNC(__kvm_flush_vm_context),
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c 
b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index bcf84e157d4b..adb6a880c684 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -260,15 +260,51 @@ int kvm_guest_prepare_stage2(struct kvm_shadow_vm *vm, 
void *pgd)
return 0;
 }
 
+static int reclaim_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+   enum kvm_pgtable_walk_flags flag,
+   void * const arg)
+{
+   kvm_pte_t pte = *ptep;
+   struct hyp_page *page;
+
+   if (!kvm_pte_valid(pte))
+   return 0;
+
+   page = hyp_phys_to_page(kvm_pte_to_phys(pte));
+   switch (pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte))) {
+   case PKVM_PAGE_OWNED:
+   page->flags |= HOST_PAGE_NEED_POISONING;
+   fallthrough;
+   case PKVM_PAGE_SHARED_BORROWED:
+   case PKVM_PAGE_SHARED_OWNED:
+   page->flags |= HOST_PAGE_PENDING_RECLAIM;
+   break;
+   default:
+   return -EPERM;
+   }
+
+   return 0;
+}
+
 void reclaim_guest_pages(struct kvm_shadow_vm *vm, struct kvm_hyp_memcache *mc)
 {
+
+   struct kvm_pgtable_walker walker = {
+   .cb = reclaim_walker,
+   .flags  = KVM_PGTABLE_WALK_LEAF
+   };
void *addr;
 
-   /* Dump all pgtable pages in the hyp_pool */
+   host_lock_component();
guest_lock_component(vm);
+
+   /* Reclaim all guest pages and dump all pgtable pages in the hyp_pool */
+   BUG_ON(kvm_pgtable_walk(>pgt, 0, BIT(vm->pgt.ia_bits), ));

[PATCH 25/89] KVM: arm64: Add flags to struct hyp_page

2022-05-19 Thread Will Deacon

From: Quentin Perret 

Add a 'flags' field to struct hyp_page, and reduce the size of the order
field to u8 to avoid growing the struct size.

Signed-off-by: Quentin Perret 
---
 arch/arm64/kvm/hyp/include/nvhe/gfp.h|  6 +++---
 arch/arm64/kvm/hyp/include/nvhe/memory.h |  3 ++-
 arch/arm64/kvm/hyp/nvhe/page_alloc.c | 14 +++---
 3 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/gfp.h 
b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
index 0a048dc06a7d..9330b13075f8 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/gfp.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/gfp.h
@@ -7,7 +7,7 @@
 #include 
 #include 
 
-#define HYP_NO_ORDER   USHRT_MAX
+#define HYP_NO_ORDER   0xff
 
 struct hyp_pool {
/*
@@ -19,11 +19,11 @@ struct hyp_pool {
struct list_head free_area[MAX_ORDER];
phys_addr_t range_start;
phys_addr_t range_end;
-   unsigned short max_order;
+   u8 max_order;
 };
 
 /* Allocation */
-void *hyp_alloc_pages(struct hyp_pool *pool, unsigned short order);
+void *hyp_alloc_pages(struct hyp_pool *pool, u8 order);
 void hyp_split_page(struct hyp_page *page);
 void hyp_get_page(struct hyp_pool *pool, void *addr);
 void hyp_put_page(struct hyp_pool *pool, void *addr);
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h 
b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index e8a78b72aabf..29f2ebe306bc 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -9,7 +9,8 @@
 
 struct hyp_page {
unsigned short refcount;
-   unsigned short order;
+   u8 order;
+   u8 flags;
 };
 
 extern u64 __hyp_vmemmap;
diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c 
b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index 7804da89e55d..01976a58d850 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -32,7 +32,7 @@ u64 __hyp_vmemmap;
  */
 static struct hyp_page *__find_buddy_nocheck(struct hyp_pool *pool,
 struct hyp_page *p,
-unsigned short order)
+u8 order)
 {
phys_addr_t addr = hyp_page_to_phys(p);
 
@@ -51,7 +51,7 @@ static struct hyp_page *__find_buddy_nocheck(struct hyp_pool 
*pool,
 /* Find a buddy page currently available for allocation */
 static struct hyp_page *__find_buddy_avail(struct hyp_pool *pool,
   struct hyp_page *p,
-  unsigned short order)
+  u8 order)
 {
struct hyp_page *buddy = __find_buddy_nocheck(pool, p, order);
 
@@ -94,8 +94,8 @@ static void __hyp_attach_page(struct hyp_pool *pool,
  struct hyp_page *p)
 {
phys_addr_t phys = hyp_page_to_phys(p);
-   unsigned short order = p->order;
struct hyp_page *buddy;
+   u8 order = p->order;
 
memset(hyp_page_to_virt(p), 0, PAGE_SIZE << p->order);
 
@@ -128,7 +128,7 @@ static void __hyp_attach_page(struct hyp_pool *pool,
 
 static struct hyp_page *__hyp_extract_page(struct hyp_pool *pool,
   struct hyp_page *p,
-  unsigned short order)
+  u8 order)
 {
struct hyp_page *buddy;
 
@@ -182,7 +182,7 @@ void hyp_get_page(struct hyp_pool *pool, void *addr)
 
 void hyp_split_page(struct hyp_page *p)
 {
-   unsigned short order = p->order;
+   u8 order = p->order;
unsigned int i;
 
p->order = 0;
@@ -194,10 +194,10 @@ void hyp_split_page(struct hyp_page *p)
}
 }
 
-void *hyp_alloc_pages(struct hyp_pool *pool, unsigned short order)
+void *hyp_alloc_pages(struct hyp_pool *pool, u8 order)
 {
-   unsigned short i = order;
struct hyp_page *p;
+   u8 i = order;
 
hyp_spin_lock(>lock);
 
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 24/89] KVM: arm64: Return guest memory from EL2 via dedicated teardown memcache

2022-05-19 Thread Will Deacon

From: Quentin Perret 

Rather than relying on the host to free the shadow VM pages explicitly
on teardown, introduce a dedicated teardown memcache which allows the
host to reclaim guest memory resources without having to keep track of
all of the allocations made by EL2.

Signed-off-by: Quentin Perret 
---
 arch/arm64/include/asm/kvm_host.h |  6 +-
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 +-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 17 +++--
 arch/arm64/kvm/hyp/nvhe/pkvm.c|  8 +++-
 arch/arm64/kvm/pkvm.c | 12 +---
 5 files changed, 21 insertions(+), 24 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index f4272ce76084..32ac88e60e6b 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -161,11 +161,7 @@ struct kvm_arch_memory_slot {
 struct kvm_protected_vm {
unsigned int shadow_handle;
struct mutex shadow_lock;
-
-   struct {
-   void *pgd;
-   void *shadow;
-   } hyp_donations;
+   struct kvm_hyp_memcache teardown_mc;
 };
 
 struct kvm_arch {
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h 
b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 36eea31a1c5f..663019992b67 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -76,7 +76,7 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
 
 int hyp_pin_shared_mem(void *from, void *to);
 void hyp_unpin_shared_mem(void *from, void *to);
-void reclaim_guest_pages(struct kvm_shadow_vm *vm);
+void reclaim_guest_pages(struct kvm_shadow_vm *vm, struct kvm_hyp_memcache 
*mc);
 int refill_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages,
struct kvm_hyp_memcache *host_mc);
 
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c 
b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 992ef4b668b4..bcf84e157d4b 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -260,19 +260,24 @@ int kvm_guest_prepare_stage2(struct kvm_shadow_vm *vm, 
void *pgd)
return 0;
 }
 
-void reclaim_guest_pages(struct kvm_shadow_vm *vm)
+void reclaim_guest_pages(struct kvm_shadow_vm *vm, struct kvm_hyp_memcache *mc)
 {
-   unsigned long nr_pages, pfn;
-
-   nr_pages = kvm_pgtable_stage2_pgd_size(vm->kvm.arch.vtcr) >> PAGE_SHIFT;
-   pfn = hyp_virt_to_pfn(vm->pgt.pgd);
+   void *addr;
 
+   /* Dump all pgtable pages in the hyp_pool */
guest_lock_component(vm);
kvm_pgtable_stage2_destroy(>pgt);
vm->kvm.arch.mmu.pgd_phys = 0ULL;
guest_unlock_component(vm);
 
-   WARN_ON(__pkvm_hyp_donate_host(pfn, nr_pages));
+   /* Drain the hyp_pool into the memcache */
+   addr = hyp_alloc_pages(>pool, 0);
+   while (addr) {
+   memset(hyp_virt_to_page(addr), 0, sizeof(struct hyp_page));
+   push_hyp_memcache(mc, addr, hyp_virt_to_phys);
+   WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(addr), 1));
+   addr = hyp_alloc_pages(>pool, 0);
+   }
 }
 
 int __pkvm_prot_finalize(void)
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 114c5565de7d..a4a518b2a43b 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -546,8 +546,10 @@ int __pkvm_init_shadow(struct kvm *kvm, unsigned long 
shadow_hva,
 
 int __pkvm_teardown_shadow(unsigned int shadow_handle)
 {
+   struct kvm_hyp_memcache *mc;
struct kvm_shadow_vm *vm;
size_t shadow_size;
+   void *addr;
int err;
 
/* Lookup then remove entry from the shadow table. */
@@ -569,7 +571,8 @@ int __pkvm_teardown_shadow(unsigned int shadow_handle)
hyp_spin_unlock(_lock);
 
/* Reclaim guest pages (including page-table pages) */
-   reclaim_guest_pages(vm);
+   mc = >host_kvm->arch.pkvm.teardown_mc;
+   reclaim_guest_pages(vm, mc);
unpin_host_vcpus(vm->shadow_vcpu_states, vm->kvm.created_vcpus);
 
/* Push the metadata pages to the teardown memcache */
@@ -577,6 +580,9 @@ int __pkvm_teardown_shadow(unsigned int shadow_handle)
hyp_unpin_shared_mem(vm->host_kvm, vm->host_kvm + 1);
 
memset(vm, 0, shadow_size);
+   for (addr = vm; addr < (void *)vm + shadow_size; addr += PAGE_SIZE)
+   push_hyp_memcache(mc, addr, hyp_virt_to_phys);
+
unmap_donated_memory_noclear(vm, shadow_size);
return 0;
 
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index b4466b31d7c8..b174d6dfde36 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -160,8 +160,6 @@ static int __kvm_shadow_create(struct kvm *kvm)
 
/* Store the shadow handle given by hyp for future call reference. */
kvm->arch.pkvm.shadow_handle = shadow_handle;
-   kvm->arch.pkvm.hyp_donations.pgd =

[PATCH 23/89] KVM: arm64: Instantiate guest stage-2 page-tables at EL2

2022-05-19 Thread Will Deacon

From: Quentin Perret 

Extend the shadow initialisation at EL2 so that we instantiate a memory
pool and a full 'struct kvm_s2_mmu' structure for each VM, with a
stage-2 page-table entirely independent from the one managed by the host
at EL1.

For now, the new page-table is unused as there is no way for the host
to map anything into it. Yet.

Signed-off-by: Quentin Perret 
---
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h |   6 ++
 arch/arm64/kvm/hyp/nvhe/mem_protect.c  | 127 -
 2 files changed, 130 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h 
b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index dc06b043bd83..f841e2b252cd 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -9,6 +9,9 @@
 
 #include 
 
+#include 
+#include 
+
 /*
  * Holds the relevant data for maintaining the vcpu state completely at hyp.
  */
@@ -37,6 +40,9 @@ struct kvm_shadow_vm {
size_t shadow_area_size;
 
struct kvm_pgtable pgt;
+   struct kvm_pgtable_mm_ops mm_ops;
+   struct hyp_pool pool;
+   hyp_spinlock_t lock;
 
/* Array of the shadow state per vcpu. */
struct kvm_shadow_vcpu_state shadow_vcpu_states[0];
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c 
b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 707bd832145f..992ef4b668b4 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -25,6 +25,21 @@ struct host_kvm host_kvm;
 
 static struct hyp_pool host_s2_pool;
 
+static DEFINE_PER_CPU(struct kvm_shadow_vm *, __current_vm);
+#define current_vm (*this_cpu_ptr(&__current_vm))
+
+static void guest_lock_component(struct kvm_shadow_vm *vm)
+{
+   hyp_spin_lock(>lock);
+   current_vm = vm;
+}
+
+static void guest_unlock_component(struct kvm_shadow_vm *vm)
+{
+   current_vm = NULL;
+   hyp_spin_unlock(>lock);
+}
+
 static void host_lock_component(void)
 {
hyp_spin_lock(_kvm.lock);
@@ -140,18 +155,124 @@ int kvm_host_prepare_stage2(void *pgt_pool_base)
return 0;
 }
 
+static bool guest_stage2_force_pte_cb(u64 addr, u64 end,
+ enum kvm_pgtable_prot prot)
+{
+   return true;
+}
+
+static void *guest_s2_zalloc_pages_exact(size_t size)
+{
+   void *addr = hyp_alloc_pages(_vm->pool, get_order(size));
+
+   WARN_ON(size != (PAGE_SIZE << get_order(size)));
+   hyp_split_page(hyp_virt_to_page(addr));
+
+   return addr;
+}
+
+static void guest_s2_free_pages_exact(void *addr, unsigned long size)
+{
+   u8 order = get_order(size);
+   unsigned int i;
+
+   for (i = 0; i < (1 << order); i++)
+   hyp_put_page(_vm->pool, addr + (i * PAGE_SIZE));
+}
+
+static void *guest_s2_zalloc_page(void *mc)
+{
+   struct hyp_page *p;
+   void *addr;
+
+   addr = hyp_alloc_pages(_vm->pool, 0);
+   if (addr)
+   return addr;
+
+   addr = pop_hyp_memcache(mc, hyp_phys_to_virt);
+   if (!addr)
+   return addr;
+
+   memset(addr, 0, PAGE_SIZE);
+   p = hyp_virt_to_page(addr);
+   memset(p, 0, sizeof(*p));
+   p->refcount = 1;
+
+   return addr;
+}
+
+static void guest_s2_get_page(void *addr)
+{
+   hyp_get_page(_vm->pool, addr);
+}
+
+static void guest_s2_put_page(void *addr)
+{
+   hyp_put_page(_vm->pool, addr);
+}
+
+static void clean_dcache_guest_page(void *va, size_t size)
+{
+   __clean_dcache_guest_page(hyp_fixmap_map(__hyp_pa(va)), size);
+   hyp_fixmap_unmap();
+}
+
+static void invalidate_icache_guest_page(void *va, size_t size)
+{
+   __invalidate_icache_guest_page(hyp_fixmap_map(__hyp_pa(va)), size);
+   hyp_fixmap_unmap();
+}
+
 int kvm_guest_prepare_stage2(struct kvm_shadow_vm *vm, void *pgd)
 {
-   vm->pgt.pgd = pgd;
+   struct kvm_s2_mmu *mmu = >kvm.arch.mmu;
+   unsigned long nr_pages;
+   int ret;
+
+   nr_pages = kvm_pgtable_stage2_pgd_size(vm->kvm.arch.vtcr) >> PAGE_SHIFT;
+   ret = hyp_pool_init(>pool, hyp_virt_to_pfn(pgd), nr_pages, 0);
+   if (ret)
+   return ret;
+
+   hyp_spin_lock_init(>lock);
+   vm->mm_ops = (struct kvm_pgtable_mm_ops) {
+   .zalloc_pages_exact = guest_s2_zalloc_pages_exact,
+   .free_pages_exact   = guest_s2_free_pages_exact,
+   .zalloc_page= guest_s2_zalloc_page,
+   .phys_to_virt   = hyp_phys_to_virt,
+   .virt_to_phys   = hyp_virt_to_phys,
+   .page_count = hyp_page_count,
+   .get_page   = guest_s2_get_page,
+   .put_page   = guest_s2_put_page,
+   .dcache_clean_inval_poc = clean_dcache_guest_page,
+   .icache_inval_pou   = invalidate_icache_guest_page,
+   };
+
+   guest_lock_component(vm);
+   ret = __kvm_pgtable_stage2_init(mmu->pgt, mmu, >mm_ops, 0,
+

[PATCH 22/89] KVM: arm64: Add generic hyp_memcache helpers

2022-05-19 Thread Will Deacon

From: Quentin Perret 

The host and hypervisor will need to dynamically exchange memory pages
soon. Indeed, the hypervisor will rely on the host to donate memory
pages it can use to create guest stage-2 page-table and to store
metadata. In order to ease this process, introduce a struct hyp_memcache
which is essentially a linked list of available pages, indexed by
physical addresses.

Signed-off-by: Quentin Perret 
---
 arch/arm64/include/asm/kvm_host.h | 57 +++
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 +
 arch/arm64/kvm/hyp/nvhe/mm.c  | 33 +++
 arch/arm64/kvm/mmu.c  | 26 +
 4 files changed, 118 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 13967fc9731a..f4272ce76084 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -72,6 +72,63 @@ u32 __attribute_const__ kvm_target_cpu(void);
 int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
 void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu);
 
+struct kvm_hyp_memcache {
+   phys_addr_t head;
+   unsigned long nr_pages;
+};
+
+static inline void push_hyp_memcache(struct kvm_hyp_memcache *mc,
+phys_addr_t *p,
+phys_addr_t (*to_pa)(void *virt))
+{
+   *p = mc->head;
+   mc->head = to_pa(p);
+   mc->nr_pages++;
+}
+
+static inline void *pop_hyp_memcache(struct kvm_hyp_memcache *mc,
+void *(*to_va)(phys_addr_t phys))
+{
+   phys_addr_t *p = to_va(mc->head);
+
+   if (!mc->nr_pages)
+   return NULL;
+
+   mc->head = *p;
+   mc->nr_pages--;
+
+   return p;
+}
+
+static inline int __topup_hyp_memcache(struct kvm_hyp_memcache *mc,
+  unsigned long min_pages,
+  void *(*alloc_fn)(void *arg),
+  phys_addr_t (*to_pa)(void *virt),
+  void *arg)
+{
+   while (mc->nr_pages < min_pages) {
+   phys_addr_t *p = alloc_fn(arg);
+
+   if (!p)
+   return -ENOMEM;
+   push_hyp_memcache(mc, p, to_pa);
+   }
+
+   return 0;
+}
+
+static inline void __free_hyp_memcache(struct kvm_hyp_memcache *mc,
+  void (*free_fn)(void *virt, void *arg),
+  void *(*to_va)(phys_addr_t phys),
+  void *arg)
+{
+   while (mc->nr_pages)
+   free_fn(pop_hyp_memcache(mc, to_va), arg);
+}
+
+void free_hyp_memcache(struct kvm_hyp_memcache *mc);
+int topup_hyp_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages);
+
 struct kvm_vmid {
atomic64_t id;
 };
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h 
b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index d11d9d68a680..36eea31a1c5f 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -77,6 +77,8 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
 int hyp_pin_shared_mem(void *from, void *to);
 void hyp_unpin_shared_mem(void *from, void *to);
 void reclaim_guest_pages(struct kvm_shadow_vm *vm);
+int refill_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages,
+   struct kvm_hyp_memcache *host_mc);
 
 static __always_inline void __load_host_stage2(void)
 {
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index bdb39897343f..4e86a2123c05 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -307,3 +307,36 @@ int hyp_create_idmap(u32 hyp_va_bits)
 
return __pkvm_create_mappings(start, end - start, start, PAGE_HYP_EXEC);
 }
+
+static void *admit_host_page(void *arg)
+{
+   struct kvm_hyp_memcache *host_mc = arg;
+
+   if (!host_mc->nr_pages)
+   return NULL;
+
+   /*
+* The host still owns the pages in its memcache, so we need to go
+* through a full host-to-hyp donation cycle to change it. Fortunately,
+* __pkvm_host_donate_hyp() takes care of races for us, so if it
+* succeeds we're good to go.
+*/
+   if (__pkvm_host_donate_hyp(hyp_phys_to_pfn(host_mc->head), 1))
+   return NULL;
+
+   return pop_hyp_memcache(host_mc, hyp_phys_to_virt);
+}
+
+/* Refill our local memcache by poping pages from the one provided by the 
host. */
+int refill_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages,
+   struct kvm_hyp_memcache *host_mc)
+{
+   struct kvm_hyp_memcache tmp = *host_mc;
+   int ret;
+
+   ret =  __topup_hyp_memcache(mc, min_pages, admit_host_page,
+   hyp_virt_to_phys, );
+   *host_mc = tmp;
+
+   return ret;
+}
diff --git a/arch/arm64/kvm/mmu.c

[PATCH 21/89] KVM: arm64: Allow non-coallescable pages in a hyp_pool

2022-05-19 Thread Will Deacon

From: Quentin Perret 

All the contiguous pages used to initialize a hyp_pool are considered
coalesceable, which means that the hyp page allocator will actively
try to merge them with their buddies on the hyp_put_page() path.
However, using hyp_put_page() on a page that is not part of the inital
memory range given to a hyp_pool() is currently unsupported.

In order to allow dynamically extending hyp pools at run-time, add a
check to __hyp_attach_page() to allow inserting 'external' pages into
the free-list of order 0. This will be necessary to allow lazy
donation of pages from the host to the hypervisor when allocating guest
stage-2 page-table pages at EL2.

Signed-off-by: Quentin Perret 
---
 arch/arm64/kvm/hyp/nvhe/page_alloc.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c 
b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index dc87589440b8..7804da89e55d 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -93,11 +93,15 @@ static inline struct hyp_page *node_to_page(struct 
list_head *node)
 static void __hyp_attach_page(struct hyp_pool *pool,
  struct hyp_page *p)
 {
+   phys_addr_t phys = hyp_page_to_phys(p);
unsigned short order = p->order;
struct hyp_page *buddy;
 
memset(hyp_page_to_virt(p), 0, PAGE_SIZE << p->order);
 
+   if (phys < pool->range_start || phys >= pool->range_end)
+   goto insert;
+
/*
 * Only the first struct hyp_page of a high-order page (otherwise known
 * as the 'head') should have p->order set. The non-head pages should
@@ -116,6 +120,7 @@ static void __hyp_attach_page(struct hyp_pool *pool,
p = min(p, buddy);
}
 
+insert:
/* Mark the new head, and insert it */
p->order = order;
page_add_to_list(p, >free_area[order]);
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 20/89] KVM: arm64: Provide I-cache invalidation by VA at EL2

2022-05-19 Thread Will Deacon

In preparation for handling cache maintenance of guest pages at EL2,
introduce an EL2 copy of icache_inval_pou() which will later be plumbed
into the stage-2 page-table cache maintenance callbacks.

Signed-off-by: Will Deacon 
---
 arch/arm64/include/asm/kvm_hyp.h |  1 +
 arch/arm64/kernel/image-vars.h   |  3 ---
 arch/arm64/kvm/arm.c |  1 +
 arch/arm64/kvm/hyp/nvhe/cache.S  | 11 +++
 arch/arm64/kvm/hyp/nvhe/pkvm.c   |  3 +++
 5 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index aa7fa2a08f06..fd99cf09972d 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -123,4 +123,5 @@ extern u64 kvm_nvhe_sym(id_aa64mmfr0_el1_sys_val);
 extern u64 kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val);
 extern u64 kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val);
 
+extern unsigned long kvm_nvhe_sym(__icache_flags);
 #endif /* __ARM64_KVM_HYP_H__ */
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 241c86b67d01..4e3b6d618ac1 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -80,9 +80,6 @@ KVM_NVHE_ALIAS(nvhe_hyp_panic_handler);
 /* Vectors installed by hyp-init on reset HVC. */
 KVM_NVHE_ALIAS(__hyp_stub_vectors);
 
-/* Kernel symbol used by icache_is_vpipt(). */
-KVM_NVHE_ALIAS(__icache_flags);
-
 /* VMID bits set by the KVM VMID allocator */
 KVM_NVHE_ALIAS(kvm_arm_vmid_bits);
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 14adfd09e882..6a32eaf768e5 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1832,6 +1832,7 @@ static int kvm_hyp_init_protection(u32 hyp_va_bits)
kvm_nvhe_sym(id_aa64mmfr0_el1_sys_val) = 
read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val) = 
read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val) = 
read_sanitised_ftr_reg(SYS_ID_AA64MMFR2_EL1);
+   kvm_nvhe_sym(__icache_flags) = __icache_flags;
 
ret = create_hyp_mappings(addr, addr + hyp_mem_size, PAGE_HYP);
if (ret)
diff --git a/arch/arm64/kvm/hyp/nvhe/cache.S b/arch/arm64/kvm/hyp/nvhe/cache.S
index 0c367eb5f4e2..85936c17ae40 100644
--- a/arch/arm64/kvm/hyp/nvhe/cache.S
+++ b/arch/arm64/kvm/hyp/nvhe/cache.S
@@ -12,3 +12,14 @@ SYM_FUNC_START(__pi_dcache_clean_inval_poc)
ret
 SYM_FUNC_END(__pi_dcache_clean_inval_poc)
 SYM_FUNC_ALIAS(dcache_clean_inval_poc, __pi_dcache_clean_inval_poc)
+
+SYM_FUNC_START(__pi_icache_inval_pou)
+alternative_if ARM64_HAS_CACHE_DIC
+   isb
+   ret
+alternative_else_nop_endif
+
+   invalidate_icache_by_line x0, x1, x2, x3
+   ret
+SYM_FUNC_END(__pi_icache_inval_pou)
+SYM_FUNC_ALIAS(icache_inval_pou, __pi_icache_inval_pou)
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 77aeb787670b..114c5565de7d 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -12,6 +12,9 @@
 #include 
 #include 
 
+/* Used by icache_is_vpipt(). */
+unsigned long __icache_flags;
+
 /*
  * Set trap register values based on features in ID_AA64PFR0.
  */
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 19/89] KVM: arm64: Add pcpu fixmap infrastructure at EL2

2022-05-19 Thread Will Deacon

From: Quentin Perret 

We will soon need to temporarily map pages into the hypervisor stage-1
in nVHE protected mode. To do this efficiently, let's introduce a
per-cpu fixmap allowing to map a single page without needing to take any
lock or to allocate memory.

Signed-off-by: Quentin Perret 
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 +
 arch/arm64/kvm/hyp/include/nvhe/mm.h  |  4 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |  1 -
 arch/arm64/kvm/hyp/nvhe/mm.c  | 85 +++
 arch/arm64/kvm/hyp/nvhe/setup.c   |  4 +
 5 files changed, 95 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h 
b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 3a0817b5c739..d11d9d68a680 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -59,6 +59,8 @@ enum pkvm_component_id {
PKVM_ID_HYP,
 };
 
+extern unsigned long hyp_nr_cpus;
+
 int __pkvm_prot_finalize(void);
 int __pkvm_host_share_hyp(u64 pfn);
 int __pkvm_host_unshare_hyp(u64 pfn);
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h 
b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index 73309ccc192e..45b04bfee171 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -13,6 +13,10 @@
 extern struct kvm_pgtable pkvm_pgtable;
 extern hyp_spinlock_t pkvm_pgd_lock;
 
+int hyp_create_pcpu_fixmap(void);
+void *hyp_fixmap_map(phys_addr_t phys);
+int hyp_fixmap_unmap(void);
+
 int hyp_create_idmap(u32 hyp_va_bits);
 int hyp_map_vectors(void);
 int hyp_back_vmemmap(phys_addr_t back);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c 
b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 502bc0d04858..707bd832145f 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -21,7 +21,6 @@
 
 #define KVM_HOST_S2_FLAGS (KVM_PGTABLE_S2_NOFWB | KVM_PGTABLE_S2_IDMAP)
 
-extern unsigned long hyp_nr_cpus;
 struct host_kvm host_kvm;
 
 static struct hyp_pool host_s2_pool;
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index 4377b067dc0e..bdb39897343f 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -24,6 +25,7 @@ struct memblock_region hyp_memory[HYP_MEMBLOCK_REGIONS];
 unsigned int hyp_memblock_nr;
 
 static u64 __io_map_base;
+static DEFINE_PER_CPU(void *, hyp_fixmap_base);
 
 static int __pkvm_create_mappings(unsigned long start, unsigned long size,
  unsigned long phys, enum kvm_pgtable_prot 
prot)
@@ -198,6 +200,89 @@ int hyp_map_vectors(void)
return 0;
 }
 
+void *hyp_fixmap_map(phys_addr_t phys)
+{
+   void *addr = *this_cpu_ptr(_fixmap_base);
+   int ret = kvm_pgtable_hyp_map(_pgtable, (u64)addr, PAGE_SIZE,
+ phys, PAGE_HYP);
+   return ret ? NULL : addr;
+}
+
+int hyp_fixmap_unmap(void)
+{
+   void *addr = *this_cpu_ptr(_fixmap_base);
+   int ret = kvm_pgtable_hyp_unmap(_pgtable, (u64)addr, PAGE_SIZE);
+
+   return (ret != PAGE_SIZE) ? -EINVAL : 0;
+}
+
+static int __pin_pgtable_cb(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+   enum kvm_pgtable_walk_flags flag, void * const arg)
+{
+   if (!kvm_pte_valid(*ptep) || level != KVM_PGTABLE_MAX_LEVELS - 1)
+   return -EINVAL;
+   hyp_page_ref_inc(hyp_virt_to_page(ptep));
+
+   return 0;
+}
+
+static int hyp_pin_pgtable_pages(u64 addr)
+{
+   struct kvm_pgtable_walker walker = {
+   .cb = __pin_pgtable_cb,
+   .flags  = KVM_PGTABLE_WALK_LEAF,
+   };
+
+   return kvm_pgtable_walk(_pgtable, addr, PAGE_SIZE, );
+}
+
+int hyp_create_pcpu_fixmap(void)
+{
+   unsigned long i;
+   int ret = 0;
+   u64 addr;
+
+   hyp_spin_lock(_pgd_lock);
+
+   for (i = 0; i < hyp_nr_cpus; i++) {
+   addr = hyp_alloc_private_va_range(PAGE_SIZE);
+   if (IS_ERR((void *)addr)) {
+   ret = -ENOMEM;
+   goto unlock;
+   }
+
+   /*
+* Create a dummy mapping, to get the intermediate page-table
+* pages allocated, then take a reference on the last level
+* page to keep it around at all times.
+*/
+   ret = kvm_pgtable_hyp_map(_pgtable, addr, PAGE_SIZE,
+ __hyp_pa(__hyp_bss_start), PAGE_HYP);
+   if (ret) {
+   ret = -EINVAL;
+   goto unlock;
+   }
+
+   ret = hyp_pin_pgtable_pages(addr);
+   if (ret)
+   goto unlock;
+
+   ret = kvm_pgtable_hyp_unmap(_pgtable, addr, PAGE_SIZE);
+   if (ret != PAGE_SIZE) {
+   ret = -EINVAL;
+

[PATCH 18/89] KVM: arm64: Factor out private range VA allocation

2022-05-19 Thread Will Deacon

From: Quentin Perret 

__pkvm_create_private_mapping() is currently responsible for allocating
VA space in the hypervisor's "private" range and creating stage-1
mappings. In order to allow reusing the VA space allocation logic from
other places, let's factor it out in a standalone function.

Signed-off-by: Quentin Perret 
---
 arch/arm64/kvm/hyp/nvhe/mm.c | 28 +++-
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index 168e7fbe9a3c..4377b067dc0e 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -37,6 +37,22 @@ static int __pkvm_create_mappings(unsigned long start, 
unsigned long size,
return err;
 }
 
+static unsigned long hyp_alloc_private_va_range(size_t size)
+{
+   unsigned long addr = __io_map_base;
+
+   hyp_assert_lock_held(_pgd_lock);
+   __io_map_base += PAGE_ALIGN(size);
+
+   /* Are we overflowing on the vmemmap ? */
+   if (__io_map_base > __hyp_vmemmap) {
+   __io_map_base = addr;
+   addr = (unsigned long)ERR_PTR(-ENOMEM);
+   }
+
+   return addr;
+}
+
 unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
enum kvm_pgtable_prot prot)
 {
@@ -45,16 +61,10 @@ unsigned long __pkvm_create_private_mapping(phys_addr_t 
phys, size_t size,
 
hyp_spin_lock(_pgd_lock);
 
-   size = PAGE_ALIGN(size + offset_in_page(phys));
-   addr = __io_map_base;
-   __io_map_base += size;
-
-   /* Are we overflowing on the vmemmap ? */
-   if (__io_map_base > __hyp_vmemmap) {
-   __io_map_base -= size;
-   addr = (unsigned long)ERR_PTR(-ENOMEM);
+   size = size + offset_in_page(phys);
+   addr = hyp_alloc_private_va_range(size);
+   if (IS_ERR((void *)addr))
goto out;
-   }
 
err = kvm_pgtable_hyp_map(_pgtable, addr, size, phys, prot);
if (err) {
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 17/89] KVM: arm64: Make hyp stage-1 refcnt correct on the whole range

2022-05-19 Thread Will Deacon

From: Quentin Perret 

We currently fixup the hypervisor stage-1 refcount only for specific
portions of the hyp stage-1 VA space. In order to allow unmapping pages
outside of these ranges, let's fixup the refcount for the entire hyp VA
space.

Signed-off-by: Quentin Perret 
---
 arch/arm64/kvm/hyp/nvhe/setup.c | 62 +++--
 1 file changed, 43 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index e9e146e50254..b306da2b5dae 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -167,12 +167,11 @@ static void hpool_put_page(void *addr)
hyp_put_page(, addr);
 }
 
-static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
-kvm_pte_t *ptep,
-enum kvm_pgtable_walk_flags flag,
-void * const arg)
+static int fix_host_ownership_walker(u64 addr, u64 end, u32 level,
+kvm_pte_t *ptep,
+enum kvm_pgtable_walk_flags flag,
+void * const arg)
 {
-   struct kvm_pgtable_mm_ops *mm_ops = arg;
enum kvm_pgtable_prot prot;
enum pkvm_page_state state;
kvm_pte_t pte = *ptep;
@@ -181,15 +180,6 @@ static int finalize_host_mappings_walker(u64 addr, u64 
end, u32 level,
if (!kvm_pte_valid(pte))
return 0;
 
-   /*
-* Fix-up the refcount for the page-table pages as the early allocator
-* was unable to access the hyp_vmemmap and so the buddy allocator has
-* initialised the refcount to '1'.
-*/
-   mm_ops->get_page(ptep);
-   if (flag != KVM_PGTABLE_WALK_LEAF)
-   return 0;
-
if (level != (KVM_PGTABLE_MAX_LEVELS - 1))
return -EINVAL;
 
@@ -218,12 +208,30 @@ static int finalize_host_mappings_walker(u64 addr, u64 
end, u32 level,
return host_stage2_idmap_locked(phys, PAGE_SIZE, prot);
 }
 
-static int finalize_host_mappings(void)
+static int fix_hyp_pgtable_refcnt_walker(u64 addr, u64 end, u32 level,
+kvm_pte_t *ptep,
+enum kvm_pgtable_walk_flags flag,
+void * const arg)
+{
+   struct kvm_pgtable_mm_ops *mm_ops = arg;
+   kvm_pte_t pte = *ptep;
+
+   /*
+* Fix-up the refcount for the page-table pages as the early allocator
+* was unable to access the hyp_vmemmap and so the buddy allocator has
+* initialised the refcount to '1'.
+*/
+   if (kvm_pte_valid(pte))
+   mm_ops->get_page(ptep);
+
+   return 0;
+}
+
+static int fix_host_ownership(void)
 {
struct kvm_pgtable_walker walker = {
-   .cb = finalize_host_mappings_walker,
-   .flags  = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
-   .arg= pkvm_pgtable.mm_ops,
+   .cb = fix_host_ownership_walker,
+   .flags  = KVM_PGTABLE_WALK_LEAF,
};
int i, ret;
 
@@ -239,6 +247,18 @@ static int finalize_host_mappings(void)
return 0;
 }
 
+static int fix_hyp_pgtable_refcnt(void)
+{
+   struct kvm_pgtable_walker walker = {
+   .cb = fix_hyp_pgtable_refcnt_walker,
+   .flags  = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
+   .arg= pkvm_pgtable.mm_ops,
+   };
+
+   return kvm_pgtable_walk(_pgtable, 0, BIT(pkvm_pgtable.ia_bits),
+   );
+}
+
 void __noreturn __pkvm_init_finalise(void)
 {
struct kvm_host_data *host_data = this_cpu_ptr(_host_data);
@@ -268,7 +288,11 @@ void __noreturn __pkvm_init_finalise(void)
};
pkvm_pgtable.mm_ops = _pgtable_mm_ops;
 
-   ret = finalize_host_mappings();
+   ret = fix_host_ownership();
+   if (ret)
+   goto out;
+
+   ret = fix_hyp_pgtable_refcnt();
if (ret)
goto out;
 
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 16/89] KVM: arm64: Instantiate VM shadow data from EL1

2022-05-19 Thread Will Deacon

From: Fuad Tabba 

Now that EL2 provides calls to create and destroy shadow VM structures,
plumb these into the KVM code at EL1 so that a shadow VM is created on
first vCPU run and destroyed later along with the 'struct kvm' at
teardown time.

Signed-off-by: Fuad Tabba 
---
 arch/arm64/include/asm/kvm_host.h  |   6 ++
 arch/arm64/include/asm/kvm_pkvm.h  |   4 ++
 arch/arm64/kvm/arm.c   |  14 
 arch/arm64/kvm/hyp/hyp-constants.c |   3 +
 arch/arm64/kvm/pkvm.c  | 112 +
 5 files changed, 139 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 9ba721fc1600..13967fc9731a 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -103,6 +103,12 @@ struct kvm_arch_memory_slot {
 
 struct kvm_protected_vm {
unsigned int shadow_handle;
+   struct mutex shadow_lock;
+
+   struct {
+   void *pgd;
+   void *shadow;
+   } hyp_donations;
 };
 
 struct kvm_arch {
diff --git a/arch/arm64/include/asm/kvm_pkvm.h 
b/arch/arm64/include/asm/kvm_pkvm.h
index 11526e89fe5c..1dc7372950b1 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -14,6 +14,10 @@
 
 #define HYP_MEMBLOCK_REGIONS 128
 
+int kvm_init_pvm(struct kvm *kvm);
+int kvm_shadow_create(struct kvm *kvm);
+void kvm_shadow_destroy(struct kvm *kvm);
+
 extern struct memblock_region kvm_nvhe_sym(hyp_memory)[];
 extern unsigned int kvm_nvhe_sym(hyp_memblock_nr);
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 7f8731306c2a..14adfd09e882 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -146,6 +147,10 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
if (ret)
goto out_free_stage2_pgd;
 
+   ret = kvm_init_pvm(kvm);
+   if (ret)
+   goto out_free_stage2_pgd;
+
if (!zalloc_cpumask_var(>arch.supported_cpus, GFP_KERNEL)) {
ret = -ENOMEM;
goto out_free_stage2_pgd;
@@ -182,6 +187,9 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 
kvm_vgic_destroy(kvm);
 
+   if (is_protected_kvm_enabled())
+   kvm_shadow_destroy(kvm);
+
kvm_destroy_vcpus(kvm);
 
kvm_unshare_hyp(kvm, kvm + 1);
@@ -545,6 +553,12 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
if (ret)
return ret;
 
+   if (is_protected_kvm_enabled()) {
+   ret = kvm_shadow_create(kvm);
+   if (ret)
+   return ret;
+   }
+
if (!irqchip_in_kernel(kvm)) {
/*
 * Tell the rest of the code that there are userspace irqchip
diff --git a/arch/arm64/kvm/hyp/hyp-constants.c 
b/arch/arm64/kvm/hyp/hyp-constants.c
index b3742a6691e8..eee79527f901 100644
--- a/arch/arm64/kvm/hyp/hyp-constants.c
+++ b/arch/arm64/kvm/hyp/hyp-constants.c
@@ -2,9 +2,12 @@
 
 #include 
 #include 
+#include 
 
 int main(void)
 {
DEFINE(STRUCT_HYP_PAGE_SIZE,sizeof(struct hyp_page));
+   DEFINE(KVM_SHADOW_VM_SIZE,  sizeof(struct kvm_shadow_vm));
+   DEFINE(KVM_SHADOW_VCPU_STATE_SIZE, sizeof(struct 
kvm_shadow_vcpu_state));
return 0;
 }
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 3947063cc3a1..b4466b31d7c8 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -6,6 +6,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -94,3 +95,114 @@ void __init kvm_hyp_reserve(void)
kvm_info("Reserved %lld MiB at 0x%llx\n", hyp_mem_size >> 20,
 hyp_mem_base);
 }
+
+/*
+ * Allocates and donates memory for EL2 shadow structs.
+ *
+ * Allocates space for the shadow state, which includes the shadow vm as well 
as
+ * the shadow vcpu states.
+ *
+ * Stores an opaque handler in the kvm struct for future reference.
+ *
+ * Return 0 on success, negative error code on failure.
+ */
+static int __kvm_shadow_create(struct kvm *kvm)
+{
+   struct kvm_vcpu *vcpu, **vcpu_array;
+   unsigned int shadow_handle;
+   size_t pgd_sz, shadow_sz;
+   void *pgd, *shadow_addr;
+   unsigned long idx;
+   int ret;
+
+   if (kvm->created_vcpus < 1)
+   return -EINVAL;
+
+   pgd_sz = kvm_pgtable_stage2_pgd_size(kvm->arch.vtcr);
+   /*
+* The PGD pages will be reclaimed using a hyp_memcache which implies
+* page granularity. So, use alloc_pages_exact() to get individual
+* refcounts.
+*/
+   pgd = alloc_pages_exact(pgd_sz, GFP_KERNEL_ACCOUNT);
+   if (!pgd)
+   return -ENOMEM;
+
+   /* Allocate memory to donate to hyp for the kvm and vcpu state. */
+   shadow_sz = PAGE_ALIGN(KVM_SHADOW_VM_SIZE +
+  KVM_SHADOW_VCPU_STATE_SIZE * kvm->created_vcpus);
+   shadow_addr =

[PATCH 15/89] KVM: arm64: Introduce shadow VM state at EL2

2022-05-19 Thread Will Deacon

From: Fuad Tabba 

Introduce a table of shadow VM structures at EL2 and provide hypercalls
to the host for creating and destroying shadow VMs.

Signed-off-by: Fuad Tabba 
---
 arch/arm64/include/asm/kvm_asm.h  |   2 +
 arch/arm64/include/asm/kvm_host.h |   6 +
 arch/arm64/include/asm/kvm_pgtable.h  |   8 +
 arch/arm64/include/asm/kvm_pkvm.h |   8 +
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |   3 +
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h|  61 +++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c|  21 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |  14 +
 arch/arm64/kvm/hyp/nvhe/pkvm.c| 398 ++
 arch/arm64/kvm/hyp/nvhe/setup.c   |   8 +
 arch/arm64/kvm/hyp/pgtable.c  |   9 +
 arch/arm64/kvm/pkvm.c |   1 +
 12 files changed, 539 insertions(+)
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/pkvm.h

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index d5b0386ef765..f5030e88eb58 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -76,6 +76,8 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___vgic_v3_save_aprs,
__KVM_HOST_SMCCC_FUNC___vgic_v3_restore_aprs,
__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_init_traps,
+   __KVM_HOST_SMCCC_FUNC___pkvm_init_shadow,
+   __KVM_HOST_SMCCC_FUNC___pkvm_teardown_shadow,
 };
 
 #define DECLARE_KVM_VHE_SYM(sym)   extern char sym[]
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 14ed7c7ad797..9ba721fc1600 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -101,6 +101,10 @@ struct kvm_s2_mmu {
 struct kvm_arch_memory_slot {
 };
 
+struct kvm_protected_vm {
+   unsigned int shadow_handle;
+};
+
 struct kvm_arch {
struct kvm_s2_mmu mmu;
 
@@ -140,6 +144,8 @@ struct kvm_arch {
 
u8 pfr0_csv2;
u8 pfr0_csv3;
+
+   struct kvm_protected_vm pkvm;
 };
 
 struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/include/asm/kvm_pgtable.h 
b/arch/arm64/include/asm/kvm_pgtable.h
index 9f339dffbc1a..2d6b5058f7d3 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -288,6 +288,14 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 
addr, u64 size);
  */
 u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift);
 
+/*
+ * kvm_pgtable_stage2_pgd_size() - Helper to compute size of a stage-2 PGD
+ * @vtcr:  Content of the VTCR register.
+ *
+ * Return: the size (in bytes) of the stage-2 PGD
+ */
+size_t kvm_pgtable_stage2_pgd_size(u64 vtcr);
+
 /**
  * __kvm_pgtable_stage2_init() - Initialise a guest stage-2 page-table.
  * @pgt:   Uninitialised page-table structure to initialise.
diff --git a/arch/arm64/include/asm/kvm_pkvm.h 
b/arch/arm64/include/asm/kvm_pkvm.h
index 8f7b8a2314bb..11526e89fe5c 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -9,6 +9,9 @@
 #include 
 #include 
 
+/* Maximum number of protected VMs that can be created. */
+#define KVM_MAX_PVMS 255
+
 #define HYP_MEMBLOCK_REGIONS 128
 
 extern struct memblock_region kvm_nvhe_sym(hyp_memory)[];
@@ -40,6 +43,11 @@ static inline unsigned long hyp_vmemmap_pages(size_t 
vmemmap_entry_size)
return res >> PAGE_SHIFT;
 }
 
+static inline unsigned long hyp_shadow_table_pages(void)
+{
+   return PAGE_ALIGN(KVM_MAX_PVMS * sizeof(void *)) >> PAGE_SHIFT;
+}
+
 static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
 {
unsigned long total = 0, i;
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h 
b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 3bea816296dc..3a0817b5c739 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /*
@@ -68,10 +69,12 @@ bool addr_is_memory(phys_addr_t phys);
 int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot 
prot);
 int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id);
 int kvm_host_prepare_stage2(void *pgt_pool_base);
+int kvm_guest_prepare_stage2(struct kvm_shadow_vm *vm, void *pgd);
 void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
 
 int hyp_pin_shared_mem(void *from, void *to);
 void hyp_unpin_shared_mem(void *from, void *to);
+void reclaim_guest_pages(struct kvm_shadow_vm *vm);
 
 static __always_inline void __load_host_stage2(void)
 {
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h 
b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
new file mode 100644
index ..dc06b043bd83
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2021 Google LLC
+ * Author: Fuad Tabba 
+ */
+
+#ifndef __ARM64_KVM_NVHE_PKVM_H__
+#define

[PATCH 14/89] KVM: arm64: Add hyp_spinlock_t static initializer

2022-05-19 Thread Will Deacon

From: Fuad Tabba 

Having a static initializer for hyp_spinlock_t simplifies its
use when there isn't an initializing function.

No functional change intended.

Signed-off-by: Fuad Tabba 
---
 arch/arm64/kvm/hyp/include/nvhe/spinlock.h | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/spinlock.h 
b/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
index 4652fd04bdbe..7c7ea8c55405 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
@@ -28,9 +28,17 @@ typedef union hyp_spinlock {
};
 } hyp_spinlock_t;
 
+#define __HYP_SPIN_LOCK_INITIALIZER \
+   { .__val = 0 }
+
+#define __HYP_SPIN_LOCK_UNLOCKED \
+   ((hyp_spinlock_t) __HYP_SPIN_LOCK_INITIALIZER)
+
+#define DEFINE_HYP_SPINLOCK(x) hyp_spinlock_t x = __HYP_SPIN_LOCK_UNLOCKED
+
 #define hyp_spin_lock_init(l)  \
 do {   \
-   *(l) = (hyp_spinlock_t){ .__val = 0 };  \
+   *(l) = __HYP_SPIN_LOCK_UNLOCKED;\
 } while (0)
 
 static inline void hyp_spin_lock(hyp_spinlock_t *lock)
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 13/89] KVM: arm64: Include asm/kvm_mmu.h in nvhe/mem_protect.h

2022-05-19 Thread Will Deacon

nvhe/mem_protect.h refers to __load_stage2() in the definition of
__load_host_stage2() but doesn't include the relevant header.

Include asm/kvm_mmu.h in nvhe/mem_protect.h so that users of the latter
don't have to do this themselves.

Signed-off-by: Will Deacon 
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h 
b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 998bf165af71..3bea816296dc 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -8,6 +8,7 @@
 #define __KVM_NVHE_MEM_PROTECT__
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 12/89] KVM: arm64: Add helpers to pin memory shared with hyp

2022-05-19 Thread Will Deacon

From: Quentin Perret 

Add helpers allowing the hypervisor to check whether a range of pages
are currently shared by the host, and 'pin' them if so by blocking host
unshare operations until the memory has been unpinned. This will allow
the hypervisor to take references on host-provided data-structures
(struct kvm and such) and be guaranteed these pages will remain in a
stable state until it decides to release them, e.g. during guest
teardown.

Signed-off-by: Quentin Perret 
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  3 ++
 arch/arm64/kvm/hyp/include/nvhe/memory.h  |  7 ++-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 48 +++
 3 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h 
b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index c87b19b2d468..998bf165af71 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -69,6 +69,9 @@ int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, 
u8 owner_id);
 int kvm_host_prepare_stage2(void *pgt_pool_base);
 void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
 
+int hyp_pin_shared_mem(void *from, void *to);
+void hyp_unpin_shared_mem(void *from, void *to);
+
 static __always_inline void __load_host_stage2(void)
 {
if (static_branch_likely(_protected_mode_initialized))
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h 
b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index 418b66a82a50..e8a78b72aabf 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -51,10 +51,15 @@ static inline void hyp_page_ref_inc(struct hyp_page *p)
p->refcount++;
 }
 
-static inline int hyp_page_ref_dec_and_test(struct hyp_page *p)
+static inline void hyp_page_ref_dec(struct hyp_page *p)
 {
BUG_ON(!p->refcount);
p->refcount--;
+}
+
+static inline int hyp_page_ref_dec_and_test(struct hyp_page *p)
+{
+   hyp_page_ref_dec(p);
return (p->refcount == 0);
 }
 
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c 
b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index a7156fd13bc8..1262dbae7f06 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -625,6 +625,9 @@ static int hyp_ack_unshare(u64 addr, const struct 
pkvm_mem_transition *tx)
 {
u64 size = tx->nr_pages * PAGE_SIZE;
 
+   if (tx->initiator.id == PKVM_ID_HOST && hyp_page_count((void *)addr))
+   return -EBUSY;
+
if (__hyp_ack_skip_pgtable_check(tx))
return 0;
 
@@ -1038,3 +1041,48 @@ int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages)
 
return ret;
 }
+
+int hyp_pin_shared_mem(void *from, void *to)
+{
+   u64 cur, start = ALIGN_DOWN((u64)from, PAGE_SIZE);
+   u64 end = PAGE_ALIGN((u64)to);
+   u64 size = end - start;
+   int ret;
+
+   host_lock_component();
+   hyp_lock_component();
+
+   ret = __host_check_page_state_range(__hyp_pa(start), size,
+   PKVM_PAGE_SHARED_OWNED);
+   if (ret)
+   goto unlock;
+
+   ret = __hyp_check_page_state_range(start, size,
+  PKVM_PAGE_SHARED_BORROWED);
+   if (ret)
+   goto unlock;
+
+   for (cur = start; cur < end; cur += PAGE_SIZE)
+   hyp_page_ref_inc(hyp_virt_to_page(cur));
+
+unlock:
+   hyp_unlock_component();
+   host_unlock_component();
+
+   return ret;
+}
+
+void hyp_unpin_shared_mem(void *from, void *to)
+{
+   u64 cur, start = ALIGN_DOWN((u64)from, PAGE_SIZE);
+   u64 end = PAGE_ALIGN((u64)to);
+
+   host_lock_component();
+   hyp_lock_component();
+
+   for (cur = start; cur < end; cur += PAGE_SIZE)
+   hyp_page_ref_dec(hyp_virt_to_page(cur));
+
+   hyp_unlock_component();
+   host_unlock_component();
+}
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 11/89] KVM: arm64: Prevent the donation of no-map pages

2022-05-19 Thread Will Deacon

From: Quentin Perret 

Memory regions marked as no-map in DT routinely include TrustZone
carevouts and such. Although donating such pages to the hypervisor may
not breach confidentiality, it may be used to corrupt its state in
uncontrollable ways. To prevent this, let's block host-initiated memory
transitions targeting no-map pages altogether in nVHE protected mode as
there should be no valid reason to do this currently.

Thankfully, the pKVM EL2 hypervisor has a full copy of the host's list
of memblock regions, hence allowing to check for the presence of the
MEMBLOCK_NOMAP flag on any given region at EL2 easily.

Signed-off-by: Quentin Perret 
---
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 22 --
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c 
b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index c30402737548..a7156fd13bc8 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -193,7 +193,7 @@ struct kvm_mem_range {
u64 end;
 };
 
-static bool find_mem_range(phys_addr_t addr, struct kvm_mem_range *range)
+static struct memblock_region *find_mem_range(phys_addr_t addr, struct 
kvm_mem_range *range)
 {
int cur, left = 0, right = hyp_memblock_nr;
struct memblock_region *reg;
@@ -216,18 +216,28 @@ static bool find_mem_range(phys_addr_t addr, struct 
kvm_mem_range *range)
} else {
range->start = reg->base;
range->end = end;
-   return true;
+   return reg;
}
}
 
-   return false;
+   return NULL;
 }
 
 bool addr_is_memory(phys_addr_t phys)
 {
struct kvm_mem_range range;
 
-   return find_mem_range(phys, );
+   return !!find_mem_range(phys, );
+}
+
+static bool addr_is_allowed_memory(phys_addr_t phys)
+{
+   struct memblock_region *reg;
+   struct kvm_mem_range range;
+
+   reg = find_mem_range(phys, );
+
+   return reg && !(reg->flags & MEMBLOCK_NOMAP);
 }
 
 static bool is_in_mem_range(u64 addr, struct kvm_mem_range *range)
@@ -346,7 +356,7 @@ static bool host_stage2_force_pte_cb(u64 addr, u64 end, 
enum kvm_pgtable_prot pr
 static int host_stage2_idmap(u64 addr)
 {
struct kvm_mem_range range;
-   bool is_memory = find_mem_range(addr, );
+   bool is_memory = !!find_mem_range(addr, );
enum kvm_pgtable_prot prot;
int ret;
 
@@ -424,7 +434,7 @@ static int __check_page_state_visitor(u64 addr, u64 end, 
u32 level,
struct check_walk_data *d = arg;
kvm_pte_t pte = *ptep;
 
-   if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
+   if (kvm_pte_valid(pte) && !addr_is_allowed_memory(kvm_pte_to_phys(pte)))
return -EINVAL;
 
return d->get_page_state(pte) == d->desired ? 0 : -EPERM;
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 10/89] KVM: arm64: Implement do_donate() helper for donating memory

2022-05-19 Thread Will Deacon

From: Quentin Perret 

Transferring ownership information of a memory region from one component
to another can be achieved using a "donate" operation, which results
in the previous owner losing access to the underlying pages entirely.

Implement a do_donate() helper, along the same lines as do_{un,}share,
and provide this functionality for the host-{to,from}-hyp cases as this
will later be used to donate/reclaim memory pages to store VM metadata
at EL2.

Signed-off-by: Quentin Perret 
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |   2 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 239 ++
 2 files changed, 241 insertions(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h 
b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index f5705a1e972f..c87b19b2d468 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -60,6 +60,8 @@ enum pkvm_component_id {
 int __pkvm_prot_finalize(void);
 int __pkvm_host_share_hyp(u64 pfn);
 int __pkvm_host_unshare_hyp(u64 pfn);
+int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
+int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
 
 bool addr_is_memory(phys_addr_t phys);
 int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot 
prot);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c 
b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index ff86f5bd230f..c30402737548 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -391,6 +391,9 @@ struct pkvm_mem_transition {
/* Address in the completer's address space */
u64 completer_addr;
} host;
+   struct {
+   u64 completer_addr;
+   } hyp;
};
} initiator;
 
@@ -404,6 +407,10 @@ struct pkvm_mem_share {
const enum kvm_pgtable_prot completer_prot;
 };
 
+struct pkvm_mem_donation {
+   const struct pkvm_mem_transitiontx;
+};
+
 struct check_walk_data {
enum pkvm_page_statedesired;
enum pkvm_page_state(*get_page_state)(kvm_pte_t pte);
@@ -503,6 +510,46 @@ static int host_initiate_unshare(u64 *completer_addr,
return __host_set_page_state_range(addr, size, PKVM_PAGE_OWNED);
 }
 
+static int host_initiate_donation(u64 *completer_addr,
+ const struct pkvm_mem_transition *tx)
+{
+   u8 owner_id = tx->completer.id;
+   u64 size = tx->nr_pages * PAGE_SIZE;
+
+   *completer_addr = tx->initiator.host.completer_addr;
+   return host_stage2_set_owner_locked(tx->initiator.addr, size, owner_id);
+}
+
+static bool __host_ack_skip_pgtable_check(const struct pkvm_mem_transition *tx)
+{
+   return !(IS_ENABLED(CONFIG_NVHE_EL2_DEBUG) ||
+tx->initiator.id != PKVM_ID_HYP);
+}
+
+static int __host_ack_transition(u64 addr, const struct pkvm_mem_transition 
*tx,
+enum pkvm_page_state state)
+{
+   u64 size = tx->nr_pages * PAGE_SIZE;
+
+   if (__host_ack_skip_pgtable_check(tx))
+   return 0;
+
+   return __host_check_page_state_range(addr, size, state);
+}
+
+static int host_ack_donation(u64 addr, const struct pkvm_mem_transition *tx)
+{
+   return __host_ack_transition(addr, tx, PKVM_NOPAGE);
+}
+
+static int host_complete_donation(u64 addr, const struct pkvm_mem_transition 
*tx)
+{
+   u64 size = tx->nr_pages * PAGE_SIZE;
+   u8 host_id = tx->completer.id;
+
+   return host_stage2_set_owner_locked(addr, size, host_id);
+}
+
 static enum pkvm_page_state hyp_get_page_state(kvm_pte_t pte)
 {
if (!kvm_pte_valid(pte))
@@ -523,6 +570,27 @@ static int __hyp_check_page_state_range(u64 addr, u64 size,
return check_page_state_range(_pgtable, addr, size, );
 }
 
+static int hyp_request_donation(u64 *completer_addr,
+   const struct pkvm_mem_transition *tx)
+{
+   u64 size = tx->nr_pages * PAGE_SIZE;
+   u64 addr = tx->initiator.addr;
+
+   *completer_addr = tx->initiator.hyp.completer_addr;
+   return __hyp_check_page_state_range(addr, size, PKVM_PAGE_OWNED);
+}
+
+static int hyp_initiate_donation(u64 *completer_addr,
+const struct pkvm_mem_transition *tx)
+{
+   u64 size = tx->nr_pages * PAGE_SIZE;
+   int ret;
+
+   *completer_addr = tx->initiator.hyp.completer_addr;
+   ret = kvm_pgtable_hyp_unmap(_pgtable, tx->initiator.addr, size);
+   return (ret != size) ? -EFAULT : 0;
+}
+
 static bool __hyp_ack_skip_pgtable_check(const struct pkvm_mem_transition *tx)
 {
return !(IS_ENABLED(CONFIG_NVHE_EL2_DEBUG) ||
@@ -554,6 +622,16 @@ static int hyp_ack_unshare(u64 addr, const struct 
pkvm_mem_transition *tx)
PKVM_PAGE_SHARED_BORROWED);
 }
 
+static int hyp_ack_donation(u64 addr,

[PATCH 08/89] KVM: arm64: Back hyp_vmemmap for all of memory

2022-05-19 Thread Will Deacon

From: Quentin Perret 

The EL2 vmemmap in nVHE Protected mode is currently very sparse: only
memory pages owned by the hypervisor itself have a matching struct
hyp_page. But since the size of these structs has been reduced
significantly, it appears that we can afford backing the vmemmap for all
of memory.

This will simplify a lot memory tracking as the hypervisor will have a
place to store metadata (e.g. refcounts) that wouldn't otherwise fit in
the 4 SW bits we have in the host stage-2 page-table for instance.

Signed-off-by: Quentin Perret 
---
 arch/arm64/include/asm/kvm_pkvm.h| 26 +++
 arch/arm64/kvm/hyp/include/nvhe/mm.h | 14 +
 arch/arm64/kvm/hyp/nvhe/mm.c | 31 
 arch/arm64/kvm/hyp/nvhe/page_alloc.c |  4 +---
 arch/arm64/kvm/hyp/nvhe/setup.c  |  7 +++
 arch/arm64/kvm/pkvm.c| 18 ++--
 6 files changed, 60 insertions(+), 40 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pkvm.h 
b/arch/arm64/include/asm/kvm_pkvm.h
index 9f4ad2a8df59..8f7b8a2314bb 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -14,6 +14,32 @@
 extern struct memblock_region kvm_nvhe_sym(hyp_memory)[];
 extern unsigned int kvm_nvhe_sym(hyp_memblock_nr);
 
+static inline unsigned long
+hyp_vmemmap_memblock_size(struct memblock_region *reg, size_t 
vmemmap_entry_size)
+{
+   unsigned long nr_pages = reg->size >> PAGE_SHIFT;
+   unsigned long start, end;
+
+   start = (reg->base >> PAGE_SHIFT) * vmemmap_entry_size;
+   end = start + nr_pages * vmemmap_entry_size;
+   start = ALIGN_DOWN(start, PAGE_SIZE);
+   end = ALIGN(end, PAGE_SIZE);
+
+   return end - start;
+}
+
+static inline unsigned long hyp_vmemmap_pages(size_t vmemmap_entry_size)
+{
+   unsigned long res = 0, i;
+
+   for (i = 0; i < kvm_nvhe_sym(hyp_memblock_nr); i++) {
+   res += hyp_vmemmap_memblock_size(_nvhe_sym(hyp_memory)[i],
+vmemmap_entry_size);
+   }
+
+   return res >> PAGE_SHIFT;
+}
+
 static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
 {
unsigned long total = 0, i;
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h 
b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index 2d08510c6cc1..73309ccc192e 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -15,23 +15,11 @@ extern hyp_spinlock_t pkvm_pgd_lock;
 
 int hyp_create_idmap(u32 hyp_va_bits);
 int hyp_map_vectors(void);
-int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back);
+int hyp_back_vmemmap(phys_addr_t back);
 int pkvm_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
 int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
 int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot 
prot);
 unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
enum kvm_pgtable_prot prot);
 
-static inline void hyp_vmemmap_range(phys_addr_t phys, unsigned long size,
-unsigned long *start, unsigned long *end)
-{
-   unsigned long nr_pages = size >> PAGE_SHIFT;
-   struct hyp_page *p = hyp_phys_to_page(phys);
-
-   *start = (unsigned long)p;
-   *end = *start + nr_pages * sizeof(struct hyp_page);
-   *start = ALIGN_DOWN(*start, PAGE_SIZE);
-   *end = ALIGN(*end, PAGE_SIZE);
-}
-
 #endif /* __KVM_HYP_MM_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index cdbe8e246418..168e7fbe9a3c 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -105,13 +105,36 @@ int pkvm_create_mappings(void *from, void *to, enum 
kvm_pgtable_prot prot)
return ret;
 }
 
-int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back)
+int hyp_back_vmemmap(phys_addr_t back)
 {
-   unsigned long start, end;
+   unsigned long i, start, size, end = 0;
+   int ret;
 
-   hyp_vmemmap_range(phys, size, , );
+   for (i = 0; i < hyp_memblock_nr; i++) {
+   start = hyp_memory[i].base;
+   start = ALIGN_DOWN((u64)hyp_phys_to_page(start), PAGE_SIZE);
+   /*
+* The begining of the hyp_vmemmap region for the current
+* memblock may already be backed by the page backing the end
+* the previous region, so avoid mapping it twice.
+*/
+   start = max(start, end);
+
+   end = hyp_memory[i].base + hyp_memory[i].size;
+   end = PAGE_ALIGN((u64)hyp_phys_to_page(end));
+   if (start >= end)
+   continue;
+
+   size = end - start;
+   ret = __pkvm_create_mappings(start, size, back, PAGE_HYP);
+   if (ret)
+   return ret;
+
+

[PATCH 09/89] KVM: arm64: Unify identifiers used to distinguish host and hypervisor

2022-05-19 Thread Will Deacon

The 'pkvm_component_id' enum type provides constants to refer to the
host and the hypervisor, yet this information is duplicated by the
'pkvm_hyp_id' constant.

Remove the definition of 'pkvm_hyp_id' and move the 'pkvm_component_id'
type definition to 'mem_protect.h' so that it can be used outside of
the memory protection code.

Signed-off-by: Will Deacon 
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 6 +-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 8 
 arch/arm64/kvm/hyp/nvhe/setup.c   | 2 +-
 3 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h 
b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 80e99836eac7..f5705a1e972f 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -51,7 +51,11 @@ struct host_kvm {
 };
 extern struct host_kvm host_kvm;
 
-extern const u8 pkvm_hyp_id;
+/* This corresponds to page-table locking order */
+enum pkvm_component_id {
+   PKVM_ID_HOST,
+   PKVM_ID_HYP,
+};
 
 int __pkvm_prot_finalize(void);
 int __pkvm_host_share_hyp(u64 pfn);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c 
b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 1e78acf9662e..ff86f5bd230f 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -26,8 +26,6 @@ struct host_kvm host_kvm;
 
 static struct hyp_pool host_s2_pool;
 
-const u8 pkvm_hyp_id = 1;
-
 static void host_lock_component(void)
 {
hyp_spin_lock(_kvm.lock);
@@ -380,12 +378,6 @@ void handle_host_mem_abort(struct kvm_cpu_context 
*host_ctxt)
BUG_ON(ret && ret != -EAGAIN);
 }
 
-/* This corresponds to locking order */
-enum pkvm_component_id {
-   PKVM_ID_HOST,
-   PKVM_ID_HYP,
-};
-
 struct pkvm_mem_transition {
u64 nr_pages;
 
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 7d2b325efb50..311197a223e6 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -197,7 +197,7 @@ static int finalize_host_mappings_walker(u64 addr, u64 end, 
u32 level,
state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(pte));
switch (state) {
case PKVM_PAGE_OWNED:
-   return host_stage2_set_owner_locked(phys, PAGE_SIZE, 
pkvm_hyp_id);
+   return host_stage2_set_owner_locked(phys, PAGE_SIZE, 
PKVM_ID_HYP);
case PKVM_PAGE_SHARED_OWNED:
prot = pkvm_mkstate(PKVM_HOST_MEM_PROT, 
PKVM_PAGE_SHARED_BORROWED);
break;
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 07/89] KVM: arm64: Move hyp refcount manipulation helpers

2022-05-19 Thread Will Deacon

From: Quentin Perret 

We will soon need to manipulate struct hyp_page refcounts from outside
page_alloc.c, so move the helpers to a header file.

Signed-off-by: Quentin Perret 
---
 arch/arm64/kvm/hyp/include/nvhe/memory.h | 18 ++
 arch/arm64/kvm/hyp/nvhe/page_alloc.c | 19 ---
 2 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h 
b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index 592b7edb3edb..418b66a82a50 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -45,4 +45,22 @@ static inline int hyp_page_count(void *addr)
return p->refcount;
 }
 
+static inline void hyp_page_ref_inc(struct hyp_page *p)
+{
+   BUG_ON(p->refcount == USHRT_MAX);
+   p->refcount++;
+}
+
+static inline int hyp_page_ref_dec_and_test(struct hyp_page *p)
+{
+   BUG_ON(!p->refcount);
+   p->refcount--;
+   return (p->refcount == 0);
+}
+
+static inline void hyp_set_page_refcounted(struct hyp_page *p)
+{
+   BUG_ON(p->refcount);
+   p->refcount = 1;
+}
 #endif /* __KVM_HYP_MEMORY_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c 
b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index d40f0b30b534..1ded09fc9b10 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -144,25 +144,6 @@ static struct hyp_page *__hyp_extract_page(struct hyp_pool 
*pool,
return p;
 }
 
-static inline void hyp_page_ref_inc(struct hyp_page *p)
-{
-   BUG_ON(p->refcount == USHRT_MAX);
-   p->refcount++;
-}
-
-static inline int hyp_page_ref_dec_and_test(struct hyp_page *p)
-{
-   BUG_ON(!p->refcount);
-   p->refcount--;
-   return (p->refcount == 0);
-}
-
-static inline void hyp_set_page_refcounted(struct hyp_page *p)
-{
-   BUG_ON(p->refcount);
-   p->refcount = 1;
-}
-
 static void __hyp_put_page(struct hyp_pool *pool, struct hyp_page *p)
 {
if (hyp_page_ref_dec_and_test(p))
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 06/89] KVM: arm64: Drop stale comment

2022-05-19 Thread Will Deacon

From: Marc Zyngier 

The layout of 'struct kvm_vcpu_arch' has evolved significantly since
the initial port of KVM/arm64, so remove the stale comment suggesting
that a prefix of the structure is used exclusively from assembly code.

Signed-off-by: Marc Zyngier 
---
 arch/arm64/include/asm/kvm_host.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index e3b25dc6c367..14ed7c7ad797 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -339,11 +339,6 @@ struct kvm_vcpu_arch {
struct arch_timer_cpu timer_cpu;
struct kvm_pmu pmu;
 
-   /*
-* Anything that is not used directly from assembly code goes
-* here.
-*/
-
/*
 * Guest registers we preserve during guest debugging.
 *
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 05/89] KVM: arm64: Extend comment in has_vhe()

2022-05-19 Thread Will Deacon

has_vhe() expands to a compile-time constant when evaluated from the VHE
or nVHE code, alternatively checking a static key when called from
elsewhere in the kernel. On face value, this looks like a case of
premature optimization, but in fact this allows symbol references on
VHE-specific code paths to be dropped from the nVHE object.

Expand the comment in has_vhe() to make this clearer, hopefully
discouraging anybody from simplifying the code.

Cc: David Brazdil 
Acked-by: Mark Rutland 
Signed-off-by: Will Deacon 
---
 arch/arm64/include/asm/virt.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index 3c8af033a997..0e80db4327b6 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -113,6 +113,9 @@ static __always_inline bool has_vhe(void)
/*
 * Code only run in VHE/NVHE hyp context can assume VHE is present or
 * absent. Otherwise fall back to caps.
+* This allows the compiler to discard VHE-specific code from the
+* nVHE object, reducing the number of external symbol references
+* needed to link.
 */
if (is_vhe_hyp_code())
return true;
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 04/89] KVM: arm64: Ignore 'kvm-arm.mode=protected' when using VHE

2022-05-19 Thread Will Deacon

Ignore 'kvm-arm.mode=protected' when using VHE so that kvm_get_mode()
only returns KVM_MODE_PROTECTED on systems where the feature is available.

Cc: David Brazdil 
Acked-by: Mark Rutland 
Signed-off-by: Will Deacon 
---
 Documentation/admin-guide/kernel-parameters.txt |  1 -
 arch/arm64/kernel/cpufeature.c  | 10 +-
 arch/arm64/kvm/arm.c|  6 +-
 3 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 3f1cc5e317ed..63a764ec7fec 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2438,7 +2438,6 @@
 
protected: nVHE-based mode with support for guests whose
   state is kept private from the host.
-  Not valid if the kernel is running in EL2.
 
Defaults to VHE/nVHE based on hardware support. Setting
mode to "protected" will disable kexec and hibernation
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index d72c4b4d389c..1bbb7cfc76df 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1925,15 +1925,7 @@ static void cpu_enable_mte(struct arm64_cpu_capabilities 
const *cap)
 #ifdef CONFIG_KVM
 static bool is_kvm_protected_mode(const struct arm64_cpu_capabilities *entry, 
int __unused)
 {
-   if (kvm_get_mode() != KVM_MODE_PROTECTED)
-   return false;
-
-   if (is_kernel_in_hyp_mode()) {
-   pr_warn("Protected KVM not available with VHE\n");
-   return false;
-   }
-
-   return true;
+   return kvm_get_mode() == KVM_MODE_PROTECTED;
 }
 #endif /* CONFIG_KVM */
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 775b52871b51..7f8731306c2a 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -2167,7 +2167,11 @@ static int __init early_kvm_mode_cfg(char *arg)
return -EINVAL;
 
if (strcmp(arg, "protected") == 0) {
-   kvm_mode = KVM_MODE_PROTECTED;
+   if (!is_kernel_in_hyp_mode())
+   kvm_mode = KVM_MODE_PROTECTED;
+   else
+   pr_warn_once("Protected KVM not available with VHE\n");
+
return 0;
}
 
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 03/89] KVM: arm64: Return error from kvm_arch_init_vm() on allocation failure

2022-05-19 Thread Will Deacon

If we fail to allocate the 'supported_cpus' cpumask in kvm_arch_init_vm()
then be sure to return -ENOMEM instead of success (0) on the failure
path.

Signed-off-by: Will Deacon 
---
 arch/arm64/kvm/arm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 523bc934fe2f..775b52871b51 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -146,8 +146,10 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
if (ret)
goto out_free_stage2_pgd;
 
-   if (!zalloc_cpumask_var(>arch.supported_cpus, GFP_KERNEL))
+   if (!zalloc_cpumask_var(>arch.supported_cpus, GFP_KERNEL)) {
+   ret = -ENOMEM;
goto out_free_stage2_pgd;
+   }
cpumask_copy(kvm->arch.supported_cpus, cpu_possible_mask);
 
kvm_vgic_early_init(kvm);
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 00/89] KVM: arm64: Base support for the pKVM hypervisor at EL2

2022-05-19 Thread Will Deacon

Hi all,

This rather large series (based on -rc2) builds on top of the limited
pKVM support available upstream and gets us to a point where the
hypervisor code at EL2 is capable of running guests in both
non-protected and protected mode on the same system. For more background
information about pKVM, the following (slightly dated) LWN article may
be informative:

  https://lwn.net/Articles/836693/

The structure of this series is roughly as follows:

  * Patches 01-06 :
- Some small cleanups and minor fixes.

  * Patches 07-12 :
- Memory management changes at EL2 to allow the donation of memory
  from the host to the hypervisor and the "pinning" of shared memory
  at EL2.

  * Patches 13-16 :
- Introduction of shadow VM and vCPU state at EL2 so that the
  hypervisor can manage guest state using its own private data
  structures, initially populated from the host structures.

  * Patches 17-33 :
- Further memory management changes at EL2 to allow the allocation
  and reclaim of guest memory by the host. This then allows us to
  manage guest stage-2 page-tables entirely at EL2, with the host
  issuing hypercalls to map guest pages in response to faults.

  * Patches 34-78 :
- Gradual reduction of EL2 trust in host data; rather than copy
  blindly between the host and shadow structures, we instead
  selectively sync/flush between them and reduce the amount of host
  data that is accessed directly by EL2.

  * Patches 79-81 :
- Inject an abort into the host if it tries to access a guest page
  for which it does not have permission. This will then deliver a
  SEGV if the access originated from userspace.

  * Patches 82-87 :
- Expose hypercalls to protected guests for sharing memory back with
  the host

  * Patches 88-89 :
- Introduce the new machine type and add some documentation.

We considered splitting this into multiple series, but decided to keep
everything together initially so that reviewers can more easily get an
idea of what we're trying to do and also take it for a spin. The patches
are also available in our git tree here:

  
https://android-kvm.googlesource.com/linux/+/refs/heads/for-upstream/pkvm-base-v1

It's worth pointing out that, although we've been tracking the fd-based
proposal around KVM private memory [1], for now the approach taken here
interacts directly with anonymous pages using a longterm GUP pin. We're
expecting to prototype an fd-based implementation once the discussion at
[2] has converged. In the meantime, we hope to progress the non-protected
VM support.

Finally, there are still some features that we have not included in this
posting and will come later on:

  - Support for read-only memslots and dirty logging for non-protected
VMs. We currently document that this doesn't work (setting the
memslot flags will fail), but we're working to enable this.

  - Support for IOMMU configuration to protect guest memory from DMA
attacks by the host.

  - Support for optional loading of the guest's initial firmware by the
hypervisor.

  - Proxying of host interactions with Trustzone, intercepting and
validating FF-A [3] calls at EL2.

  - Support for restricted MMIO exits to only regions designated as
MMIO by the guest. An earlier version of this work was previously
posted at [4].

  - Hardware debug and PMU support for non-protected guests -- this
builds on the separate series posted at [5] and which is now queued
for 5.19.

  - Guest-side changes to issue the new pKVM hypercalls, for example
sharing back the SWIOTLB buffer with the host for virtio traffic.

Please enjoy,

Will, Quentin, Fuad and Marc

[1] 
https://lore.kernel.org/all/20220310140911.50924-1-chao.p.p...@linux.intel.com/
[2] https://lore.kernel.org/r/20220422105612.gb61...@chaop.bj.intel.com
[3] https://developer.arm.com/documentation/den0077/latest
[4] https://lore.kernel.org/all/20211004174849.2831548-1-...@kernel.org/
[5] https://lore.kernel.org/all/20220510095710.148178-1-ta...@google.com/

Cc: Ard Biesheuvel 
Cc: Sean Christopherson 
Cc: Will Deacon 
Cc: Alexandru Elisei 
Cc: Andy Lutomirski 
Cc: Catalin Marinas 
Cc: James Morse 
Cc: Chao Peng 
Cc: Quentin Perret 
Cc: Suzuki K Poulose 
Cc: Michael Roth 
Cc: Mark Rutland 
Cc: Fuad Tabba 
Cc: Oliver Upton 
Cc: Marc Zyngier 

Cc: kernel-t...@android.com
Cc: k...@vger.kernel.org
Cc: kvmarm@lists.cs.columbia.edu
Cc: linux-arm-ker...@lists.infradead.org

--->8

Fuad Tabba (23):
  KVM: arm64: Add hyp_spinlock_t static initializer
  KVM: arm64: Introduce shadow VM state at EL2
  KVM: arm64: Instantiate VM shadow data from EL1
  KVM: arm64: Do not allow memslot changes after first VM run under pKVM
  KVM: arm64: Add hyp per_cpu variable to track current physical cpu
number
  KVM: arm64: Ensure that TLBs and I-cache are private to each vcpu
  KVM: arm64: Do not pass the vcpu to __pkvm_host_map_guest()
  KVM: arm64: Check directly whether the vcpu is

[PATCH 02/89] KVM: arm64: Remove redundant hyp_assert_lock_held() assertions

2022-05-19 Thread Will Deacon

host_stage2_try() asserts that the KVM host lock is held, so there's no
need to duplicate the assertion in its wrappers.

Signed-off-by: Will Deacon 
---
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c 
b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 78edf077fa3b..1e78acf9662e 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -314,15 +314,11 @@ static int host_stage2_adjust_range(u64 addr, struct 
kvm_mem_range *range)
 int host_stage2_idmap_locked(phys_addr_t addr, u64 size,
 enum kvm_pgtable_prot prot)
 {
-   hyp_assert_lock_held(_kvm.lock);
-
return host_stage2_try(__host_stage2_idmap, addr, addr + size, prot);
 }
 
 int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
 {
-   hyp_assert_lock_held(_kvm.lock);
-
return host_stage2_try(kvm_pgtable_stage2_set_owner, _kvm.pgt,
   addr, size, _s2_pool, owner_id);
 }
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCH 01/89] KVM: arm64: Handle all ID registers trapped for a protected VM

2022-05-19 Thread Will Deacon

From: Marc Zyngier 

A protected VM accessing ID_AA64ISAR2_EL1 gets punished with an UNDEF,
while it really should only get a zero back if the register is not
handled by the hypervisor emulation (as mandated by the architecture).

Introduce all the missing ID registers (including the unallocated ones),
and have them to return 0.

Reported-by: Will Deacon 
Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/hyp/nvhe/sys_regs.c | 42 --
 1 file changed, 34 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/sys_regs.c 
b/arch/arm64/kvm/hyp/nvhe/sys_regs.c
index 33f5181af330..188fed1c174b 100644
--- a/arch/arm64/kvm/hyp/nvhe/sys_regs.c
+++ b/arch/arm64/kvm/hyp/nvhe/sys_regs.c
@@ -246,15 +246,9 @@ u64 pvm_read_id_reg(const struct kvm_vcpu *vcpu, u32 id)
case SYS_ID_AA64MMFR2_EL1:
return get_pvm_id_aa64mmfr2(vcpu);
default:
-   /*
-* Should never happen because all cases are covered in
-* pvm_sys_reg_descs[].
-*/
-   WARN_ON(1);
-   break;
+   /* Unhandled ID register, RAZ */
+   return 0;
}
-
-   return 0;
 }
 
 static u64 read_id_reg(const struct kvm_vcpu *vcpu,
@@ -335,6 +329,16 @@ static bool pvm_gic_read_sre(struct kvm_vcpu *vcpu,
 /* Mark the specified system register as an AArch64 feature id register. */
 #define AARCH64(REG) { SYS_DESC(REG), .access = pvm_access_id_aarch64 }
 
+/*
+ * sys_reg_desc initialiser for architecturally unallocated cpufeature ID
+ * register with encoding Op0=3, Op1=0, CRn=0, CRm=crm, Op2=op2
+ * (1 <= crm < 8, 0 <= Op2 < 8).
+ */
+#define ID_UNALLOCATED(crm, op2) { \
+   Op0(3), Op1(0), CRn(0), CRm(crm), Op2(op2), \
+   .access = pvm_access_id_aarch64,\
+}
+
 /* Mark the specified system register as Read-As-Zero/Write-Ignored */
 #define RAZ_WI(REG) { SYS_DESC(REG), .access = pvm_access_raz_wi }
 
@@ -378,24 +382,46 @@ static const struct sys_reg_desc pvm_sys_reg_descs[] = {
AARCH32(SYS_MVFR0_EL1),
AARCH32(SYS_MVFR1_EL1),
AARCH32(SYS_MVFR2_EL1),
+   ID_UNALLOCATED(3,3),
AARCH32(SYS_ID_PFR2_EL1),
AARCH32(SYS_ID_DFR1_EL1),
AARCH32(SYS_ID_MMFR5_EL1),
+   ID_UNALLOCATED(3,7),
 
/* AArch64 ID registers */
/* CRm=4 */
AARCH64(SYS_ID_AA64PFR0_EL1),
AARCH64(SYS_ID_AA64PFR1_EL1),
+   ID_UNALLOCATED(4,2),
+   ID_UNALLOCATED(4,3),
AARCH64(SYS_ID_AA64ZFR0_EL1),
+   ID_UNALLOCATED(4,5),
+   ID_UNALLOCATED(4,6),
+   ID_UNALLOCATED(4,7),
AARCH64(SYS_ID_AA64DFR0_EL1),
AARCH64(SYS_ID_AA64DFR1_EL1),
+   ID_UNALLOCATED(5,2),
+   ID_UNALLOCATED(5,3),
AARCH64(SYS_ID_AA64AFR0_EL1),
AARCH64(SYS_ID_AA64AFR1_EL1),
+   ID_UNALLOCATED(5,6),
+   ID_UNALLOCATED(5,7),
AARCH64(SYS_ID_AA64ISAR0_EL1),
AARCH64(SYS_ID_AA64ISAR1_EL1),
+   AARCH64(SYS_ID_AA64ISAR2_EL1),
+   ID_UNALLOCATED(6,3),
+   ID_UNALLOCATED(6,4),
+   ID_UNALLOCATED(6,5),
+   ID_UNALLOCATED(6,6),
+   ID_UNALLOCATED(6,7),
AARCH64(SYS_ID_AA64MMFR0_EL1),
AARCH64(SYS_ID_AA64MMFR1_EL1),
AARCH64(SYS_ID_AA64MMFR2_EL1),
+   ID_UNALLOCATED(7,3),
+   ID_UNALLOCATED(7,4),
+   ID_UNALLOCATED(7,5),
+   ID_UNALLOCATED(7,6),
+   ID_UNALLOCATED(7,7),
 
/* Scalable Vector Registers are restricted. */
 
-- 
2.36.1.124.g0e6072fb45-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[GIT PULL] KVM/arm64 updates for 5.19

2022-05-19 Thread Marc Zyngier

Hi Paolo,

Here's the bulk of the KVM/arm64 updates for 5.19. Major features are
guard pages for the EL2 stacks, save/restore of the guest-visible
hypercall configuration and PSCI suspend support. Further details in
the tag description.

Note that this PR contains a shared branch with the arm64 tree
containing the SME patches to resolve conflicts with the WFxT support
branch.

Please pull,

M.

The following changes since commit 672c0c5173427e6b3e2a9bbb7be51ceeec78093a:

  Linux 5.18-rc5 (2022-05-01 13:57:58 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git 
tags/kvmarm-5.19

for you to fetch changes up to 5c0ad551e9aa6188f2bda0977c1cb6768a2b74ef:

  Merge branch kvm-arm64/its-save-restore-fixes-5.19 into kvmarm-master/next 
(2022-05-16 17:48:36 +0100)


KVM/arm64 updates for 5.19

- Add support for the ARMv8.6 WFxT extension

- Guard pages for the EL2 stacks

- Trap and emulate AArch32 ID registers to hide unsupported features

- Ability to select and save/restore the set of hypercalls exposed
  to the guest

- Support for PSCI-initiated suspend in collaboration with userspace

- GICv3 register-based LPI invalidation support

- Move host PMU event merging into the vcpu data structure

- GICv3 ITS save/restore fixes

- The usual set of small-scale cleanups and fixes


Alexandru Elisei (3):
  KVM: arm64: Hide AArch32 PMU registers when not available
  KVM: arm64: Don't BUG_ON() if emulated register table is unsorted
  KVM: arm64: Print emulated register table name when it is unsorted

Ard Biesheuvel (1):
  KVM: arm64: Avoid unnecessary absolute addressing via literals

Fuad Tabba (4):
  KVM: arm64: Wrapper for getting pmu_events
  KVM: arm64: Repack struct kvm_pmu to reduce size
  KVM: arm64: Pass pmu events to hyp via vcpu
  KVM: arm64: Reenable pmu in Protected Mode

Kalesh Singh (6):
  KVM: arm64: Introduce hyp_alloc_private_va_range()
  KVM: arm64: Introduce pkvm_alloc_private_va_range()
  KVM: arm64: Add guard pages for KVM nVHE hypervisor stack
  KVM: arm64: Add guard pages for pKVM (protected nVHE) hypervisor stack
  KVM: arm64: Detect and handle hypervisor stack overflows
  KVM: arm64: Symbolize the nVHE HYP addresses

Marc Zyngier (30):
  arm64: Expand ESR_ELx_WFx_ISS_TI to match its ARMv8.7 definition
  arm64: Add RV and RN fields for ESR_ELx_WFx_ISS
  arm64: Add HWCAP advertising FEAT_WFXT
  arm64: Add wfet()/wfit() helpers
  arm64: Use WFxT for __delay() when possible
  KVM: arm64: Simplify kvm_cpu_has_pending_timer()
  KVM: arm64: Introduce kvm_counter_compute_delta() helper
  KVM: arm64: Handle blocking WFIT instruction
  KVM: arm64: Offer early resume for non-blocking WFxT instructions
  KVM: arm64: Expose the WFXT feature to guests
  KVM: arm64: Fix new instances of 32bit ESRs
  Merge remote-tracking branch 'arm64/for-next/sme' into kvmarm-master/next
  Merge branch kvm-arm64/wfxt into kvmarm-master/next
  Merge branch kvm-arm64/hyp-stack-guard into kvmarm-master/next
  Merge branch kvm-arm64/aarch32-idreg-trap into kvmarm-master/next
  Documentation: Fix index.rst after psci.rst renaming
  irqchip/gic-v3: Exposes bit values for GICR_CTLR.{IR, CES}
  KVM: arm64: vgic-v3: Expose GICR_CTLR.RWP when disabling LPIs
  KVM: arm64: vgic-v3: Implement MMIO-based LPI invalidation
  KVM: arm64: vgic-v3: Advertise GICR_CTLR.{IR, CES} as a new GICD_IIDR 
revision
  KVM: arm64: vgic-v3: List M1 Pro/Max as requiring the SEIS workaround
  KVM: arm64: Hide KVM_REG_ARM_*_BMAP_BIT_COUNT from userspace
  KVM: arm64: pmu: Restore compilation when HW_PERF_EVENTS isn't selected
  KVM: arm64: Fix hypercall bitmap writeback when vcpus have already run
  Merge branch kvm-arm64/hcall-selection into kvmarm-master/next
  Merge branch kvm-arm64/psci-suspend into kvmarm-master/next
  Merge branch kvm-arm64/vgic-invlpir into kvmarm-master/next
  Merge branch kvm-arm64/per-vcpu-host-pmu-data into kvmarm-master/next
  Merge branch kvm-arm64/misc-5.19 into kvmarm-master/next
  Merge branch kvm-arm64/its-save-restore-fixes-5.19 into kvmarm-master/next

Mark Brown (25):
  arm64/sme: Provide ABI documentation for SME
  arm64/sme: System register and exception syndrome definitions
  arm64/sme: Manually encode SME instructions
  arm64/sme: Early CPU setup for SME
  arm64/sme: Basic enumeration support
  arm64/sme: Identify supported SME vector lengths at boot
  arm64/sme: Implement sysctl to set the default vector length
  arm64/sme: Implement vector length configuration prctl()s
  arm64/sme: Implement support for TPIDR2
  arm64/sme: Implement SVCR context switching
  arm64/sme: Implement streaming SVE context

92 matches

Mail list logo