Re: KVM/arm64: SPE: Translate VA to IPA on a stage 2 fault instead of pinning VM memory

2022-09-13 Thread Oliver Upton
On Tue, Sep 13, 2022 at 01:41:56PM +0100, Alexandru Elisei wrote:
> Hi Oliver,
> 
> On Tue, Sep 13, 2022 at 11:58:47AM +0100, Oliver Upton wrote:
> > Hey Alex,
> > 
> > On Mon, Sep 12, 2022 at 03:50:46PM +0100, Alexandru Elisei wrote:
> > 
> > [...]
> > 
> > > > Yeah, that would be good to follow up on what other OSes are doing.
> > > 
> > > FreeBSD doesn't have a SPE driver.
> > > 
> > > Currently in the process of finding out how/if Windows implements the
> > > driver.
> > > 
> > > > You'll still have a nondestructive S2 fault handler for the SPE, right?
> > > > IOW, if PMBSR_EL1.DL=0 KVM will just unpin the old buffer and repin the
> > > > new one.
> > > 
> > > This is how I think about it: a S2 DABT where DL == 0 can happen because 
> > > of
> > > something that the VMM, KVM or the guest has done:
> > > 
> > > 1. If it's because of something that the host's userspace did (memslot was
> > > changed while the VM was running, memory was munmap'ed, etc). In this 
> > > case,
> > > there's no way for KVM to handle the SPE fault, so I would say that the
> > > sensible approach would be to inject an SPE external abort.
> > > 
> > > 2. If it's because of something that KVM did, that can only be because of 
> > > a
> > > bug in SPE emulation. In this case, it can happen again, which means
> > > arbitrary blackout windows which can skew the profiling results. I would
> > > much rather inject an SPE external abort then let the guest rely on
> > > potentially bad profiling information.
> > > 
> > > 3. The guest changes the mapping for the buffer when it shouldn't have: A.
> > > when the architecture does allow it, but KVM doesn't support, or B. when
> > > the architecture doesn't allow it. For both cases, I would much rather
> > > inject an SPE external abort for the reasons above. Furthermore, for B, I
> > > think it would be better to let the guest know as soon as possible that
> > > it's not following the architecture.
> > > 
> > > In conclusion, I would prefer to treat all SPE S2 faults as errors.
> > 
> > My main concern with treating S2 faults as a synthetic external abort is
> > how this behavior progresses in later versions of the architecture.
> > SPEv1p3 disallows implementations from reporting external aborts via the
> > SPU, instead allowing only for an SError to be delivered to the core.
> 
> Ah, yes, missed that bit for SPEv1p3 (ARM DDI 0487H.a, page D10-5180).
> 
> > 
> > I caught up with Will on this for a little bit:
> > 
> > Instead of an external abort, how about reporting an IMP DEF buffer
> > management event to the guest? At least for the Linux driver it should
> > have the same effect of killing the session but the VM will stay
> > running. This way there's no architectural requirement to promote to an
> > SError.
> 
> The only reason I proposed to inject an external abort is because KVM needs
> a way to tell the guest that something outside of the guest's control went
> wrong and it should drop the contents of the current profiling session. An
> external abort reported by the SPU seemed to fit the bit.
> 
> By IMP DEF buffer management event I assume you mean PMBSR_EL1.EC=0b01
> (Buffer management event for an IMPLEMENTATION DEFINED reason).

Yup, that's it. You also get two whole bytes of room in PMBSR_EL1.MSS
which is also IMP DEF, so we could even stick some ASCII in there to
tell the guest how we really feel! :-P

> I'm thinking that someone might run a custom kernel in a VM, like a vendor
> downstream kernel, with patches that actually handle this exception class,
> and injecting such an exception might not have the effects that KVM
> expects. Am I overthinking things? Is that something that KVM should take
> into consideration? I suppose KVM can and should also set
> PMBSR_EL1.DL = 1, as that means per the architecture that the buffer
> contents should be discarded.

I agree with you that PMBSR_EL1.DL=1 is the right call for this. With
that, I'd be surprised if there was a guest that tried to pull some
tricks other than blowing away the profile. The other option that I
find funny is if we plainly report the S2 abort to the guest, but that
wont work well when nested comes into the picture.

--
Thanks,
Oliver
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kernel] KVM: PPC: Make KVM_CAP_IRQFD_RESAMPLE platform dependent

2022-09-13 Thread Alexey Kardashevskiy

Ping? It's been a while and probably got lost :-/

On 18/05/2022 16:27, Alexey Kardashevskiy wrote:



On 5/4/22 17:48, Alexey Kardashevskiy wrote:

When introduced, IRQFD resampling worked on POWER8 with XICS. However
KVM on POWER9 has never implemented it - the compatibility mode code
("XICS-on-XIVE") misses the kvm_notify_acked_irq() call and the native
XIVE mode does not handle INTx in KVM at all.

This moved the capability support advertising to platforms and stops
advertising it on XIVE, i.e. POWER9 and later.

Signed-off-by: Alexey Kardashevskiy 
---


Or I could move this one together with KVM_CAP_IRQFD. Thoughts?



Ping?



---
  arch/arm64/kvm/arm.c   | 3 +++
  arch/mips/kvm/mips.c   | 3 +++
  arch/powerpc/kvm/powerpc.c | 6 ++
  arch/riscv/kvm/vm.c    | 3 +++
  arch/s390/kvm/kvm-s390.c   | 3 +++
  arch/x86/kvm/x86.c | 3 +++
  virt/kvm/kvm_main.c    | 1 -
  7 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 523bc934fe2f..092f0614bae3 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -210,6 +210,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, 
long ext)

  case KVM_CAP_SET_GUEST_DEBUG:
  case KVM_CAP_VCPU_ATTRIBUTES:
  case KVM_CAP_PTP_KVM:
+#ifdef CONFIG_HAVE_KVM_IRQFD
+    case KVM_CAP_IRQFD_RESAMPLE:
+#endif
  r = 1;
  break;
  case KVM_CAP_SET_GUEST_DEBUG2:
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index a25e0b73ee70..0f3de470a73e 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -1071,6 +1071,9 @@ int kvm_vm_ioctl_check_extension(struct kvm 
*kvm, long ext)

  case KVM_CAP_READONLY_MEM:
  case KVM_CAP_SYNC_MMU:
  case KVM_CAP_IMMEDIATE_EXIT:
+#ifdef CONFIG_HAVE_KVM_IRQFD
+    case KVM_CAP_IRQFD_RESAMPLE:
+#endif
  r = 1;
  break;
  case KVM_CAP_NR_VCPUS:
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 875c30c12db0..87698ffef3be 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -591,6 +591,12 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, 
long ext)

  break;
  #endif
+#ifdef CONFIG_HAVE_KVM_IRQFD
+    case KVM_CAP_IRQFD_RESAMPLE:
+    r = !xive_enabled();
+    break;
+#endif
+
  case KVM_CAP_PPC_ALLOC_HTAB:
  r = hv_enabled;
  break;
diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
index c768f75279ef..b58579b386bb 100644
--- a/arch/riscv/kvm/vm.c
+++ b/arch/riscv/kvm/vm.c
@@ -63,6 +63,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, 
long ext)

  case KVM_CAP_READONLY_MEM:
  case KVM_CAP_MP_STATE:
  case KVM_CAP_IMMEDIATE_EXIT:
+#ifdef CONFIG_HAVE_KVM_IRQFD
+    case KVM_CAP_IRQFD_RESAMPLE:
+#endif
  r = 1;
  break;
  case KVM_CAP_NR_VCPUS:
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 156d1c25a3c1..85e093fc8d13 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -564,6 +564,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, 
long ext)

  case KVM_CAP_SET_GUEST_DEBUG:
  case KVM_CAP_S390_DIAG318:
  case KVM_CAP_S390_MEM_OP_EXTENSION:
+#ifdef CONFIG_HAVE_KVM_IRQFD
+    case KVM_CAP_IRQFD_RESAMPLE:
+#endif
  r = 1;
  break;
  case KVM_CAP_SET_GUEST_DEBUG2:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0c0ca599a353..a0a7b769483d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4273,6 +4273,9 @@ int kvm_vm_ioctl_check_extension(struct kvm 
*kvm, long ext)

  case KVM_CAP_SYS_ATTRIBUTES:
  case KVM_CAP_VAPIC:
  case KVM_CAP_ENABLE_CAP:
+#ifdef CONFIG_HAVE_KVM_IRQFD
+    case KVM_CAP_IRQFD_RESAMPLE:
+#endif
  r = 1;
  break;
  case KVM_CAP_EXIT_HYPERCALL:
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 70e05af5ebea..885e72e668a5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4293,7 +4293,6 @@ static long 
kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)

  #endif
  #ifdef CONFIG_HAVE_KVM_IRQFD
  case KVM_CAP_IRQFD:
-    case KVM_CAP_IRQFD_RESAMPLE:
  #endif
  case KVM_CAP_IOEVENTFD_ANY_LENGTH:
  case KVM_CAP_CHECK_EXTENSION_VM:




--
Alexey
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: KVM/arm64: SPE: Translate VA to IPA on a stage 2 fault instead of pinning VM memory

2022-09-13 Thread Alexandru Elisei
Hi Oliver,

On Tue, Sep 13, 2022 at 11:58:47AM +0100, Oliver Upton wrote:
> Hey Alex,
> 
> On Mon, Sep 12, 2022 at 03:50:46PM +0100, Alexandru Elisei wrote:
> 
> [...]
> 
> > > Yeah, that would be good to follow up on what other OSes are doing.
> > 
> > FreeBSD doesn't have a SPE driver.
> > 
> > Currently in the process of finding out how/if Windows implements the
> > driver.
> > 
> > > You'll still have a nondestructive S2 fault handler for the SPE, right?
> > > IOW, if PMBSR_EL1.DL=0 KVM will just unpin the old buffer and repin the
> > > new one.
> > 
> > This is how I think about it: a S2 DABT where DL == 0 can happen because of
> > something that the VMM, KVM or the guest has done:
> > 
> > 1. If it's because of something that the host's userspace did (memslot was
> > changed while the VM was running, memory was munmap'ed, etc). In this case,
> > there's no way for KVM to handle the SPE fault, so I would say that the
> > sensible approach would be to inject an SPE external abort.
> > 
> > 2. If it's because of something that KVM did, that can only be because of a
> > bug in SPE emulation. In this case, it can happen again, which means
> > arbitrary blackout windows which can skew the profiling results. I would
> > much rather inject an SPE external abort then let the guest rely on
> > potentially bad profiling information.
> > 
> > 3. The guest changes the mapping for the buffer when it shouldn't have: A.
> > when the architecture does allow it, but KVM doesn't support, or B. when
> > the architecture doesn't allow it. For both cases, I would much rather
> > inject an SPE external abort for the reasons above. Furthermore, for B, I
> > think it would be better to let the guest know as soon as possible that
> > it's not following the architecture.
> > 
> > In conclusion, I would prefer to treat all SPE S2 faults as errors.
> 
> My main concern with treating S2 faults as a synthetic external abort is
> how this behavior progresses in later versions of the architecture.
> SPEv1p3 disallows implementations from reporting external aborts via the
> SPU, instead allowing only for an SError to be delivered to the core.

Ah, yes, missed that bit for SPEv1p3 (ARM DDI 0487H.a, page D10-5180).

> 
> I caught up with Will on this for a little bit:
> 
> Instead of an external abort, how about reporting an IMP DEF buffer
> management event to the guest? At least for the Linux driver it should
> have the same effect of killing the session but the VM will stay
> running. This way there's no architectural requirement to promote to an
> SError.

The only reason I proposed to inject an external abort is because KVM needs
a way to tell the guest that something outside of the guest's control went
wrong and it should drop the contents of the current profiling session. An
external abort reported by the SPU seemed to fit the bit.

By IMP DEF buffer management event I assume you mean PMBSR_EL1.EC=0b01
(Buffer management event for an IMPLEMENTATION DEFINED reason). I'm
thinking that someone might run a custom kernel in a VM, like a vendor
downstream kernel, with patches that actually handle this exception class,
and injecting such an exception might not have the effects that KVM
expects. Am I overthinking things? Is that something that KVM should take
into consideration? I suppose KVM can and should also set
PMBSR_EL1.DL = 1, as that means per the architecture that the buffer
contents should be discarded.

Thanks,
Alex
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: KVM/arm64: SPE: Translate VA to IPA on a stage 2 fault instead of pinning VM memory

2022-09-13 Thread Oliver Upton
Hey Alex,

On Mon, Sep 12, 2022 at 03:50:46PM +0100, Alexandru Elisei wrote:

[...]

> > Yeah, that would be good to follow up on what other OSes are doing.
> 
> FreeBSD doesn't have a SPE driver.
> 
> Currently in the process of finding out how/if Windows implements the
> driver.
> 
> > You'll still have a nondestructive S2 fault handler for the SPE, right?
> > IOW, if PMBSR_EL1.DL=0 KVM will just unpin the old buffer and repin the
> > new one.
> 
> This is how I think about it: a S2 DABT where DL == 0 can happen because of
> something that the VMM, KVM or the guest has done:
> 
> 1. If it's because of something that the host's userspace did (memslot was
> changed while the VM was running, memory was munmap'ed, etc). In this case,
> there's no way for KVM to handle the SPE fault, so I would say that the
> sensible approach would be to inject an SPE external abort.
> 
> 2. If it's because of something that KVM did, that can only be because of a
> bug in SPE emulation. In this case, it can happen again, which means
> arbitrary blackout windows which can skew the profiling results. I would
> much rather inject an SPE external abort then let the guest rely on
> potentially bad profiling information.
> 
> 3. The guest changes the mapping for the buffer when it shouldn't have: A.
> when the architecture does allow it, but KVM doesn't support, or B. when
> the architecture doesn't allow it. For both cases, I would much rather
> inject an SPE external abort for the reasons above. Furthermore, for B, I
> think it would be better to let the guest know as soon as possible that
> it's not following the architecture.
> 
> In conclusion, I would prefer to treat all SPE S2 faults as errors.

My main concern with treating S2 faults as a synthetic external abort is
how this behavior progresses in later versions of the architecture.
SPEv1p3 disallows implementations from reporting external aborts via the
SPU, instead allowing only for an SError to be delivered to the core.

I caught up with Will on this for a little bit:

Instead of an external abort, how about reporting an IMP DEF buffer
management event to the guest? At least for the Linux driver it should
have the same effect of killing the session but the VM will stay
running. This way there's no architectural requirement to promote to an
SError.

--
Thanks,
Oliver
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 7/7] KVM: selftests: Add test for AArch32 ID registers

2022-09-13 Thread Oliver Upton
Add a test to assert that KVM handles the AArch64 views of the AArch32
ID registers as RAZ/WI (writable only from userspace). For registers
that were already hidden or unallocated, expect RAZ + invariant
behavior.

Signed-off-by: Oliver Upton 
---
 tools/testing/selftests/kvm/.gitignore|   1 +
 tools/testing/selftests/kvm/Makefile  |   1 +
 .../selftests/kvm/aarch64/aarch32_id_regs.c   | 169 ++
 3 files changed, 171 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/aarch64/aarch32_id_regs.c

diff --git a/tools/testing/selftests/kvm/.gitignore 
b/tools/testing/selftests/kvm/.gitignore
index d625a3f83780..87d1a0b1bae0 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -1,4 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
+/aarch64/aarch32_id_regs
 /aarch64/arch_timer
 /aarch64/debug-exceptions
 /aarch64/get-reg-list
diff --git a/tools/testing/selftests/kvm/Makefile 
b/tools/testing/selftests/kvm/Makefile
index 4c122f1b1737..784abe7f0962 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -144,6 +144,7 @@ TEST_GEN_PROGS_x86_64 += system_counter_offset_test
 # Compiled outputs used by test targets
 TEST_GEN_PROGS_EXTENDED_x86_64 += x86_64/nx_huge_pages_test
 
+TEST_GEN_PROGS_aarch64 += aarch64/aarch32_id_regs
 TEST_GEN_PROGS_aarch64 += aarch64/arch_timer
 TEST_GEN_PROGS_aarch64 += aarch64/debug-exceptions
 TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
diff --git a/tools/testing/selftests/kvm/aarch64/aarch32_id_regs.c 
b/tools/testing/selftests/kvm/aarch64/aarch32_id_regs.c
new file mode 100644
index ..6f9c1f19c7f6
--- /dev/null
+++ b/tools/testing/selftests/kvm/aarch64/aarch32_id_regs.c
@@ -0,0 +1,169 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * aarch32_id_regs - Test for ID register behavior on AArch64-only systems
+ *
+ * Copyright (c) 2022 Google LLC.
+ *
+ * Test that KVM handles the AArch64 views of the AArch32 ID registers as RAZ
+ * and WI from userspace.
+ */
+
+#include 
+
+#include "kvm_util.h"
+#include "processor.h"
+#include "test_util.h"
+
+#define BAD_ID_REG_VAL 0x1badc0deul
+
+#define GUEST_ASSERT_REG_RAZ(reg)  GUEST_ASSERT_EQ(read_sysreg_s(reg), 0)
+
+static void guest_main(void)
+{
+   GUEST_ASSERT_REG_RAZ(SYS_ID_PFR0_EL1);
+   GUEST_ASSERT_REG_RAZ(SYS_ID_PFR1_EL1);
+   GUEST_ASSERT_REG_RAZ(SYS_ID_DFR0_EL1);
+   GUEST_ASSERT_REG_RAZ(SYS_ID_AFR0_EL1);
+   GUEST_ASSERT_REG_RAZ(SYS_ID_MMFR0_EL1);
+   GUEST_ASSERT_REG_RAZ(SYS_ID_MMFR1_EL1);
+   GUEST_ASSERT_REG_RAZ(SYS_ID_MMFR2_EL1);
+   GUEST_ASSERT_REG_RAZ(SYS_ID_MMFR3_EL1);
+   GUEST_ASSERT_REG_RAZ(SYS_ID_ISAR0_EL1);
+   GUEST_ASSERT_REG_RAZ(SYS_ID_ISAR1_EL1);
+   GUEST_ASSERT_REG_RAZ(SYS_ID_ISAR2_EL1);
+   GUEST_ASSERT_REG_RAZ(SYS_ID_ISAR3_EL1);
+   GUEST_ASSERT_REG_RAZ(SYS_ID_ISAR4_EL1);
+   GUEST_ASSERT_REG_RAZ(SYS_ID_ISAR5_EL1);
+   GUEST_ASSERT_REG_RAZ(SYS_ID_MMFR4_EL1);
+   GUEST_ASSERT_REG_RAZ(SYS_ID_ISAR6_EL1);
+   GUEST_ASSERT_REG_RAZ(SYS_MVFR0_EL1);
+   GUEST_ASSERT_REG_RAZ(SYS_MVFR1_EL1);
+   GUEST_ASSERT_REG_RAZ(SYS_MVFR2_EL1);
+   GUEST_ASSERT_REG_RAZ(sys_reg(3, 0, 0, 3, 3));
+   GUEST_ASSERT_REG_RAZ(SYS_ID_PFR2_EL1);
+   GUEST_ASSERT_REG_RAZ(SYS_ID_DFR1_EL1);
+   GUEST_ASSERT_REG_RAZ(SYS_ID_MMFR5_EL1);
+   GUEST_ASSERT_REG_RAZ(sys_reg(3, 0, 0, 3, 7));
+
+   GUEST_DONE();
+}
+
+static void test_guest_raz(struct kvm_vcpu *vcpu)
+{
+   struct ucall uc;
+
+   vcpu_run(vcpu);
+
+   switch (get_ucall(vcpu, &uc)) {
+   case UCALL_ABORT:
+   REPORT_GUEST_ASSERT(uc);
+   break;
+   case UCALL_DONE:
+   break;
+   default:
+   TEST_FAIL("Unexpected ucall: %lu", uc.cmd);
+   }
+}
+
+static uint64_t raz_wi_reg_ids[] = {
+   KVM_ARM64_SYS_REG(SYS_ID_PFR0_EL1),
+   KVM_ARM64_SYS_REG(SYS_ID_PFR1_EL1),
+   KVM_ARM64_SYS_REG(SYS_ID_DFR0_EL1),
+   KVM_ARM64_SYS_REG(SYS_ID_MMFR0_EL1),
+   KVM_ARM64_SYS_REG(SYS_ID_MMFR1_EL1),
+   KVM_ARM64_SYS_REG(SYS_ID_MMFR2_EL1),
+   KVM_ARM64_SYS_REG(SYS_ID_MMFR3_EL1),
+   KVM_ARM64_SYS_REG(SYS_ID_ISAR0_EL1),
+   KVM_ARM64_SYS_REG(SYS_ID_ISAR1_EL1),
+   KVM_ARM64_SYS_REG(SYS_ID_ISAR2_EL1),
+   KVM_ARM64_SYS_REG(SYS_ID_ISAR3_EL1),
+   KVM_ARM64_SYS_REG(SYS_ID_ISAR4_EL1),
+   KVM_ARM64_SYS_REG(SYS_ID_ISAR5_EL1),
+   KVM_ARM64_SYS_REG(SYS_ID_MMFR4_EL1),
+   KVM_ARM64_SYS_REG(SYS_ID_ISAR6_EL1),
+   KVM_ARM64_SYS_REG(SYS_MVFR0_EL1),
+   KVM_ARM64_SYS_REG(SYS_MVFR1_EL1),
+   KVM_ARM64_SYS_REG(SYS_MVFR2_EL1),
+   KVM_ARM64_SYS_REG(SYS_ID_PFR2_EL1),
+   KVM_ARM64_SYS_REG(SYS_ID_MMFR5_EL1),
+};
+
+static void test_user_raz_wi(struct kvm_vcpu *vcpu)
+{
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(raz_wi_reg_ids); i++) {
+   uint64_t reg_id = raz_wi_reg_ids[i];
+   uint64_t val;
+
+

[PATCH v3 6/7] KVM: arm64: Treat 32bit ID registers as RAZ/WI on 64bit-only system

2022-09-13 Thread Oliver Upton
One of the oddities of the architecture is that the AArch64 views of the
AArch32 ID registers are UNKNOWN if AArch32 isn't implemented at any EL.
Nonetheless, KVM exposes these registers to userspace for the sake of
save/restore. It is possible that the UNKNOWN value could differ between
systems, leading to a rejected write from userspace.

Avoid the issue altogether by handling the AArch32 ID registers as
RAZ/WI when on an AArch64-only system.

Signed-off-by: Oliver Upton 
---
 arch/arm64/kvm/sys_regs.c | 63 ++-
 1 file changed, 43 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 6d0511247df4..9569772cf09a 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1144,6 +1144,20 @@ static unsigned int id_visibility(const struct kvm_vcpu 
*vcpu,
return 0;
 }
 
+static unsigned int aa32_id_visibility(const struct kvm_vcpu *vcpu,
+  const struct sys_reg_desc *r)
+{
+   /*
+* AArch32 ID registers are UNKNOWN if AArch32 isn't implemented at any
+* EL. Promote to RAZ/WI in order to guarantee consistency between
+* systems.
+*/
+   if (!kvm_supports_32bit_el0())
+   return REG_RAZ | REG_USER_WI;
+
+   return id_visibility(vcpu, r);
+}
+
 static unsigned int raz_visibility(const struct kvm_vcpu *vcpu,
   const struct sys_reg_desc *r)
 {
@@ -1331,6 +1345,15 @@ static unsigned int mte_visibility(const struct kvm_vcpu 
*vcpu,
.visibility = id_visibility,\
 }
 
+/* sys_reg_desc initialiser for known cpufeature ID registers */
+#define AA32_ID_SANITISED(name) {  \
+   SYS_DESC(SYS_##name),   \
+   .access = access_id_reg,\
+   .get_user = get_id_reg, \
+   .set_user = set_id_reg, \
+   .visibility = aa32_id_visibility,   \
+}
+
 /*
  * sys_reg_desc initialiser for architecturally unallocated cpufeature ID
  * register with encoding Op0=3, Op1=0, CRn=0, CRm=crm, Op2=op2
@@ -1418,33 +1441,33 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 
/* AArch64 mappings of the AArch32 ID registers */
/* CRm=1 */
-   ID_SANITISED(ID_PFR0_EL1),
-   ID_SANITISED(ID_PFR1_EL1),
-   ID_SANITISED(ID_DFR0_EL1),
+   AA32_ID_SANITISED(ID_PFR0_EL1),
+   AA32_ID_SANITISED(ID_PFR1_EL1),
+   AA32_ID_SANITISED(ID_DFR0_EL1),
ID_HIDDEN(ID_AFR0_EL1),
-   ID_SANITISED(ID_MMFR0_EL1),
-   ID_SANITISED(ID_MMFR1_EL1),
-   ID_SANITISED(ID_MMFR2_EL1),
-   ID_SANITISED(ID_MMFR3_EL1),
+   AA32_ID_SANITISED(ID_MMFR0_EL1),
+   AA32_ID_SANITISED(ID_MMFR1_EL1),
+   AA32_ID_SANITISED(ID_MMFR2_EL1),
+   AA32_ID_SANITISED(ID_MMFR3_EL1),
 
/* CRm=2 */
-   ID_SANITISED(ID_ISAR0_EL1),
-   ID_SANITISED(ID_ISAR1_EL1),
-   ID_SANITISED(ID_ISAR2_EL1),
-   ID_SANITISED(ID_ISAR3_EL1),
-   ID_SANITISED(ID_ISAR4_EL1),
-   ID_SANITISED(ID_ISAR5_EL1),
-   ID_SANITISED(ID_MMFR4_EL1),
-   ID_SANITISED(ID_ISAR6_EL1),
+   AA32_ID_SANITISED(ID_ISAR0_EL1),
+   AA32_ID_SANITISED(ID_ISAR1_EL1),
+   AA32_ID_SANITISED(ID_ISAR2_EL1),
+   AA32_ID_SANITISED(ID_ISAR3_EL1),
+   AA32_ID_SANITISED(ID_ISAR4_EL1),
+   AA32_ID_SANITISED(ID_ISAR5_EL1),
+   AA32_ID_SANITISED(ID_MMFR4_EL1),
+   AA32_ID_SANITISED(ID_ISAR6_EL1),
 
/* CRm=3 */
-   ID_SANITISED(MVFR0_EL1),
-   ID_SANITISED(MVFR1_EL1),
-   ID_SANITISED(MVFR2_EL1),
+   AA32_ID_SANITISED(MVFR0_EL1),
+   AA32_ID_SANITISED(MVFR1_EL1),
+   AA32_ID_SANITISED(MVFR2_EL1),
ID_UNALLOCATED(3,3),
-   ID_SANITISED(ID_PFR2_EL1),
+   AA32_ID_SANITISED(ID_PFR2_EL1),
ID_HIDDEN(ID_DFR1_EL1),
-   ID_SANITISED(ID_MMFR5_EL1),
+   AA32_ID_SANITISED(ID_MMFR5_EL1),
ID_UNALLOCATED(3,7),
 
/* AArch64 ID registers */
-- 
2.37.2.789.g6183377224-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 5/7] KVM: arm64: Add a visibility bit to ignore user writes

2022-09-13 Thread Oliver Upton
We're about to ignore writes to AArch32 ID registers on AArch64-only
systems. Add a bit to indicate a register is handled as write ignore
when accessed from userspace.

Signed-off-by: Oliver Upton 
Reviewed-by: Reiji Watanabe 
---
 arch/arm64/kvm/sys_regs.c | 3 +++
 arch/arm64/kvm/sys_regs.h | 7 +++
 2 files changed, 10 insertions(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 0e20a311ea20..6d0511247df4 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2775,6 +2775,9 @@ int kvm_sys_reg_set_user(struct kvm_vcpu *vcpu, const 
struct kvm_one_reg *reg,
if (!r)
return -ENOENT;
 
+   if (sysreg_user_write_ignore(vcpu, r))
+   return 0;
+
if (r->set_user) {
ret = (r->set_user)(vcpu, r, val);
} else {
diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h
index e78b51059622..e4ebb3a379fd 100644
--- a/arch/arm64/kvm/sys_regs.h
+++ b/arch/arm64/kvm/sys_regs.h
@@ -86,6 +86,7 @@ struct sys_reg_desc {
 
 #define REG_HIDDEN (1 << 0) /* hidden from userspace and guest */
 #define REG_RAZ(1 << 1) /* RAZ from userspace and 
guest */
+#define REG_USER_WI(1 << 2) /* WI from userspace only */
 
 static __printf(2, 3)
 inline void print_sys_reg_msg(const struct sys_reg_params *p,
@@ -157,6 +158,12 @@ static inline bool sysreg_visible_as_raz(const struct 
kvm_vcpu *vcpu,
return sysreg_visibility(vcpu, r) & REG_RAZ;
 }
 
+static inline bool sysreg_user_write_ignore(const struct kvm_vcpu *vcpu,
+   const struct sys_reg_desc *r)
+{
+   return sysreg_visibility(vcpu, r) & REG_USER_WI;
+}
+
 static inline int cmp_sys_reg(const struct sys_reg_desc *i1,
  const struct sys_reg_desc *i2)
 {
-- 
2.37.2.789.g6183377224-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 4/7] KVM: arm64: Spin off helper for calling visibility hook

2022-09-13 Thread Oliver Upton
No functional change intended.

Reviewed-by: Reiji Watanabe 
Signed-off-by: Oliver Upton 
---
 arch/arm64/kvm/sys_regs.h | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h
index a8c4cc32eb9a..e78b51059622 100644
--- a/arch/arm64/kvm/sys_regs.h
+++ b/arch/arm64/kvm/sys_regs.h
@@ -136,22 +136,25 @@ static inline void reset_val(struct kvm_vcpu *vcpu, const 
struct sys_reg_desc *r
__vcpu_sys_reg(vcpu, r->reg) = r->val;
 }
 
-static inline bool sysreg_hidden(const struct kvm_vcpu *vcpu,
-const struct sys_reg_desc *r)
+static inline unsigned int sysreg_visibility(const struct kvm_vcpu *vcpu,
+const struct sys_reg_desc *r)
 {
if (likely(!r->visibility))
-   return false;
+   return 0;
 
-   return r->visibility(vcpu, r) & REG_HIDDEN;
+   return r->visibility(vcpu, r);
+}
+
+static inline bool sysreg_hidden(const struct kvm_vcpu *vcpu,
+const struct sys_reg_desc *r)
+{
+   return sysreg_visibility(vcpu, r) & REG_HIDDEN;
 }
 
 static inline bool sysreg_visible_as_raz(const struct kvm_vcpu *vcpu,
 const struct sys_reg_desc *r)
 {
-   if (likely(!r->visibility))
-   return false;
-
-   return r->visibility(vcpu, r) & REG_RAZ;
+   return sysreg_visibility(vcpu, r) & REG_RAZ;
 }
 
 static inline int cmp_sys_reg(const struct sys_reg_desc *i1,
-- 
2.37.2.789.g6183377224-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 3/7] KVM: arm64: Drop raz parameter from read_id_reg()

2022-09-13 Thread Oliver Upton
There is no longer a need for caller-specified RAZ visibility. Hoist the
call to sysreg_visible_as_raz() into read_id_reg() and drop the
parameter.

No functional change intended.

Suggested-by: Reiji Watanabe 
Signed-off-by: Oliver Upton 
Reviewed-by: Reiji Watanabe 
---
 arch/arm64/kvm/sys_regs.c | 19 ++-
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 26210f3a0b27..0e20a311ea20 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1063,13 +1063,12 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu,
 }
 
 /* Read a sanitised cpufeature ID register by sys_reg_desc */
-static u64 read_id_reg(const struct kvm_vcpu *vcpu,
-   struct sys_reg_desc const *r, bool raz)
+static u64 read_id_reg(const struct kvm_vcpu *vcpu, struct sys_reg_desc const 
*r)
 {
u32 id = reg_to_encoding(r);
u64 val;
 
-   if (raz)
+   if (sysreg_visible_as_raz(vcpu, r))
return 0;
 
val = read_sanitised_ftr_reg(id);
@@ -1157,12 +1156,10 @@ static bool access_id_reg(struct kvm_vcpu *vcpu,
  struct sys_reg_params *p,
  const struct sys_reg_desc *r)
 {
-   bool raz = sysreg_visible_as_raz(vcpu, r);
-
if (p->is_write)
return write_to_read_only(vcpu, p, r);
 
-   p->regval = read_id_reg(vcpu, r, raz);
+   p->regval = read_id_reg(vcpu, r);
return true;
 }
 
@@ -1199,7 +1196,7 @@ static int set_id_aa64pfr0_el1(struct kvm_vcpu *vcpu,
return -EINVAL;
 
/* We can only differ with CSV[23], and anything else is an error */
-   val ^= read_id_reg(vcpu, rd, false);
+   val ^= read_id_reg(vcpu, rd);
val &= ~((0xFUL << ID_AA64PFR0_CSV2_SHIFT) |
 (0xFUL << ID_AA64PFR0_CSV3_SHIFT));
if (val)
@@ -1221,19 +1218,15 @@ static int set_id_aa64pfr0_el1(struct kvm_vcpu *vcpu,
 static int get_id_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
  u64 *val)
 {
-   bool raz = sysreg_visible_as_raz(vcpu, rd);
-
-   *val = read_id_reg(vcpu, rd, raz);
+   *val = read_id_reg(vcpu, rd);
return 0;
 }
 
 static int set_id_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
  u64 val)
 {
-   bool raz = sysreg_visible_as_raz(vcpu, rd);
-
/* This is what we mean by invariant: you can't change it. */
-   if (val != read_id_reg(vcpu, rd, raz))
+   if (val != read_id_reg(vcpu, rd))
return -EINVAL;
 
return 0;
-- 
2.37.2.789.g6183377224-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 2/7] KVM: arm64: Remove internal accessor helpers for id regs

2022-09-13 Thread Oliver Upton
The internal accessors are only ever called once. Dump out their
contents in the caller.

No functional change intended.

Signed-off-by: Oliver Upton 
Reviewed-by: Reiji Watanabe 
---
 arch/arm64/kvm/sys_regs.c | 46 ++-
 1 file changed, 12 insertions(+), 34 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index e18efb9211f0..26210f3a0b27 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1153,25 +1153,17 @@ static unsigned int raz_visibility(const struct 
kvm_vcpu *vcpu,
 
 /* cpufeature ID register access trap handlers */
 
-static bool __access_id_reg(struct kvm_vcpu *vcpu,
-   struct sys_reg_params *p,
-   const struct sys_reg_desc *r,
-   bool raz)
-{
-   if (p->is_write)
-   return write_to_read_only(vcpu, p, r);
-
-   p->regval = read_id_reg(vcpu, r, raz);
-   return true;
-}
-
 static bool access_id_reg(struct kvm_vcpu *vcpu,
  struct sys_reg_params *p,
  const struct sys_reg_desc *r)
 {
bool raz = sysreg_visible_as_raz(vcpu, r);
 
-   return __access_id_reg(vcpu, p, r, raz);
+   if (p->is_write)
+   return write_to_read_only(vcpu, p, r);
+
+   p->regval = read_id_reg(vcpu, r, raz);
+   return true;
 }
 
 /* Visibility overrides for SVE-specific control registers */
@@ -1226,31 +1218,13 @@ static int set_id_aa64pfr0_el1(struct kvm_vcpu *vcpu,
  * are stored, and for set_id_reg() we don't allow the effective value
  * to be changed.
  */
-static int __get_id_reg(const struct kvm_vcpu *vcpu,
-   const struct sys_reg_desc *rd, u64 *val,
-   bool raz)
-{
-   *val = read_id_reg(vcpu, rd, raz);
-   return 0;
-}
-
-static int __set_id_reg(const struct kvm_vcpu *vcpu,
-   const struct sys_reg_desc *rd, u64 val,
-   bool raz)
-{
-   /* This is what we mean by invariant: you can't change it. */
-   if (val != read_id_reg(vcpu, rd, raz))
-   return -EINVAL;
-
-   return 0;
-}
-
 static int get_id_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
  u64 *val)
 {
bool raz = sysreg_visible_as_raz(vcpu, rd);
 
-   return __get_id_reg(vcpu, rd, val, raz);
+   *val = read_id_reg(vcpu, rd, raz);
+   return 0;
 }
 
 static int set_id_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
@@ -1258,7 +1232,11 @@ static int set_id_reg(struct kvm_vcpu *vcpu, const 
struct sys_reg_desc *rd,
 {
bool raz = sysreg_visible_as_raz(vcpu, rd);
 
-   return __set_id_reg(vcpu, rd, val, raz);
+   /* This is what we mean by invariant: you can't change it. */
+   if (val != read_id_reg(vcpu, rd, raz))
+   return -EINVAL;
+
+   return 0;
 }
 
 static int get_raz_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
-- 
2.37.2.789.g6183377224-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 1/7] KVM: arm64: Use visibility hook to treat ID regs as RAZ

2022-09-13 Thread Oliver Upton
The generic id reg accessors already handle RAZ registers by way of the
visibility hook. Add a visibility hook that returns REG_RAZ
unconditionally and throw out the RAZ specific accessors.

Reviewed-by: Reiji Watanabe 
Signed-off-by: Oliver Upton 
---
 arch/arm64/kvm/sys_regs.c | 33 ++---
 1 file changed, 14 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 3234f50b8c4b..e18efb9211f0 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1145,6 +1145,12 @@ static unsigned int id_visibility(const struct kvm_vcpu 
*vcpu,
return 0;
 }
 
+static unsigned int raz_visibility(const struct kvm_vcpu *vcpu,
+  const struct sys_reg_desc *r)
+{
+   return REG_RAZ;
+}
+
 /* cpufeature ID register access trap handlers */
 
 static bool __access_id_reg(struct kvm_vcpu *vcpu,
@@ -1168,13 +1174,6 @@ static bool access_id_reg(struct kvm_vcpu *vcpu,
return __access_id_reg(vcpu, p, r, raz);
 }
 
-static bool access_raz_id_reg(struct kvm_vcpu *vcpu,
- struct sys_reg_params *p,
- const struct sys_reg_desc *r)
-{
-   return __access_id_reg(vcpu, p, r, true);
-}
-
 /* Visibility overrides for SVE-specific control registers */
 static unsigned int sve_visibility(const struct kvm_vcpu *vcpu,
   const struct sys_reg_desc *rd)
@@ -1262,12 +1261,6 @@ static int set_id_reg(struct kvm_vcpu *vcpu, const 
struct sys_reg_desc *rd,
return __set_id_reg(vcpu, rd, val, raz);
 }
 
-static int set_raz_id_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
- u64 val)
-{
-   return __set_id_reg(vcpu, rd, val, true);
-}
-
 static int get_raz_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
   u64 *val)
 {
@@ -1374,9 +1367,10 @@ static unsigned int mte_visibility(const struct kvm_vcpu 
*vcpu,
  */
 #define ID_UNALLOCATED(crm, op2) { \
Op0(3), Op1(0), CRn(0), CRm(crm), Op2(op2), \
-   .access = access_raz_id_reg,\
-   .get_user = get_raz_reg,\
-   .set_user = set_raz_id_reg, \
+   .access = access_id_reg,\
+   .get_user = get_id_reg, \
+   .set_user = set_id_reg, \
+   .visibility = raz_visibility\
 }
 
 /*
@@ -1386,9 +1380,10 @@ static unsigned int mte_visibility(const struct kvm_vcpu 
*vcpu,
  */
 #define ID_HIDDEN(name) {  \
SYS_DESC(SYS_##name),   \
-   .access = access_raz_id_reg,\
-   .get_user = get_raz_reg,\
-   .set_user = set_raz_id_reg, \
+   .access = access_id_reg,\
+   .get_user = get_id_reg, \
+   .set_user = set_id_reg, \
+   .visibility = raz_visibility,   \
 }
 
 /*
-- 
2.37.2.789.g6183377224-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 0/7] KVM: arm64: Use visibility hook to treat ID regs as RAZ

2022-09-13 Thread Oliver Upton
For reasons unknown, the Arm architecture defines the 64-bit views of
the 32-bit ID registers as UNKNOWN [1]. This combines poorly with the
fact that KVM unconditionally exposes these registers to userspace,
which could throw a wrench in migration between 64-bit only systems.

This series reworks KVM's definition of these registers to RAZ/WI with
the goal of providing consistent register values across 64-bit machines.

Patches 1-3 clean up the ID register accessors, taking advantage of the
fact that the generic accessors know how to handle RAZ.

Patches 4-6 start switch the handling of potentially nonzero AArch32 ID
registers to RAZ/WI. RAZ covers up the architecturally UNKNOWN values,
and WI allows for migration off of kernels that may provide garbage.
Note that hidden AArch32 ID registers continue to have RAZ behavior with
the additional expectation of invariance.

Lastly, patch 7 includes a small test for the issue.

Applies to 6.0-rc3. Tested with KVM selftests under the fast model w/
asymmetric 32 bit support and no 32 bit support whatsoever.

[1]: DDI0487H.a Table D12-2 'Instruction encodings for non-Debug System 
Register accesses'

v2: 
https://lore.kernel.org/kvmarm/20220902154804.1939819-1-oliver.up...@linux.dev/

v2 -> v3:
 - Collect more of Reiji's r-bs (thanks again!)
 - Test the RAZ+invariant registers (AFR0, DFR1, unallocated AA32 ID
   registers) (Drew)
 - Give the selftest a more sensible name

v1 -> v2:
 - Collect Reiji's r-b tags (thanks!)
 - Call sysreg_visible_as_raz() from read_id_reg() (Reiji)
 - Hoist sysreg_user_write_ignore() into kvm_sys_reg_set_user() (Reiji)

Oliver Upton (7):
  KVM: arm64: Use visibility hook to treat ID regs as RAZ
  KVM: arm64: Remove internal accessor helpers for id regs
  KVM: arm64: Drop raz parameter from read_id_reg()
  KVM: arm64: Spin off helper for calling visibility hook
  KVM: arm64: Add a visibility bit to ignore user writes
  KVM: arm64: Treat 32bit ID registers as RAZ/WI on 64bit-only system
  KVM: selftests: Add test for AArch32 ID registers

 arch/arm64/kvm/sys_regs.c | 150 
 arch/arm64/kvm/sys_regs.h |  24 ++-
 tools/testing/selftests/kvm/.gitignore|   1 +
 tools/testing/selftests/kvm/Makefile  |   1 +
 .../selftests/kvm/aarch64/aarch32_id_regs.c   | 169 ++
 5 files changed, 259 insertions(+), 86 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/aarch64/aarch32_id_regs.c


base-commit: b90cb1053190353cc30f0fef0ef1f378ccc063c5
-- 
2.37.2.789.g6183377224-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm