Re: [PATCH v6 00/21] KVM: Add idempotent controls for migrating system counter state

2021-08-04 Thread Oliver Upton
On Wed, Aug 4, 2021 at 4:05 AM Oliver Upton  wrote:
>
> On Wed, Aug 4, 2021 at 1:58 AM Oliver Upton  wrote:
> >
> > KVM's current means of saving/restoring system counters is plagued with
> > temporal issues. At least on ARM64 and x86, we migrate the guest's
> > system counter by-value through the respective guest system register
> > values (cntvct_el0, ia32_tsc). Restoring system counters by-value is
> > brittle as the state is not idempotent: the host system counter is still
> > oscillating between the attempted save and restore. Furthermore, VMMs
> > may wish to transparently live migrate guest VMs, meaning that they
> > include the elapsed time due to live migration blackout in the guest
> > system counter view. The VMM thread could be preempted for any number of
> > reasons (scheduler, L0 hypervisor under nested) between the time that
> > it calculates the desired guest counter value and when KVM actually sets
> > this counter state.
> >
> > Despite the value-based interface that we present to userspace, KVM
> > actually has idempotent guest controls by way of system counter offsets.
> > We can avoid all of the issues associated with a value-based interface
> > by abstracting these offset controls in new ioctls. This series
> > introduces new vCPU device attributes to provide userspace access to the
> > vCPU's system counter offset.
> >
> > Patch 1 addresses a possible race in KVM_GET_CLOCK where
> > use_master_clock is read outside of the pvclock_gtod_sync_lock.
> >
> > Patch 2 adopts Paolo's suggestion, augmenting the KVM_{GET,SET}_CLOCK
> > ioctls to provide userspace with a (host_tsc, realtime) instant. This is
> > essential for a VMM to perform precise migration of the guest's system
> > counters.
> >
> > Patches 3-4 are some preparatory changes for exposing the TSC offset to
> > userspace. Patch 5 provides a vCPU attribute to provide userspace access
> > to the TSC offset.
> >
> > Patches 6-7 implement a test for the new additions to
> > KVM_{GET,SET}_CLOCK.
> >
> > Patch 8 fixes some assertions in the kvm device attribute helpers.
> >
> > Patches 9-10 implement at test for the tsc offset attribute introduced in
> > patch 5.
> >
> > Patches 11-12 lay the groundwork for patch 13, which exposes CNTVOFF_EL2
> > through the ONE_REG interface.
> >
> > Patches 14-15 add test cases for userspace manipulation of the virtual
> > counter-timer.
> >
> > Patches 16-17 add a vCPU attribute to adjust the host-guest offset of an
> > ARM vCPU, but only implements support for ECV hosts. Patches 18-19 add
> > support for non-ECV hosts by emulating physical counter offsetting.
> >
> > Patch 20 adds test cases for adjusting the host-guest offset, and
> > finally patch 21 adds a test to measure the emulation overhead of
> > CNTPCT_EL2.
> >
> > This series was tested on both an Ampere Mt. Jade and Haswell systems.
> > Unfortunately, the ECV portions of this series are untested, as there is
> > no ECV-capable hardware and the ARM fast models only partially implement
> > ECV.
>
> Small correction: I was only using the foundation model. Apparently
> the AEM FVP provides full ECV support.

Ok. I've now tested this series on the FVP Base RevC fast model@v8.6 +
ECV=2. Passes on VHE, fails on nVHE.

I'll respin this series with the fix for nVHE+ECV soon.

--
Thanks,
Oliver

>
> >
> > Physical counter benchmark
> > --
> >
> > The following data was collected by running 1 iterations of the
> > benchmark test from Patch 21 on an Ampere Mt. Jade reference server, A 2S
> > machine with 2 80-core Ampere Altra SoCs. Measurements were collected
> > for both VHE and nVHE operation using the `kvm-arm.mode=` command-line
> > parameter.
> >
> > nVHE
> > 
> >
> > +++-+
> > |   Metric   | Native | Trapped |
> > +++-+
> > | Average| 54ns   | 148ns   |
> > | Standard Deviation | 124ns  | 122ns   |
> > | 95th Percentile| 258ns  | 348ns   |
> > +++-+
> >
> > VHE
> > ---
> >
> > +++-+
> > |   Metric   | Native | Trapped |
> > +++-+
> > | Average| 53ns   | 152ns   |
> > | Standard Deviation | 92ns   | 94ns|
> > | 95th Percentile| 204ns  | 307ns   |
> > +++-+
> >
> > This series applies cleanly to kvm/queue at the following commit:
> >
> > 6cd974485e25 ("KVM: selftests: Add a test of an unbacked nested PI 
> > descriptor")
> >
> > v1 -> v2:
> >   - Reimplemented as vCPU device attributes instead of a distinct ioctl.
> >   - Added the (realtime, host_tsc) instant support to KVM_{GET,SET}_CLOCK
> >   - Changed the arm64 implementation to broadcast counter
> > offset values to all vCPUs in a guest. This upholds the
> > architectural expectations of a consistent counter-timer across CPUs.
> >   - Fixed a bug with traps in VHE mode. We now 

Re: [PATCH v6 00/21] KVM: Add idempotent controls for migrating system counter state

2021-08-04 Thread Oliver Upton
On Wed, Aug 4, 2021 at 1:58 AM Oliver Upton  wrote:
>
> KVM's current means of saving/restoring system counters is plagued with
> temporal issues. At least on ARM64 and x86, we migrate the guest's
> system counter by-value through the respective guest system register
> values (cntvct_el0, ia32_tsc). Restoring system counters by-value is
> brittle as the state is not idempotent: the host system counter is still
> oscillating between the attempted save and restore. Furthermore, VMMs
> may wish to transparently live migrate guest VMs, meaning that they
> include the elapsed time due to live migration blackout in the guest
> system counter view. The VMM thread could be preempted for any number of
> reasons (scheduler, L0 hypervisor under nested) between the time that
> it calculates the desired guest counter value and when KVM actually sets
> this counter state.
>
> Despite the value-based interface that we present to userspace, KVM
> actually has idempotent guest controls by way of system counter offsets.
> We can avoid all of the issues associated with a value-based interface
> by abstracting these offset controls in new ioctls. This series
> introduces new vCPU device attributes to provide userspace access to the
> vCPU's system counter offset.
>
> Patch 1 addresses a possible race in KVM_GET_CLOCK where
> use_master_clock is read outside of the pvclock_gtod_sync_lock.
>
> Patch 2 adopts Paolo's suggestion, augmenting the KVM_{GET,SET}_CLOCK
> ioctls to provide userspace with a (host_tsc, realtime) instant. This is
> essential for a VMM to perform precise migration of the guest's system
> counters.
>
> Patches 3-4 are some preparatory changes for exposing the TSC offset to
> userspace. Patch 5 provides a vCPU attribute to provide userspace access
> to the TSC offset.
>
> Patches 6-7 implement a test for the new additions to
> KVM_{GET,SET}_CLOCK.
>
> Patch 8 fixes some assertions in the kvm device attribute helpers.
>
> Patches 9-10 implement at test for the tsc offset attribute introduced in
> patch 5.
>
> Patches 11-12 lay the groundwork for patch 13, which exposes CNTVOFF_EL2
> through the ONE_REG interface.
>
> Patches 14-15 add test cases for userspace manipulation of the virtual
> counter-timer.
>
> Patches 16-17 add a vCPU attribute to adjust the host-guest offset of an
> ARM vCPU, but only implements support for ECV hosts. Patches 18-19 add
> support for non-ECV hosts by emulating physical counter offsetting.
>
> Patch 20 adds test cases for adjusting the host-guest offset, and
> finally patch 21 adds a test to measure the emulation overhead of
> CNTPCT_EL2.
>
> This series was tested on both an Ampere Mt. Jade and Haswell systems.
> Unfortunately, the ECV portions of this series are untested, as there is
> no ECV-capable hardware and the ARM fast models only partially implement
> ECV.

Small correction: I was only using the foundation model. Apparently
the AEM FVP provides full ECV support.

>
> Physical counter benchmark
> --
>
> The following data was collected by running 1 iterations of the
> benchmark test from Patch 21 on an Ampere Mt. Jade reference server, A 2S
> machine with 2 80-core Ampere Altra SoCs. Measurements were collected
> for both VHE and nVHE operation using the `kvm-arm.mode=` command-line
> parameter.
>
> nVHE
> 
>
> +++-+
> |   Metric   | Native | Trapped |
> +++-+
> | Average| 54ns   | 148ns   |
> | Standard Deviation | 124ns  | 122ns   |
> | 95th Percentile| 258ns  | 348ns   |
> +++-+
>
> VHE
> ---
>
> +++-+
> |   Metric   | Native | Trapped |
> +++-+
> | Average| 53ns   | 152ns   |
> | Standard Deviation | 92ns   | 94ns|
> | 95th Percentile| 204ns  | 307ns   |
> +++-+
>
> This series applies cleanly to kvm/queue at the following commit:
>
> 6cd974485e25 ("KVM: selftests: Add a test of an unbacked nested PI 
> descriptor")
>
> v1 -> v2:
>   - Reimplemented as vCPU device attributes instead of a distinct ioctl.
>   - Added the (realtime, host_tsc) instant support to KVM_{GET,SET}_CLOCK
>   - Changed the arm64 implementation to broadcast counter
> offset values to all vCPUs in a guest. This upholds the
> architectural expectations of a consistent counter-timer across CPUs.
>   - Fixed a bug with traps in VHE mode. We now configure traps on every
> transition into a guest to handle differing VMs (trapped, emulated).
>
> v2 -> v3:
>   - Added documentation for additions to KVM_{GET,SET}_CLOCK
>   - Added documentation for all new vCPU attributes
>   - Added documentation for suggested algorithm to migrate a guest's
> TSC(s)
>   - Bug fixes throughout series
>   - Rename KVM_CLOCK_REAL_TIME -> KVM_CLOCK_REALTIME
>
> v3 -> v4:
>   - Added patch to address incorrect 

Re: [PATCH v6 17/21] KVM: arm64: Allow userspace to configure a guest's counter-timer offset

2021-08-04 Thread Oliver Upton
On Wed, Aug 4, 2021 at 3:17 AM Andrew Jones  wrote:
> > diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> > index a8815b09da3e..f15058612994 100644
> > --- a/arch/arm64/kvm/arch_timer.c
> > +++ b/arch/arm64/kvm/arch_timer.c
> > @@ -85,11 +85,15 @@ u64 timer_get_cval(struct arch_timer_context *ctxt)
> >  static u64 timer_get_offset(struct arch_timer_context *ctxt)
> >  {
> >   struct kvm_vcpu *vcpu = ctxt->vcpu;
> > + struct arch_timer_cpu *timer = vcpu_timer(vcpu);
>
> This new timer variable doesn't appear to get used.

Ooops, this is stale. Thanks for catching that.

>
> Reviewed-by: Andrew Jones 
>

Thanks for the quick review!

--
Oliver
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v6 10/21] selftests: KVM: Introduce system counter offset test

2021-08-04 Thread Oliver Upton
Introduce a KVM selftest to verify that userspace manipulation of the
TSC (via the new vCPU attribute) results in the correct behavior within
the guest.

Reviewed-by: Andrew Jones 
Signed-off-by: Oliver Upton 
---
 tools/testing/selftests/kvm/.gitignore|   1 +
 tools/testing/selftests/kvm/Makefile  |   1 +
 .../kvm/system_counter_offset_test.c  | 132 ++
 3 files changed, 134 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/system_counter_offset_test.c

diff --git a/tools/testing/selftests/kvm/.gitignore 
b/tools/testing/selftests/kvm/.gitignore
index 958a809c8de4..3d2585f0bffc 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -52,3 +52,4 @@
 /set_memory_region_test
 /steal_time
 /kvm_binary_stats_test
+/system_counter_offset_test
diff --git a/tools/testing/selftests/kvm/Makefile 
b/tools/testing/selftests/kvm/Makefile
index 0f94b18b33ce..9f7060c02668 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -85,6 +85,7 @@ TEST_GEN_PROGS_x86_64 += memslot_perf_test
 TEST_GEN_PROGS_x86_64 += set_memory_region_test
 TEST_GEN_PROGS_x86_64 += steal_time
 TEST_GEN_PROGS_x86_64 += kvm_binary_stats_test
+TEST_GEN_PROGS_x86_64 += system_counter_offset_test
 
 TEST_GEN_PROGS_aarch64 += aarch64/debug-exceptions
 TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
diff --git a/tools/testing/selftests/kvm/system_counter_offset_test.c 
b/tools/testing/selftests/kvm/system_counter_offset_test.c
new file mode 100644
index ..b337bbbfa41f
--- /dev/null
+++ b/tools/testing/selftests/kvm/system_counter_offset_test.c
@@ -0,0 +1,132 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2021, Google LLC.
+ *
+ * Tests for adjusting the system counter from userspace
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+
+#define VCPU_ID 0
+
+#ifdef __x86_64__
+
+struct test_case {
+   uint64_t tsc_offset;
+};
+
+static struct test_case test_cases[] = {
+   { 0 },
+   { 180 * NSEC_PER_SEC },
+   { -180 * NSEC_PER_SEC },
+};
+
+static void check_preconditions(struct kvm_vm *vm)
+{
+   if (!_vcpu_has_device_attr(vm, VCPU_ID, KVM_VCPU_TSC_CTRL, 
KVM_VCPU_TSC_OFFSET))
+   return;
+
+   print_skip("KVM_VCPU_TSC_OFFSET not supported; skipping test");
+   exit(KSFT_SKIP);
+}
+
+static void setup_system_counter(struct kvm_vm *vm, struct test_case *test)
+{
+   vcpu_access_device_attr(vm, VCPU_ID, KVM_VCPU_TSC_CTRL,
+   KVM_VCPU_TSC_OFFSET, >tsc_offset, true);
+}
+
+static uint64_t guest_read_system_counter(struct test_case *test)
+{
+   return rdtsc();
+}
+
+static uint64_t host_read_guest_system_counter(struct test_case *test)
+{
+   return rdtsc() + test->tsc_offset;
+}
+
+#else /* __x86_64__ */
+
+#error test not implemented for this architecture!
+
+#endif
+
+#define GUEST_SYNC_CLOCK(__stage, __val)   \
+   GUEST_SYNC_ARGS(__stage, __val, 0, 0, 0)
+
+static void guest_main(void)
+{
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(test_cases); i++) {
+   struct test_case *test = _cases[i];
+
+   GUEST_SYNC_CLOCK(i, guest_read_system_counter(test));
+   }
+}
+
+static void handle_sync(struct ucall *uc, uint64_t start, uint64_t end)
+{
+   uint64_t obs = uc->args[2];
+
+   TEST_ASSERT(start <= obs && obs <= end,
+   "unexpected system counter value: %"PRIu64" expected range: 
[%"PRIu64", %"PRIu64"]",
+   obs, start, end);
+
+   pr_info("system counter value: %"PRIu64" expected range [%"PRIu64", 
%"PRIu64"]\n",
+   obs, start, end);
+}
+
+static void handle_abort(struct ucall *uc)
+{
+   TEST_FAIL("%s at %s:%ld", (const char *)uc->args[0],
+ __FILE__, uc->args[1]);
+}
+
+static void enter_guest(struct kvm_vm *vm)
+{
+   uint64_t start, end;
+   struct ucall uc;
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(test_cases); i++) {
+   struct test_case *test = _cases[i];
+
+   setup_system_counter(vm, test);
+   start = host_read_guest_system_counter(test);
+   vcpu_run(vm, VCPU_ID);
+   end = host_read_guest_system_counter(test);
+
+   switch (get_ucall(vm, VCPU_ID, )) {
+   case UCALL_SYNC:
+   handle_sync(, start, end);
+   break;
+   case UCALL_ABORT:
+   handle_abort();
+   return;
+   default:
+   TEST_ASSERT(0, "unhandled ucall %ld\n",
+   get_ucall(vm, VCPU_ID, ));
+   }
+   }
+}
+
+int main(void)
+{
+   struct kvm_vm *vm;
+
+   vm = vm_create_default(VCPU_ID, 0, guest_main);
+   check_preconditions(vm);
+   

[PATCH v6 09/21] selftests: KVM: Add helpers for vCPU device attributes

2021-08-04 Thread Oliver Upton
vCPU file descriptors are abstracted away from test code in KVM
selftests, meaning that tests cannot directly access a vCPU's device
attributes. Add helpers that tests can use to get at vCPU device
attributes.

Reviewed-by: Andrew Jones 
Signed-off-by: Oliver Upton 
---
 .../testing/selftests/kvm/include/kvm_util.h  |  9 +
 tools/testing/selftests/kvm/lib/kvm_util.c| 38 +++
 2 files changed, 47 insertions(+)

diff --git a/tools/testing/selftests/kvm/include/kvm_util.h 
b/tools/testing/selftests/kvm/include/kvm_util.h
index a8ac5d52e17b..1b3ef5757819 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -240,6 +240,15 @@ int _kvm_device_access(int dev_fd, uint32_t group, 
uint64_t attr,
 int kvm_device_access(int dev_fd, uint32_t group, uint64_t attr,
  void *val, bool write);
 
+int _vcpu_has_device_attr(struct kvm_vm *vm, uint32_t vcpuid, uint32_t group,
+ uint64_t attr);
+int vcpu_has_device_attr(struct kvm_vm *vm, uint32_t vcpuid, uint32_t group,
+uint64_t attr);
+int _vcpu_access_device_attr(struct kvm_vm *vm, uint32_t vcpuid, uint32_t 
group,
+ uint64_t attr, void *val, bool write);
+int vcpu_access_device_attr(struct kvm_vm *vm, uint32_t vcpuid, uint32_t group,
+uint64_t attr, void *val, bool write);
+
 const char *exit_reason_str(unsigned int exit_reason);
 
 void virt_pgd_alloc(struct kvm_vm *vm);
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c 
b/tools/testing/selftests/kvm/lib/kvm_util.c
index 0ffc2d39c80d..0fe66ca6139a 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -2040,6 +2040,44 @@ int kvm_device_access(int dev_fd, uint32_t group, 
uint64_t attr,
return ret;
 }
 
+int _vcpu_has_device_attr(struct kvm_vm *vm, uint32_t vcpuid, uint32_t group,
+ uint64_t attr)
+{
+   struct vcpu *vcpu = vcpu_find(vm, vcpuid);
+
+   TEST_ASSERT(vcpu, "nonexistent vcpu id: %d", vcpuid);
+
+   return _kvm_device_check_attr(vcpu->fd, group, attr);
+}
+
+int vcpu_has_device_attr(struct kvm_vm *vm, uint32_t vcpuid, uint32_t group,
+uint64_t attr)
+{
+   int ret = _vcpu_has_device_attr(vm, vcpuid, group, attr);
+
+   TEST_ASSERT(!ret, "KVM_HAS_DEVICE_ATTR IOCTL failed, rc: %i errno: %i", 
ret, errno);
+   return ret;
+}
+
+int _vcpu_access_device_attr(struct kvm_vm *vm, uint32_t vcpuid, uint32_t 
group,
+uint64_t attr, void *val, bool write)
+{
+   struct vcpu *vcpu = vcpu_find(vm, vcpuid);
+
+   TEST_ASSERT(vcpu, "nonexistent vcpu id: %d", vcpuid);
+
+   return _kvm_device_access(vcpu->fd, group, attr, val, write);
+}
+
+int vcpu_access_device_attr(struct kvm_vm *vm, uint32_t vcpuid, uint32_t group,
+   uint64_t attr, void *val, bool write)
+{
+   int ret = _vcpu_access_device_attr(vm, vcpuid, group, attr, val, write);
+
+   TEST_ASSERT(!ret, "KVM_SET|GET_DEVICE_ATTR IOCTL failed, rc: %i errno: 
%i", ret, errno);
+   return ret;
+}
+
 /*
  * VM Dump
  *
-- 
2.32.0.605.g8dce9f2422-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v6 17/21] KVM: arm64: Allow userspace to configure a guest's counter-timer offset

2021-08-04 Thread Oliver Upton
Presently, KVM provides no facilities for correctly migrating a guest
that depends on the physical counter-timer. Whie most guests (barring
NV, of course) should not depend on the physical counter-timer, an
operator may wish to provide a consistent view of the physical
counter-timer across migrations.

Provide userspace with a new vCPU attribute to modify the guest
counter-timer offset. Unlike KVM_REG_ARM_TIMER_OFFSET, this attribute is
hidden from the guest's architectural state. The value offsets *both*
the virtual and physical counter-timer views for the guest. Only support
this attribute on ECV systems as ECV is required for hardware offsetting
of the physical counter-timer.

Signed-off-by: Oliver Upton 
---
 Documentation/virt/kvm/devices/vcpu.rst |  28 ++
 arch/arm64/include/asm/kvm_asm.h|   2 +
 arch/arm64/include/asm/sysreg.h |   2 +
 arch/arm64/include/uapi/asm/kvm.h   |   1 +
 arch/arm64/kvm/arch_timer.c | 122 +++-
 arch/arm64/kvm/hyp/nvhe/hyp-main.c  |   6 ++
 arch/arm64/kvm/hyp/nvhe/timer-sr.c  |   5 +
 arch/arm64/kvm/hyp/vhe/timer-sr.c   |   5 +
 include/clocksource/arm_arch_timer.h|   1 +
 9 files changed, 169 insertions(+), 3 deletions(-)

diff --git a/Documentation/virt/kvm/devices/vcpu.rst 
b/Documentation/virt/kvm/devices/vcpu.rst
index 3b399d727c11..3ba35b9d9d03 100644
--- a/Documentation/virt/kvm/devices/vcpu.rst
+++ b/Documentation/virt/kvm/devices/vcpu.rst
@@ -139,6 +139,34 @@ configured values on other VCPUs.  Userspace should 
configure the interrupt
 numbers on at least one VCPU after creating all VCPUs and before running any
 VCPUs.
 
+2.2. ATTRIBUTE: KVM_ARM_VCPU_TIMER_OFFSET
+-
+
+:Parameters: in kvm_device_attr.addr the address for the timer offset is a
+ pointer to a __u64
+
+Returns:
+
+=== ==
+-EFAULT Error reading/writing the provided
+parameter address
+-ENXIO  Timer offsetting not implemented
+=== ==
+
+Specifies the guest's counter-timer offset from the host's virtual counter.
+The guest's physical counter value is then derived by the following
+equation:
+
+  guest_cntpct = host_cntvct - KVM_ARM_VCPU_TIMER_OFFSET
+
+The guest's virtual counter value is derived by the following equation:
+
+  guest_cntvct = host_cntvct - KVM_REG_ARM_TIMER_OFFSET
+   - KVM_ARM_VCPU_TIMER_OFFSET
+
+KVM does not allow the use of varying offset values for different vCPUs;
+the last written offset value will be broadcasted to all vCPUs in a VM.
+
 3. GROUP: KVM_ARM_VCPU_PVTIME_CTRL
 ==
 
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 9f0bf2109be7..ab1c8fdb0177 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -65,6 +65,7 @@
 #define __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize 19
 #define __KVM_HOST_SMCCC_FUNC___pkvm_mark_hyp  20
 #define __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc  21
+#define __KVM_HOST_SMCCC_FUNC___kvm_timer_set_cntpoff  22
 
 #ifndef __ASSEMBLY__
 
@@ -200,6 +201,7 @@ extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu 
*mmu, phys_addr_t ipa,
 extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
 
 extern void __kvm_timer_set_cntvoff(u64 cntvoff);
+extern void __kvm_timer_set_cntpoff(u64 cntpoff);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 4dfc44066dfb..c34672aa65b9 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -586,6 +586,8 @@
 #define SYS_ICH_LR14_EL2   __SYS__LR8_EL2(6)
 #define SYS_ICH_LR15_EL2   __SYS__LR8_EL2(7)
 
+#define SYS_CNTPOFF_EL2sys_reg(3, 4, 14, 0, 6)
+
 /* VHE encodings for architectural EL0/1 system registers */
 #define SYS_SCTLR_EL12 sys_reg(3, 5, 1, 0, 0)
 #define SYS_CPACR_EL12 sys_reg(3, 5, 1, 0, 2)
diff --git a/arch/arm64/include/uapi/asm/kvm.h 
b/arch/arm64/include/uapi/asm/kvm.h
index 949a31bc10f0..15150f8224a1 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -366,6 +366,7 @@ struct kvm_arm_copy_mte_tags {
 #define KVM_ARM_VCPU_TIMER_CTRL1
 #define   KVM_ARM_VCPU_TIMER_IRQ_VTIMER0
 #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER1
+#define   KVM_ARM_VCPU_TIMER_OFFSET2
 #define KVM_ARM_VCPU_PVTIME_CTRL   2
 #define   KVM_ARM_VCPU_PVTIME_IPA  0
 
diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index a8815b09da3e..f15058612994 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -85,11 +85,15 @@ u64 timer_get_cval(struct arch_timer_context *ctxt)
 static u64 

[PATCH v6 16/21] arm64: cpufeature: Enumerate support for Enhanced Counter Virtualization

2021-08-04 Thread Oliver Upton
Introduce a new cpucap to indicate if the system supports full enhanced
counter virtualization (i.e. ID_AA64MMFR0_EL1.ECV==0x2).

Signed-off-by: Oliver Upton 
---
 arch/arm64/include/asm/sysreg.h |  2 ++
 arch/arm64/kernel/cpufeature.c  | 10 ++
 arch/arm64/tools/cpucaps|  1 +
 3 files changed, 13 insertions(+)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 7b9c3acba684..4dfc44066dfb 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -847,6 +847,8 @@
 #define ID_AA64MMFR0_ASID_SHIFT4
 #define ID_AA64MMFR0_PARANGE_SHIFT 0
 
+#define ID_AA64MMFR0_ECV_VIRT  0x1
+#define ID_AA64MMFR0_ECV_PHYS  0x2
 #define ID_AA64MMFR0_TGRAN4_NI 0xf
 #define ID_AA64MMFR0_TGRAN4_SUPPORTED  0x0
 #define ID_AA64MMFR0_TGRAN64_NI0xf
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 0ead8bfedf20..94c349e179d3 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2301,6 +2301,16 @@ static const struct arm64_cpu_capabilities 
arm64_features[] = {
.matches = has_cpuid_feature,
.min_field_value = 1,
},
+   {
+   .desc = "Enhanced Counter Virtualization (Physical)",
+   .capability = ARM64_ECV,
+   .type = ARM64_CPUCAP_SYSTEM_FEATURE,
+   .sys_reg = SYS_ID_AA64MMFR0_EL1,
+   .sign = FTR_UNSIGNED,
+   .field_pos = ID_AA64MMFR0_ECV_SHIFT,
+   .matches = has_cpuid_feature,
+   .min_field_value = ID_AA64MMFR0_ECV_PHYS,
+   },
{},
 };
 
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 49305c2e6dfd..d819ea614da5 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -3,6 +3,7 @@
 # Internal CPU capabilities constants, keep this list sorted
 
 BTI
+ECV
 # Unreliable: use system_supports_32bit_el0() instead.
 HAS_32BIT_EL0_DO_NOT_USE
 HAS_32BIT_EL1
-- 
2.32.0.605.g8dce9f2422-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v6 11/21] KVM: arm64: Refactor update_vtimer_cntvoff()

2021-08-04 Thread Oliver Upton
Make the implementation of update_vtimer_cntvoff() generic w.r.t. guest
timer context and spin off into a new helper method for later use.
Require callers of this new helper method to grab the kvm lock
beforehand.

No functional change intended.

Signed-off-by: Oliver Upton 
---
 arch/arm64/kvm/arch_timer.c | 20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 3df67c127489..c0101db75ad4 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -747,22 +747,32 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
return 0;
 }
 
-/* Make the updates of cntvoff for all vtimer contexts atomic */
-static void update_vtimer_cntvoff(struct kvm_vcpu *vcpu, u64 cntvoff)
+/* Make offset updates for all timer contexts atomic */
+static void update_timer_offset(struct kvm_vcpu *vcpu,
+   enum kvm_arch_timers timer, u64 offset)
 {
int i;
struct kvm *kvm = vcpu->kvm;
struct kvm_vcpu *tmp;
 
-   mutex_lock(>lock);
+   lockdep_assert_held(>lock);
+
kvm_for_each_vcpu(i, tmp, kvm)
-   timer_set_offset(vcpu_vtimer(tmp), cntvoff);
+   timer_set_offset(vcpu_get_timer(tmp, timer), offset);
 
/*
 * When called from the vcpu create path, the CPU being created is not
 * included in the loop above, so we just set it here as well.
 */
-   timer_set_offset(vcpu_vtimer(vcpu), cntvoff);
+   timer_set_offset(vcpu_get_timer(vcpu, timer), offset);
+}
+
+static void update_vtimer_cntvoff(struct kvm_vcpu *vcpu, u64 cntvoff)
+{
+   struct kvm *kvm = vcpu->kvm;
+
+   mutex_lock(>lock);
+   update_timer_offset(vcpu, TIMER_VTIMER, cntvoff);
mutex_unlock(>lock);
 }
 
-- 
2.32.0.605.g8dce9f2422-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v6 21/21] selftests: KVM: Add counter emulation benchmark

2021-08-04 Thread Oliver Upton
Add a test case for counter emulation on arm64. A side effect of how KVM
handles physical counter offsetting on non-ECV systems is that the
virtual counter will always hit hardware and the physical could be
emulated. Force emulation by writing a nonzero offset to the physical
counter and compare the elapsed cycles to a direct read of the hardware
register.

Reviewed-by: Ricardo Koller 
Signed-off-by: Oliver Upton 
Reviewed-by: Andrew Jones 
---
 tools/testing/selftests/kvm/.gitignore|   1 +
 tools/testing/selftests/kvm/Makefile  |   1 +
 .../kvm/aarch64/counter_emulation_benchmark.c | 207 ++
 3 files changed, 209 insertions(+)
 create mode 100644 
tools/testing/selftests/kvm/aarch64/counter_emulation_benchmark.c

diff --git a/tools/testing/selftests/kvm/.gitignore 
b/tools/testing/selftests/kvm/.gitignore
index 3d2585f0bffc..a5953f92f6b1 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -1,4 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
+/aarch64/counter_emulation_benchmark
 /aarch64/debug-exceptions
 /aarch64/get-reg-list
 /aarch64/vgic_init
diff --git a/tools/testing/selftests/kvm/Makefile 
b/tools/testing/selftests/kvm/Makefile
index fab42e7c23ee..d24f7a914992 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -87,6 +87,7 @@ TEST_GEN_PROGS_x86_64 += steal_time
 TEST_GEN_PROGS_x86_64 += kvm_binary_stats_test
 TEST_GEN_PROGS_x86_64 += system_counter_offset_test
 
+TEST_GEN_PROGS_aarch64 += aarch64/counter_emulation_benchmark
 TEST_GEN_PROGS_aarch64 += aarch64/debug-exceptions
 TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
 TEST_GEN_PROGS_aarch64 += aarch64/vgic_init
diff --git a/tools/testing/selftests/kvm/aarch64/counter_emulation_benchmark.c 
b/tools/testing/selftests/kvm/aarch64/counter_emulation_benchmark.c
new file mode 100644
index ..09ff79ab3d6f
--- /dev/null
+++ b/tools/testing/selftests/kvm/aarch64/counter_emulation_benchmark.c
@@ -0,0 +1,207 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * counter_emulation_benchmark.c -- test to measure the effects of counter
+ * emulation on guest reads of the physical counter.
+ *
+ * Copyright (c) 2021, Google LLC.
+ */
+
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "kvm_util.h"
+#include "processor.h"
+#include "test_util.h"
+
+#define VCPU_ID 0
+
+static struct counter_values {
+   uint64_t cntvct_start;
+   uint64_t cntpct;
+   uint64_t cntvct_end;
+} counter_values;
+
+static uint64_t nr_iterations = 1000;
+
+static void do_test(void)
+{
+   /*
+* Open-coded approach instead of using helper methods to keep a tight
+* interval around the physical counter read.
+*/
+   asm volatile("isb\n\t"
+"mrs %[cntvct_start], cntvct_el0\n\t"
+"isb\n\t"
+"mrs %[cntpct], cntpct_el0\n\t"
+"isb\n\t"
+"mrs %[cntvct_end], cntvct_el0\n\t"
+"isb\n\t"
+: [cntvct_start] "=r"(counter_values.cntvct_start),
+[cntpct] "=r"(counter_values.cntpct),
+[cntvct_end] "=r"(counter_values.cntvct_end));
+}
+
+static void guest_main(void)
+{
+   int i;
+
+   for (i = 0; i < nr_iterations; i++) {
+   do_test();
+   GUEST_SYNC(i);
+   }
+
+   for (i = 0; i < nr_iterations; i++) {
+   do_test();
+   GUEST_SYNC(i);
+   }
+}
+
+static void enter_guest(struct kvm_vm *vm)
+{
+   struct ucall uc;
+
+   vcpu_ioctl(vm, VCPU_ID, KVM_RUN, NULL);
+
+   switch (get_ucall(vm, VCPU_ID, )) {
+   case UCALL_SYNC:
+   break;
+   case UCALL_ABORT:
+   TEST_ASSERT(false, "%s at %s:%ld", (const char *)uc.args[0],
+   __FILE__, uc.args[1]);
+   break;
+   default:
+   TEST_ASSERT(false, "unexpected exit: %s",
+   exit_reason_str(vcpu_state(vm, 
VCPU_ID)->exit_reason));
+   break;
+   }
+}
+
+static double counter_frequency(void)
+{
+   uint32_t freq;
+
+   asm volatile("mrs %0, cntfrq_el0"
+: "=r" (freq));
+
+   return freq / 100.0;
+}
+
+static void log_csv(FILE *csv, bool trapped)
+{
+   double freq = counter_frequency();
+
+   fprintf(csv, "%s,%.02f,%lu,%lu,%lu\n",
+   trapped ? "true" : "false", freq,
+   counter_values.cntvct_start,
+   counter_values.cntpct,
+   counter_values.cntvct_end);
+}
+
+static double run_loop(struct kvm_vm *vm, FILE *csv, bool trapped)
+{
+   double avg = 0;
+   int i;
+
+   for (i = 0; i < nr_iterations; i++) {
+   uint64_t delta;
+
+   enter_guest(vm);
+   sync_global_from_guest(vm, counter_values);
+
+   if (csv)
+ 

[PATCH v6 19/21] KVM: arm64: Emulate physical counter offsetting on non-ECV systems

2021-08-04 Thread Oliver Upton
Unfortunately, ECV hasn't yet arrived in any tangible hardware. At the
same time, controlling the guest view of the physical counter-timer is
useful. Support guest counter-timer offsetting on non-ECV systems by
trapping guest accesses to the physical counter-timer. Emulate reads of
the physical counter in the fast exit path.

Signed-off-by: Oliver Upton 
---
 arch/arm64/include/asm/sysreg.h |  1 +
 arch/arm64/kvm/arch_timer.c | 53 +++--
 arch/arm64/kvm/hyp/include/hyp/switch.h | 29 ++
 arch/arm64/kvm/hyp/nvhe/timer-sr.c  | 11 -
 4 files changed, 70 insertions(+), 24 deletions(-)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index c34672aa65b9..e49790ae5da4 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -505,6 +505,7 @@
 #define SYS_AMEVCNTR0_MEM_STALLSYS_AMEVCNTR0_EL0(3)
 
 #define SYS_CNTFRQ_EL0 sys_reg(3, 3, 14, 0, 0)
+#define SYS_CNTPCT_EL0 sys_reg(3, 3, 14, 0, 1)
 
 #define SYS_CNTP_TVAL_EL0  sys_reg(3, 3, 14, 2, 0)
 #define SYS_CNTP_CTL_EL0   sys_reg(3, 3, 14, 2, 1)
diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 9ead94aa867d..b7cb63acf2a0 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -51,7 +51,7 @@ static void kvm_arm_timer_write(struct kvm_vcpu *vcpu,
 static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
  struct arch_timer_context *timer,
  enum kvm_arch_timer_regs treg);
-static void kvm_timer_enable_traps_vhe(void);
+static void kvm_timer_enable_traps_vhe(struct kvm_vcpu *vcpu);
 
 u32 timer_get_ctl(struct arch_timer_context *ctxt)
 {
@@ -175,6 +175,12 @@ static void timer_set_guest_offset(struct 
arch_timer_context *ctxt, u64 offset)
}
 }
 
+static bool ptimer_emulation_required(struct kvm_vcpu *vcpu)
+{
+   return timer_get_offset(vcpu_ptimer(vcpu)) &&
+   !cpus_have_const_cap(ARM64_ECV);
+}
+
 u64 kvm_phys_timer_read(void)
 {
return timecounter->cc->read(timecounter->cc);
@@ -184,8 +190,13 @@ static void get_timer_map(struct kvm_vcpu *vcpu, struct 
timer_map *map)
 {
if (has_vhe()) {
map->direct_vtimer = vcpu_vtimer(vcpu);
-   map->direct_ptimer = vcpu_ptimer(vcpu);
-   map->emul_ptimer = NULL;
+   if (!ptimer_emulation_required(vcpu)) {
+   map->direct_ptimer = vcpu_ptimer(vcpu);
+   map->emul_ptimer = NULL;
+   } else {
+   map->direct_ptimer = NULL;
+   map->emul_ptimer = vcpu_ptimer(vcpu);
+   }
} else {
map->direct_vtimer = vcpu_vtimer(vcpu);
map->direct_ptimer = NULL;
@@ -671,7 +682,7 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
timer_emulate(map.emul_ptimer);
 
if (has_vhe())
-   kvm_timer_enable_traps_vhe();
+   kvm_timer_enable_traps_vhe(vcpu);
 }
 
 bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
@@ -1392,22 +1403,29 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
  * The host kernel runs at EL2 with HCR_EL2.TGE == 1,
  * and this makes those bits have no effect for the host kernel execution.
  */
-static void kvm_timer_enable_traps_vhe(void)
+static void kvm_timer_enable_traps_vhe(struct kvm_vcpu *vcpu)
 {
/* When HCR_EL2.E2H ==1, EL1PCEN and EL1PCTEN are shifted by 10 */
u32 cnthctl_shift = 10;
-   u64 val;
+   u64 val, mask;
+
+   mask = CNTHCTL_EL1PCEN << cnthctl_shift;
+   mask |= CNTHCTL_EL1PCTEN << cnthctl_shift;
 
-   /*
-* VHE systems allow the guest direct access to the EL1 physical
-* timer/counter.
-*/
val = read_sysreg(cnthctl_el2);
-   val |= (CNTHCTL_EL1PCEN << cnthctl_shift);
-   val |= (CNTHCTL_EL1PCTEN << cnthctl_shift);
 
if (cpus_have_const_cap(ARM64_ECV))
val |= CNTHCTL_ECV;
+
+   /*
+* VHE systems allow the guest direct access to the EL1 physical
+* timer/counter if offsetting isn't requested on a non-ECV system.
+*/
+   if (ptimer_emulation_required(vcpu))
+   val &= ~mask;
+   else
+   val |= mask;
+
write_sysreg(val, cnthctl_el2);
 }
 
@@ -1462,9 +1480,6 @@ static int kvm_arm_timer_set_attr_offset(struct kvm_vcpu 
*vcpu,
u64 __user *uaddr = (u64 __user *)(long)attr->addr;
u64 offset;
 
-   if (!cpus_have_const_cap(ARM64_ECV))
-   return -ENXIO;
-
if (get_user(offset, uaddr))
return -EFAULT;
 
@@ -1513,9 +1528,6 @@ static int kvm_arm_timer_get_attr_offset(struct kvm_vcpu 
*vcpu,
u64 __user *uaddr = (u64 __user *)(long)attr->addr;
u64 offset;
 
-   if (!cpus_have_const_cap(ARM64_ECV))
-   

[PATCH v6 08/21] selftests: KVM: Fix kvm device helper ioctl assertions

2021-08-04 Thread Oliver Upton
The KVM_CREATE_DEVICE and KVM_{GET,SET}_DEVICE_ATTR ioctls are defined
to return a value of zero on success. As such, tighten the assertions in
the helper functions to only pass if the return code is zero.

Suggested-by: Andrew Jones 
Reviewed-by: Andrew Jones 
Signed-off-by: Oliver Upton 
---
 tools/testing/selftests/kvm/lib/kvm_util.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c 
b/tools/testing/selftests/kvm/lib/kvm_util.c
index 10a8ed691c66..0ffc2d39c80d 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -1984,7 +1984,7 @@ int kvm_device_check_attr(int dev_fd, uint32_t group, 
uint64_t attr)
 {
int ret = _kvm_device_check_attr(dev_fd, group, attr);
 
-   TEST_ASSERT(ret >= 0, "KVM_HAS_DEVICE_ATTR failed, rc: %i errno: %i", 
ret, errno);
+   TEST_ASSERT(!ret, "KVM_HAS_DEVICE_ATTR failed, rc: %i errno: %i", ret, 
errno);
return ret;
 }
 
@@ -2008,7 +2008,7 @@ int kvm_create_device(struct kvm_vm *vm, uint64_t type, 
bool test)
ret = _kvm_create_device(vm, type, test, );
 
if (!test) {
-   TEST_ASSERT(ret >= 0,
+   TEST_ASSERT(!ret,
"KVM_CREATE_DEVICE IOCTL failed, rc: %i errno: %i", 
ret, errno);
return fd;
}
@@ -2036,7 +2036,7 @@ int kvm_device_access(int dev_fd, uint32_t group, 
uint64_t attr,
 {
int ret = _kvm_device_access(dev_fd, group, attr, val, write);
 
-   TEST_ASSERT(ret >= 0, "KVM_SET|GET_DEVICE_ATTR IOCTL failed, rc: %i 
errno: %i", ret, errno);
+   TEST_ASSERT(!ret, "KVM_SET|GET_DEVICE_ATTR IOCTL failed, rc: %i errno: 
%i", ret, errno);
return ret;
 }
 
-- 
2.32.0.605.g8dce9f2422-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v6 03/21] KVM: x86: Take the pvclock sync lock behind the tsc_write_lock

2021-08-04 Thread Oliver Upton
A later change requires that the pvclock sync lock be taken while
holding the tsc_write_lock. Change the locking in kvm_synchronize_tsc()
to align with the requirement to isolate the locking change to its own
commit.

Cc: Sean Christopherson 
Signed-off-by: Oliver Upton 
---
 Documentation/virt/kvm/locking.rst | 11 +++
 arch/x86/kvm/x86.c |  2 +-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/locking.rst 
b/Documentation/virt/kvm/locking.rst
index 8138201efb09..0bf346adac2a 100644
--- a/Documentation/virt/kvm/locking.rst
+++ b/Documentation/virt/kvm/locking.rst
@@ -36,6 +36,9 @@ On x86:
   holding kvm->arch.mmu_lock (typically with ``read_lock``, otherwise
   there's no need to take kvm->arch.tdp_mmu_pages_lock at all).
 
+- kvm->arch.tsc_write_lock is taken outside
+  kvm->arch.pvclock_gtod_sync_lock
+
 Everything else is a leaf: no other lock is taken inside the critical
 sections.
 
@@ -222,6 +225,14 @@ time it will be set using the Dirty tracking mechanism 
described above.
 :Comment:  'raw' because hardware enabling/disabling must be atomic /wrt
migration.
 
+:Name: kvm_arch::pvclock_gtod_sync_lock
+:Type: raw_spinlock_t
+:Arch: x86
+:Protects: kvm_arch::{cur_tsc_generation,cur_tsc_nsec,cur_tsc_write,
+   cur_tsc_offset,nr_vcpus_matched_tsc}
+:Comment:  'raw' because updating the kvm master clock must not be
+   preempted.
+
 :Name: kvm_arch::tsc_write_lock
 :Type: raw_spinlock
 :Arch: x86
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 26f1fa263192..93b449761fbe 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2529,7 +2529,6 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, 
u64 data)
vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write;
 
kvm_vcpu_write_tsc_offset(vcpu, offset);
-   raw_spin_unlock_irqrestore(>arch.tsc_write_lock, flags);
 
spin_lock_irqsave(>arch.pvclock_gtod_sync_lock, flags);
if (!matched) {
@@ -2540,6 +2539,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, 
u64 data)
 
kvm_track_tsc_matching(vcpu);
spin_unlock_irqrestore(>arch.pvclock_gtod_sync_lock, flags);
+   raw_spin_unlock_irqrestore(>arch.tsc_write_lock, flags);
 }
 
 static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu,
-- 
2.32.0.605.g8dce9f2422-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v6 15/21] selftests: KVM: Add support for aarch64 to system_counter_offset_test

2021-08-04 Thread Oliver Upton
KVM/arm64 now allows userspace to adjust the guest virtual counter-timer
via a vCPU register. Test that changes to the virtual counter-timer
offset result in the correct view being presented to the guest.

Reviewed-by: Andrew Jones 
Signed-off-by: Oliver Upton 
---
 tools/testing/selftests/kvm/Makefile  |  1 +
 .../selftests/kvm/include/aarch64/processor.h | 12 
 .../kvm/system_counter_offset_test.c  | 56 ++-
 3 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/Makefile 
b/tools/testing/selftests/kvm/Makefile
index 9f7060c02668..fab42e7c23ee 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -98,6 +98,7 @@ TEST_GEN_PROGS_aarch64 += kvm_page_table_test
 TEST_GEN_PROGS_aarch64 += set_memory_region_test
 TEST_GEN_PROGS_aarch64 += steal_time
 TEST_GEN_PROGS_aarch64 += kvm_binary_stats_test
+TEST_GEN_PROGS_aarch64 += system_counter_offset_test
 
 TEST_GEN_PROGS_s390x = s390x/memop
 TEST_GEN_PROGS_s390x += s390x/resets
diff --git a/tools/testing/selftests/kvm/include/aarch64/processor.h 
b/tools/testing/selftests/kvm/include/aarch64/processor.h
index 27dc5c2e56b9..3168cdbae6ee 100644
--- a/tools/testing/selftests/kvm/include/aarch64/processor.h
+++ b/tools/testing/selftests/kvm/include/aarch64/processor.h
@@ -129,4 +129,16 @@ void vm_install_sync_handler(struct kvm_vm *vm,
 
 #define isb()  asm volatile("isb" : : : "memory")
 
+static inline uint64_t read_cntvct_ordered(void)
+{
+   uint64_t r;
+
+   __asm__ __volatile__("isb\n\t"
+"mrs %0, cntvct_el0\n\t"
+"isb\n\t"
+: "=r"(r));
+
+   return r;
+}
+
 #endif /* SELFTEST_KVM_PROCESSOR_H */
diff --git a/tools/testing/selftests/kvm/system_counter_offset_test.c 
b/tools/testing/selftests/kvm/system_counter_offset_test.c
index b337bbbfa41f..ac933db83d03 100644
--- a/tools/testing/selftests/kvm/system_counter_offset_test.c
+++ b/tools/testing/selftests/kvm/system_counter_offset_test.c
@@ -53,7 +53,61 @@ static uint64_t host_read_guest_system_counter(struct 
test_case *test)
return rdtsc() + test->tsc_offset;
 }
 
-#else /* __x86_64__ */
+#elif __aarch64__ /* __x86_64__ */
+
+enum arch_counter {
+   VIRTUAL,
+};
+
+struct test_case {
+   enum arch_counter counter;
+   uint64_t offset;
+};
+
+static struct test_case test_cases[] = {
+   { .counter = VIRTUAL, .offset = 0 },
+   { .counter = VIRTUAL, .offset = 180 * NSEC_PER_SEC },
+   { .counter = VIRTUAL, .offset = -180 * NSEC_PER_SEC },
+};
+
+static void check_preconditions(struct kvm_vm *vm)
+{
+   if (vcpu_has_reg(vm, VCPU_ID, KVM_REG_ARM_TIMER_OFFSET))
+   return;
+
+   print_skip("KVM_REG_ARM_TIMER_OFFSET not supported; skipping test");
+   exit(KSFT_SKIP);
+}
+
+static void setup_system_counter(struct kvm_vm *vm, struct test_case *test)
+{
+   struct kvm_one_reg reg = {
+   .id = KVM_REG_ARM_TIMER_OFFSET,
+   .addr = (__u64)>offset,
+   };
+
+   vcpu_set_reg(vm, VCPU_ID, );
+}
+
+static uint64_t guest_read_system_counter(struct test_case *test)
+{
+   switch (test->counter) {
+   case VIRTUAL:
+   return read_cntvct_ordered();
+   default:
+   GUEST_ASSERT(0);
+   }
+
+   /* unreachable */
+   return 0;
+}
+
+static uint64_t host_read_guest_system_counter(struct test_case *test)
+{
+   return read_cntvct_ordered() - test->offset;
+}
+
+#else /* __aarch64__ */
 
 #error test not implemented for this architecture!
 
-- 
2.32.0.605.g8dce9f2422-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v6 18/21] KVM: arm64: Configure timer traps in vcpu_load() for VHE

2021-08-04 Thread Oliver Upton
In preparation for emulated physical counter-timer offsetting, configure
traps on every vcpu_load() for VHE systems. As before, these trap
settings do not affect host userspace, and are only active for the
guest.

Suggested-by: Marc Zyngier 
Signed-off-by: Oliver Upton 
---
 arch/arm64/kvm/arch_timer.c  | 10 +++---
 arch/arm64/kvm/arm.c |  4 +---
 include/kvm/arm_arch_timer.h |  2 --
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index f15058612994..9ead94aa867d 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -51,6 +51,7 @@ static void kvm_arm_timer_write(struct kvm_vcpu *vcpu,
 static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
  struct arch_timer_context *timer,
  enum kvm_arch_timer_regs treg);
+static void kvm_timer_enable_traps_vhe(void);
 
 u32 timer_get_ctl(struct arch_timer_context *ctxt)
 {
@@ -668,6 +669,9 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
 
if (map.emul_ptimer)
timer_emulate(map.emul_ptimer);
+
+   if (has_vhe())
+   kvm_timer_enable_traps_vhe();
 }
 
 bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
@@ -1383,12 +1387,12 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
 }
 
 /*
- * On VHE system, we only need to configure the EL2 timer trap register once,
- * not for every world switch.
+ * On VHE system, we only need to configure the EL2 timer trap register on
+ * vcpu_load(), but not every world switch into the guest.
  * The host kernel runs at EL2 with HCR_EL2.TGE == 1,
  * and this makes those bits have no effect for the host kernel execution.
  */
-void kvm_timer_init_vhe(void)
+static void kvm_timer_enable_traps_vhe(void)
 {
/* When HCR_EL2.E2H ==1, EL1PCEN and EL1PCTEN are shifted by 10 */
u32 cnthctl_shift = 10;
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index e9a2b8f27792..47ea1e1ba80b 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1558,9 +1558,7 @@ static void cpu_hyp_reinit(void)
 
cpu_hyp_reset();
 
-   if (is_kernel_in_hyp_mode())
-   kvm_timer_init_vhe();
-   else
+   if (!is_kernel_in_hyp_mode())
cpu_init_hyp_mode();
 
cpu_set_hyp_vector();
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index 615f9314f6a5..254653b42da0 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -87,8 +87,6 @@ u64 kvm_phys_timer_read(void);
 void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu);
 void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu);
 
-void kvm_timer_init_vhe(void);
-
 bool kvm_arch_timer_get_input_level(int vintid);
 
 #define vcpu_timer(v)  (&(v)->arch.timer_cpu)
-- 
2.32.0.605.g8dce9f2422-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v6 07/21] selftests: KVM: Add test for KVM_{GET,SET}_CLOCK

2021-08-04 Thread Oliver Upton
Add a selftest for the new KVM clock UAPI that was introduced. Ensure
that the KVM clock is consistent between userspace and the guest, and
that the difference in realtime will only ever cause the KVM clock to
advance forward.

Cc: Andrew Jones 
Signed-off-by: Oliver Upton 
---
 tools/testing/selftests/kvm/.gitignore|   1 +
 tools/testing/selftests/kvm/Makefile  |   1 +
 .../testing/selftests/kvm/include/kvm_util.h  |   2 +
 .../selftests/kvm/x86_64/kvm_clock_test.c | 204 ++
 4 files changed, 208 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86_64/kvm_clock_test.c

diff --git a/tools/testing/selftests/kvm/.gitignore 
b/tools/testing/selftests/kvm/.gitignore
index 36896d251977..958a809c8de4 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -11,6 +11,7 @@
 /x86_64/emulator_error_test
 /x86_64/get_cpuid_test
 /x86_64/get_msr_index_features
+/x86_64/kvm_clock_test
 /x86_64/kvm_pv_test
 /x86_64/hyperv_clock
 /x86_64/hyperv_cpuid
diff --git a/tools/testing/selftests/kvm/Makefile 
b/tools/testing/selftests/kvm/Makefile
index c103873531e0..0f94b18b33ce 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -46,6 +46,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/get_cpuid_test
 TEST_GEN_PROGS_x86_64 += x86_64/hyperv_clock
 TEST_GEN_PROGS_x86_64 += x86_64/hyperv_cpuid
 TEST_GEN_PROGS_x86_64 += x86_64/hyperv_features
+TEST_GEN_PROGS_x86_64 += x86_64/kvm_clock_test
 TEST_GEN_PROGS_x86_64 += x86_64/kvm_pv_test
 TEST_GEN_PROGS_x86_64 += x86_64/mmio_warning_test
 TEST_GEN_PROGS_x86_64 += x86_64/mmu_role_test
diff --git a/tools/testing/selftests/kvm/include/kvm_util.h 
b/tools/testing/selftests/kvm/include/kvm_util.h
index 010b59b13917..a8ac5d52e17b 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -19,6 +19,8 @@
 #define KVM_DEV_PATH "/dev/kvm"
 #define KVM_MAX_VCPUS 512
 
+#define NSEC_PER_SEC 10L
+
 /*
  * Callers of kvm_util only have an incomplete/opaque description of the
  * structure kvm_util is using to maintain the state of a VM.
diff --git a/tools/testing/selftests/kvm/x86_64/kvm_clock_test.c 
b/tools/testing/selftests/kvm/x86_64/kvm_clock_test.c
new file mode 100644
index ..e0dcc27ae9f1
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/kvm_clock_test.c
@@ -0,0 +1,204 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2021, Google LLC.
+ *
+ * Tests for adjusting the KVM clock from userspace
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+
+#define VCPU_ID 0
+
+struct test_case {
+   uint64_t kvmclock_base;
+   int64_t realtime_offset;
+};
+
+static struct test_case test_cases[] = {
+   { .kvmclock_base = 0 },
+   { .kvmclock_base = 180 * NSEC_PER_SEC },
+   { .kvmclock_base = 0, .realtime_offset = -180 * NSEC_PER_SEC },
+   { .kvmclock_base = 0, .realtime_offset = 180 * NSEC_PER_SEC },
+};
+
+#define GUEST_SYNC_CLOCK(__stage, __val)   \
+   GUEST_SYNC_ARGS(__stage, __val, 0, 0, 0)
+
+static void guest_main(vm_paddr_t pvti_pa, struct pvclock_vcpu_time_info *pvti)
+{
+   int i;
+
+   wrmsr(MSR_KVM_SYSTEM_TIME_NEW, pvti_pa | KVM_MSR_ENABLED);
+   for (i = 0; i < ARRAY_SIZE(test_cases); i++)
+   GUEST_SYNC_CLOCK(i, __pvclock_read_cycles(pvti, rdtsc()));
+}
+
+#define EXPECTED_FLAGS (KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC)
+
+static inline void assert_flags(struct kvm_clock_data *data)
+{
+   TEST_ASSERT((data->flags & EXPECTED_FLAGS) == EXPECTED_FLAGS,
+   "unexpected clock data flags: %x (want set: %x)",
+   data->flags, EXPECTED_FLAGS);
+}
+
+static void handle_sync(struct ucall *uc, struct kvm_clock_data *start,
+   struct kvm_clock_data *end)
+{
+   uint64_t obs, exp_lo, exp_hi;
+
+   obs = uc->args[2];
+   exp_lo = start->clock;
+   exp_hi = end->clock;
+
+   assert_flags(start);
+   assert_flags(end);
+
+   TEST_ASSERT(exp_lo <= obs && obs <= exp_hi,
+   "unexpected kvm-clock value: %"PRIu64" expected range: 
[%"PRIu64", %"PRIu64"]",
+   obs, exp_lo, exp_hi);
+
+   pr_info("kvm-clock value: %"PRIu64" expected range [%"PRIu64", 
%"PRIu64"]\n",
+   obs, exp_lo, exp_hi);
+}
+
+static void handle_abort(struct ucall *uc)
+{
+   TEST_FAIL("%s at %s:%ld", (const char *)uc->args[0],
+ __FILE__, uc->args[1]);
+}
+
+static void setup_clock(struct kvm_vm *vm, struct test_case *test_case)
+{
+   struct kvm_clock_data data;
+
+   memset(, 0, sizeof(data));
+
+   data.clock = test_case->kvmclock_base;
+   if (test_case->realtime_offset) {
+   struct timespec ts;
+   int r;
+
+   data.flags |= 

[PATCH v6 12/21] KVM: arm64: Separate guest/host counter offset values

2021-08-04 Thread Oliver Upton
In some instances, a VMM may want to update the guest's counter-timer
offset in a transparent manner, meaning that changes to the hardware
value do not affect the synthetic register presented to the guest or the
VMM through said guest's architectural state. Lay the groundwork to
separate guest offset register writes from the hardware values utilized
by KVM.

Signed-off-by: Oliver Upton 
---
 arch/arm64/kvm/arch_timer.c  | 48 
 include/kvm/arm_arch_timer.h |  3 +++
 2 files changed, 46 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index c0101db75ad4..4c2b763a8849 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -87,6 +87,18 @@ static u64 timer_get_offset(struct arch_timer_context *ctxt)
struct kvm_vcpu *vcpu = ctxt->vcpu;
 
switch(arch_timer_ctx_index(ctxt)) {
+   case TIMER_VTIMER:
+   return ctxt->host_offset;
+   default:
+   return 0;
+   }
+}
+
+static u64 timer_get_guest_offset(struct arch_timer_context *ctxt)
+{
+   struct kvm_vcpu *vcpu = ctxt->vcpu;
+
+   switch (arch_timer_ctx_index(ctxt)) {
case TIMER_VTIMER:
return __vcpu_sys_reg(vcpu, CNTVOFF_EL2);
default:
@@ -132,13 +144,31 @@ static void timer_set_offset(struct arch_timer_context 
*ctxt, u64 offset)
 
switch(arch_timer_ctx_index(ctxt)) {
case TIMER_VTIMER:
-   __vcpu_sys_reg(vcpu, CNTVOFF_EL2) = offset;
+   ctxt->host_offset = offset;
break;
default:
WARN(offset, "timer %ld\n", arch_timer_ctx_index(ctxt));
}
 }
 
+static void timer_set_guest_offset(struct arch_timer_context *ctxt, u64 offset)
+{
+   struct kvm_vcpu *vcpu = ctxt->vcpu;
+
+   switch (arch_timer_ctx_index(ctxt)) {
+   case TIMER_VTIMER: {
+   u64 host_offset = timer_get_offset(ctxt);
+
+   host_offset += offset - __vcpu_sys_reg(vcpu, CNTVOFF_EL2);
+   __vcpu_sys_reg(vcpu, CNTVOFF_EL2) = offset;
+   timer_set_offset(ctxt, host_offset);
+   break;
+   }
+   default:
+   WARN_ONCE(offset, "timer %ld\n", arch_timer_ctx_index(ctxt));
+   }
+}
+
 u64 kvm_phys_timer_read(void)
 {
return timecounter->cc->read(timecounter->cc);
@@ -749,7 +779,8 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
 
 /* Make offset updates for all timer contexts atomic */
 static void update_timer_offset(struct kvm_vcpu *vcpu,
-   enum kvm_arch_timers timer, u64 offset)
+   enum kvm_arch_timers timer, u64 offset,
+   bool guest_visible)
 {
int i;
struct kvm *kvm = vcpu->kvm;
@@ -758,13 +789,20 @@ static void update_timer_offset(struct kvm_vcpu *vcpu,
lockdep_assert_held(>lock);
 
kvm_for_each_vcpu(i, tmp, kvm)
-   timer_set_offset(vcpu_get_timer(tmp, timer), offset);
+   if (guest_visible)
+   timer_set_guest_offset(vcpu_get_timer(tmp, timer),
+  offset);
+   else
+   timer_set_offset(vcpu_get_timer(tmp, timer), offset);
 
/*
 * When called from the vcpu create path, the CPU being created is not
 * included in the loop above, so we just set it here as well.
 */
-   timer_set_offset(vcpu_get_timer(vcpu, timer), offset);
+   if (guest_visible)
+   timer_set_guest_offset(vcpu_get_timer(vcpu, timer), offset);
+   else
+   timer_set_offset(vcpu_get_timer(vcpu, timer), offset);
 }
 
 static void update_vtimer_cntvoff(struct kvm_vcpu *vcpu, u64 cntvoff)
@@ -772,7 +810,7 @@ static void update_vtimer_cntvoff(struct kvm_vcpu *vcpu, 
u64 cntvoff)
struct kvm *kvm = vcpu->kvm;
 
mutex_lock(>lock);
-   update_timer_offset(vcpu, TIMER_VTIMER, cntvoff);
+   update_timer_offset(vcpu, TIMER_VTIMER, cntvoff, true);
mutex_unlock(>lock);
 }
 
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index 51c19381108c..9d65d4a29f81 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -42,6 +42,9 @@ struct arch_timer_context {
/* Duplicated state from arch_timer.c for convenience */
u32 host_timer_irq;
u32 host_timer_irq_flags;
+
+   /* offset relative to the host's physical counter-timer */
+   u64 host_offset;
 };
 
 struct timer_map {
-- 
2.32.0.605.g8dce9f2422-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v6 02/21] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK

2021-08-04 Thread Oliver Upton
Handling the migration of TSCs correctly is difficult, in part because
Linux does not provide userspace with the ability to retrieve a (TSC,
realtime) clock pair for a single instant in time. In lieu of a more
convenient facility, KVM can report similar information in the kvm_clock
structure.

Provide userspace with a host TSC & realtime pair iff the realtime clock
is based on the TSC. If userspace provides KVM_SET_CLOCK with a valid
realtime value, advance the KVM clock by the amount of elapsed time. Do
not step the KVM clock backwards, though, as it is a monotonic
oscillator.

Suggested-by: Paolo Bonzini 
Signed-off-by: Oliver Upton 
---
 Documentation/virt/kvm/api.rst  |  42 ---
 arch/x86/include/asm/kvm_host.h |   3 +
 arch/x86/kvm/x86.c  | 127 ++--
 include/uapi/linux/kvm.h|   7 +-
 4 files changed, 112 insertions(+), 67 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index dae68e68ca23..8d4a3471ad9e 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -993,20 +993,34 @@ such as migration.
 When KVM_CAP_ADJUST_CLOCK is passed to KVM_CHECK_EXTENSION, it returns the
 set of bits that KVM can return in struct kvm_clock_data's flag member.
 
-The only flag defined now is KVM_CLOCK_TSC_STABLE.  If set, the returned
-value is the exact kvmclock value seen by all VCPUs at the instant
-when KVM_GET_CLOCK was called.  If clear, the returned value is simply
-CLOCK_MONOTONIC plus a constant offset; the offset can be modified
-with KVM_SET_CLOCK.  KVM will try to make all VCPUs follow this clock,
-but the exact value read by each VCPU could differ, because the host
-TSC is not stable.
+FLAGS:
+
+KVM_CLOCK_TSC_STABLE.  If set, the returned value is the exact kvmclock
+value seen by all VCPUs at the instant when KVM_GET_CLOCK was called.
+If clear, the returned value is simply CLOCK_MONOTONIC plus a constant
+offset; the offset can be modified with KVM_SET_CLOCK.  KVM will try
+to make all VCPUs follow this clock, but the exact value read by each
+VCPU could differ, because the host TSC is not stable.
+
+KVM_CLOCK_REALTIME.  If set, the `realtime` field in the kvm_clock_data
+structure is populated with the value of the host's real time
+clocksource at the instant when KVM_GET_CLOCK was called. If clear,
+the `realtime` field does not contain a value.
+
+KVM_CLOCK_HOST_TSC.  If set, the `host_tsc` field in the kvm_clock_data
+structure is populated with the value of the host's timestamp counter (TSC)
+at the instant when KVM_GET_CLOCK was called. If clear, the `host_tsc` field
+does not contain a value.
 
 ::
 
   struct kvm_clock_data {
__u64 clock;  /* kvmclock current value */
__u32 flags;
-   __u32 pad[9];
+   __u32 pad0;
+   __u64 realtime;
+   __u64 host_tsc;
+   __u32 pad[4];
   };
 
 
@@ -1023,12 +1037,22 @@ Sets the current timestamp of kvmclock to the value 
specified in its parameter.
 In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on 
scenarios
 such as migration.
 
+FLAGS:
+
+KVM_CLOCK_REALTIME.  If set, KVM will compare the value of the `realtime` field
+with the value of the host's real time clocksource at the instant when
+KVM_SET_CLOCK was called. The difference in elapsed time is added to the final
+kvmclock value that will be provided to guests.
+
 ::
 
   struct kvm_clock_data {
__u64 clock;  /* kvmclock current value */
__u32 flags;
-   __u32 pad[9];
+   __u32 pad0;
+   __u64 realtime;
+   __u64 host_tsc;
+   __u32 pad[4];
   };
 
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6818095dd157..d6376ca8efce 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1926,4 +1926,7 @@ int kvm_cpu_dirty_log_size(void);
 
 int alloc_all_memslots_rmaps(struct kvm *kvm);
 
+#define KVM_CLOCK_VALID_FLAGS  \
+   (KVM_CLOCK_TSC_STABLE | KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC)
+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 34287c522f4e..26f1fa263192 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2804,10 +2804,20 @@ static void get_kvmclock(struct kvm *kvm, struct 
kvm_clock_data *data)
get_cpu();
 
if (__this_cpu_read(cpu_tsc_khz)) {
+#ifdef CONFIG_X86_64
+   struct timespec64 ts;
+
+   if (kvm_get_walltime_and_clockread(, >host_tsc)) {
+   data->realtime = ts.tv_nsec + NSEC_PER_SEC * ts.tv_sec;
+   data->flags |= KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC;
+   } else
+#endif
+   data->host_tsc = rdtsc();
+
kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 
1000LL,
   _clock.tsc_shift,
   _clock.tsc_to_system_mul);
- 

[PATCH v6 04/21] KVM: x86: Refactor tsc synchronization code

2021-08-04 Thread Oliver Upton
Refactor kvm_synchronize_tsc to make a new function that allows callers
to specify TSC parameters (offset, value, nanoseconds, etc.) explicitly
for the sake of participating in TSC synchronization.

Signed-off-by: Oliver Upton 
---
 arch/x86/kvm/x86.c | 105 ++---
 1 file changed, 61 insertions(+), 44 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 93b449761fbe..91aea751d621 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2443,13 +2443,71 @@ static inline bool kvm_check_tsc_unstable(void)
return check_tsc_unstable();
 }
 
+/*
+ * Infers attempts to synchronize the guest's tsc from host writes. Sets the
+ * offset for the vcpu and tracks the TSC matching generation that the vcpu
+ * participates in.
+ */
+static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc,
+ u64 ns, bool matched)
+{
+   struct kvm *kvm = vcpu->kvm;
+   bool already_matched;
+
+   lockdep_assert_held(>arch.tsc_write_lock);
+
+   already_matched =
+  (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation);
+
+   /*
+* We track the most recent recorded KHZ, write and time to
+* allow the matching interval to be extended at each write.
+*/
+   kvm->arch.last_tsc_nsec = ns;
+   kvm->arch.last_tsc_write = tsc;
+   kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz;
+
+   vcpu->arch.last_guest_tsc = tsc;
+
+   /* Keep track of which generation this VCPU has synchronized to */
+   vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation;
+   vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec;
+   vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write;
+
+   kvm_vcpu_write_tsc_offset(vcpu, offset);
+
+   if (!matched) {
+   /*
+* We split periods of matched TSC writes into generations.
+* For each generation, we track the original measured
+* nanosecond time, offset, and write, so if TSCs are in
+* sync, we can match exact offset, and if not, we can match
+* exact software computation in compute_guest_tsc()
+*
+* These values are tracked in kvm->arch.cur_xxx variables.
+*/
+   kvm->arch.cur_tsc_generation++;
+   kvm->arch.cur_tsc_nsec = ns;
+   kvm->arch.cur_tsc_write = tsc;
+   kvm->arch.cur_tsc_offset = offset;
+
+   spin_lock(>arch.pvclock_gtod_sync_lock);
+   kvm->arch.nr_vcpus_matched_tsc = 0;
+   } else if (!already_matched) {
+   spin_lock(>arch.pvclock_gtod_sync_lock);
+   kvm->arch.nr_vcpus_matched_tsc++;
+   }
+
+   kvm_track_tsc_matching(vcpu);
+   spin_unlock(>arch.pvclock_gtod_sync_lock);
+}
+
 static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
 {
struct kvm *kvm = vcpu->kvm;
u64 offset, ns, elapsed;
unsigned long flags;
-   bool matched;
-   bool already_matched;
+   bool matched = false;
bool synchronizing = false;
 
raw_spin_lock_irqsave(>arch.tsc_write_lock, flags);
@@ -2495,50 +2553,9 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, 
u64 data)
offset = kvm_compute_l1_tsc_offset(vcpu, data);
}
matched = true;
-   already_matched = (vcpu->arch.this_tsc_generation == 
kvm->arch.cur_tsc_generation);
-   } else {
-   /*
-* We split periods of matched TSC writes into generations.
-* For each generation, we track the original measured
-* nanosecond time, offset, and write, so if TSCs are in
-* sync, we can match exact offset, and if not, we can match
-* exact software computation in compute_guest_tsc()
-*
-* These values are tracked in kvm->arch.cur_xxx variables.
-*/
-   kvm->arch.cur_tsc_generation++;
-   kvm->arch.cur_tsc_nsec = ns;
-   kvm->arch.cur_tsc_write = data;
-   kvm->arch.cur_tsc_offset = offset;
-   matched = false;
}
 
-   /*
-* We also track th most recent recorded KHZ, write and time to
-* allow the matching interval to be extended at each write.
-*/
-   kvm->arch.last_tsc_nsec = ns;
-   kvm->arch.last_tsc_write = data;
-   kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz;
-
-   vcpu->arch.last_guest_tsc = data;
-
-   /* Keep track of which generation this VCPU has synchronized to */
-   vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation;
-   vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec;
-   vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write;
-
-   kvm_vcpu_write_tsc_offset(vcpu, offset);
-
-   

[PATCH v6 20/21] selftests: KVM: Test physical counter offsetting

2021-08-04 Thread Oliver Upton
Test that userspace adjustment of the guest physical counter-timer
results in the correct view within the guest.

Cc: Andrew Jones 
Signed-off-by: Oliver Upton 
---
 .../selftests/kvm/include/aarch64/processor.h | 12 +++
 .../kvm/system_counter_offset_test.c  | 31 +--
 2 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/aarch64/processor.h 
b/tools/testing/selftests/kvm/include/aarch64/processor.h
index 3168cdbae6ee..7f53d90e9512 100644
--- a/tools/testing/selftests/kvm/include/aarch64/processor.h
+++ b/tools/testing/selftests/kvm/include/aarch64/processor.h
@@ -141,4 +141,16 @@ static inline uint64_t read_cntvct_ordered(void)
return r;
 }
 
+static inline uint64_t read_cntpct_ordered(void)
+{
+   uint64_t r;
+
+   __asm__ __volatile__("isb\n\t"
+"mrs %0, cntpct_el0\n\t"
+"isb\n\t"
+: "=r"(r));
+
+   return r;
+}
+
 #endif /* SELFTEST_KVM_PROCESSOR_H */
diff --git a/tools/testing/selftests/kvm/system_counter_offset_test.c 
b/tools/testing/selftests/kvm/system_counter_offset_test.c
index ac933db83d03..82d26a45cc48 100644
--- a/tools/testing/selftests/kvm/system_counter_offset_test.c
+++ b/tools/testing/selftests/kvm/system_counter_offset_test.c
@@ -57,6 +57,9 @@ static uint64_t host_read_guest_system_counter(struct 
test_case *test)
 
 enum arch_counter {
VIRTUAL,
+   PHYSICAL,
+   /* offset physical, read virtual */
+   PHYSICAL_READ_VIRTUAL,
 };
 
 struct test_case {
@@ -68,32 +71,54 @@ static struct test_case test_cases[] = {
{ .counter = VIRTUAL, .offset = 0 },
{ .counter = VIRTUAL, .offset = 180 * NSEC_PER_SEC },
{ .counter = VIRTUAL, .offset = -180 * NSEC_PER_SEC },
+   { .counter = PHYSICAL, .offset = 0 },
+   { .counter = PHYSICAL, .offset = 180 * NSEC_PER_SEC },
+   { .counter = PHYSICAL, .offset = -180 * NSEC_PER_SEC },
+   { .counter = PHYSICAL_READ_VIRTUAL, .offset = 0 },
+   { .counter = PHYSICAL_READ_VIRTUAL, .offset = 180 * NSEC_PER_SEC },
+   { .counter = PHYSICAL_READ_VIRTUAL, .offset = -180 * NSEC_PER_SEC },
 };
 
 static void check_preconditions(struct kvm_vm *vm)
 {
-   if (vcpu_has_reg(vm, VCPU_ID, KVM_REG_ARM_TIMER_OFFSET))
+   if (vcpu_has_reg(vm, VCPU_ID, KVM_REG_ARM_TIMER_OFFSET) &&
+   !_vcpu_has_device_attr(vm, VCPU_ID, KVM_ARM_VCPU_TIMER_CTRL,
+  KVM_ARM_VCPU_TIMER_OFFSET))
return;
 
-   print_skip("KVM_REG_ARM_TIMER_OFFSET not supported; skipping test");
+   print_skip("KVM_REG_ARM_TIMER_OFFSET|KVM_ARM_VCPU_TIMER_OFFSET not 
supported; skipping test");
exit(KSFT_SKIP);
 }
 
 static void setup_system_counter(struct kvm_vm *vm, struct test_case *test)
 {
+   uint64_t cntvoff, cntpoff;
struct kvm_one_reg reg = {
.id = KVM_REG_ARM_TIMER_OFFSET,
-   .addr = (__u64)>offset,
+   .addr = (__u64),
};
 
+   if (test->counter == VIRTUAL) {
+   cntvoff = test->offset;
+   cntpoff = 0;
+   } else {
+   cntvoff = 0;
+   cntpoff = test->offset;
+   }
+
vcpu_set_reg(vm, VCPU_ID, );
+   vcpu_access_device_attr(vm, VCPU_ID, KVM_ARM_VCPU_TIMER_CTRL,
+   KVM_ARM_VCPU_TIMER_OFFSET, , true);
 }
 
 static uint64_t guest_read_system_counter(struct test_case *test)
 {
switch (test->counter) {
case VIRTUAL:
+   case PHYSICAL_READ_VIRTUAL:
return read_cntvct_ordered();
+   case PHYSICAL:
+   return read_cntpct_ordered();
default:
GUEST_ASSERT(0);
}
-- 
2.32.0.605.g8dce9f2422-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v6 01/21] KVM: x86: Fix potential race in KVM_GET_CLOCK

2021-08-04 Thread Oliver Upton
Sean noticed that KVM_GET_CLOCK was checking kvm_arch.use_master_clock
outside of the pvclock sync lock. This is problematic, as the clock
value written to the user may or may not actually correspond to a stable
TSC.

Fix the race by populating the entire kvm_clock_data structure behind
the pvclock_gtod_sync_lock.

Suggested-by: Sean Christopherson 
Signed-off-by: Oliver Upton 
---
 arch/x86/kvm/x86.c | 39 ---
 1 file changed, 28 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8cdf4ac6990b..34287c522f4e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2782,19 +2782,20 @@ static void kvm_gen_update_masterclock(struct kvm *kvm)
 #endif
 }
 
-u64 get_kvmclock_ns(struct kvm *kvm)
+static void get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data)
 {
struct kvm_arch *ka = >arch;
struct pvclock_vcpu_time_info hv_clock;
unsigned long flags;
-   u64 ret;
 
spin_lock_irqsave(>pvclock_gtod_sync_lock, flags);
if (!ka->use_master_clock) {
spin_unlock_irqrestore(>pvclock_gtod_sync_lock, flags);
-   return get_kvmclock_base_ns() + ka->kvmclock_offset;
+   data->clock = get_kvmclock_base_ns() + ka->kvmclock_offset;
+   return;
}
 
+   data->flags |= KVM_CLOCK_TSC_STABLE;
hv_clock.tsc_timestamp = ka->master_cycle_now;
hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset;
spin_unlock_irqrestore(>pvclock_gtod_sync_lock, flags);
@@ -2806,13 +2807,26 @@ u64 get_kvmclock_ns(struct kvm *kvm)
kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 
1000LL,
   _clock.tsc_shift,
   _clock.tsc_to_system_mul);
-   ret = __pvclock_read_cycles(_clock, rdtsc());
-   } else
-   ret = get_kvmclock_base_ns() + ka->kvmclock_offset;
+   data->clock = __pvclock_read_cycles(_clock, rdtsc());
+   } else {
+   data->clock = get_kvmclock_base_ns() + ka->kvmclock_offset;
+   }
 
put_cpu();
+}
 
-   return ret;
+u64 get_kvmclock_ns(struct kvm *kvm)
+{
+   struct kvm_clock_data data;
+
+   /*
+* Zero flags as it's accessed RMW, leave everything else uninitialized
+* as clock is always written and no other fields are consumed.
+*/
+   data.flags = 0;
+
+   get_kvmclock(kvm, );
+   return data.clock;
 }
 
 static void kvm_setup_pvclock_page(struct kvm_vcpu *v,
@@ -6104,11 +6118,14 @@ long kvm_arch_vm_ioctl(struct file *filp,
}
case KVM_GET_CLOCK: {
struct kvm_clock_data user_ns;
-   u64 now_ns;
 
-   now_ns = get_kvmclock_ns(kvm);
-   user_ns.clock = now_ns;
-   user_ns.flags = kvm->arch.use_master_clock ? 
KVM_CLOCK_TSC_STABLE : 0;
+   /*
+* Zero flags as it is accessed RMW, leave everything else
+* uninitialized as clock is always written and no other fields
+* are consumed.
+*/
+   user_ns.flags = 0;
+   get_kvmclock(kvm, _ns);
memset(_ns.pad, 0, sizeof(user_ns.pad));
 
r = -EFAULT;
-- 
2.32.0.605.g8dce9f2422-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v6 00/21] KVM: Add idempotent controls for migrating system counter state

2021-08-04 Thread Oliver Upton
KVM's current means of saving/restoring system counters is plagued with
temporal issues. At least on ARM64 and x86, we migrate the guest's
system counter by-value through the respective guest system register
values (cntvct_el0, ia32_tsc). Restoring system counters by-value is
brittle as the state is not idempotent: the host system counter is still
oscillating between the attempted save and restore. Furthermore, VMMs
may wish to transparently live migrate guest VMs, meaning that they
include the elapsed time due to live migration blackout in the guest
system counter view. The VMM thread could be preempted for any number of
reasons (scheduler, L0 hypervisor under nested) between the time that
it calculates the desired guest counter value and when KVM actually sets
this counter state.

Despite the value-based interface that we present to userspace, KVM
actually has idempotent guest controls by way of system counter offsets.
We can avoid all of the issues associated with a value-based interface
by abstracting these offset controls in new ioctls. This series
introduces new vCPU device attributes to provide userspace access to the
vCPU's system counter offset.

Patch 1 addresses a possible race in KVM_GET_CLOCK where
use_master_clock is read outside of the pvclock_gtod_sync_lock.

Patch 2 adopts Paolo's suggestion, augmenting the KVM_{GET,SET}_CLOCK
ioctls to provide userspace with a (host_tsc, realtime) instant. This is
essential for a VMM to perform precise migration of the guest's system
counters.

Patches 3-4 are some preparatory changes for exposing the TSC offset to
userspace. Patch 5 provides a vCPU attribute to provide userspace access
to the TSC offset.

Patches 6-7 implement a test for the new additions to
KVM_{GET,SET}_CLOCK.

Patch 8 fixes some assertions in the kvm device attribute helpers.

Patches 9-10 implement at test for the tsc offset attribute introduced in
patch 5.

Patches 11-12 lay the groundwork for patch 13, which exposes CNTVOFF_EL2
through the ONE_REG interface.

Patches 14-15 add test cases for userspace manipulation of the virtual
counter-timer.

Patches 16-17 add a vCPU attribute to adjust the host-guest offset of an
ARM vCPU, but only implements support for ECV hosts. Patches 18-19 add
support for non-ECV hosts by emulating physical counter offsetting.

Patch 20 adds test cases for adjusting the host-guest offset, and
finally patch 21 adds a test to measure the emulation overhead of
CNTPCT_EL2.

This series was tested on both an Ampere Mt. Jade and Haswell systems.
Unfortunately, the ECV portions of this series are untested, as there is
no ECV-capable hardware and the ARM fast models only partially implement
ECV.

Physical counter benchmark
--

The following data was collected by running 1 iterations of the
benchmark test from Patch 21 on an Ampere Mt. Jade reference server, A 2S
machine with 2 80-core Ampere Altra SoCs. Measurements were collected
for both VHE and nVHE operation using the `kvm-arm.mode=` command-line
parameter.

nVHE


+++-+
|   Metric   | Native | Trapped |
+++-+
| Average| 54ns   | 148ns   |
| Standard Deviation | 124ns  | 122ns   |
| 95th Percentile| 258ns  | 348ns   |
+++-+

VHE
---

+++-+
|   Metric   | Native | Trapped |
+++-+
| Average| 53ns   | 152ns   |
| Standard Deviation | 92ns   | 94ns|
| 95th Percentile| 204ns  | 307ns   |
+++-+

This series applies cleanly to kvm/queue at the following commit:

6cd974485e25 ("KVM: selftests: Add a test of an unbacked nested PI descriptor")

v1 -> v2:
  - Reimplemented as vCPU device attributes instead of a distinct ioctl.
  - Added the (realtime, host_tsc) instant support to KVM_{GET,SET}_CLOCK
  - Changed the arm64 implementation to broadcast counter
offset values to all vCPUs in a guest. This upholds the
architectural expectations of a consistent counter-timer across CPUs.
  - Fixed a bug with traps in VHE mode. We now configure traps on every
transition into a guest to handle differing VMs (trapped, emulated).

v2 -> v3:
  - Added documentation for additions to KVM_{GET,SET}_CLOCK
  - Added documentation for all new vCPU attributes
  - Added documentation for suggested algorithm to migrate a guest's
TSC(s)
  - Bug fixes throughout series
  - Rename KVM_CLOCK_REAL_TIME -> KVM_CLOCK_REALTIME

v3 -> v4:
  - Added patch to address incorrect device helper assertions (Drew)
  - Carried Drew's r-b tags where appropriate
  - x86 selftest cleanup
  - Removed stale kvm_timer_init_vhe() function
  - Removed unnecessary GUEST_DONE() from selftests

v4 -> v5:
  - Fix typo in TSC migration algorithm
  - Carry more of Drew's r-b tags
  - clean up run loop logic in counter emulation benchmark (missed from
Drew's 

[PATCH v6 14/21] selftests: KVM: Add helper to check for register presence

2021-08-04 Thread Oliver Upton
The KVM_GET_REG_LIST vCPU ioctl returns a list of supported registers
for a given vCPU. Add a helper to check if a register exists in the list
of supported registers.

Signed-off-by: Oliver Upton 
---
 .../testing/selftests/kvm/include/kvm_util.h  |  2 ++
 tools/testing/selftests/kvm/lib/kvm_util.c| 19 +++
 2 files changed, 21 insertions(+)

diff --git a/tools/testing/selftests/kvm/include/kvm_util.h 
b/tools/testing/selftests/kvm/include/kvm_util.h
index 1b3ef5757819..077082dd2ca7 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -215,6 +215,8 @@ void vcpu_fpu_get(struct kvm_vm *vm, uint32_t vcpuid,
  struct kvm_fpu *fpu);
 void vcpu_fpu_set(struct kvm_vm *vm, uint32_t vcpuid,
  struct kvm_fpu *fpu);
+
+bool vcpu_has_reg(struct kvm_vm *vm, uint32_t vcpuid, uint64_t reg_id);
 void vcpu_get_reg(struct kvm_vm *vm, uint32_t vcpuid, struct kvm_one_reg *reg);
 void vcpu_set_reg(struct kvm_vm *vm, uint32_t vcpuid, struct kvm_one_reg *reg);
 #ifdef __KVM_HAVE_VCPU_EVENTS
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c 
b/tools/testing/selftests/kvm/lib/kvm_util.c
index 0fe66ca6139a..a5801d4ed37d 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -1823,6 +1823,25 @@ void vcpu_fpu_set(struct kvm_vm *vm, uint32_t vcpuid, 
struct kvm_fpu *fpu)
ret, errno, strerror(errno));
 }
 
+bool vcpu_has_reg(struct kvm_vm *vm, uint32_t vcpuid, uint64_t reg_id)
+{
+   struct kvm_reg_list *list;
+   bool ret = false;
+   uint64_t i;
+
+   list = vcpu_get_reg_list(vm, vcpuid);
+
+   for (i = 0; i < list->n; i++) {
+   if (list->reg[i] == reg_id) {
+   ret = true;
+   break;
+   }
+   }
+
+   free(list);
+   return ret;
+}
+
 void vcpu_get_reg(struct kvm_vm *vm, uint32_t vcpuid, struct kvm_one_reg *reg)
 {
int ret;
-- 
2.32.0.605.g8dce9f2422-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v6 06/21] tools: arch: x86: pull in pvclock headers

2021-08-04 Thread Oliver Upton
Copy over approximately clean versions of the pvclock headers into
tools. Reconcile headers/symbols missing in tools that are unneeded.

Signed-off-by: Oliver Upton 
---
 tools/arch/x86/include/asm/pvclock-abi.h |  48 +++
 tools/arch/x86/include/asm/pvclock.h | 103 +++
 2 files changed, 151 insertions(+)
 create mode 100644 tools/arch/x86/include/asm/pvclock-abi.h
 create mode 100644 tools/arch/x86/include/asm/pvclock.h

diff --git a/tools/arch/x86/include/asm/pvclock-abi.h 
b/tools/arch/x86/include/asm/pvclock-abi.h
new file mode 100644
index ..1436226efe3e
--- /dev/null
+++ b/tools/arch/x86/include/asm/pvclock-abi.h
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_PVCLOCK_ABI_H
+#define _ASM_X86_PVCLOCK_ABI_H
+#ifndef __ASSEMBLY__
+
+/*
+ * These structs MUST NOT be changed.
+ * They are the ABI between hypervisor and guest OS.
+ * Both Xen and KVM are using this.
+ *
+ * pvclock_vcpu_time_info holds the system time and the tsc timestamp
+ * of the last update. So the guest can use the tsc delta to get a
+ * more precise system time.  There is one per virtual cpu.
+ *
+ * pvclock_wall_clock references the point in time when the system
+ * time was zero (usually boot time), thus the guest calculates the
+ * current wall clock by adding the system time.
+ *
+ * Protocol for the "version" fields is: hypervisor raises it (making
+ * it uneven) before it starts updating the fields and raises it again
+ * (making it even) when it is done.  Thus the guest can make sure the
+ * time values it got are consistent by checking the version before
+ * and after reading them.
+ */
+
+struct pvclock_vcpu_time_info {
+   u32   version;
+   u32   pad0;
+   u64   tsc_timestamp;
+   u64   system_time;
+   u32   tsc_to_system_mul;
+   s8tsc_shift;
+   u8flags;
+   u8pad[2];
+} __attribute__((__packed__)); /* 32 bytes */
+
+struct pvclock_wall_clock {
+   u32   version;
+   u32   sec;
+   u32   nsec;
+} __attribute__((__packed__));
+
+#define PVCLOCK_TSC_STABLE_BIT (1 << 0)
+#define PVCLOCK_GUEST_STOPPED  (1 << 1)
+/* PVCLOCK_COUNTS_FROM_ZERO broke ABI and can't be used anymore. */
+#define PVCLOCK_COUNTS_FROM_ZERO (1 << 2)
+#endif /* __ASSEMBLY__ */
+#endif /* _ASM_X86_PVCLOCK_ABI_H */
diff --git a/tools/arch/x86/include/asm/pvclock.h 
b/tools/arch/x86/include/asm/pvclock.h
new file mode 100644
index ..2628f9a6330b
--- /dev/null
+++ b/tools/arch/x86/include/asm/pvclock.h
@@ -0,0 +1,103 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_PVCLOCK_H
+#define _ASM_X86_PVCLOCK_H
+
+#include 
+#include 
+
+/* some helper functions for xen and kvm pv clock sources */
+u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src);
+u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src);
+void pvclock_set_flags(u8 flags);
+unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src);
+void pvclock_resume(void);
+
+void pvclock_touch_watchdogs(void);
+
+static __always_inline
+unsigned pvclock_read_begin(const struct pvclock_vcpu_time_info *src)
+{
+   unsigned version = src->version & ~1;
+   /* Make sure that the version is read before the data. */
+   rmb();
+   return version;
+}
+
+static __always_inline
+bool pvclock_read_retry(const struct pvclock_vcpu_time_info *src,
+   unsigned version)
+{
+   /* Make sure that the version is re-read after the data. */
+   rmb();
+   return version != src->version;
+}
+
+/*
+ * Scale a 64-bit delta by scaling and multiplying by a 32-bit fraction,
+ * yielding a 64-bit result.
+ */
+static inline u64 pvclock_scale_delta(u64 delta, u32 mul_frac, int shift)
+{
+   u64 product;
+#ifdef __i386__
+   u32 tmp1, tmp2;
+#else
+   unsigned long tmp;
+#endif
+
+   if (shift < 0)
+   delta >>= -shift;
+   else
+   delta <<= shift;
+
+#ifdef __i386__
+   __asm__ (
+   "mul  %5   ; "
+   "mov  %4,%%eax ; "
+   "mov  %%edx,%4 ; "
+   "mul  %5   ; "
+   "xor  %5,%5; "
+   "add  %4,%%eax ; "
+   "adc  %5,%%edx ; "
+   : "=A" (product), "=r" (tmp1), "=r" (tmp2)
+   : "a" ((u32)delta), "1" ((u32)(delta >> 32)), "2" (mul_frac) );
+#elif defined(__x86_64__)
+   __asm__ (
+   "mulq %[mul_frac] ; shrd $32, %[hi], %[lo]"
+   : [lo]"=a"(product),
+ [hi]"=d"(tmp)
+   : "0"(delta),
+ [mul_frac]"rm"((u64)mul_frac));
+#else
+#error implement me!
+#endif
+
+   return product;
+}
+
+static __always_inline
+u64 __pvclock_read_cycles(const struct pvclock_vcpu_time_info *src, u64 tsc)
+{
+   u64 delta = tsc - src->tsc_timestamp;
+   u64 offset = pvclock_scale_delta(delta, src->tsc_to_system_mul,
+src->tsc_shift);
+   return 

[PATCH v6 13/21] KVM: arm64: Allow userspace to configure a vCPU's virtual offset

2021-08-04 Thread Oliver Upton
Allow userspace to access the guest's virtual counter-timer offset
through the ONE_REG interface. The value read or written is defined to
be an offset from the guest's physical counter-timer. Add some
documentation to clarify how a VMM should use this and the existing
CNTVCT_EL0.

Signed-off-by: Oliver Upton 
---
 Documentation/virt/kvm/api.rst| 10 ++
 arch/arm64/include/uapi/asm/kvm.h |  1 +
 arch/arm64/kvm/arch_timer.c   | 11 +++
 arch/arm64/kvm/guest.c|  6 +-
 include/kvm/arm_arch_timer.h  |  1 +
 5 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 8d4a3471ad9e..28a65dc89985 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -2487,6 +2487,16 @@ arm64 system registers have the following id bit 
patterns::
  derived from the register encoding for CNTV_CVAL_EL0.  As this is
  API, it must remain this way.
 
+.. warning::
+
+ The value of KVM_REG_ARM_TIMER_OFFSET is defined as an offset from
+ the guest's view of the physical counter-timer.
+
+ Userspace should use either KVM_REG_ARM_TIMER_OFFSET or
+ KVM_REG_ARM_TIMER_CVAL to pause and resume a guest's virtual
+ counter-timer. Mixed use of these registers could result in an
+ unpredictable guest counter value.
+
 arm64 firmware pseudo-registers have the following bit pattern::
 
   0x6030  0014 
diff --git a/arch/arm64/include/uapi/asm/kvm.h 
b/arch/arm64/include/uapi/asm/kvm.h
index b3edde68bc3e..949a31bc10f0 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -255,6 +255,7 @@ struct kvm_arm_copy_mte_tags {
 #define KVM_REG_ARM_TIMER_CTL  ARM64_SYS_REG(3, 3, 14, 3, 1)
 #define KVM_REG_ARM_TIMER_CVAL ARM64_SYS_REG(3, 3, 14, 0, 2)
 #define KVM_REG_ARM_TIMER_CNT  ARM64_SYS_REG(3, 3, 14, 3, 2)
+#define KVM_REG_ARM_TIMER_OFFSET   ARM64_SYS_REG(3, 4, 14, 0, 3)
 
 /* KVM-as-firmware specific pseudo-registers */
 #define KVM_REG_ARM_FW (0x0014 << KVM_REG_ARM_COPROC_SHIFT)
diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 4c2b763a8849..a8815b09da3e 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -868,6 +868,10 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 
regid, u64 value)
timer = vcpu_vtimer(vcpu);
kvm_arm_timer_write(vcpu, timer, TIMER_REG_CVAL, value);
break;
+   case KVM_REG_ARM_TIMER_OFFSET:
+   timer = vcpu_vtimer(vcpu);
+   update_vtimer_cntvoff(vcpu, value);
+   break;
case KVM_REG_ARM_PTIMER_CTL:
timer = vcpu_ptimer(vcpu);
kvm_arm_timer_write(vcpu, timer, TIMER_REG_CTL, value);
@@ -912,6 +916,9 @@ u64 kvm_arm_timer_get_reg(struct kvm_vcpu *vcpu, u64 regid)
case KVM_REG_ARM_TIMER_CVAL:
return kvm_arm_timer_read(vcpu,
  vcpu_vtimer(vcpu), TIMER_REG_CVAL);
+   case KVM_REG_ARM_TIMER_OFFSET:
+   return kvm_arm_timer_read(vcpu,
+ vcpu_vtimer(vcpu), TIMER_REG_OFFSET);
case KVM_REG_ARM_PTIMER_CTL:
return kvm_arm_timer_read(vcpu,
  vcpu_ptimer(vcpu), TIMER_REG_CTL);
@@ -949,6 +956,10 @@ static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
val = kvm_phys_timer_read() - timer_get_offset(timer);
break;
 
+   case TIMER_REG_OFFSET:
+   val = timer_get_offset(timer);
+   break;
+
default:
BUG();
}
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 1dfb83578277..17fc06e2b422 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -591,7 +591,7 @@ static unsigned long num_core_regs(const struct kvm_vcpu 
*vcpu)
  * ARM64 versions of the TIMER registers, always available on arm64
  */
 
-#define NUM_TIMER_REGS 3
+#define NUM_TIMER_REGS 4
 
 static bool is_timer_reg(u64 index)
 {
@@ -599,6 +599,7 @@ static bool is_timer_reg(u64 index)
case KVM_REG_ARM_TIMER_CTL:
case KVM_REG_ARM_TIMER_CNT:
case KVM_REG_ARM_TIMER_CVAL:
+   case KVM_REG_ARM_TIMER_OFFSET:
return true;
}
return false;
@@ -614,6 +615,9 @@ static int copy_timer_indices(struct kvm_vcpu *vcpu, u64 
__user *uindices)
uindices++;
if (put_user(KVM_REG_ARM_TIMER_CVAL, uindices))
return -EFAULT;
+   uindices++;
+   if (put_user(KVM_REG_ARM_TIMER_OFFSET, uindices))
+   return -EFAULT;
 
return 0;
 }
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index 9d65d4a29f81..615f9314f6a5 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -21,6 +21,7 @@ enum kvm_arch_timer_regs {

[PATCH v6 05/21] KVM: x86: Expose TSC offset controls to userspace

2021-08-04 Thread Oliver Upton
To date, VMM-directed TSC synchronization and migration has been a bit
messy. KVM has some baked-in heuristics around TSC writes to infer if
the VMM is attempting to synchronize. This is problematic, as it depends
on host userspace writing to the guest's TSC within 1 second of the last
write.

A much cleaner approach to configuring the guest's views of the TSC is to
simply migrate the TSC offset for every vCPU. Offsets are idempotent,
and thus not subject to change depending on when the VMM actually
reads/writes values from/to KVM. The VMM can then read the TSC once with
KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when
the guest is paused.

Cc: David Matlack 
Cc: Sean Christopherson 
Signed-off-by: Oliver Upton 
---
 Documentation/virt/kvm/devices/vcpu.rst |  57 +
 arch/x86/include/asm/kvm_host.h |   1 +
 arch/x86/include/uapi/asm/kvm.h |   4 +
 arch/x86/kvm/x86.c  | 109 
 4 files changed, 171 insertions(+)

diff --git a/Documentation/virt/kvm/devices/vcpu.rst 
b/Documentation/virt/kvm/devices/vcpu.rst
index 2acec3b9ef65..3b399d727c11 100644
--- a/Documentation/virt/kvm/devices/vcpu.rst
+++ b/Documentation/virt/kvm/devices/vcpu.rst
@@ -161,3 +161,60 @@ Specifies the base address of the stolen time structure 
for this VCPU. The
 base address must be 64 byte aligned and exist within a valid guest memory
 region. See Documentation/virt/kvm/arm/pvtime.rst for more information
 including the layout of the stolen time structure.
+
+4. GROUP: KVM_VCPU_TSC_CTRL
+===
+
+:Architectures: x86
+
+4.1 ATTRIBUTE: KVM_VCPU_TSC_OFFSET
+
+:Parameters: 64-bit unsigned TSC offset
+
+Returns:
+
+=== ==
+-EFAULT Error reading/writing the provided
+parameter address.
+-ENXIO  Attribute not supported
+=== ==
+
+Specifies the guest's TSC offset relative to the host's TSC. The guest's
+TSC is then derived by the following equation:
+
+  guest_tsc = host_tsc + KVM_VCPU_TSC_OFFSET
+
+This attribute is useful for the precise migration of a guest's TSC. The
+following describes a possible algorithm to use for the migration of a
+guest's TSC:
+
+From the source VMM process:
+
+1. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (t_0),
+   kvmclock nanoseconds (k_0), and realtime nanoseconds (r_0).
+
+2. Read the KVM_VCPU_TSC_OFFSET attribute for every vCPU to record the
+   guest TSC offset (off_n).
+
+3. Invoke the KVM_GET_TSC_KHZ ioctl to record the frequency of the
+   guest's TSC (freq).
+
+From the destination VMM process:
+
+4. Invoke the KVM_SET_CLOCK ioctl, providing the kvmclock nanoseconds
+   (k_0) and realtime nanoseconds (r_0) in their respective fields.
+   Ensure that the KVM_CLOCK_REALTIME flag is set in the provided
+   structure. KVM will advance the VM's kvmclock to account for elapsed
+   time since recording the clock values.
+
+5. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (t_1) and
+   kvmclock nanoseconds (k_1).
+
+6. Adjust the guest TSC offsets for every vCPU to account for (1) time
+   elapsed since recording state and (2) difference in TSCs between the
+   source and destination machine:
+
+   new_off_n = t_0 + off_n + (k_1 - k_0) * freq - t_1
+
+7. Write the KVM_VCPU_TSC_OFFSET attribute for every vCPU with the
+   respective value derived in the previous step.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d6376ca8efce..e9bfc00692fb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1064,6 +1064,7 @@ struct kvm_arch {
u64 last_tsc_nsec;
u64 last_tsc_write;
u32 last_tsc_khz;
+   u64 last_tsc_offset;
u64 cur_tsc_nsec;
u64 cur_tsc_write;
u64 cur_tsc_offset;
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index a6c327f8ad9e..0b22e1e84e78 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -503,4 +503,8 @@ struct kvm_pmu_event_filter {
 #define KVM_PMU_EVENT_ALLOW 0
 #define KVM_PMU_EVENT_DENY 1
 
+/* for KVM_{GET,SET,HAS}_DEVICE_ATTR */
+#define KVM_VCPU_TSC_CTRL 0 /* control group for the timestamp counter (TSC) */
+#define   KVM_VCPU_TSC_OFFSET 0 /* attribute for the TSC offset */
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 91aea751d621..3fdad71e5a36 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2466,6 +2466,7 @@ static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, 
u64 offset, u64 tsc,
kvm->arch.last_tsc_nsec = ns;
kvm->arch.last_tsc_write = tsc;
kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz;
+   kvm->arch.last_tsc_offset = offset;
 
vcpu->arch.last_guest_tsc = tsc;
 
@@ -4924,6 +4925,109 @@ static int kvm_set_guest_paused(struct kvm_vcpu *vcpu)

Re: [PATCH v2 0/2] KVM: arm64: Prevent kmemleak from accessing HYP data

2021-08-04 Thread Marc Zyngier
On Mon, 2 Aug 2021 13:38:28 +0100, Marc Zyngier wrote:
> This is a rework of the patch previously posted at [1].
> 
> The gist of the problem is that kmemleak can legitimately access data
> that has been removed from the kernel view, for two reasons:
> 
> (1) .hyp.rodata is lumped together with the BSS
> (2) there is no separation of the HYP BSS from the kernel BSS
> 
> [...]

Applied to next, thanks!

[1/2] arm64: Move .hyp.rodata outside of the _sdata.._edata range
  commit: eb48d154cd0dade56a0e244f0cfa198ea2925ed3
[2/2] KVM: arm64: Unregister HYP sections from kmemleak in protected mode
  commit: 47e6223c841e029bfc23c3ce594dac5525cebaf8

Cheers,

M.
-- 
Without deviation from the norm, progress is not possible.


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 19/21] KVM: arm64: Emulate physical counter offsetting on non-ECV systems

2021-08-04 Thread Andrew Jones
On Wed, Aug 04, 2021 at 08:58:17AM +, Oliver Upton wrote:
> Unfortunately, ECV hasn't yet arrived in any tangible hardware. At the
> same time, controlling the guest view of the physical counter-timer is
> useful. Support guest counter-timer offsetting on non-ECV systems by
> trapping guest accesses to the physical counter-timer. Emulate reads of
> the physical counter in the fast exit path.
> 
> Signed-off-by: Oliver Upton 
> ---
>  arch/arm64/include/asm/sysreg.h |  1 +
>  arch/arm64/kvm/arch_timer.c | 53 +++--
>  arch/arm64/kvm/hyp/include/hyp/switch.h | 29 ++
>  arch/arm64/kvm/hyp/nvhe/timer-sr.c  | 11 -
>  4 files changed, 70 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index c34672aa65b9..e49790ae5da4 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -505,6 +505,7 @@
>  #define SYS_AMEVCNTR0_MEM_STALL  SYS_AMEVCNTR0_EL0(3)
>  
>  #define SYS_CNTFRQ_EL0   sys_reg(3, 3, 14, 0, 0)
> +#define SYS_CNTPCT_EL0   sys_reg(3, 3, 14, 0, 1)
>  
>  #define SYS_CNTP_TVAL_EL0sys_reg(3, 3, 14, 2, 0)
>  #define SYS_CNTP_CTL_EL0 sys_reg(3, 3, 14, 2, 1)
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index 9ead94aa867d..b7cb63acf2a0 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -51,7 +51,7 @@ static void kvm_arm_timer_write(struct kvm_vcpu *vcpu,
>  static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
> struct arch_timer_context *timer,
> enum kvm_arch_timer_regs treg);
> -static void kvm_timer_enable_traps_vhe(void);
> +static void kvm_timer_enable_traps_vhe(struct kvm_vcpu *vcpu);
>  
>  u32 timer_get_ctl(struct arch_timer_context *ctxt)
>  {
> @@ -175,6 +175,12 @@ static void timer_set_guest_offset(struct 
> arch_timer_context *ctxt, u64 offset)
>   }
>  }
>  
> +static bool ptimer_emulation_required(struct kvm_vcpu *vcpu)
> +{
> + return timer_get_offset(vcpu_ptimer(vcpu)) &&
> + !cpus_have_const_cap(ARM64_ECV);

Whenever I see a static branch check and something else in the same
condition, I always wonder if we could trim a few instructions for
the static branch is false case by testing it first.

> +}
> +
>  u64 kvm_phys_timer_read(void)
>  {
>   return timecounter->cc->read(timecounter->cc);
> @@ -184,8 +190,13 @@ static void get_timer_map(struct kvm_vcpu *vcpu, struct 
> timer_map *map)
>  {
>   if (has_vhe()) {
>   map->direct_vtimer = vcpu_vtimer(vcpu);
> - map->direct_ptimer = vcpu_ptimer(vcpu);
> - map->emul_ptimer = NULL;
> + if (!ptimer_emulation_required(vcpu)) {
> + map->direct_ptimer = vcpu_ptimer(vcpu);
> + map->emul_ptimer = NULL;
> + } else {
> + map->direct_ptimer = NULL;
> + map->emul_ptimer = vcpu_ptimer(vcpu);
> + }
>   } else {
>   map->direct_vtimer = vcpu_vtimer(vcpu);
>   map->direct_ptimer = NULL;
> @@ -671,7 +682,7 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
>   timer_emulate(map.emul_ptimer);
>  
>   if (has_vhe())
> - kvm_timer_enable_traps_vhe();
> + kvm_timer_enable_traps_vhe(vcpu);
>  }
>  
>  bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
> @@ -1392,22 +1403,29 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
>   * The host kernel runs at EL2 with HCR_EL2.TGE == 1,
>   * and this makes those bits have no effect for the host kernel execution.
>   */
> -static void kvm_timer_enable_traps_vhe(void)
> +static void kvm_timer_enable_traps_vhe(struct kvm_vcpu *vcpu)
>  {
>   /* When HCR_EL2.E2H ==1, EL1PCEN and EL1PCTEN are shifted by 10 */
>   u32 cnthctl_shift = 10;
> - u64 val;
> + u64 val, mask;
> +
> + mask = CNTHCTL_EL1PCEN << cnthctl_shift;
> + mask |= CNTHCTL_EL1PCTEN << cnthctl_shift;
>  
> - /*
> -  * VHE systems allow the guest direct access to the EL1 physical
> -  * timer/counter.
> -  */
>   val = read_sysreg(cnthctl_el2);
> - val |= (CNTHCTL_EL1PCEN << cnthctl_shift);
> - val |= (CNTHCTL_EL1PCTEN << cnthctl_shift);
>  
>   if (cpus_have_const_cap(ARM64_ECV))
>   val |= CNTHCTL_ECV;
> +
> + /*
> +  * VHE systems allow the guest direct access to the EL1 physical
> +  * timer/counter if offsetting isn't requested on a non-ECV system.
> +  */
> + if (ptimer_emulation_required(vcpu))
> + val &= ~mask;
> + else
> + val |= mask;
> +
>   write_sysreg(val, cnthctl_el2);
>  }
>  
> @@ -1462,9 +1480,6 @@ static int kvm_arm_timer_set_attr_offset(struct 
> kvm_vcpu *vcpu,
>   u64 __user *uaddr = (u64 __user *)(long)attr->addr;
> 

Re: [PATCH v6 20/21] selftests: KVM: Test physical counter offsetting

2021-08-04 Thread Andrew Jones
On Wed, Aug 04, 2021 at 08:58:18AM +, Oliver Upton wrote:
> Test that userspace adjustment of the guest physical counter-timer
> results in the correct view within the guest.
> 
> Cc: Andrew Jones 
> Signed-off-by: Oliver Upton 
> ---
>  .../selftests/kvm/include/aarch64/processor.h | 12 +++
>  .../kvm/system_counter_offset_test.c  | 31 +--
>  2 files changed, 40 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/include/aarch64/processor.h 
> b/tools/testing/selftests/kvm/include/aarch64/processor.h
> index 3168cdbae6ee..7f53d90e9512 100644
> --- a/tools/testing/selftests/kvm/include/aarch64/processor.h
> +++ b/tools/testing/selftests/kvm/include/aarch64/processor.h
> @@ -141,4 +141,16 @@ static inline uint64_t read_cntvct_ordered(void)
>   return r;
>  }
>  
> +static inline uint64_t read_cntpct_ordered(void)
> +{
> + uint64_t r;
> +
> + __asm__ __volatile__("isb\n\t"
> +  "mrs %0, cntpct_el0\n\t"
> +  "isb\n\t"
> +  : "=r"(r));
> +
> + return r;
> +}
> +
>  #endif /* SELFTEST_KVM_PROCESSOR_H */
> diff --git a/tools/testing/selftests/kvm/system_counter_offset_test.c 
> b/tools/testing/selftests/kvm/system_counter_offset_test.c
> index ac933db83d03..82d26a45cc48 100644
> --- a/tools/testing/selftests/kvm/system_counter_offset_test.c
> +++ b/tools/testing/selftests/kvm/system_counter_offset_test.c
> @@ -57,6 +57,9 @@ static uint64_t host_read_guest_system_counter(struct 
> test_case *test)
>  
>  enum arch_counter {
>   VIRTUAL,
> + PHYSICAL,
> + /* offset physical, read virtual */
> + PHYSICAL_READ_VIRTUAL,
>  };
>  
>  struct test_case {
> @@ -68,32 +71,54 @@ static struct test_case test_cases[] = {
>   { .counter = VIRTUAL, .offset = 0 },
>   { .counter = VIRTUAL, .offset = 180 * NSEC_PER_SEC },
>   { .counter = VIRTUAL, .offset = -180 * NSEC_PER_SEC },
> + { .counter = PHYSICAL, .offset = 0 },
> + { .counter = PHYSICAL, .offset = 180 * NSEC_PER_SEC },
> + { .counter = PHYSICAL, .offset = -180 * NSEC_PER_SEC },
> + { .counter = PHYSICAL_READ_VIRTUAL, .offset = 0 },
> + { .counter = PHYSICAL_READ_VIRTUAL, .offset = 180 * NSEC_PER_SEC },
> + { .counter = PHYSICAL_READ_VIRTUAL, .offset = -180 * NSEC_PER_SEC },
>  };
>  
>  static void check_preconditions(struct kvm_vm *vm)
>  {
> - if (vcpu_has_reg(vm, VCPU_ID, KVM_REG_ARM_TIMER_OFFSET))
> + if (vcpu_has_reg(vm, VCPU_ID, KVM_REG_ARM_TIMER_OFFSET) &&
> + !_vcpu_has_device_attr(vm, VCPU_ID, KVM_ARM_VCPU_TIMER_CTRL,
> +KVM_ARM_VCPU_TIMER_OFFSET))
>   return;
>  
> - print_skip("KVM_REG_ARM_TIMER_OFFSET not supported; skipping test");
> + print_skip("KVM_REG_ARM_TIMER_OFFSET|KVM_ARM_VCPU_TIMER_OFFSET not 
> supported; skipping test");
>   exit(KSFT_SKIP);
>  }
>  
>  static void setup_system_counter(struct kvm_vm *vm, struct test_case *test)
>  {
> + uint64_t cntvoff, cntpoff;
>   struct kvm_one_reg reg = {
>   .id = KVM_REG_ARM_TIMER_OFFSET,
> - .addr = (__u64)>offset,
> + .addr = (__u64),
>   };
>  
> + if (test->counter == VIRTUAL) {
> + cntvoff = test->offset;
> + cntpoff = 0;
> + } else {
> + cntvoff = 0;
> + cntpoff = test->offset;
> + }
> +
>   vcpu_set_reg(vm, VCPU_ID, );
> + vcpu_access_device_attr(vm, VCPU_ID, KVM_ARM_VCPU_TIMER_CTRL,
> + KVM_ARM_VCPU_TIMER_OFFSET, , true);
>  }
>  
>  static uint64_t guest_read_system_counter(struct test_case *test)
>  {
>   switch (test->counter) {
>   case VIRTUAL:
> + case PHYSICAL_READ_VIRTUAL:
>   return read_cntvct_ordered();
> + case PHYSICAL:
> + return read_cntpct_ordered();
>   default:
>   GUEST_ASSERT(0);
>   }
> -- 
> 2.32.0.605.g8dce9f2422-goog
>

Reviewed-by: Andrew Jones 

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5 11/13] KVM: arm64: Provide userspace access to the physical counter offset

2021-08-04 Thread Oliver Upton
Hi Marc,

On Fri, Jul 30, 2021 at 9:48 AM Oliver Upton  wrote:
>
> On Fri, Jul 30, 2021 at 9:18 AM Marc Zyngier  wrote:
> > You want the ARM FVP model, or maybe even the Foundation model. It has
> > support all the way to ARMv8.7 apparently. I personally use the FVP,
> > get in touch offline and I'll help you with the setup.
> >
> > In general, I tend to trust the ARM models a lot more than QEMU for
> > the quality of the emulation. You can tune it in some bizarre way
> > (the cache modelling is terrifying), and it will definitely do all
> > kind of crazy reordering and speculation.
>
> Awesome, thanks. I'll give this a try.
>

I have another spin of this series ready to kick out the door that
implements ECV support but ran into some issues testing it... Seems
that the ARM Foundation model only implements ECV=0x01, when we need
ECV=0x02 for CNTPOFF_EL2 to be valid. Any thoughts, or shall I just
send out the series and stare at it long enough to make sure the ECV
parts look right ;-)

--
Thanks,
Oliver
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 18/21] KVM: arm64: Configure timer traps in vcpu_load() for VHE

2021-08-04 Thread Andrew Jones
On Wed, Aug 04, 2021 at 08:58:16AM +, Oliver Upton wrote:
> In preparation for emulated physical counter-timer offsetting, configure
> traps on every vcpu_load() for VHE systems. As before, these trap
> settings do not affect host userspace, and are only active for the
> guest.
> 
> Suggested-by: Marc Zyngier 
> Signed-off-by: Oliver Upton 
> ---
>  arch/arm64/kvm/arch_timer.c  | 10 +++---
>  arch/arm64/kvm/arm.c |  4 +---
>  include/kvm/arm_arch_timer.h |  2 --
>  3 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index f15058612994..9ead94aa867d 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -51,6 +51,7 @@ static void kvm_arm_timer_write(struct kvm_vcpu *vcpu,
>  static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
> struct arch_timer_context *timer,
> enum kvm_arch_timer_regs treg);
> +static void kvm_timer_enable_traps_vhe(void);
>  
>  u32 timer_get_ctl(struct arch_timer_context *ctxt)
>  {
> @@ -668,6 +669,9 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
>  
>   if (map.emul_ptimer)
>   timer_emulate(map.emul_ptimer);
> +
> + if (has_vhe())
> + kvm_timer_enable_traps_vhe();
>  }
>  
>  bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
> @@ -1383,12 +1387,12 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
>  }
>  
>  /*
> - * On VHE system, we only need to configure the EL2 timer trap register once,
> - * not for every world switch.
> + * On VHE system, we only need to configure the EL2 timer trap register on
> + * vcpu_load(), but not every world switch into the guest.
>   * The host kernel runs at EL2 with HCR_EL2.TGE == 1,
>   * and this makes those bits have no effect for the host kernel execution.
>   */
> -void kvm_timer_init_vhe(void)
> +static void kvm_timer_enable_traps_vhe(void)
>  {
>   /* When HCR_EL2.E2H ==1, EL1PCEN and EL1PCTEN are shifted by 10 */
>   u32 cnthctl_shift = 10;
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index e9a2b8f27792..47ea1e1ba80b 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1558,9 +1558,7 @@ static void cpu_hyp_reinit(void)
>  
>   cpu_hyp_reset();
>  
> - if (is_kernel_in_hyp_mode())
> - kvm_timer_init_vhe();
> - else
> + if (!is_kernel_in_hyp_mode())
>   cpu_init_hyp_mode();
>  
>   cpu_set_hyp_vector();
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index 615f9314f6a5..254653b42da0 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -87,8 +87,6 @@ u64 kvm_phys_timer_read(void);
>  void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu);
>  void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu);
>  
> -void kvm_timer_init_vhe(void);
> -
>  bool kvm_arch_timer_get_input_level(int vintid);
>  
>  #define vcpu_timer(v)(&(v)->arch.timer_cpu)
> -- 
> 2.32.0.605.g8dce9f2422-goog
>

Reviewed-by: Andrew Jones 

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 13/21] KVM: arm64: Allow userspace to configure a vCPU's virtual offset

2021-08-04 Thread Andrew Jones
On Wed, Aug 04, 2021 at 08:58:11AM +, Oliver Upton wrote:
> Allow userspace to access the guest's virtual counter-timer offset
> through the ONE_REG interface. The value read or written is defined to
> be an offset from the guest's physical counter-timer. Add some
> documentation to clarify how a VMM should use this and the existing
> CNTVCT_EL0.
> 
> Signed-off-by: Oliver Upton 
> ---
>  Documentation/virt/kvm/api.rst| 10 ++
>  arch/arm64/include/uapi/asm/kvm.h |  1 +
>  arch/arm64/kvm/arch_timer.c   | 11 +++
>  arch/arm64/kvm/guest.c|  6 +-
>  include/kvm/arm_arch_timer.h  |  1 +
>  5 files changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 8d4a3471ad9e..28a65dc89985 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -2487,6 +2487,16 @@ arm64 system registers have the following id bit 
> patterns::
>   derived from the register encoding for CNTV_CVAL_EL0.  As this is
>   API, it must remain this way.
>  
> +.. warning::
> +
> + The value of KVM_REG_ARM_TIMER_OFFSET is defined as an offset from
> + the guest's view of the physical counter-timer.
> +
> + Userspace should use either KVM_REG_ARM_TIMER_OFFSET or
> + KVM_REG_ARM_TIMER_CVAL to pause and resume a guest's virtual
> + counter-timer. Mixed use of these registers could result in an
> + unpredictable guest counter value.
> +
>  arm64 firmware pseudo-registers have the following bit pattern::
>  
>0x6030  0014 
> diff --git a/arch/arm64/include/uapi/asm/kvm.h 
> b/arch/arm64/include/uapi/asm/kvm.h
> index b3edde68bc3e..949a31bc10f0 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -255,6 +255,7 @@ struct kvm_arm_copy_mte_tags {
>  #define KVM_REG_ARM_TIMER_CTLARM64_SYS_REG(3, 3, 14, 3, 1)
>  #define KVM_REG_ARM_TIMER_CVAL   ARM64_SYS_REG(3, 3, 14, 0, 2)
>  #define KVM_REG_ARM_TIMER_CNTARM64_SYS_REG(3, 3, 14, 3, 2)
> +#define KVM_REG_ARM_TIMER_OFFSET ARM64_SYS_REG(3, 4, 14, 0, 3)
>  
>  /* KVM-as-firmware specific pseudo-registers */
>  #define KVM_REG_ARM_FW   (0x0014 << 
> KVM_REG_ARM_COPROC_SHIFT)
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index 4c2b763a8849..a8815b09da3e 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -868,6 +868,10 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 
> regid, u64 value)
>   timer = vcpu_vtimer(vcpu);
>   kvm_arm_timer_write(vcpu, timer, TIMER_REG_CVAL, value);
>   break;
> + case KVM_REG_ARM_TIMER_OFFSET:
> + timer = vcpu_vtimer(vcpu);
> + update_vtimer_cntvoff(vcpu, value);
> + break;
>   case KVM_REG_ARM_PTIMER_CTL:
>   timer = vcpu_ptimer(vcpu);
>   kvm_arm_timer_write(vcpu, timer, TIMER_REG_CTL, value);
> @@ -912,6 +916,9 @@ u64 kvm_arm_timer_get_reg(struct kvm_vcpu *vcpu, u64 
> regid)
>   case KVM_REG_ARM_TIMER_CVAL:
>   return kvm_arm_timer_read(vcpu,
> vcpu_vtimer(vcpu), TIMER_REG_CVAL);
> + case KVM_REG_ARM_TIMER_OFFSET:
> + return kvm_arm_timer_read(vcpu,
> +   vcpu_vtimer(vcpu), TIMER_REG_OFFSET);
>   case KVM_REG_ARM_PTIMER_CTL:
>   return kvm_arm_timer_read(vcpu,
> vcpu_ptimer(vcpu), TIMER_REG_CTL);
> @@ -949,6 +956,10 @@ static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
>   val = kvm_phys_timer_read() - timer_get_offset(timer);
>   break;
>  
> + case TIMER_REG_OFFSET:
> + val = timer_get_offset(timer);
> + break;
> +
>   default:
>   BUG();
>   }
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index 1dfb83578277..17fc06e2b422 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -591,7 +591,7 @@ static unsigned long num_core_regs(const struct kvm_vcpu 
> *vcpu)
>   * ARM64 versions of the TIMER registers, always available on arm64
>   */
>  
> -#define NUM_TIMER_REGS 3
> +#define NUM_TIMER_REGS 4
>  
>  static bool is_timer_reg(u64 index)
>  {
> @@ -599,6 +599,7 @@ static bool is_timer_reg(u64 index)
>   case KVM_REG_ARM_TIMER_CTL:
>   case KVM_REG_ARM_TIMER_CNT:
>   case KVM_REG_ARM_TIMER_CVAL:
> + case KVM_REG_ARM_TIMER_OFFSET:
>   return true;
>   }
>   return false;
> @@ -614,6 +615,9 @@ static int copy_timer_indices(struct kvm_vcpu *vcpu, u64 
> __user *uindices)
>   uindices++;
>   if (put_user(KVM_REG_ARM_TIMER_CVAL, uindices))
>   return -EFAULT;
> + uindices++;
> + if (put_user(KVM_REG_ARM_TIMER_OFFSET, uindices))
> + return -EFAULT;
>  
>   return 0;
> 

Re: [PATCH v6 12/21] KVM: arm64: Separate guest/host counter offset values

2021-08-04 Thread Andrew Jones
On Wed, Aug 04, 2021 at 08:58:10AM +, Oliver Upton wrote:
> In some instances, a VMM may want to update the guest's counter-timer
> offset in a transparent manner, meaning that changes to the hardware
> value do not affect the synthetic register presented to the guest or the
> VMM through said guest's architectural state. Lay the groundwork to
> separate guest offset register writes from the hardware values utilized
> by KVM.
> 
> Signed-off-by: Oliver Upton 
> ---
>  arch/arm64/kvm/arch_timer.c  | 48 
>  include/kvm/arm_arch_timer.h |  3 +++
>  2 files changed, 46 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index c0101db75ad4..4c2b763a8849 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -87,6 +87,18 @@ static u64 timer_get_offset(struct arch_timer_context 
> *ctxt)
>   struct kvm_vcpu *vcpu = ctxt->vcpu;
>  
>   switch(arch_timer_ctx_index(ctxt)) {
> + case TIMER_VTIMER:
> + return ctxt->host_offset;
> + default:
> + return 0;
> + }
> +}
> +
> +static u64 timer_get_guest_offset(struct arch_timer_context *ctxt)
> +{
> + struct kvm_vcpu *vcpu = ctxt->vcpu;
> +
> + switch (arch_timer_ctx_index(ctxt)) {
>   case TIMER_VTIMER:
>   return __vcpu_sys_reg(vcpu, CNTVOFF_EL2);
>   default:
> @@ -132,13 +144,31 @@ static void timer_set_offset(struct arch_timer_context 
> *ctxt, u64 offset)
>  
>   switch(arch_timer_ctx_index(ctxt)) {
>   case TIMER_VTIMER:
> - __vcpu_sys_reg(vcpu, CNTVOFF_EL2) = offset;
> + ctxt->host_offset = offset;
>   break;
>   default:
>   WARN(offset, "timer %ld\n", arch_timer_ctx_index(ctxt));
>   }
>  }
>  
> +static void timer_set_guest_offset(struct arch_timer_context *ctxt, u64 
> offset)
> +{
> + struct kvm_vcpu *vcpu = ctxt->vcpu;
> +
> + switch (arch_timer_ctx_index(ctxt)) {
> + case TIMER_VTIMER: {
> + u64 host_offset = timer_get_offset(ctxt);
> +
> + host_offset += offset - __vcpu_sys_reg(vcpu, CNTVOFF_EL2);
> + __vcpu_sys_reg(vcpu, CNTVOFF_EL2) = offset;
> + timer_set_offset(ctxt, host_offset);
> + break;
> + }
> + default:
> + WARN_ONCE(offset, "timer %ld\n", arch_timer_ctx_index(ctxt));
> + }
> +}
> +
>  u64 kvm_phys_timer_read(void)
>  {
>   return timecounter->cc->read(timecounter->cc);
> @@ -749,7 +779,8 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
>  
>  /* Make offset updates for all timer contexts atomic */
>  static void update_timer_offset(struct kvm_vcpu *vcpu,
> - enum kvm_arch_timers timer, u64 offset)
> + enum kvm_arch_timers timer, u64 offset,
> + bool guest_visible)
>  {
>   int i;
>   struct kvm *kvm = vcpu->kvm;
> @@ -758,13 +789,20 @@ static void update_timer_offset(struct kvm_vcpu *vcpu,
>   lockdep_assert_held(>lock);
>  
>   kvm_for_each_vcpu(i, tmp, kvm)
> - timer_set_offset(vcpu_get_timer(tmp, timer), offset);
> + if (guest_visible)
> + timer_set_guest_offset(vcpu_get_timer(tmp, timer),
> +offset);
> + else
> + timer_set_offset(vcpu_get_timer(tmp, timer), offset);
>  
>   /*
>* When called from the vcpu create path, the CPU being created is not
>* included in the loop above, so we just set it here as well.
>*/
> - timer_set_offset(vcpu_get_timer(vcpu, timer), offset);
> + if (guest_visible)
> + timer_set_guest_offset(vcpu_get_timer(vcpu, timer), offset);
> + else
> + timer_set_offset(vcpu_get_timer(vcpu, timer), offset);
>  }
>  
>  static void update_vtimer_cntvoff(struct kvm_vcpu *vcpu, u64 cntvoff)
> @@ -772,7 +810,7 @@ static void update_vtimer_cntvoff(struct kvm_vcpu *vcpu, 
> u64 cntvoff)
>   struct kvm *kvm = vcpu->kvm;
>  
>   mutex_lock(>lock);
> - update_timer_offset(vcpu, TIMER_VTIMER, cntvoff);
> + update_timer_offset(vcpu, TIMER_VTIMER, cntvoff, true);
>   mutex_unlock(>lock);
>  }
>  
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index 51c19381108c..9d65d4a29f81 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -42,6 +42,9 @@ struct arch_timer_context {
>   /* Duplicated state from arch_timer.c for convenience */
>   u32 host_timer_irq;
>   u32 host_timer_irq_flags;
> +
> + /* offset relative to the host's physical counter-timer */
> + u64 host_offset;
>  };
>  
>  struct timer_map {
> -- 
> 2.32.0.605.g8dce9f2422-goog
>

 
Reviewed-by: Andrew Jones 

___

Re: [PATCH v6 17/21] KVM: arm64: Allow userspace to configure a guest's counter-timer offset

2021-08-04 Thread Andrew Jones
On Wed, Aug 04, 2021 at 08:58:15AM +, Oliver Upton wrote:
> Presently, KVM provides no facilities for correctly migrating a guest
> that depends on the physical counter-timer. Whie most guests (barring
> NV, of course) should not depend on the physical counter-timer, an
> operator may wish to provide a consistent view of the physical
> counter-timer across migrations.
> 
> Provide userspace with a new vCPU attribute to modify the guest
> counter-timer offset. Unlike KVM_REG_ARM_TIMER_OFFSET, this attribute is
> hidden from the guest's architectural state. The value offsets *both*
> the virtual and physical counter-timer views for the guest. Only support
> this attribute on ECV systems as ECV is required for hardware offsetting
> of the physical counter-timer.
> 
> Signed-off-by: Oliver Upton 
> ---
>  Documentation/virt/kvm/devices/vcpu.rst |  28 ++
>  arch/arm64/include/asm/kvm_asm.h|   2 +
>  arch/arm64/include/asm/sysreg.h |   2 +
>  arch/arm64/include/uapi/asm/kvm.h   |   1 +
>  arch/arm64/kvm/arch_timer.c | 122 +++-
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c  |   6 ++
>  arch/arm64/kvm/hyp/nvhe/timer-sr.c  |   5 +
>  arch/arm64/kvm/hyp/vhe/timer-sr.c   |   5 +
>  include/clocksource/arm_arch_timer.h|   1 +
>  9 files changed, 169 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/devices/vcpu.rst 
> b/Documentation/virt/kvm/devices/vcpu.rst
> index 3b399d727c11..3ba35b9d9d03 100644
> --- a/Documentation/virt/kvm/devices/vcpu.rst
> +++ b/Documentation/virt/kvm/devices/vcpu.rst
> @@ -139,6 +139,34 @@ configured values on other VCPUs.  Userspace should 
> configure the interrupt
>  numbers on at least one VCPU after creating all VCPUs and before running any
>  VCPUs.
>  
> +2.2. ATTRIBUTE: KVM_ARM_VCPU_TIMER_OFFSET
> +-
> +
> +:Parameters: in kvm_device_attr.addr the address for the timer offset is a
> + pointer to a __u64
> +
> +Returns:
> +
> +  === ==
> +  -EFAULT Error reading/writing the provided
> +  parameter address
> +  -ENXIO  Timer offsetting not implemented
> +  === ==
> +
> +Specifies the guest's counter-timer offset from the host's virtual counter.
> +The guest's physical counter value is then derived by the following
> +equation:
> +
> +  guest_cntpct = host_cntvct - KVM_ARM_VCPU_TIMER_OFFSET
> +
> +The guest's virtual counter value is derived by the following equation:
> +
> +  guest_cntvct = host_cntvct - KVM_REG_ARM_TIMER_OFFSET
> + - KVM_ARM_VCPU_TIMER_OFFSET
> +
> +KVM does not allow the use of varying offset values for different vCPUs;
> +the last written offset value will be broadcasted to all vCPUs in a VM.
> +
>  3. GROUP: KVM_ARM_VCPU_PVTIME_CTRL
>  ==
>  
> diff --git a/arch/arm64/include/asm/kvm_asm.h 
> b/arch/arm64/include/asm/kvm_asm.h
> index 9f0bf2109be7..ab1c8fdb0177 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -65,6 +65,7 @@
>  #define __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize   19
>  #define __KVM_HOST_SMCCC_FUNC___pkvm_mark_hyp20
>  #define __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc21
> +#define __KVM_HOST_SMCCC_FUNC___kvm_timer_set_cntpoff22
>  
>  #ifndef __ASSEMBLY__
>  
> @@ -200,6 +201,7 @@ extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu 
> *mmu, phys_addr_t ipa,
>  extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
>  
>  extern void __kvm_timer_set_cntvoff(u64 cntvoff);
> +extern void __kvm_timer_set_cntpoff(u64 cntpoff);
>  
>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>  
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 4dfc44066dfb..c34672aa65b9 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -586,6 +586,8 @@
>  #define SYS_ICH_LR14_EL2 __SYS__LR8_EL2(6)
>  #define SYS_ICH_LR15_EL2 __SYS__LR8_EL2(7)
>  
> +#define SYS_CNTPOFF_EL2  sys_reg(3, 4, 14, 0, 6)
> +
>  /* VHE encodings for architectural EL0/1 system registers */
>  #define SYS_SCTLR_EL12   sys_reg(3, 5, 1, 0, 0)
>  #define SYS_CPACR_EL12   sys_reg(3, 5, 1, 0, 2)
> diff --git a/arch/arm64/include/uapi/asm/kvm.h 
> b/arch/arm64/include/uapi/asm/kvm.h
> index 949a31bc10f0..15150f8224a1 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -366,6 +366,7 @@ struct kvm_arm_copy_mte_tags {
>  #define KVM_ARM_VCPU_TIMER_CTRL  1
>  #define   KVM_ARM_VCPU_TIMER_IRQ_VTIMER  0
>  #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER  1
> +#define   KVM_ARM_VCPU_TIMER_OFFSET  2
>  #define KVM_ARM_VCPU_PVTIME_CTRL 2
>  #define   

Re: [PATCH v3] memblock: make memblock_find_in_range method private

2021-08-04 Thread Catalin Marinas
On Tue, Aug 03, 2021 at 10:07:37PM +0300, Mike Rapoport wrote:
> On Tue, Aug 03, 2021 at 07:05:26PM +0100, Catalin Marinas wrote:
> > On Tue, Aug 03, 2021 at 09:42:18AM +0300, Mike Rapoport wrote:
> > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > > index 8490ed2917ff..0bffd2d1854f 100644
> > > --- a/arch/arm64/mm/init.c
> > > +++ b/arch/arm64/mm/init.c
> > > @@ -74,6 +74,7 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init;
> > >  static void __init reserve_crashkernel(void)
> > >  {
> > >   unsigned long long crash_base, crash_size;
> > > + unsigned long long crash_max = arm64_dma_phys_limit;
> > >   int ret;
> > >  
> > >   ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> > > @@ -84,33 +85,18 @@ static void __init reserve_crashkernel(void)
> > >  
> > >   crash_size = PAGE_ALIGN(crash_size);
> > >  
> > > - if (crash_base == 0) {
> > > - /* Current arm64 boot protocol requires 2MB alignment */
> > > - crash_base = memblock_find_in_range(0, arm64_dma_phys_limit,
> > > - crash_size, SZ_2M);
> > > - if (crash_base == 0) {
> > > - pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
> > > - crash_size);
> > > - return;
> > > - }
> > > - } else {
> > > - /* User specifies base address explicitly. */
> > > - if (!memblock_is_region_memory(crash_base, crash_size)) {
> > > - pr_warn("cannot reserve crashkernel: region is not 
> > > memory\n");
> > > - return;
> > > - }
> > > + /* User specifies base address explicitly. */
> > > + if (crash_base)
> > > + crash_max = crash_base + crash_size;
> > >  
> > > - if (memblock_is_region_reserved(crash_base, crash_size)) {
> > > - pr_warn("cannot reserve crashkernel: region overlaps 
> > > reserved memory\n");
> > > - return;
> > > - }
> > > -
> > > - if (!IS_ALIGNED(crash_base, SZ_2M)) {
> > > - pr_warn("cannot reserve crashkernel: base address is 
> > > not 2MB aligned\n");
> > > - return;
> > > - }
> > > + /* Current arm64 boot protocol requires 2MB alignment */
> > > + crash_base = memblock_phys_alloc_range(crash_size, SZ_2M,
> > > +crash_base, crash_max);
> > > + if (!crash_base) {
> > > + pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
> > > + crash_size);
> > > + return;
> > >   }
> > > - memblock_reserve(crash_base, crash_size);
> > 
> > We'll miss a bit on debug information provided to the user in case of a
> > wrong crash_base/size option on the command line. Not sure we care much,
> > though the alignment would probably be useful (maybe we document it
> > somewhere).
> 
> It is already documented:
> 
> Documentation/admin-guide/kdump/kdump.rst:
>On arm64, use "crashkernel=Y[@X]".  Note that the start address of
>the kernel, X if explicitly specified, must be aligned to 2MiB (0x20).

Thanks for the pointer.

> > What I haven't checked is whether memblock_phys_alloc_range() aims to
> > get a 2MB aligned end (size) as well. If crash_size is not 2MB aligned,
> > crash_max wouldn't be either and the above could fail. We only care
> > about the crash_base to be aligned but the memblock_phys_alloc_range()
> > doc says that both the start and size would be aligned to this.
> 
> The doc lies :)
> 
> memblock_phys_alloc_range() boils down to 
> 
>   for_each_free_mem_range_reverse(i, nid, flags, _start, _end,
>   NULL) {
> 
>   /* clamp this_{start,end} to the user defined limits */
> 
>   cand = round_down(this_end - size, align);
>   if (cand >= this_start)
>   return cand;
>   }

Alright, it should work then. For arm64:

Reviewed-by: Catalin Marinas 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 11/21] KVM: arm64: Refactor update_vtimer_cntvoff()

2021-08-04 Thread Andrew Jones
On Wed, Aug 04, 2021 at 08:58:09AM +, Oliver Upton wrote:
> Make the implementation of update_vtimer_cntvoff() generic w.r.t. guest
> timer context and spin off into a new helper method for later use.
> Require callers of this new helper method to grab the kvm lock
> beforehand.
> 
> No functional change intended.
> 
> Signed-off-by: Oliver Upton 
> ---
>  arch/arm64/kvm/arch_timer.c | 20 +++-
>  1 file changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index 3df67c127489..c0101db75ad4 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -747,22 +747,32 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
>   return 0;
>  }
>  
> -/* Make the updates of cntvoff for all vtimer contexts atomic */
> -static void update_vtimer_cntvoff(struct kvm_vcpu *vcpu, u64 cntvoff)
> +/* Make offset updates for all timer contexts atomic */
> +static void update_timer_offset(struct kvm_vcpu *vcpu,
> + enum kvm_arch_timers timer, u64 offset)
>  {
>   int i;
>   struct kvm *kvm = vcpu->kvm;
>   struct kvm_vcpu *tmp;
>  
> - mutex_lock(>lock);
> + lockdep_assert_held(>lock);
> +
>   kvm_for_each_vcpu(i, tmp, kvm)
> - timer_set_offset(vcpu_vtimer(tmp), cntvoff);
> + timer_set_offset(vcpu_get_timer(tmp, timer), offset);
>  
>   /*
>* When called from the vcpu create path, the CPU being created is not
>* included in the loop above, so we just set it here as well.
>*/
> - timer_set_offset(vcpu_vtimer(vcpu), cntvoff);
> + timer_set_offset(vcpu_get_timer(vcpu, timer), offset);
> +}
> +
> +static void update_vtimer_cntvoff(struct kvm_vcpu *vcpu, u64 cntvoff)
> +{
> + struct kvm *kvm = vcpu->kvm;
> +
> + mutex_lock(>lock);
> + update_timer_offset(vcpu, TIMER_VTIMER, cntvoff);
>   mutex_unlock(>lock);
>  }
>  
> -- 
> 2.32.0.605.g8dce9f2422-goog
>

Reviewed-by: Andrew Jones 

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 14/21] selftests: KVM: Add helper to check for register presence

2021-08-04 Thread Andrew Jones
On Wed, Aug 04, 2021 at 08:58:12AM +, Oliver Upton wrote:
> The KVM_GET_REG_LIST vCPU ioctl returns a list of supported registers
> for a given vCPU. Add a helper to check if a register exists in the list
> of supported registers.
> 
> Signed-off-by: Oliver Upton 
> ---
>  .../testing/selftests/kvm/include/kvm_util.h  |  2 ++
>  tools/testing/selftests/kvm/lib/kvm_util.c| 19 +++
>  2 files changed, 21 insertions(+)
> 
> diff --git a/tools/testing/selftests/kvm/include/kvm_util.h 
> b/tools/testing/selftests/kvm/include/kvm_util.h
> index 1b3ef5757819..077082dd2ca7 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> @@ -215,6 +215,8 @@ void vcpu_fpu_get(struct kvm_vm *vm, uint32_t vcpuid,
> struct kvm_fpu *fpu);
>  void vcpu_fpu_set(struct kvm_vm *vm, uint32_t vcpuid,
> struct kvm_fpu *fpu);
> +
> +bool vcpu_has_reg(struct kvm_vm *vm, uint32_t vcpuid, uint64_t reg_id);
>  void vcpu_get_reg(struct kvm_vm *vm, uint32_t vcpuid, struct kvm_one_reg 
> *reg);
>  void vcpu_set_reg(struct kvm_vm *vm, uint32_t vcpuid, struct kvm_one_reg 
> *reg);
>  #ifdef __KVM_HAVE_VCPU_EVENTS
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c 
> b/tools/testing/selftests/kvm/lib/kvm_util.c
> index 0fe66ca6139a..a5801d4ed37d 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -1823,6 +1823,25 @@ void vcpu_fpu_set(struct kvm_vm *vm, uint32_t vcpuid, 
> struct kvm_fpu *fpu)
>   ret, errno, strerror(errno));
>  }
>  
> +bool vcpu_has_reg(struct kvm_vm *vm, uint32_t vcpuid, uint64_t reg_id)
> +{
> + struct kvm_reg_list *list;
> + bool ret = false;
> + uint64_t i;
> +
> + list = vcpu_get_reg_list(vm, vcpuid);
> +
> + for (i = 0; i < list->n; i++) {
> + if (list->reg[i] == reg_id) {
> + ret = true;
> + break;
> + }
> + }
> +
> + free(list);
> + return ret;
> +}
> +
>  void vcpu_get_reg(struct kvm_vm *vm, uint32_t vcpuid, struct kvm_one_reg 
> *reg)
>  {
>   int ret;
> -- 
> 2.32.0.605.g8dce9f2422-goog
>

Reviewed-by: Andrew Jones 

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm